Sample preprocessing for identification of a text area in image acquisition before using tesseract-OCR to extract text

I am using ImageMagick to preprocess the receipt image before using the tesseract-OCR engine to extract the texts. I removed noise from the image using

convert input.png -colorspace gray \
  \( +clone -blur 0x2 \) +swap -compose divide -composite \
  -linear-stretch 5%x0%   photocopy.png

      

Now I need to crop the area with the texts. ImageMagick has a masking feature to remove the border of the image's shape, but in my case creating the mask doesn't seem to work due to the uneven background image.

I went through SWT ' Stroke Width Transform

to identify texts in natural images' here Can I get it via imagemagick (maybe another handy image processing tool for developers) to identify text so borders can be dropped? Thanks in advance.

+3


source to share





All Articles