Sample preprocessing for identification of a text area in image acquisition before using tesseract-OCR to extract text

I am using ImageMagick to preprocess the receipt image before using the tesseract-OCR engine to extract the texts. I removed noise from the image using

convert input.png -colorspace gray \
  \( +clone -blur 0x2 \) +swap -compose divide -composite \
  -linear-stretch 5%x0%   photocopy.png

      

Now I need to crop the area with the texts. ImageMagick has a masking feature to remove the border of the image's shape, but in my case creating the mask doesn't seem to work due to the uneven background image.

I went through SWT ' Stroke Width Transform

to identify texts in natural images' here Can I get it via imagemagick (maybe another handy image processing tool for developers) to identify text so borders can be dropped? Thanks in advance.

+3
imagemagick ocr tesseract


source to share


No one has answered this question yet

Check out similar questions:

110
image processing to improve the accuracy of Tesseract OCR
13
OCR: Image to Text?
eleven
Improve OCR Tesseract results with blurry text
five
Remove the border of the image with ImageMagick
3
OCR to text printed on a metal plate
1
Tesseract works for images containing only and only text - Crop the image to get only the text part of the image
0
ImageMagick removes background noise and leaves it white
0
Preserving color composer when performing OCR with Image magick - tesseract
0
Why does tesseract-ocr not detect the text that is in the box?
0
ImageMagick for image preview for tesseract-ocr



All Articles
Loading...
X
Show
Funny
Dev
Pics