Sample preprocessing for identification of a text area in image acquisition before using tesseract-OCR to extract text

Question

Sample preprocessing for identification of a text area in image acquisition before using tesseract-OCR to extract text

I am using ImageMagick to preprocess the receipt image before using the tesseract-OCR engine to extract the texts. I removed noise from the image using

convert input.png -colorspace gray \
  \( +clone -blur 0x2 \) +swap -compose divide -composite \
  -linear-stretch 5%x0%   photocopy.png

Now I need to crop the area with the texts. ImageMagick has a masking feature to remove the border of the image's shape, but in my case creating the mask doesn't seem to work due to the uneven background image.

I went through SWT ' Stroke Width Transform

to identify texts in natural images' here Can I get it via imagemagick (maybe another handy image processing tool for developers) to identify text so borders can be dropped? Thanks in advance.

+3

imagemagick ocr tesseract

Sanjay sharma 08 jan. 15 at 11:17

source to share

No one has answered this question yet

Check out similar questions:

110

image processing to improve the accuracy of Tesseract OCR

13

OCR: Image to Text?

eleven

Improve OCR Tesseract results with blurry text

five

Remove the border of the image with ImageMagick

3

OCR to text printed on a metal plate

1

Tesseract works for images containing only and only text - Crop the image to get only the text part of the image

0

ImageMagick removes background noise and leaves it white

0

Preserving color composer when performing OCR with Image magick - tesseract

0

Why does tesseract-ocr not detect the text that is in the box?

0

ImageMagick for image preview for tesseract-ocr

Sample preprocessing for identification of a text area in image acquisition before using tesseract-OCR to extract text

More articles: