Sample preprocessing for identification of a text area in image acquisition before using tesseract-OCR to extract text
I am using ImageMagick to preprocess the receipt image before using the tesseract-OCR engine to extract the texts. I removed noise from the image using
convert input.png -colorspace gray \
\( +clone -blur 0x2 \) +swap -compose divide -composite \
-linear-stretch 5%x0% photocopy.png
Now I need to crop the area with the texts. ImageMagick has a masking feature to remove the border of the image's shape, but in my case creating the mask doesn't seem to work due to the uneven background image.
I went through SWT ' Stroke Width Transform
to identify texts in natural images' here Can I get it via imagemagick (maybe another handy image processing tool for developers) to identify text so borders can be dropped? Thanks in advance.
+3
source to share
No one has answered this question yet
Check out similar questions: