Image processing to improve Tesseract OCR

I am using tesseract to convert documents to text. Document quality fluctuates wildly and I'm looking for advice on what image processing can improve the results. I noticed that text that is highly pixelated, such as generated by fax machines, is especially difficult to handle with tesseract - apparently all those jagged edges of the characters are mixing up the shape recognition algorithms.

What image processing techniques will improve accuracy? I used Gaussian blur to smooth out the pixelated images and saw a slight improvement, but I hope there is a more specific method that will give better results. Let's say a filter that was tuned to black and white images that smoothed out irregular edges, and then a filter that increased the contrast to make the characters sharper.

Any general tips for anyone new to image processing?

+3
tesseract


source to share


No one has answered this question yet

Check out similar questions:

110
image processing to improve the accuracy of Tesseract OCR
21
Image preprocessing for OCR Tesseract with OpenCV
15
iOS Tesseract OCR Image Preperation
ten
Is there a way to improve Tesseract OCR with small fonts?
4
Digital numbers on Tesseract OCR
3
Tesseract OCR, reading low resolution font / pixels (especially numbers)
2
Next step of image preprocessing for OCR with Tesseract (tess4j)
1
How to fix Tesseract OCR page segmentation using image processing?
0
Enlarge OCR faxes with Tesseract
-1
Android OCR Project



All Articles
Loading...
X
Show
Funny
Dev
Pics