Image processing to improve Tesseract OCR
I am using tesseract to convert documents to text. Document quality fluctuates wildly and I'm looking for advice on what image processing can improve the results. I noticed that text that is highly pixelated, such as generated by fax machines, is especially difficult to handle with tesseract - apparently all those jagged edges of the characters are mixing up the shape recognition algorithms.
What image processing techniques will improve accuracy? I used Gaussian blur to smooth out the pixelated images and saw a slight improvement, but I hope there is a more specific method that will give better results. Let's say a filter that was tuned to black and white images that smoothed out irregular edges, and then a filter that increased the contrast to make the characters sharper.
Any general tips for anyone new to image processing?
source to share
No one has answered this question yet
Check out similar questions: