Image processing to improve Tesseract OCR

Question

Image processing to improve Tesseract OCR

I am using tesseract to convert documents to text. Document quality fluctuates wildly and I'm looking for advice on what image processing can improve the results. I noticed that text that is highly pixelated, such as generated by fax machines, is especially difficult to handle with tesseract - apparently all those jagged edges of the characters are mixing up the shape recognition algorithms.

What image processing techniques will improve accuracy? I used Gaussian blur to smooth out the pixelated images and saw a slight improvement, but I hope there is a more specific method that will give better results. Let's say a filter that was tuned to black and white images that smoothed out irregular edges, and then a filter that increased the contrast to make the characters sharper.

Any general tips for anyone new to image processing?

+3

tesseract

Sagar Parmar June 30. '15 at 6:24

source to share

No one has answered this question yet

Check out similar questions:

110

image processing to improve the accuracy of Tesseract OCR

21

Image preprocessing for OCR Tesseract with OpenCV

15

iOS Tesseract OCR Image Preperation

ten

Is there a way to improve Tesseract OCR with small fonts?

4

Digital numbers on Tesseract OCR

3

Tesseract OCR, reading low resolution font / pixels (especially numbers)

2

Next step of image preprocessing for OCR with Tesseract (tess4j)

1

How to fix Tesseract OCR page segmentation using image processing?

0

Enlarge OCR faxes with Tesseract

-1

Android OCR Project

Image processing to improve Tesseract OCR

More articles: