Tesseract Train for Specific Words - Possible?

I want to use Tesseract to extract 10-20 keywords from a document. The document will contain all English characters / words. I'm interested in something like "Age: 23". Here Age is the keyword I'm interested in and you want to extract 23 (the value for that).

The first approach that comes to my mind is to extract the entire page into text and then look for keywords in the recognized text. But from a tesseract learning point of view, is there a better approach if I know the keywords, which can lead to better accuracy?

I am more or less aware of the limitations of Tesseract OCR. Trying to maximize within these constraints. Thanks for all your recommendations.

+2


source to share


1 answer


Try a bazaar that matches the pattern in Tesseract.



+4


source







All Articles