Python Tesseract OCR training for specific wordlist

Question

Python Tesseract OCR training for specific wordlist

I am new to OCR and Tesseract.

So far, I have a working script that extracts pretty good text from images.

My doubt: is it possible to train tesseract to extract only words / characters represented in some kind of dictionary file?

For example, I have a .txt with a large list of people's names and I want to train Tesseract that "SONIA" is not "50NlA" and "YANNICK", not "VANNlD", etc.

If he has a list of all possible names, can he give better precision? If the original image is text with a lot of people's names and other information about those faces, but I only want to get the names from the ocr and ignore the "noisy information", what can I do? Sorry if this is a stupid question.

I have read this https://groups.google.com/forum/#!topic/tesseract-ocr/r5qkHxQOT98 and the guide http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html and created eng.user words and bazaar files ... what should be the next step? Since it gives me the same results ...

Thanks a lot for your time and patient.

+3

python string image-processing ocr tesseract

Inês Martins June 12. 15 at 11:15

source to share

No one has answered this question yet

See similar questions:

1

Tesseract training: just a few words

1

Tesseract OCR: recognize only dictionary words

or similar:

3119

What is the difference between Python list methods that are appended and expanded?

2818

Finding the index of an element by specifying the list that contains it in Python

2664

How can I check if a string contains a specific word?

2047

How do I concatenate two lists in Python?

1798

Getting the last item in a list

1782

How can I get the number of items in a list?

1646

Why is it string.join (list) instead of list.join (string)?

1170

Create a list comprehension dictionary

1

tesseract OCR - Q is defined as O

0

Teaching a tesseract with a new language that has almost the same script as Vietnamese

Python Tesseract OCR training for specific wordlist

More articles: