Tesseract character recognition issues in Android (but not iOS?)

I have created an application that uses Tesseract (V3.03 rc1) to identify certain text strings. Unfortunately, they are printed in a special font that requires me to create my own instructions file. I've created an app on iOS (using https://github.com/gali8/Tesseract-OCR-iOS for inspiration) and Android (using https://github.com/rmtheis/tess-two/ for inspiration).

The workflow for both platforms is as follows:

  • I select the bounding rectangle in the preview screen where I can cut the appropriate text and crop the image accordingly.

  • I am using OpenCV to get a binary image (using OpenCV's adaptive thresholding feature with the same parameters for both platforms)

  • I am passing this binary to Tesseract. Both platforms (Android and iOS) use the same instruction file.

And yet, iOS is great at recognizing text strings, while Android continues to misidentify certain characters (6s for Ss, As for Hs).

On both platforms, I am using the same whitelist string, I will disable load_type_dawg and load_system_dawg, and also choose to keep blob options.

Has anyone encountered a similar situation before? Am I missing settings on Android that are automatically handled by iOS? Is there something special about Android that didn't cross my mind?

Any thoughts or advice would be greatly appreciated!

+3


source to share


1 answer


So, after a lot of work, I found out what was wrong with my Android app (luckily it wasn't a problem with Tesseract at all). Since I am more familiar with iOS applications than Android, I was not sure how to load the downloaded data file into the application without requiring the user to download the file to an external storage device. I found inspiration in this project ( http://www.codeproject.com/Tips/840623/Android-Character-Recognition ) as they automatically download the prepared data file.

However, I have not figured out how it works. I originally thought that TessDataManager did a file search in the local tesseract / tessdata folder of the project to get the prepared data file (as it does in iOS). However, this is not what he does. It rather checks the internal structure of the file (data / data / filename / files / tesseract / tessdata / trainingdatafilegoeshere) to see if the file exists, and if not, it copies the data file it stores in Resources / Source Directory. In my case it defaulted to eng file, so it never read my own font file.



Hope this helps someone else to have similar problems. Thanks to Robin and RmTheis for your help!

+1


source







All Articles