Android OCR detect digits only using popular tessercat fork tess-two

I am using the popular OCR tessercat fork for android tess-two https://github.com/rmtheis/tess-two . I have integrated all the staff and it works, etc.

But I only need to detect numbers, my code at the moment is:

TessBaseAPI baseApi = new TessBaseAPI();
baseApi.init(pathToLngFile, langName);
baseApi.setImage(bitmap);
String recognizedText = baseApi.getUTF8Text();
baseApi.end();
doSomething(recognizedText); 

      

From here https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_recognize_only_digits ?

I am using V3 version and instead there is no solution for some command line solutions - not relevant to Android project (I think ...). So I tried to implement the solution for version <V3 and add this line:

baseApi.SetVariable("tessedit_char_whitelist", "0123456789");

      

My question is what to do with init ()? I don't need any language, but I still need init method and aint init () ...

EDIT: more specifically

My end goal is a simple document (not a blank Excel sheet) that looks like an attached image (a title and three columns separated by spaces).

My requirements are to make sense in numbers: to be able to separate and determine which numbers belong to which row and column. enter image description here

Thank,

+3


source to share


2 answers


I wanted to do the same, and after a little research, I decided to write everything, text and numbers, and then just store the numbers, this works for me:

//This Replaces all except numbers from 0 to 9    
recognizedText = recognizedText.replaceAll("[^0-9]+", " "); 

      

And now you can do whatever you want with numbers.

For example, I am using this code to get all numbers divided by a String array and display them in a TextView



String[] justnumbers = recognizedText.trim().split(" "); //Deletes blank spaces and splits the numbers
YourTextView.setText(Arrays.toString(justnumbers).replaceAll("\\[|\\]", "")) //sets the numbers into the TextView and deletes the "[]" from the String Array

      

You can see how it works here .

Hope it helps.

+3


source


I did it a little differently. Maybe it will be helpful for someone.

So, you need to initialize the API first.

TessBaseAPI baseApi = new TessBaseAPI();
baseApi.init(datapath, language, ocrEngineMode);

      



Then set the following variables

baseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_SINGLE_LINE);
baseApi.setVariable(TessBaseAPI.VAR_CHAR_BLACKLIST, "!?@#$%&*()<>_-+=/:;'\"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, ".,0123456789");
baseApi.setVariable("classify_bln_numeric_mode", "1");

      

Thus, the engine only checks the numbers.

+5


source







All Articles