Android OCR detect digits only using popular tessercat fork tess-two
I am using the popular OCR tessercat fork for android tess-two https://github.com/rmtheis/tess-two . I have integrated all the staff and it works, etc.
But I only need to detect numbers, my code at the moment is:
TessBaseAPI baseApi = new TessBaseAPI();
baseApi.init(pathToLngFile, langName);
baseApi.setImage(bitmap);
String recognizedText = baseApi.getUTF8Text();
baseApi.end();
doSomething(recognizedText);
From here https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_recognize_only_digits ?
I am using V3 version and instead there is no solution for some command line solutions - not relevant to Android project (I think ...). So I tried to implement the solution for version <V3 and add this line:
baseApi.SetVariable("tessedit_char_whitelist", "0123456789");
My question is what to do with init ()? I don't need any language, but I still need init method and aint init () ...
EDIT: more specifically
My end goal is a simple document (not a blank Excel sheet) that looks like an attached image (a title and three columns separated by spaces).
My requirements are to make sense in numbers: to be able to separate and determine which numbers belong to which row and column.
Thank,
source to share
I wanted to do the same, and after a little research, I decided to write everything, text and numbers, and then just store the numbers, this works for me:
//This Replaces all except numbers from 0 to 9
recognizedText = recognizedText.replaceAll("[^0-9]+", " ");
And now you can do whatever you want with numbers.
For example, I am using this code to get all numbers divided by a String array and display them in a TextView
String[] justnumbers = recognizedText.trim().split(" "); //Deletes blank spaces and splits the numbers
YourTextView.setText(Arrays.toString(justnumbers).replaceAll("\\[|\\]", "")) //sets the numbers into the TextView and deletes the "[]" from the String Array
You can see how it works here .
Hope it helps.
source to share
I did it a little differently. Maybe it will be helpful for someone.
So, you need to initialize the API first.
TessBaseAPI baseApi = new TessBaseAPI();
baseApi.init(datapath, language, ocrEngineMode);
Then set the following variables
baseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_SINGLE_LINE);
baseApi.setVariable(TessBaseAPI.VAR_CHAR_BLACKLIST, "!?@#$%&*()<>_-+=/:;'\"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, ".,0123456789");
baseApi.setVariable("classify_bln_numeric_mode", "1");
Thus, the engine only checks the numbers.
source to share