Tesseract setVariable whitelist for another language
Tesseract setVariable whitelist works fine for English, for example I use this to only recognize numbers and letters from an image (excluding the special characters & * ^%! Etc.)
myOCR->SetVariable("tessedit_char_whitelist",
"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ");
But I cannot do the same for Russian
myOCR->SetVariable("tessedit_char_whitelist", "0123456789");
is there another principle? Because it doesn't work. Instead of all defined characters, I only get numbers in the output, tesseract ignores all Russian letters that I have whitelisted. The blacklist didn't work either. Is there a way to get rid of it? Thank.
+3
source to share
3 answers
I had a similar problem in android (tess-two). It can be simply executed, for example this is an online tool to convert UTF8 to Java Object. For example your character set:
tess.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, "0123456789\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043A\u043B\u043C\u043D\u043E\u043F\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044A\u044B\u044C\u044D\u044E\u044F\u0410\u0411\u0412\u0413\u0414\u0415\u0416\u0417\u0418\u0419\u041A\u041B\u041C\u041E\u041F\u0420\u0421\u0422\u0423\u0424\u0425\u0426\u0427\u0428\u0429\u042D\u042E\u042F");
0
source to share