Detecting large and small font sizes of the Java Tesseract OCR implementation

Is it possible to do OCR an image and detect fonts of different sizes in the image using Tesseract OCR. If so, I need to use any other third party library or I can use pure Java. For example,

I want to define the title and content of a newspaper using the font size.

Any help on this would be appreciated.

+3


source to share


2 answers


You can use the ResultIterator.WordFontAttributes API method ( example in Java using Tess4J ) to get font information including the name and font size of the recognized text.



+1


source


The hOCR Tesseract output includes string and word bounding boxes that can be used to determine the size, and can be customized in the same way as the font point size in the output by including a config variable hocr_font_info

.



0


source







All Articles