Tesseract OCR: Parsing Table Cells

I am using Tesseract-OCR v4.0.0 (alpha?) From cmd to extract text from png table shown below:

Input data image (png)

I wanted Tesseract-OCR to parse what was in one cell before moving on to the next. I don't want to go to the next word in the line.

Expected:

. . . John Smith 07 March,2017 Chicago Milwaukee Detroit Pacific Ocean . . .

Actual

. . . John Smith 07 March,2017 Chicago Pacific Ocean Milwaukee Detroit . . .

I tried:

  • Change page segmentation using the -psm flag from 0-13. The results usually coincide with slight differences or unreadable results.

Is there any other way to configure Tesseract to read the entire content of one cell before moving on to the next? Otherwise, are there any workarounds?

+3


source to share





All Articles