OCR position matches a frame to a field on a credit card
I am developing an OCR for credit card detection.
After scanning the image, I get a list of words with its positions. Any tips / suggestions on the best approach for determining which words correspond to each field of the credit card (number, date, name)?
For example:
position = 96.00 491.00
text = CARDHOLDER
Thank you in advance
source to share
Your first problem is that most OCRs are not optimized for small amounts of text that take up most of the "page" (or map image in your case) in spatially separated chunks. They expect lines or pages of text from a scanned book or newspaper. Therefore, they are unlikely to be able to do this immediately when analyzing the image.
Since the font is fairly uniform, they are likely to recognize characters well, but the layout will confuse the page segmentation algorithm, so the text you choose may not be in the correct order. For example, "1234" of the card number and the smaller "1234" below it make up a single column of text, as well as two second sets of four numbers and an expiration date.
For specialized cases where you know the layout in advance, you really want to develop your own page segmentation algorithm to break the image into zones, for example. card number, cardholder name, start and end date. It shouldn't be too much , because I think the layout of these components is standardized on credit cards. Assuming good preprocessing and binarization, you can basically make a horizontal histogram and split the image at valleys.
Then, extract each zone as a separate image, containing only one line of text, and feed it to OCR.
Alternatively (quick and dirty approach)
- Instruct OCR that what you want to recognize is a single column (i.e. prevents it from figuring out the page layout for itself). You can do this with Tesseract using a parameter
-psm
(page segmentation parameter) set to probably 6 (but try and see which gives the best results). - Create hOCR format for Tesseract output which you can set in config file. The hOCR format includes bounding boxes for lines that are drawn relative to the entire image.
- write an algorithm that compares the bounding boxes in hOCR to where you know each map component should be (looking for some percentage of overlap, it won't match exactly for obvious reasons.)
source to share