Change font size and threshold for scanned Arabic words
I am working on an Arabic OCR for printed scanned documents ... Some of the scanned documents can be written with a font size of 8 height which is quite small ... I want to resize to 60px but some artifacts may be due to the nature of the Arabic characters .. some characters may overlap. I've used local thresholding methods after resizing but the results are still unacceptable ... any ideas?
This is an example image:
This is the same example after resizing and applying a local adaptive threshold using 50 as the window size:
As you can see, there are some gaps in some characters like this:
Is there a way to resize the image while saving the text form?
My approach to fixing character gaps:
-
Threshold original image before resizing using local adaptive thresholding using window size 16 (this will resolve the character gaps, but the holes in the characters are filled) name it
smallbw
. -
Resize
smallbw
withimresize(smallbw, [nh nw], 'nearest')
and fill the holes in the symbols usingimfill
-
Resize the original image to a height of 60px using
imresize(originalIm, [nh nw], 'nearest')
name itlargebw
-
Fill holes in
largebw
withimfill
and name itbwfill
-
Extract holes from
largebw
onbwholes = bwfill - largebw
-
Finally, subtract
bwholes
fromsmallbw
to get this
you can see here that the gap found in the @Image 3 symbol has been resolved ... but another problem arises: some symbols may overlap as shown here.
These are the best results I have been able to achieve so far ... are there any other ideas that might solve these problems? and if you think this problem has no solution, how can I solve it and not use resizing? how about using 12 font text instead of 8?
Useful links: Using the local adaptive threshold method
Operating system: Windows 7
Programming environment: Matlab 2013a - Image processing toolbar
source to share
No one has answered this question yet
Check out similar questions: