Change font size and threshold for scanned Arabic words

I am working on an Arabic OCR for printed scanned documents ... Some of the scanned documents can be written with a font size of 8 height which is quite small ... I want to resize to 60px but some artifacts may be due to the nature of the Arabic characters .. some characters may overlap. I've used local thresholding methods after resizing but the results are still unacceptable ... any ideas?

This is an example image:

Image 1

This is the same example after resizing and applying a local adaptive threshold using 50 as the window size:

Image 2

As you can see, there are some gaps in some characters like this:

Image 3

Is there a way to resize the image while saving the text form?

My approach to fixing character gaps:

  • Threshold original image before resizing using local adaptive thresholding using window size 16 (this will resolve the character gaps, but the holes in the characters are filled) name it smallbw

    .

  • Resize smallbw

    with imresize(smallbw, [nh nw], 'nearest')

    and fill the holes in the symbols usingimfill

  • Resize the original image to a height of 60px using imresize(originalIm, [nh nw], 'nearest')

    name itlargebw

  • Fill holes in largebw

    with imfill

    and name itbwfill

  • Extract holes from largebw

    onbwholes = bwfill - largebw

  • Finally, subtract bwholes

    from smallbw

    to get this

Image 4

you can see here that the gap found in the @Image 3 symbol has been resolved ... but another problem arises: some symbols may overlap as shown here.

Image 5

These are the best results I have been able to achieve so far ... are there any other ideas that might solve these problems? and if you think this problem has no solution, how can I solve it and not use resizing? how about using 12 font text instead of 8?

Useful links: Using the local adaptive threshold method

Operating system: Windows 7

Programming environment: Matlab 2013a - Image processing toolbar

+3


source to share





All Articles