Tessnet2 using Tesseract Engine - Why is it giving very bad results?
I am trying to use Tessnet2 using Tesseract engine in C #. For many of the test images I pass to Tessnet2, the result is very poor and almost nothing happens.
This is my code in a C # console project, Program.cs class:
static void Main(string[] args)
{
try
{
Bitmap image = new Bitmap(@"C:\Users\hp\Desktop\eurotext.tif");
var ocr = new Tesseract();
//when I tried to add the SetVariable(...), it didn't change the output much
ocr.Init(@"C:\Program Files (x86)\Tesseract-OCR", "eng", true);
var result = ocr.DoOCR(image, Rectangle.Empty);
foreach (Word word in result)
Console.WriteLine("{0} : {1}", word.Confidence, word.Text);
Console.ReadLine();
}
catch (Exception exception)
{
Console.WriteLine("Error");
}
}
For example, this is a sample (300 dpi large binary) test image "eurotext.tif":
And this is the Tessnet2 result for this image:
I am using this site to learn how to use Tessnet2: https://code.msdn.microsoft.com/windowsdesktop/How-to-use-Tessnet2-library-716be12f
I used this site to try and use the SetVariable (...) function correctly to get it to do what I want, but with no luck and not much different from the output: http://www.sk-spell.sk.cx / tesseract-ocr-en
I found Tesseract rules to reduce engine error: http://code.google.com/p/tesseract-ocr/wiki/ImproveQuality
-
it says, "Tesseract works best with text using a DPI of at least 300 dpi." This sample image is 300 dpi
-
this sample image is binary as well, which should give a better result as many people have talked about on various sites.
I've looked everywhere for a solution that can improve accuracy and I've found many posts and people with similar problems but no working solution.
What is the cause of this problem? How can I solve it?
I'm getting started on this thread, so please bear with me if the solution is too trivial.
Thank!
source to share