Incorrect extracted text from image and how to improve tesseract-ocr 3.0 for C#?

1.1k Views Asked by At

I am having trouble extracting text from image using tesseract-ocr-setup-3.02.02.exe in .NET I have used simple yatt class (yatt / tesseract-ocr-class.cs) from here

I have Downloaded and installed esseract-ocr-setup-3.02.02.exe from here Then use the yatt class in C# like this

        TesseractOCR ocr = new TesseractOCR(@"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe");          
        using (Bitmap bmp = new Bitmap(@"C:\ocr\cap.jpg"))
        divOCRText.InnerHtml = ocr.OCRFromBitmap(bmp);

Its extracting text from image but there are many-2 issues. Extracted text has many spelling mistakes. Can somebody guide me what am I doing wrong?

Here is the Image to OCR

enter image description here

Extracted Text (Screenshot)

enter image description here

Here is the testdata install in my PC

enter image description here

0

There are 0 best solutions below