I'm having issues reading white text on a bright background, it finds the text itself but it cannot really translate it correctly.
The result I keep getting is LanEerus
which is not that far off, to be honest.
What I'm wondering is what image pre-processing could fix this? I'm using photoshop to manually pre-process it before I try to do it with code, to find what should work first.
I've tried making it a bitmap, but that makes the borders of the text pretty bad, resulting in tesseract just translating it to random characters.
Inverting colors and/or grayscaling doesn't seem to do the trick, either.
Anyone have any ideas? I know it's a pretty bad background for the text for this case. Trust me, I wish that the background was different!
My code for the tests:
File file = new File("C:\\tess\\lando.png");
ITesseract tess = new Tesseract();
tess.setDatapath("tessdata");
System.out.println(tess.doOCR(file));
EDIT
I have read through Improving the quality but couldn't get those tips to work.
EDIT 2
After using OpenCV to preprocess the image with grayscale, inverting colors, gaussian blur and adaptive threshold. I get this result of the image but no better reading. If anything, worse..
Here's one possible solution. This is in Python, but it should be clear enough for a Java port. We will apply a method called gained division. The idea is that you try to build a model of the background and then weight each input pixel by that model. The output gain should be relatively constant during most of the image. This will get rid of most of the background color variation. We can use a
morphological
chain to clean the result a little bit, let's see the code:The first step is to apply gain division, the operations you need are straightforward: a morphological
closing
with a big rectangularstructuring element
and some data type conversions, be careful with the latter ones. This is the image you should see after the method is applied:Very cool, the background is almost gone. Let's get a binary image using Otsu's Thresholding:
This is the binary image:
We have a nice image of the edges of the text. We can get a black-background and white-text if we
Flood-Fill
the background with white. However, we should be careful of the characters, because if a character is broken, theFlood-Fill
operation will erase it. Let's first assure our characters are closed by applying a morphologicalclosing
:This is the resulting image:
As you see, the edges are a lot stronger and, most importantly, are closed. Now, we can
Flood-Fill
the background with white color. Here, theFlood-Fill
seed point is located at the image origin (x = 0
,y = 0
):We get this image:
We are almost there. As you can see, the holes inside of some characters (e.g., the "a", "d" and "o") are not filled - this can produce noise to the
OCR
. Let's try to fill them. We can exploit the fact that these holes are all children of a parent contour. We can isolate the children contour and, again, apply aFlood-Fill
to fill them. But first, do not forget to invert the image:This is the resulting mask:
Cool, let's apply the
OCR
. I'm usingpyocr
:The output: