I`m preparing image for OCR by Tesseract (pre-trained for this custom font) on Java (using OpenCV library).
There is an image with blue-colored text, after image resizing and binarization by OpenCV inRange() method I have black and white image, but some letters are connected and Tesseract sometimes makes mistakes on them. Also, there are few more problems : the original text is pretty small, it`s border pixels always have a bit different RGB values and background always different too.
I tried to increase the number of pixels that the inRange() method captures, but got much more connected characters. After decreasing amount of captured pixels some letters became barely visible and Tesseract cant read them.
Please, advise me how to split those characters by white color on binarized images. Or maybe there is more efficient way to extract text from colored images? Any text extraction/recognizing advices will be good, not only for Tesseract and OpenCV.
All your texts on images have a blue color. In the first step try to use the approach (color filtering) described in this tesseract user forum. It is in python but there could be something similar for java.