Java Tesseract improve text reading from image

711 Views Asked by At

I am trying to read text from a League of Legends chat screen.

To achieve this, I created a java application using tesseract. However, the text returned is not completely correct.

This is the code I'm using to get the text from an image using https://sourceforge.net/projects/tess4j/

File file = new File("screenshots/screenshot-15.59.19.png");
ITesseract instance = new Tesseract();
instance.setTessVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqtrstuvwxyz():[] ");

File tessDataFolder = LoadLibs.extractTessResources("tessdata");
instance.setDatapath(tessDataFolder.getAbsolutePath());

try {
    String result = instance.doOCR(file);
} catch (TesseractException e) {
    System.err.println(e.getMessage());
}

In the tessdata folder I got the english tessdata file https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata

The image returns the following result (I removed personal details in the picture and result):

Worlds map zccwti 3r: enabledl
They can be Logged on or oil in Options a Vidw
Type map [or a [m cl (ommands

[00: [ s]  (Draven): gt:
[00: [ 7] W (Draven): 3g
[00: [ 7] (Dram): agg
[00: [ s] (Draven): 33g
[00: [ 9] (Draven): ga
[00:20] (Draven): galgaBg
[00:2I] (Draven): gagBa
[00:23] (Dymnpv Flash 0 Ready
[00:23] (0mm): Te[epon : Rudy
[00:24] (Draven): arfar
[00:25] (Draven): m3
[00:27] (omen): 2m Heal
[00:27] (Dymnpv 2m Exhauxt
[00:20] (Draven): mmlr
[00:29] (Draven): ms

How can I improve my code to get the correct text?

If not possible, how can I improve the screenshot in java or any programming language (create a new application).

The eventual data I want is the game time eg: [00:05] and the text after the ':'

0

There are 0 best solutions below