Tesseract Tess4J giving different results whether running in debug or run mode

52 Views Asked by At

I am running Tesseract in Java 11.0.16 (Tess4j 5.7.0) for getting all words from an image. The process uses no threads. If I run it on 'Run' mode I get list of 174 words. If I run the exact same one with the same image in 'Debug' mode with no breakpoints, I get a different list, now with 169 words. The lists also differ in the skew angle detected for each mode. In 'Run' mode the calculated angle is: 0.0842, and in 'Debug' mode the angle value is: 0.0421

The 'Run' mode list starts like this:

Word 01, x: 1398.00, y: 0266.00, x+w: 1410.00, y+h: 0284.00, text: 1
Word 02, x: 0081.00, y: 0311.00, x+w: 0255.00, y+h: 0333.00, text: TERT
Word 03, x: 0268.00, y: 0304.00, x+w: 0290.00, y+h: 0341.00, text: vo
Word 04, x: 0302.00, y: 0312.00, x+w: 0346.00, y+h: 0334.00, text: LL
Word 05, x: 0422.00, y: 0311.00, x+w: 0500.00, y+h: 0334.00, text: a
Word 06, x: 0576.00, y: 0311.00, x+w: 0641.00, y+h: 0333.00, text: a
Word 07, x: 0762.00, y: 0307.00, x+w: 0872.00, y+h: 0324.00, text: Ref
Word 08, x: 0889.00, y: 0307.00, x+w: 0995.00, y+h: 0324.00, text: Number:
Word 09, x: 1018.00, y: 0307.00, x+w: 1141.00, y+h: 0325.00, text: 3491159
Word 10, x: 1271.00, y: 0308.00, x+w: 1381.00, y+h: 0329.00, text: Commp
Word 11, x: 1396.00, y: 0307.00, x+w: 1502.00, y+h: 0325.00, text: Number:

And the 'Debug' mode list starts like this:

Word 01, x: 1397.00, y: 0266.00, x+w: 1409.00, y+h: 0283.00, text: 1
Word 02, x: 0080.00, y: 0310.00, x+w: 0234.00, y+h: 0328.00, text: SEE
Word 03, x: 0268.00, y: 0300.00, x+w: 0289.00, y+h: 0336.00, text: Oo
Word 04, x: 0302.00, y: 0300.00, x+w: 0348.00, y+h: 0336.00, text: Tone
Word 05, x: 0364.00, y: 0310.00, x+w: 0430.00, y+h: 0332.00, text: SE
Word 06, x: 0481.00, y: 0310.00, x+w: 0528.00, y+h: 0332.00, text: IE
Word 07, x: 0556.00, y: 0310.00, x+w: 0640.00, y+h: 0332.00, text: ETE
Word 08, x: 0761.00, y: 0306.00, x+w: 0871.00, y+h: 0323.00, text: Ref
Word 09, x: 0888.00, y: 0306.00, x+w: 0994.00, y+h: 0323.00, text: Number:
Word 10, x: 1017.00, y: 0306.00, x+w: 1140.00, y+h: 0324.00, text: 3491159
Word 11, x: 1270.00, y: 0308.00, x+w: 1380.00, y+h: 0328.00, text: Commp

If you see, the x, y coordinates also change.

The parameters values for PageIterator is 3 and PageSegMode is 6.

Having both different lists heavily impact on the final results, so this is quite undesired. Any idea how this is happening? Any suggestion about how to get the exact same lists under 'Debug' and 'Run' mode, since any other running conditions are the same?

I have tried the same without de-skewing the images, or with different values for PageIterator and PageSegMode. Also, I placed a short sleep command, even though I am not using threads.

Thank you very much.

0

There are 0 best solutions below