We are trying to use Tesseract with Tess4j for OCR text extraction.
On continuous use of tesseract over a period, we notice the RAM used by the application getting increased gradually, During this time, The heap memory is still free. We monitored the off-heap memory using the jconsole. Off-heap memory also seems normal. But the RAM RSS memory is keeps increasing for the application
The problem I'm guessing is memory leak by tesseract during memory allocation of OCR, I'm not sure. Any ideas to approach further, please share
I had same issue since last few days. I resolved by removing tess4j and using Tika 1.27 + tesseract. I used Executor service to run 3 threads at a time this kept memory within limits.
While the code given above works, later i made it simpler by just spawning a process to call tesseract directly.