Converting hOCR formatted text to Json

2.6k Views Asked by At

Trying to implement a java class to convert hOCR output from Tesseract to JSON formatted data instead. At the moment we use Abbey for our OCR service and they return JSON formatted data for the Words location on the OCR'd image. But Tesseract only returns hOCR. So need to convert tesseracts output to match that of Abbey.

0

There are 0 best solutions below