Extract only the handwriting text from a pdf using OCR

243 Views Asked by At

I am working OCR handwriting to text conversion using Google cloud vision api.

Input is a pdf with 1-5 pages, but the catch is the each page of pdf can have default header and footer printed on it and in between that, an answer would be written by the student.

I am doing this in nodejs but open to suggestions if I can do this with accuracy.

Now the issue is, google OCR is converting everything into the plain text without telling which one was printed text vs handwriting.

Any which way I can achieve that?

1

There are 1 best solutions below

0
Poala Astrid On

A feature request has been filed regarding the Google Cloud Vision API's DOCUMENT_TEXT_DETECTION whether the text is handwritten or typed/printed. This feature has been anticipated by many users. By staying updated, you can follow the link of this feature request or always take a look at release notes

There are other ways to achieve it by using other 3rd party tools such as ABBYY, Nanonets OCR API, Kofax, etc.