how to recognize form data from different fields of form using OCR in java?

2.8k Views Asked by At

here is the form

I have an image of form which contains different fields like name, number, address etc. I want to recognize data from these fields and save them to database. Now, my OCR is working fine but I don't know how to extract specific field data(name, address) from image to be used for OCR. simply I want to know how to recognize characters in output files are from name field or address field or any other field.

2

There are 2 best solutions below

4
Osiris On

Since you know the exact areas of the form the different fields will be in, you can use some image manipulation library crop the image and send only specific regions to the OCR engine.

Check this SO question.

0
A.H On

You have two solutions to get the data you want either you use @osiris's solution or you have to add a text mining layer. First solution : you get the image and cut it into pieces (the pieces that contains the needed data). For example, you cut the image into 2 pieces one that contains the name and the second one that contains the address by cropping the original image based on fields position (X & Y)and for that you have to use an image library to manipulate your original image . The second solution is to use a text mining layer without doing the cropping. In this solution you have to use models that detects the names and addresses (duckling.ai), you can train your own model or you can even use some chatbot engines and you train your chatbot engine to detect the names and addresses as entities (recast.ai or rasa for example).