My task is to extract text from a scanned document/ JPG and then get only below mentioned 6 values so that I can auto-fill a form-data in my next screen/ activity.
I used google cloud vision api in my android app with a Blaze version(paid), And I got the result as a text block, but I want to extract only some of information out of them, how I can achieve that?
Bills or receipt can be different all the time but I want 6 things out of all the invoices text block for Ex -
- Vendor
- Account
- Description
- Due Date
- Invoice Number
- Amount
Is there any tool/3rd party library available so that I can use in my android development.
Note - I don't think any sample of receipt or bill image needed for this because it can be any type of bill or invoice we just need to extract 6 mentioned things from that extracted text.
In the next scenarios I will create two fictive bill formats, then write the code algorithm to parse them. I will write only the algorithm because I don't know JAVA.
On the first column we have great pictures from two bills. In the second column we have text data obtained from OCR software. It's like a simple text file, with no logic implemented. But we know certain keywords that can make it have meaning. Bellow is the algorithm that translates the meaningless file in a perfect logical JSON.
At the end you will know which template extracted most of the data, and based on this which one to use for that bill. Feel free to ask anything in comments. If I stayed 45 minute to write this answer, I'll surely answer to your comments as well. :)