Is it possible Track Document fields using custom object detection or segmentation (or any other ML Technique) ?

The Ask is to Do Key Element Extraction from document like first name, last name, Identity card number, Expiry date etc ?

So I understand that regular expression etc can solve this problem to some extent but Looking at PP Structure from paddle ocr or LayoutLM, LayoutXLM kind of models and technique. It seems people are doing it and concepts do exist.

My Question is specially around is this possible on low compute devices like android or ios which don't have large compute available using tenser-flow light or ncnn kind of mobile compatible deep learning frameworks ?

1

There are 1 best solutions below

0
On

Yes it is possible How I did it is using YoloV8

Train a Object Detection model on document with field labeled

Preparing right data is very important here with proper
validation and test set of images. Also don't use same image for training and validation because it will end up overfitting.

Default model generated by YoloV9 is py torch model which can be converted to onnx(Open Neural Network exchange) which can be converted to tensorflow light model.

Once you have tflight model you can use it in android app to predict different labels and perform OCR using ML Kit or Paddle OCR on those cropped label images.