I am currently building a custom OCR extractor with Google's Document AI, my documents are usually around 8-14 pages long and I have created a schema across all possible pages. Using the defined schema, I begin annotating manually across all pages of the imported documents. However, when I evaluate my model, it seems that the model can only accurate annotate/predict labels on the first page. Does anyone know what is the cause of this issue? Thanks a lot!
** I'll expand more on how I have been annotating maybe this is the reason why it has been causing me this issue. So I have around 40 labels and each page of the document uses around 5-15 labels depending on what the type of content is on each page. The way I have been labeling is I annotate the labels that exist on that specific page. For example, page 1 has company_name, company_address, company_type so I only label them and leave the other 30ish labels empty and move on to the next page. Is this correct? Or am I missing something in this step?