Document AI - Multi-page files performance affect

33 Views Asked by At

I’ve noticed that it’s possible to upload multi-page files to Document AI, such that all pages are connected to each other by being associated to the same file.

My use case is invoice files that I would like to extract data from, using a custom extractor.
Most of the invoices are 1-pagers, but some of them span over 2 pages, meaning that the second page usually is leaner than the first page, and does not contain most of the information.

My question is - will there be a difference in a trained model performance between the following file upload mechanisms:
  1. Uploading each page as a separate file, even when an invoice spans over multiple pages (I preprocess it beforehand)
  2. Uploading each file without splitting it to pages

I assume that the performance of option # 2 will be equal or greater than option # 1 - my question is mainly whether it makes a difference or not, as uploading pages separately has its own advantages for us (our use case is a bit more complicated, I simplified it for the explanation).

0

There are 0 best solutions below