Google Document AI Form Parser is not returning entities for all pages

214 Views Asked by Emma At 07 June 2025 at 08:16

I am trying the Google Document AI with a standard Form Parser. I processed a 60 pages PDF file and the OCR result returned entities for a first few pages and the rest of the pages do not include the entities in response. I couldn't find a documentation about this field but is this optimistic field?

Is there any way to enforce to have this field in all the response?

If not, is there any way to identify what kind of pages would likely to have this field in response vs not? i.e. how the page should look like to make the entities detection to run.

gcs_docs = [
    documentai.GcsDocument(
        gcs_uri=input_file, 
        mime_type='application/pdf'
    )
]   

gcs_documents = documentai.GcsDocuments(documents=gcs_docs)
input_config = documentai.BatchDocumentsInputConfig(gcs_documents=gcs_documents)

gcs_output_config = documentai.DocumentOutputConfig.GcsOutputConfig(
    gcs_uri=output_file, 
    field_mask="text,entities,pages.pageNumber,pages.formFields",
    sharding_config={"pages_per_shard": 1, "pages_overlap": 0}
)

Original Q&A

There are 1 best solutions below

Nestor On 30 November 2023 at 19:07

According to this article Form parser can detect 11 generic Entities. Are the files are images converted to PDF? The api might be having trouble detecting some entites due to image quality etc,

Can try different versions of the API just to see what is stable for your use case? See version management here. (If persist I would recommend filing a support case about this to help possible improvement of entity detection of the processor)

Google Document AI Form Parser is not returning entities for all pages

There are 1 best solutions below

Related Questions in GOOGLE-CLOUD-PLATFORM

Related Questions in OCR

Related Questions in CLOUD-DOCUMENT-AI

Trending Questions

Popular # Hahtags

Popular Questions