How do you scale Google Cloud Document AI processing?

615 Views Asked by Kevin Eid At 30 July 2020 at 11:01

From https://cloud.google.com/document-ai/docs/process-forms, I can see some example of processing single files. But in most cases, companies have buckets of documents. In that case, how do you scale the document ai processing? Do you use the document ai in conjunction with Spark? Or is there another way?

Original Q&A

There are 2 best solutions below

Kevin Eid On 30 July 2020 at 11:24

I could only find the following: batch_process_documents process many documents and return an async response that'll get saved in cloud storage.

From there, I think that we can parametrise our job by adding an input path of the bucket prefix and distribute the job over several machines.

All of that could be orchestrated via Airflow for example.

Holt Skinner On 02 August 2022 at 20:57

You will need to use Batch Processing to handle multiple documents at once with Document AI.

This page in the Cloud Documentation shows how to make Batch Processing requests with REST and the Client Libraries.

https://cloud.google.com/document-ai/docs/send-request#batch-process

This codelab also illustrates how to do this in Python with the OCR Processor. https://codelabs.developers.google.com/codelabs/docai-ocr-python

How do you scale Google Cloud Document AI processing?

There are 2 best solutions below

Related Questions in GOOGLE-CLOUD-PLATFORM

Related Questions in GOOGLE-CLOUD-DATAPROC

Related Questions in CLOUD-DOCUMENT-AI

Related Questions in GOOGLE-CLOUD-AI

Trending Questions

Popular # Hahtags

Popular Questions