I'm have a BATCH pipeline that needs to write to BigQuery truncating the table. I'm using method STORAGE_API_WRITE and the table is not truncated, instead the values are appended.
.apply(BigQueryIO.<RunQueryResponse>write()
.to(new TableReference()
.setProjectId(clientProject)
.setDatasetId(firestoreStateDataset)
.setTableId(table))
.withFormatFunction(new RunQueryResponseToTableRow())
.withMethod(BigQueryIO.Write.Method.STORAGE_WRITE_API)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE));
I know WRITE_TRUNCATE don't work on streaming pipelines, but this is a BATCH pipeline. STORAGE_WRITE_API does not support WRITE_TRUNCATE?
The table is not partitioned.
If I change to use the default method FILE_LOADS, it works.
STORAGE_WRITE_API indeed supports real-time streaming and batch data processing to BigQuery, however, the base function is through stream hence the challenges working with WRITE_TRUNCATE.
This storage write api batch load data using pending type document provides you with instructions and code samples which you can reference for your use case. Alternatively, you can also explore configuring BigQuery load jobs for your batch data pipelines.