Bigquery dbt_external_tables External Data Configuration

3.2k Views Asked by At

I need some help when using the dbt_external_tables package.

I realized that in the csv I have in GCS some lines appear to have line breaks and this is causing some issues when trying to query the table created by the macro.

Sometimes when doing this configuration of the external table manually the BigQuery UI has two options: Allow jagged rows (CSV) Allow quoted newlines (CSV) true I usually put those options in true and sometimes the issues are solved. I don't know how to do this using the dbt_external_tables.

This is important as I am receiving this errors when trying to query the table created by dbt "Error while reading table: kpi-process.file_csv.History, error message: CSV table references column position 9, but line starting at position:10956 contains only 7 columns. "

1

There are 1 best solutions below

0
On

The dbt-external-tables package supports passing a dictionary of options for BigQuery external tables, which maps to the options documented here. In your case, it sounds like you want to turn on allow_jagged_rows and allow_quoted_newlines, so you can specify them like so:

version: 2
sources:
 - name: my_external_source
   tables:
     - name: my_external_table
       location: 'gs://bucket/path/*'
       options:
         format: csv
         allow_jagged_rows: true
         allow_quoted_newlines: true

And dbt will template a DDL statement accordingly:

create or replace external statement my_external_source.my_external_table
options (
  format = 'csv',
  allow_jagged_rows = true,
  allow_quoted_newlines = true,
  uris = ['gs://bucket/path/*']
)