bigrquery bq_table_load csv file with tab delimiter

657 Views Asked by At

I am trying to use bigrquery's bq_table_load() command to move a tab delimited csv file from google storage to bigrquery. It works but it doesn't automatically recognize the column names. Doing the same thing interactively (i.e. in the bigquery clould console) works well. Comparing the jobs metadata for the two jobs (R induced jobs vs cloud console job) I note that the column delimiter is not set to TAB for the R job. This is despite me including this in my command call; e.g. as follows:

bq_table_load(<x>,<uri>, fieldDelimiter="Tab",    source_format = "CSV", autodetect=TRUE)

I tried all sorts of variations of this...nothing seems to work (i.e. the R job will always have the Comma delimiter set)...here are some of the variations I tried:

bq_table_load(<x>,<uri>, field_delimiter="Tab",    source_format = "CSV", autodetect=TRUE)

bq_table_load(<x>,<uri>, field_delimiter="\t",    source_format = "CSV", autodetect=TRUE)

bq_table_load(<x>,<uri>, field_delimiter="tab",    source_format = "CSV", autodetect=TRUE)

Any suggestions?

1

There are 1 best solutions below

1
On

You can define Schema using a schema file, a sample is given below:-

Sample BQ load command, where '$schema_dir/$TABLENAME.json' Represent a schema file :-

   bq --nosync load --source_format=CSV --skip_leading_rows=3 --allow_jagged_rows=TRUE --max_bad_records=10000 \
      --allow_quoted_newlines=TRUE  $projectid:$dataset.$TABLENAME \
       $csv_data_path/$FILENAME $schema_dir/$TABLENAME.json

Sample Schema file

[
  {
    "mode": "NULLABLE",
    "name": "C1",
    "type": "STRING"
  }
]