Replace character in large txt file

203 Views Asked by At

I have a series of .txt files that I need to upload to Google Cloud in order to create some tables out of them in BigQuery. The files are tables whose separator is a ';'.

For some reason it seems like BigQuery has issues at identifying columns (automatically or not) when the files are in this shape, even if specifying ';' as the separator.

I generated a sample table, and find-and-replace the ';' with some ',' and saved the file as .csv. Now BigQuery has no issue at creating the apt table.

Question: Should I find-and-replace all the all the separators in all the tables? or am I missing something?

If yes how can I implement on a OS 10 the sep prompt command? (files are large and I have issues at timely substitute character also with UltraEdit)

Best!

1

There are 1 best solutions below

0
On BEST ANSWER

To address this issue - you can upload such file(s) as having just one column and then after you have data in BigQuery - parse it using SQL into whatever schema you need. I think this is the most generic and easy way to deal with this. Usually SPLIT(col, ';') would be used as in below quite simplified example or in more complex cases REGEXP_EXTRACT_ALL()

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'aa,a;bbb;ccc' col
)
SELECT 
  SPLIT(col, ';')[SAFE_OFFSET(0)] col1, 
  SPLIT(col, ';')[SAFE_OFFSET(1)] col2, 
  SPLIT(col, ';')[SAFE_OFFSET(2)] col3
FROM `project.dataset.table`   

with result

Row col1    col2    col3     
1   aa,a    bbb     ccc