I'm using Google Cloud Storage as a source and BigQuery as the destination for a Data Transfer feature that's available in BigQuery.So, on the data source details when creating a new data transfer I need to input the 'Cloud Storage URI', but I don't know how to exclude files with a certain text such as '_All_Data' on the file name.
I've tried to use the pattern bucket/filepath1/2023/*/*/filename_2023*!_All_Data.csv which was provided by ChatGPT, but it didn't work. I was expecting that I would be able to extract all the files in 2023 with the filename starting with 'filename_2023', and exclude all the files that contains '_All_Data'.
Basically, my expectations is that I would be able to extract files such as:
bucket/filepath1/2023/05/03/filename_20230503.csvbucket/filepath1/2023/05/02/filename_20230502.csvbucket/filepath1/2023/03/03/filename_20230303.csv
And exclude files such as:
bucket/filepath1/2023/05/03/filename_20230503_All_Data.csvbucket/filepath1/2023/05/02/filename_20230502_All_Data.csvbucket/filepath1/2023/03/03/filename_20230303_All_Data.csv
give this a try. I checked this worked. basically you will have to copy the required files to another bucket and then use data transfer service from this new bucket.
the
grep -v 'All'will ensure the file names havingAllin it will be excluded and other files will be copied to target-gcs-bucket