Bigquery Data Transfer from S3 intermittent success

646 Views Asked by At

When using bigquery data transfer to move data into BigQuery from S3, I get intermittent success (I've actually only seen it work correctly one time).

The success:

6:00:48 PM  Summary: succeeded 1 jobs, failed 0 jobs.   
6:00:14 PM  Job bqts_5f*** (table test_json_data) completed successfully. Number of records: 516356, with errors: 0.    
5:59:13 PM  Job bqts_5f*** (table test_json_data) started.  
5:59:12 PM  Processing files from Amazon S3 matching: "s3://bucket-name/*.json" 
5:59:12 PM  Moving data from Amazon S3 to Google Cloud complete: Moved 2661 object(s).  
5:58:50 PM  Starting transfer from Amazon S3 for files with prefix: "s3://bucket-name/" 
5:58:49 PM  Starting transfer from Amazon S3 for files modified before 2020-07-27T16:48:49-07:00 (exclusive).   
5:58:49 PM  Transfer load date: 20200727    
5:58:48 PM  Dispatched run to data source with id 138***3616

The usual instance those is just 0 success, 0 failures, like the following:

8:33:13 PM  Summary: succeeded 0 jobs, failed 0 jobs.   
8:32:38 PM  Processing files from Amazon S3 matching: "s3://bucket-name/*.json" 
8:32:38 PM  Moving data from Amazon S3 to Google Cloud complete: Moved 3468 object(s).  
8:32:14 PM  Starting transfer from Amazon S3 for files with prefix: "s3://bucket-name/" 
8:32:14 PM  Starting transfer from Amazon S3 for files modified between 2020-07-27T16:48:49-07:00 and 2020-07-27T19:22:14-07:00 (exclusive).    
8:32:13 PM  Transfer load date: 20200728    
8:32:13 PM  Dispatched run to data source with id 13***0415

What might be going on such that the second log above doesn't have the Job bqts... run? Is there somewhere I can get more details about these data transfer jobs? I had a different job that ran into a JSON error, so I don't believe it was that.

Thanks!

1

There are 1 best solutions below

0
On

I was a bit confused by the logging, since it finds and moves the objects like

I believe I misread the docs, I had thought previously that an amazon URI of s3://bucket-name/*.json would crawl the directory for the json files, but even though the message above seems to indicate such, it only loads files into bigquery that are at the top level (for the s3://bucket-name/*.json URI).