When using bigquery data transfer to move data into BigQuery from S3, I get intermittent success (I've actually only seen it work correctly one time).
The success:
6:00:48 PM Summary: succeeded 1 jobs, failed 0 jobs.
6:00:14 PM Job bqts_5f*** (table test_json_data) completed successfully. Number of records: 516356, with errors: 0.
5:59:13 PM Job bqts_5f*** (table test_json_data) started.
5:59:12 PM Processing files from Amazon S3 matching: "s3://bucket-name/*.json"
5:59:12 PM Moving data from Amazon S3 to Google Cloud complete: Moved 2661 object(s).
5:58:50 PM Starting transfer from Amazon S3 for files with prefix: "s3://bucket-name/"
5:58:49 PM Starting transfer from Amazon S3 for files modified before 2020-07-27T16:48:49-07:00 (exclusive).
5:58:49 PM Transfer load date: 20200727
5:58:48 PM Dispatched run to data source with id 138***3616
The usual instance those is just 0 success, 0 failures, like the following:
8:33:13 PM Summary: succeeded 0 jobs, failed 0 jobs.
8:32:38 PM Processing files from Amazon S3 matching: "s3://bucket-name/*.json"
8:32:38 PM Moving data from Amazon S3 to Google Cloud complete: Moved 3468 object(s).
8:32:14 PM Starting transfer from Amazon S3 for files with prefix: "s3://bucket-name/"
8:32:14 PM Starting transfer from Amazon S3 for files modified between 2020-07-27T16:48:49-07:00 and 2020-07-27T19:22:14-07:00 (exclusive).
8:32:13 PM Transfer load date: 20200728
8:32:13 PM Dispatched run to data source with id 13***0415
What might be going on such that the second log above doesn't have the Job bqts...
run? Is there somewhere I can get more details about these data transfer jobs? I had a different job that ran into a JSON error, so I don't believe it was that.
Thanks!
I was a bit confused by the logging, since it finds and moves the objects like
I believe I misread the docs, I had thought previously that an amazon URI of
s3://bucket-name/*.json
would crawl the directory for the json files, but even though the message above seems to indicate such, it only loads files into bigquery that are at the top level (for thes3://bucket-name/*.json
URI).