Billing - copying historic BQ logs from old table to new one. Google Cloud Platfom

414 Views Asked by At

Google cloud Big Query question.

Situation:

  • GCP BQ logs where routed to "billing" project on we2 region.
  • New project "BillingNew" (region EU) has been created and requested to traffic BQ logs to this project. Also, move historical data.

The problem: How to insert old billing data to a new billing project and nested tables between different projects.

3

There are 3 best solutions below

0
Mund On BEST ANSWER

Just in case someone else comes to same rebus:

  • GCP, BQ logs where routed to "billing" project on we2 region.

  • New project "BillingNew" (region EU) has been created and requested to traffic BQ logs to this project. Also, move historical data.

Solution:

  1. Create a new dataset on "BillingNew". Re-route all the projects there. Check, make sure live logs are going there.

  2. Copy dataset from old "billing" to a new "BillingNew" project using "transfer job" (script below as example):

bq mk --transfer_config --project_id=TARGET_PROJECT_ID --data_source=cross_region_copy --target_dataset=TARGET_DATASET_NAME --display_name='Transfer job name, type any' --params='{"source_dataset_id":"SOURCE_DATASET_NAME(ID)","source_project_id":"SOURCE_PROJECT_NAME(ID)","overwrite_destination_table":"true"}'
  1. Now you have both datasets with identical log table schemas on same project and the most important in same region (EU).
  2. Use SQL syntax 'insert into newtable ...select * from oldtable...'
  3. Done, you have live and old logs in same table, including partition etc.

Other way:

  1. Copy paste dataset from old to new project.
  2. Route BQ logs there. You will lose some records while re-routing projects, they will be recorded in the old table. So will need to copy that gap data eventually.

thanks for suggestions to use "transfer jobs" and copying datasets, as those GCP generated structured, partitioned tables are a way too complicated to work with.

4
rmesteves On

If you're using partitioned tables, BigQuery doesn't allow you to copy many partitioned tables at once. If you're going to migrate your tables only once, I suggest that you do something like this in SHELL Script:

tables=("20200107"  "20200106"  "20200105"  "20200104")

for val in ${tables[*]}; do
     bq cp project1:dataset1.table1_$val project2:dataset2.table2_$val
done

You should put all your date partitions as a string inside the list and then run the loop. I hope it helps

EDIT 1:

As you're having problems to copy tables between regions, I suggest the following:

Given that you're going to migrate the tables only once, you could use a dataset copy between region:

bq mk --transfer_config --project_id=myproject --data_source=cross_region_copy --target_dataset=123_demo_tokyo --display_name='My Dataset Copy' --params='{"source_dataset_id":"123_demo_eu","source_project_id":"mysourceproject","overwrite_destination_table":"true"}'

You can check the complete reference here If you copy your old dataset setting your new dataset as the target dataset, all the tables will be created there.

Hope it helps

0
Noe Romero On

In order to move your billing data to your new project, first you need to move the data to the same region as your new dataset. You can follow this guide to accomplish this. Please note that the billing table has nested data; therefore, you should use a file format that supports this kind of data like JSON or Avro formats. In addition, please take into account the location considerations since you can't copy datasets between different regions directly.

Once you have your data in Google Cloud Storage, you can load it within your new billing table by following the "Loading data into a partitioned table" documentation, please be careful in this step since there are 2 option: "WRITE_APPEND" and "WRITE_TRUNCATE". The type of load will depend on whether the partition is new or already has data.

Another option is by using the "Copying datasets" Beta feature; however, not all regions are currently supported for dataset copying. You can create dataset copies in regions where the BigQuery Data Transfer Service is currently supported. By using the "Copying datasets" feature I recommend you to load the data within a separate dataset into your new project (within the same location that your destination dataset) and then, add the data to your destination dataset by using query jobs.

In both options you can use query jobs to load the billing data within your new table if you load your history data within a separate dataset.