Clone/ Backups for BigQuery Project

35 Views Asked by Jovane At 25 March 2024 at 04:00

How can one back up a Project containing datasets in Google BigQuery to another GCS bucket? Is it feasible to employ a Python script for this task, and how can the process be automated?

Just trying to gather ideas to implement for Disaster Recovery. To clone Project into another GCP bucket

the data structure as below:

 Project -> dataset -> table
                    -> table
                    -> ...

         -> dataset -> table
                    -> table
                    -> ...
         -> ...

Original Q&A

There are 1 best solutions below

ShaiD On 25 March 2024 at 14:51

Considering Disaster Recovery is great, but it's important to know that Google BigQuery provides a disaster system out-of-the-box. Any data you have stored in a region is automatically copied to another zone within that region, so that in case one location is down, the other should provide you with the required service. This is especially strong when using their "Multi-Region" approach where you basically select just "US" - so it can be copied to more distant locations. See more info here
It is however recommended by Google themselves to create some sort of disaster recovery plan. One of the options suggested is actually exporting the data to another region in GCS - See here

Now for the next part - it is definitely possible to create a python script that performs this backup. You could even run it as a Google Cloud Function using a Google Cloud Service Account on a schedule, and it will automatically copy your entire dataset / project into whatever location you decide to do that.

You should use the google-bigquery python package for that, which will allow you to connect to bigquery and create an export job.

You can refer to this example code provided by Google's Documentation to get started:

# from google.cloud import bigquery
# client = bigquery.Client()
# bucket_name = 'my-bucket'
project = "bigquery-public-data"
dataset_id = "samples"
table_id = "shakespeare"

destination_uri = "gs://{}/{}".format(bucket_name, "shakespeare.csv")
dataset_ref = bigquery.DatasetReference(project, dataset_id)
table_ref = dataset_ref.table(table_id)

extract_job = client.extract_table(
    table_ref,
    destination_uri,
    # Location must match that of the source table.
    location="US",
)  # API request
extract_job.result()  # Waits for job to complete.

print(
    "Exported {}:{}.{} to {}".format(project, dataset_id, table_id, destination_uri)
)

Clone/ Backups for BigQuery Project

There are 1 best solutions below

Related Questions in GOOGLE-BIGQUERY

Related Questions in GOOGLE-CLOUD-STORAGE

Related Questions in DISASTER-RECOVERY

Trending Questions

Popular # Hahtags

Popular Questions