Amazon S3 parquet file - Transferring to GCP / BQ

868 Views Asked by Victoria At 19 October 2025 at 06:32

Good morning everyone. I have a GCS Bucket, which has files that have been transferred from our Amazon S3 bucket. These files are in .gz.parquet format. I am trying to set up a transfer from the GSC bucket to BigQuery with the transfer feature, however I am running into issues with the parquet file format.

When I create a transfer and specify the file format as Parquet, I receive an error stating that the data is not in parquet format. When I tried specifying the file in CSV, weird values appear in my table as shown in the image linked:

I have tried the following URIs:

bucket-name/folder-1/folder-2/dt={run_time|"%Y-%m-%d"}/b=1/geo/*.parquet. FILE FORMAT: PARQUET. RESULTS: FILE NOT IN PARQUET FORMAT.
bucket-name/folder-1/folder-2/dt={run_time|"%Y-%m-%d"}/b=1/geo/*.gz.parquet. FILE FORMAT: PARQUET. RESULTS: FILE NOT IN PARQUET FORMAT.
bucket-name/folder-1/folder-2/dt={run_time|"%Y-%m-%d"}/b=1/geo/*.gz.parquet. FILE FORMAT: CSV. RESULTS: TRANSFER DONE, BUT WEIRD VALUES.
bucket-name/folder-1/folder-2/dt={run_time|"%Y-%m-%d"}/b=1/geo/*.parquet. FILE FORMAT: CSV. RESULTS: TRANSFER DONE, BUT WEIRD VALUES.

Does anyone have any idea on how I should proceed? Thank you in advance!

Original Q&A

There are 2 best solutions below

Anbu Thirugnana Sekar On 26 May 2021 at 12:17

There is a dedicated documentation explaining how to copy Parquet data from Cloud storage bucket to Big Query which is given below. Could you please go thru it and update us if its still not solving your problem.

https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-parquet

Regards, Anbu.

Cylldby On 27 May 2021 at 08:02

Seeing the looks of your URIs, the page you are looking for is this one, for loading hive partitioned parquet files into BigQuery.

You can try something like below in Cloud Shell:

bq load --source_format=PARQUET --autodetect \
--hive_partitioning_mode=STRINGS \
--hive_partitioning_source_uri_prefix=gs://bucket-name/folder-1/folder-2/ \
dataset.table `gcs_uris`

Amazon S3 parquet file - Transferring to GCP / BQ

There are 2 best solutions below

Related Questions in AMAZON-S3

Related Questions in GOOGLE-CLOUD-PLATFORM

Related Questions in GOOGLE-BIGQUERY

Related Questions in PARQUET

Related Questions in GOOGLE-CLOUD-DATA-TRANSFER

Trending Questions

Popular # Hahtags

Popular Questions