I am loading data into bigquery, the data should be in order when I upload it from csv, but once loaded it gets shuffled in bigquery, as below,
actual table: This how it should be in bigquery
| id | name | location |
|---|---|---|
| 1 | aaa | bbbb |
| 2 | ccc | dddd |
| 3 | eeee | |
| 4 | fff | gggg |
uploaded table: but in the below format it got updated once I uploaded from csv to bigquery
| id | name | location |
|---|---|---|
| 3 | null | eeee |
| 2 | ccc | dddd |
| 1 | aaa | bbbb |
| 4 | fff | gggg |
Even if I use pandas.sort_values(by='id'), I am not getting the table in correct order, it gets reshuffled, no idea on what basis.
What changes/steps to be done to get the actual table uploaded into the bigquery as it is in the same format and ignoring null values while loading data into the bigquery?
Thanks in advance
To complement
Cylldbyanswer. If you still need that raw order, you can just add a column namedindexand populated with yourindexdata from your dataframe into your table ( or working temp table )Note: if there are multiple files, you can also set the custom index of your dataframe chunks. For more info there is this good answer which explores index reordering.
So, on you BigQuery table you can just order by
index_coland preserver the raw order that got uploaded from your file.