Bigquery - Batch run ARRAY_AGG(STRUCT()) to avoid exceeding rescource limit

70 Views Asked by At

I am trying to create some nested fields in Bigquery using the ARRAY_AGG(STRUCT()) method, but I am exceeding my resource limit whilst doing so. Is there a way of breaking the query down into batches to overcome this problem?

Example query

SELECT cust_id, ARRAY_AGG( STRUCT(status_s_date,status_e_date,status_desc,current_status_flag,active_flag, price, payment_freq, product_group) ) as status FROM table1 GROUP BY cust_id

I need all of these fields in the STRUCT but trying to doing so all at the same time for all of the data does not work. Is there a way of doing any of the following? If so, which method is best?

a) Creating mutiple structs and then joining them under a common name? E.g Run the following script, creating structs 'status1' and 'status2'... SELECT cust_id, ARRAY_AGG( STRUCT(status_s_date,status_e_date,status_desc,current_status_flag,active_flag) ) as status1, ARRAY_AGG( STRUCT(status_s_date,status_e_date, price, payment_freq, product_group) ) as status2, FROM table1 GROUP BY cust_id

...and then join the two structs on status_s_date and status_e_date to create a single struct called 'Status'

b) Partitioning the original table on status_s_date and then running the ARRAY_AGG(STRUCT() step in batches over the partitioned field.

c) Finally, some of the fields I want to put into the struct are string fields, which I understand take up more resource when nesting. Can I nest their numeric value equivalents and then apply a join to a lookup table afterwards to get their plain-English values?

I am very new to this process so I appreciate some of the above may not make sense. Any help gratefully received.

Regards

1

There are 1 best solutions below

0
On

An update from me which I would appreciate if someone could confirm.

My original unnested table contained an 'ORDER BY' command, which I believe leads to increased resource requirements in the subsequent nesting process. I have removed the 'ORDER BY' (I'm not sure why it was there in the first place) and the ARRAY_AGG(STRUCT()) process is now running fine.

Is it the case that querying ORDERED tables is more resource intensive?