I have running few BigQuery DML tests to get a better idea of the performance of BigQuery DML capabilities. So far, here are some initial observations: 1) Slow performance when updating only a few records in a very small table (30K+ records)
UPDATE babynames.names_2014
SET name = 'Emma B'
WHERE name = 'Emma';
Output: - 2 rows affected (# of records in the table: 33176) - Query complete (4.5s elapsed, 621 KB processed)
2) Very slow performance when deleting only few records from a small table
SQL:
DELETE from babynames.names_2014_copy
where gender<>'U'
Output: -2 rows affected. -Query complete (162.6s elapsed, 1.21 MB processed) - ~3 minutes
QUESTIONS: 1) Are these known behavior? 2) Any suggestions on how to improve the performance?
(1) is approximately expected - primary DML scenarios are large updates/deletes affecting many rows (millions/billions rows). So latency is less important than throughput.
(2) doesn't look normal - could you try one more time? Anything unusual about table you are trying to update?
Optimize towards having few DML statements with each statement updating many rows. For example you may use joins/semijoins to specify large sets of affected rows.