Google Datastream (beta) issues with removing backfilling table from a stream

937 Views Asked by At

I am currently testing Google Datastream to stream data from Cloud SQL into GCS and then onto Big Query. All is well however a 200m row table is currently backfilling with data and I wanted to stop this as the table is not linger in use.

Here is what I have tried so far:

  1. Removing the table from the stream. This has worked so far for all tables however this is the first time i've tried it whilst the table is backfilling.

  2. Adding the table to the No-Backfill option inside the stream.

  3. Pausing the stream, draining and then restarting the stream.

None of these seem to work, anybody come across this issue before when backfilling a table?

Many thanks, Mark.

2

There are 2 best solutions below

0
On BEST ANSWER

Just thought I'd update this ticket with the solution from Google Support.

Once a table has started a backfill process then currently you cannot stop this process until the backfill has finished.

@Prabir thanks for your reply - i think the 100m row limit is also only without a numeric primary key - "Tables that have more than 100 million rows and that don't have a numeric primary key can't be backfilled."

I've asked Google Support to add removing a table during backfill to a later release as it's still only in alpha testing and features can be added.

Let's see how this goes in future releases...

0
On

The issue seems to be the number of rows in the table you are using for backfilling. You have stated that it was working well before and you got the issue only when you were backfilling a table having 200 Million rows. You are using CloudSQL, so you must be using MySQL as it is the only supported Cloud SQL for Datastream right now. Please note that Datastream has some known limitations while using MySQL as databases and this limitation states that tables having more than 100 Million rows cannot be backfilled. So I will suggest you keep the number of rows well within 100 Million. You can find more about the known limitations while using MySQL as source for Datastream in this document.