How to change mapping of dataflow job (datastream to spanner template)

57 Views Asked by At

I managed to create a datastream that fetches data from mysql, stores it into cloud storage bucket, and finally a dataflow job fetches this information and inserts it into cloud spanner (did it with the template provided by google).

The problem is that for this to work, the tables in spanner (including column names) must match the ones in mysql, so if for example you have the table "transfer" in mysql:

id (varchar(36)) amount (decimal (15, 2))
mysql-uuid 1.99

then in spanner you need to have also have the table transfer:

id (STRING(36)) amount (NUMERIC)
mysql-uuid 1.99

like this it will work with no problem, but the thing I need now is that if for example I have that transfer table in mysql, but I have this transfer table in spanner (notice the new column):

id (STRING(36)) amount (NUMERIC) legacyid (STRING(36))
new-uuids-generated-for-spanner 1.99 mysql-uuid

how can I tell the job to take the id in mysql transfer table and put it in the legacyid column of the spanner transfer table when it is doing the job, thats it, I appreciate any help.

EDIT: I have been trying to figure out how or were to do this configurations, and the closest option I see is this when configuring the dataflow job:

enter image description here but I can not find any example

EDIT 2:

I tried using the following .json file in the transformationContextFilePath parameter with no success (I upload it in my cloud storage bucket and indicate the path using that browse button), the dataflow still starts and runs, it migrates the data but it does not map it as the transformation file instructs, what am I missing?

{
    "udf": "function process (inJson) {\n  var obj = JSON.parse (inJson),\n  var mutation = Mutation.newInsertBuilder('transfer')\n    .set('id').to(generateUUID())\n    .set('amount').to(obj.amount)\n    .set('legacyid').to(obj.id)\n    .build();\n  return mutation;\n}"
}
0

There are 0 best solutions below