How to set up labels in google dataflow jobs using scio?

2.8k Views Asked by At

I want to set up labels for google dataflow jobs for cost allocation purpose. Here is an example of working Java Code:

private DataflowPipelineOptions options = PipelineOptionsFactory.fromArgs(args).as(DataflowPipelineOptionsImpl.class); 
options.setLabels(ImmutableMap.of("key", "value"));

setLabels: Method Documentation Link

Can someone please help with scio / scala example ? I checked few scio + google dataflow examples but I couldn't find anything helpful.

Other option is provide label argument in mvn / gradle command like below:

      mvn compile exec:java   
     --Dexec.mainClass=com.example.WordCount   -Dexec.args="--project=test-prod \
     --stagingLocation=gs://test-bucket/staging/ \
     --output=gs://test-bucket/output \
     --runner=TestDataflowPipelineRunner \
     --labels=\"{'a':'b'}\" \
     --jobName=dataflow-intro"

I am not sure if passing labels={a:b} is the correct syntax.

Any help? thanks

1

There are 1 best solutions below

0
On BEST ANSWER

In scio you can do:

import scala.collection.JavaConverters._

val (sc: ScioContext, args: Args) = ContextAndArgs(cmdLineArgs)
sc.optionsAs[DataflowPipelineOptions].setLabels(Map("foo" -> "bar").asJava)

From the command line you can pass the labels param as a json string, e.g.:

mvn compile exec:java \
--Dexec.mainClass=com.example.WordCount \
--labels={"a":"b"} \
...