I use customer docker containers to run dataflow jobs. I want to chain it together with my tpu training job etc, so I'm considering running kubeflow pipeline on vertex ai. Is this a sensible idea? (There seems to be many alternatives like airflow etc.)
In particular, must I use DataflowPythonJobOp in the pipeline? It does not seem to support custom worker images. I assume I can just have one small machine, which launches the dataflow pipeline and stays idle (besides writing some logs) until the dataflow pipeline finishes?
Have you tried to pass the custom container args with https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.0.0/api/v1/dataflow.html#v1.dataflow.DataflowPythonJobOp.args?