Running TFX pipeline with Kubeflow or LocalDagRunner in GCP

55 Views Asked by At
  1. I am trying to run in GCP a simple TFX pipeline using KubeflowRunner according to this tutorial: https://www.tensorflow.org/tfx/tutorials/tfx/gcp/vertex_pipelines_bq
  2. I also tried the same code using LocalDagRunner.

The best scenario is to make it work with KubeflowRunner.

I have the following TF/KF versions:

TensorFlow version: 2.11.0
TFX version: 1.12.0
KFP version: 1.8.22

I tried to upgrade the TFX to version 1.13.0 in my GCP notebook but I get the error: "No matching distribution found for tfx==1.13.0". I can see the latest available is 1.12.0

When trying to running the pipeline using Kubeflow:

BIG_QUERY_WITH_DIRECT_RUNNER_BEAM_PIPELINE_ARGS = [
   '--project=' + GOOGLE_CLOUD_PROJECT,
   '--temp_location=' + os.path.join('gs://', GCS_BUCKET_NAME, 'tmp'),
   ]

PIPELINE_DEFINITION_FILE = PIPELINE_NAME + '_pipeline.json'

runner = tfx.orchestration.experimental.KubeflowV2DagRunner(
    config=tfx.orchestration.experimental.KubeflowV2DagRunnerConfig(),
    output_filename=PIPELINE_DEFINITION_FILE)
_ = runner.run(
    _create_pipeline(
        pipeline_name=PIPELINE_NAME,
        pipeline_root=PIPELINE_ROOT,
        query=QUERY,
        module_file=os.path.join(MODULE_ROOT, _trainer_module_file),
        serving_model_dir=SERVING_MODEL_DIR,
        beam_pipeline_args=BIG_QUERY_WITH_DIRECT_RUNNER_BEAM_PIPELINE_ARGS))

I get the error "module 'tfx.v1.orchestration.experimental' has no attribute 'KubeflowV2DagRunner":

AttributeError                            Traceback (most recent call last)
/var/tmp/ipykernel_34085/2.....py in <module>
     11 PIPELINE_DEFINITION_FILE = PIPELINE_NAME + '_pipeline.json'
     12 
---> 13 runner = tfx.orchestration.experimental.KubeflowV2DagRunner(
     14     config=tfx.orchestration.experimental.KubeflowV2DagRunnerConfig(),
     15     output_filename=PIPELINE_DEFINITION_FILE)

AttributeError: module 'tfx.v1.orchestration.experimental' has no attribute 'KubeflowV2DagRunner'

When trying to run using LocalDagRunner:

BIG_QUERY_WITH_DIRECT_RUNNER_BEAM_PIPELINE_ARGS = [
   '--project=' + GOOGLE_CLOUD_PROJECT,
   '--temp_location=' + os.path.join('gs://', GCS_BUCKET_NAME, 'tmp'),
   ]

PIPELINE_DEFINITION_FILE = PIPELINE_NAME + '_pipeline.json'

runner = tfx.orchestration.LocalDagRunner(
    # config=tfx.orchestration.experimental.KubeflowV2DagRunnerConfig(),
    # output_filename=PIPELINE_DEFINITION_FILE
    )
_ = runner.run(
    _create_pipeline(
        pipeline_name=PIPELINE_NAME,
        pipeline_root=PIPELINE_ROOT,
        query=QUERY,
        module_file=os.path.join(MODULE_ROOT, _trainer_module_file),
        serving_model_dir=SERVING_MODEL_DIR,
        beam_pipeline_args=BIG_QUERY_WITH_DIRECT_RUNNER_BEAM_PIPELINE_ARGS))

I get error "getattr(): attribute name must be string":

WARNING:absl:metadata_connection_config is not provided by IR.
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/tmp/ipykernel_34085/4.....py in <module>
     17         module_file=os.path.join(MODULE_ROOT, _trainer_module_file),
     18         serving_model_dir=SERVING_MODEL_DIR,
---> 19         beam_pipeline_args=BIG_QUERY_WITH_DIRECT_RUNNER_BEAM_PIPELINE_ARGS))

~/tfx_env/lib/python3.7/site-packages/tfx/orchestration/portable/tfx_runner.py in run(self, pipeline, run_options, **kwargs)
    122     else:
    123       run_options_pb = None
--> 124     return self.run_with_ir(pipeline_pb, run_options=run_options_pb, **kwargs)

~/tfx_env/lib/python3.7/site-packages/tfx/orchestration/local/local_dag_runner.py in run_with_ir(self, pipeline, run_options)
     64         deployment_config.metadata_connection_config,
     65         deployment_config.metadata_connection_config.WhichOneof(
---> 66             'connection_config'))
     67 
     68     logging.info('Using deployment config:\n %s', deployment_config)

TypeError: getattr(): attribute name must be string
0

There are 0 best solutions below