Error when trying to use CustomPythonPackageTrainingJobRunOp in VertexAI pipeline

646 Views Asked by At

I am using the google cloud pipeline component CustomPythonPackageTrainingJobRunOp in a VertexAI pipeline . I have been able to run this package successfully as a CustomTrainingJob before. I can see multiple (11) error messages in the logs but the only one that seems to make sense to me is, "ValueError: too many values to unpack (expected 2) " but I am unable to figure out the solution. I can add all the other error messages too if required. I am logging some messages at the start of the training code so I know the errors happen before the training code is executed. I am completely stuck on this. Links to samples where someone has used CustomPythonPackageTrainingJobRunOp in a pipeline would very helpful as well. Below is the pipeline code that I am trying to execute:

import kfp
from kfp.v2 import compiler
from kfp.v2.google.client import AIPlatformClient
from google_cloud_pipeline_components import aiplatform as gcc_aip

@kfp.dsl.pipeline(name=pipeline_name)
def pipeline(
    project: str = "adsfafs-321118",
    location: str = "us-central1",
    display_name: str = "vertex_pipeline",
    python_package_gcs_uri: str = "gs://vertex/training/training-package-3.0.tar.gz",
    python_module_name: str = "trainer.task",
    container_uri: str = "us-docker.pkg.dev/vertex-ai/training/scikit-learn-cpu.0-23:latest",
    staging_bucket: str = "vertex_bucket",
    base_output_dir: str = "gs://vertex_artifacts/custom_training/"
):
    
    gcc_aip.CustomPythonPackageTrainingJobRunOp(
        display_name=display_name,
        python_package_gcs_uri=python_package_gcs_uri,
        python_module=python_module_name,
        container_uri=container_uri,
        project=project,
        location=location,
        staging_bucket=staging_bucket,
        base_output_dir=base_output_dir,
        args = ["--arg1=val1", "--arg2=val2", ...]
    )



compiler.Compiler().compile(
    pipeline_func=pipeline, package_path=package_path
)

api_client = AIPlatformClient(project_id=project_id, region=region)

response = api_client.create_run_from_job_spec(
    package_path,
    pipeline_root=pipeline_root_path
)

In the documentation for CustomPythonPackageTrainingJobRunOp, the type of the argument "python_module" seems to be "google.cloud.aiplatform.training_jobs.CustomPythonPackageTrainingJob" instead of string, which seems odd. However, I tried to re-define the pipeline, where I have replaced argument python_module in CustomPythonPackageTrainingJobRunOp with a CustomPythonPackageTrainingJob object instead of a string, as below but still getting the same error:

def pipeline(
    project: str = "...",
    location: str = "...",
    display_name: str = "...",
    python_package_gcs_uri: str = "...",
    python_module_name: str = "...",
    container_uri: str = "...",
    staging_bucket: str = "...",
    base_output_dir: str = "...",
):

    job = aiplatform.CustomPythonPackageTrainingJob(
        display_name= display_name,
        python_package_gcs_uri=python_package_gcs_uri,
        python_module_name=python_module_name,
        container_uri=container_uri,
        staging_bucket=staging_bucket
    )
    
    gcc_aip.CustomPythonPackageTrainingJobRunOp(
        display_name=display_name,
        python_package_gcs_uri=python_package_gcs_uri,
        python_module=job,
        container_uri=container_uri,
        project=project,
        location=location,
        base_output_dir=base_output_dir,
        args = ["--arg1=val1", "--arg2=val2", ...]
    )

Edit:

Added the args that I was passing and had forgotten to add here.

1

There are 1 best solutions below

0
On

Turns out that the way I was passing the args to the python module was incorrect. Instead of args = ["--arg1=val1", "--arg2=val2", ...], you need to specify args = ["--arg1", val1, "--arg2", val2, ...]