Kubeflow - YAML error while deploying to GKE cluster

34 Views Asked by At

I have created a kubeflow pipeline with the following steps using kubeflow DSL construct. step 1. read data from a CSV step 2. tokenize using a model (encoding) step 3. summarize the text data (from CSV) in a dataframe using the model (transformers) step 4. Save the final results in to a csv. step 5. define the pipeline and execute all the 4 steps.

pipeline YAML file got generated after running and compiling the kubeflow steps. While trying to use the YAML file into GKE cluster, it is throwing an error related to YAML. I am not sure if this YAML is what I must use in GKE or should I create a deployment YAML separately. Any pointers or references to this process will greatly help for my experiment in running the kubeflow steps in GKE cluster.

I ran the kubeflow steps in a notebook and tried to upload to a GKE cluster and got the following error.

Error: error: error validating "pipeline.yaml": error validating data: [apiVersion not set, kind not set]; if you choose to ignore these errors, turn validation off with --validate=false. I have tried to check what values to be set but didnt get any pattern or templates from official docs

# PIPELINE DEFINITION
# Name: pipeline
components:
  comp-preprocess-text:
    executorLabel: exec-preprocess-text
    inputDefinitions:
      parameters:
        dic:
          parameterType: STRUCT
  comp-publish-results:
    executorLabel: exec-publish-results
    inputDefinitions:
      parameters:
        dic:
          parameterType: STRUCT
  comp-read-csv-file:
    executorLabel: exec-read-csv-file
    inputDefinitions:
      parameters:
        file_path:
          parameterType: STRING
  comp-summarize-text:
    executorLabel: exec-summarize-text
    inputDefinitions:
      parameters:
        dic:
          parameterType: STRUCT
deploymentSpec:
  executors:
    exec-preprocess-text:
      container:
        args:
        - --executor_input
        - '{{$}}'
        - --function_to_execute
        - preprocess_text
        command:
        - sh
        - -c
        - "\nif ! [ -x \"$(command -v pip)\" ]; then\n    python3 -m ensurepip ||\
          \ python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1\
          \ python3 -m pip install --quiet --no-warn-script-location 'kfp==2.7.0'\
          \ '--no-deps' 'typing-extensions>=3.7.4,<5; python_version<\"3.9\"'  &&\
          \  python3 -m pip install --quiet --no-warn-script-location 'pandas==1.2.4'\
          \ && \"$0\" \"$@\"\n"
        - sh
        - -ec
        - 'program_path=$(mktemp -d)


          printf "%s" "$0" > "$program_path/ephemeral_component.py"

          _KFP_RUNTIME=true python3 -m kfp.dsl.executor_main                         --component_module_path                         "$program_path/ephemeral_component.py"                         "$@"

          '
        - "\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import\
          \ *\n\ndef preprocess_text(dic:dict):\n    df=pd.DataFrame(dic)\n    # Tokenize\
          \ the text data using AutoTokenizer\n    tokenizer = AutoTokenizer.from_pretrained(model)\n\
          \    df['encoded_text'] = df['text'].apply(lambda text: tokenizer.encode(text,\
          \ max_length=512, truncation=True))\n    dic=df.to_dict()\n    return dic\n\
          \n"
        image: us-central1-docker.pkg.dev/steel-climber-408809/dockerimage
    exec-publish-results:
      container:
        args:
        - --executor_input
        - '{{$}}'
        - --function_to_execute
        - publish_results
        command:
        - sh
        - -c
        - "\nif ! [ -x \"$(command -v pip)\" ]; then\n    python3 -m ensurepip ||\
          \ python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1\
          \ python3 -m pip install --quiet --no-warn-script-location 'kfp==2.7.0'\
          \ '--no-deps' 'typing-extensions>=3.7.4,<5; python_version<\"3.9\"' && \"\
          $0\" \"$@\"\n"
        - sh
        - -ec
        - 'program_path=$(mktemp -d)


          printf "%s" "$0" > "$program_path/ephemeral_component.py"

          _KFP_RUNTIME=true python3 -m kfp.dsl.executor_main                         --component_module_path                         "$program_path/ephemeral_component.py"                         "$@"

          '
        - "\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import\
          \ *\n\ndef publish_results(dic:dict) -> None:\n    df=pd.DataFrame(dic)\n\
          # Example: Save the dataframe as a CSV file\n    df.to_csv('results.csv',\
          \ index=False)\n\n"
        image: python:3.7
    exec-read-csv-file:
      container:
        args:
        - --executor_input
        - '{{$}}'
        - --function_to_execute
        - read_csv_file
        command:
        - sh
        - -c
        - "\nif ! [ -x \"$(command -v pip)\" ]; then\n    python3 -m ensurepip ||\
          \ python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1\
          \ python3 -m pip install --quiet --no-warn-script-location 'kfp==2.7.0'\
          \ '--no-deps' 'typing-extensions>=3.7.4,<5; python_version<\"3.9\"' && \"\
          $0\" \"$@\"\n"
        - sh
        - -ec
        - 'program_path=$(mktemp -d)


          printf "%s" "$0" > "$program_path/ephemeral_component.py"

          _KFP_RUNTIME=true python3 -m kfp.dsl.executor_main                         --component_module_path                         "$program_path/ephemeral_component.py"                         "$@"

          '
        - "\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import\
          \ *\n\ndef read_csv_file(file_path: str):\n    df= pd.read_csv(file_path)\n\
          \    dic=df.to_dict()\n    return dic\n\n"
        image: python:3.7
    exec-summarize-text:
      container:
        args:
        - --executor_input
        - '{{$}}'
        - --function_to_execute
        - summarize_text
        command:
        - sh
        - -c
        - "\nif ! [ -x \"$(command -v pip)\" ]; then\n    python3 -m ensurepip ||\
          \ python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1\
          \ python3 -m pip install --quiet --no-warn-script-location 'kfp==2.7.0'\
          \ '--no-deps' 'typing-extensions>=3.7.4,<5; python_version<\"3.9\"' && \"\
          $0\" \"$@\"\n"
        - sh
        - -ec
        - 'program_path=$(mktemp -d)


          printf "%s" "$0" > "$program_path/ephemeral_component.py"

          _KFP_RUNTIME=true python3 -m kfp.dsl.executor_main                         --component_module_path                         "$program_path/ephemeral_component.py"                         "$@"

          '
        - "\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import\
          \ *\n\ndef summarize_text(dic: dict):\n    df=pd.DataFrame(dic)\n    # Load\
          \ the LLAMA-2 7B model\n    model = Llama2Model.from_pretrained(model)\n\
          \    df['summary'] = df['encoded_text'].apply(lambda encoded_text: model.generate(encoded_text,\
          \ max_length=150))\n    dic=df.to_dict()\n    return dic\n\n"
        image: python:3.7
pipelineInfo:
  name: pipeline
root:
  dag:
    tasks:
      preprocess-text:
        cachingOptions:
          enableCache: true
        componentRef:
          name: comp-preprocess-text
        inputs:
          parameters:
            dic:
              runtimeValue:
                constant: {}
        taskInfo:
          name: preprocess-text
      publish-results:
        cachingOptions:
          enableCache: true
        componentRef:
          name: comp-publish-results
        inputs:
          parameters:
            dic:
              runtimeValue:
                constant: {}
        taskInfo:
          name: publish-results
      read-csv-file:
        cachingOptions:
          enableCache: true
        componentRef:
          name: comp-read-csv-file
        inputs:
          parameters:
            file_path:
              runtimeValue:
                constant: /content/sample_data/kfb_experiment.csv
        taskInfo:
          name: read-csv-file
      summarize-text:
        cachingOptions:
          enableCache: true
        componentRef:
          name: comp-summarize-text
        inputs:
          parameters:
            dic:
              runtimeValue:
                constant: {}
        taskInfo:
          name: summarize-text
schemaVersion: 2.1.0
sdkVersion: kfp-2.7.0
0

There are 0 best solutions below