Kubeflow Pipelines open /var/run/argo/outputs/parameters/tmp/outputs/execution-id: no such file or directory error

102 Views Asked by At

I am trying to run a docker container in my private gitlab registry and save the results in a txt file as an artifact. My pipeline is as follows:

yaml_path = './container_pipeline_1229.yaml'

@container_component
def container_component_trial(sentence_count:str, output_gcs: Output[Dataset]):
    return ContainerSpec(
        image='gitex.xx/sentence_generator:latest',
        command=[
            'python3',
            'sentence_generator_app.py',
        ],
        args=[
            '--output_path', output_gcs.path,
            '--sentence_count', sentence_count
        ]
    )


@pipeline(name='container-running-pipeline')
def container_pipeline():
    component = container_component_trial(sentence_count = "29")

At the end of the pipeline I am getting the txt file in minio as you can see in the screenshot: enter image description here

However, my pipeline is failing and one pod is in error state:


> Blockquote

container-running-pipeline-gqgk4-system-dag-driver-3064853463         0/2     Error              0                34m
container-running-pipeline-gqgk4-3064853463                           0/2     Completed          0                34m
container-running-pipeline-gqgk4-system-container-driver-561475989    0/2     Completed          0                34m
container-running-pipeline-gqgk4-561475989                            0/2     Completed          0                34m
container-running-pipeline-gqgk4-2630735187                           0/2     Completed          0                34m
container-running-pipeline-gqgk4-system-container-impl-2630735187     0/2     Completed          0                34m

This is the output of the kubectl describe pod container-running-pipeline-gqgk4-system-dag-driver-3064853463 command:

....
Containers:
  wait:
    Container ID:  containerd://d2f48e9a1c0605820e58fd3bd3332be15ba4f99f43ffcdc3b2f48ffed009280e
    Image:         quay.io/argoproj/argoexec:v3.4.11
    Image ID:      quay.io/argoproj/argoexec@sha256:6731738dc79454232937c73128ccf6a9e74ffbdd0a613fb427ae427ee482115b
    Port:          <none>
    Host Port:     <none>
    Command:
      argoexec
      wait
      --loglevel
      info
      --log-format
      text
    State:          Terminated
      Reason:       Error
      Message:      open /var/run/argo/outputs/parameters/tmp/outputs/execution-id: no such file or directory
      Exit Code:    1
      Started:      Fri, 29 Dec 2023 16:06:04 +0100
      Finished:     Fri, 29 Dec 2023 16:06:06 +0100
    Ready:          False
    Restart Count:  0
    Environment:
      ARGO_POD_NAME:                      container-running-pipeline-gqgk4-system-dag-driver-3064853463 (v1:metadata.name)
      ARGO_POD_UID:                        (v1:metadata.uid)
      GODEBUG:                            x509ignoreCN=0
      ARGO_WORKFLOW_NAME:                 container-running-pipeline-gqgk4
      ARGO_CONTAINER_NAME:                wait
      ARGO_TEMPLATE:                      {"name":"system-dag-driver","inputs":{"parameters":[{"name":"component","value":"{\"dag\":{\"tasks\":{\"container-component-trial\":{\"cachingOptions\":{\"enableCache\":true},\"componentRef\":{\"name\":\"comp-container-component-trial\"},\"inputs\":{\"parameters\":{\"sentence_count\":{\"runtimeValue\":{\"constant\":\"29\"}}}},\"taskInfo\":{\"name\":\"container-component-trial\"}}}}}"},{"name":"runtime-config","default":"","value":"{}"},{"name":"task","default":"","value":""},{"name":"parent-dag-id","default":"0","value":"0"},{"name":"iteration-index","default":"-1","value":"-1"},{"name":"driver-type","default":"DAG","value":"ROOT_DAG"}]},"outputs":{"parameters":[{"name":"execution-id","valueFrom":{"path":"/tmp/outputs/execution-id"}},{"name":"iteration-count","valueFrom":{"path":"/tmp/outputs/iteration-count","default":"0"}},{"name":"condition","valueFrom":{"path":"/tmp/outputs/condition","default":"true"}}]},"metadata":{"annotations":{"sidecar.istio.io/inject":"false"}},"container":{"name":"","image":"gcr.io/ml-pipeline/kfp-driver@sha256:8e60086b04d92b657898a310ca9757631d58547e76bbbb8bfc376d654bef1707","command":["driver"],"args":["--type","ROOT_DAG","--pipeline_name","container-running-pipeline","--run_id","abdd4511-10b4-4cb3-85c5-34466d4b4679","--dag_execution_id","0","--component","{\"dag\":{\"tasks\":{\"container-component-trial\":{\"cachingOptions\":{\"enableCache\":true},\"componentRef\":{\"name\":\"comp-container-component-trial\"},\"inputs\":{\"parameters\":{\"sentence_count\":{\"runtimeValue\":{\"constant\":\"29\"}}}},\"taskInfo\":{\"name\":\"container-component-trial\"}}}}}","--task","","--runtime_config","{}","--iteration_index","-1","--execution_id_path","/tmp/outputs/execution-id","--iteration_count_path","/tmp/outputs/iteration-count","--condition_path","/tmp/outputs/condition"],"resources":{"limits":{"cpu":"500m","memory":"512Mi"},"requests":{"cpu":"100m","memory":"64Mi"}}}}
      ARGO_NODE_ID:                       container-running-pipeline-gqgk4-3064853463
      ARGO_INCLUDE_SCRIPT_OUTPUT:         false
      ARGO_DEADLINE:                      0001-01-01T00:00:00Z
      ARGO_PROGRESS_FILE:                 /var/run/argo/progress
      ARGO_PROGRESS_PATCH_TICK_DURATION:  1m0s
      ARGO_PROGRESS_FILE_TICK_DURATION:   3s
    Mounts:
      /tmp from tmp-dir-argo (rw,path="0")
      /var/run/argo from var-run-argo (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n5zzp (ro)
  main:
    Container ID:  containerd://1115796bdf2edef71b6c85360006a2096265e6c1fc2455e4a48addecd647d660
    Image:         gcr.io/ml-pipeline/kfp-driver@sha256:8e60086b04d92b657898a310ca9757631d58547e76bbbb8bfc376d654bef1707
    Image ID:      gcr.io/ml-pipeline/kfp-driver@sha256:8e60086b04d92b657898a310ca9757631d58547e76bbbb8bfc376d654bef1707
    Port:          <none>
    Host Port:     <none>
    Command:
      /var/run/argo/argoexec
      emissary
      --loglevel
      info
      --log-format
      text
      --
      driver
    Args:
      --type
      ROOT_DAG
      --pipeline_name
      container-running-pipeline
      --run_id
      abdd4511-10b4-4cb3-85c5-34466d4b4679
      --dag_execution_id
      0
      --component
      {"dag":{"tasks":{"container-component-trial":{"cachingOptions":{"enableCache":true},"componentRef":{"name":"comp-container-component-trial"},"inputs":{"parameters":{"sentence_count":{"runtimeValue":{"constant":"29"}}}},"taskInfo":{"name":"container-component-trial"}}}}}
      --task

      --runtime_config
      {}
      --iteration_index
      -1
      --execution_id_path
      /tmp/outputs/execution-id
      --iteration_count_path
      /tmp/outputs/iteration-count
      --condition_path
      /tmp/outputs/condition
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 29 Dec 2023 16:06:04 +0100
      Finished:     Fri, 29 Dec 2023 16:06:05 +0100
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     500m
      memory:  512Mi
    Requests:
      cpu:     100m
      memory:  64Mi
    Environment:
      ARGO_CONTAINER_NAME:                main
      ARGO_TEMPLATE:                      {"name":"system-dag-driver","inputs":{"parameters":[{"name":"component","value":"{\"dag\":{\"tasks\":{\"container-component-trial\":{\"cachingOptions\":{\"enableCache\":true},\"componentRef\":{\"name\":\"comp-container-component-trial\"},\"inputs\":{\"parameters\":{\"sentence_count\":{\"runtimeValue\":{\"constant\":\"29\"}}}},\"taskInfo\":{\"name\":\"container-component-trial\"}}}}}"},{"name":"runtime-config","default":"","value":"{}"},{"name":"task","default":"","value":""},{"name":"parent-dag-id","default":"0","value":"0"},{"name":"iteration-index","default":"-1","value":"-1"},{"name":"driver-type","default":"DAG","value":"ROOT_DAG"}]},"outputs":{"parameters":[{"name":"execution-id","valueFrom":{"path":"/tmp/outputs/execution-id"}},{"name":"iteration-count","valueFrom":{"path":"/tmp/outputs/iteration-count","default":"0"}},{"name":"condition","valueFrom":{"path":"/tmp/outputs/condition","default":"true"}}]},"metadata":{"annotations":{"sidecar.istio.io/inject":"false"}},"container":{"name":"","image":"gcr.io/ml-pipeline/kfp-driver@sha256:8e60086b04d92b657898a310ca9757631d58547e76bbbb8bfc376d654bef1707","command":["driver"],"args":["--type","ROOT_DAG","--pipeline_name","container-running-pipeline","--run_id","abdd4511-10b4-4cb3-85c5-34466d4b4679","--dag_execution_id","0","--component","{\"dag\":{\"tasks\":{\"container-component-trial\":{\"cachingOptions\":{\"enableCache\":true},\"componentRef\":{\"name\":\"comp-container-component-trial\"},\"inputs\":{\"parameters\":{\"sentence_count\":{\"runtimeValue\":{\"constant\":\"29\"}}}},\"taskInfo\":{\"name\":\"container-component-trial\"}}}}}","--task","","--runtime_config","{}","--iteration_index","-1","--execution_id_path","/tmp/outputs/execution-id","--iteration_count_path","/tmp/outputs/iteration-count","--condition_path","/tmp/outputs/condition"],"resources":{"limits":{"cpu":"500m","memory":"512Mi"},"requests":{"cpu":"100m","memory":"64Mi"}}}}
      ARGO_NODE_ID:                       container-running-pipeline-gqgk4-3064853463
      ARGO_INCLUDE_SCRIPT_OUTPUT:         false
      ARGO_DEADLINE:                      0001-01-01T00:00:00Z
      ARGO_PROGRESS_FILE:                 /var/run/argo/progress
      ARGO_PROGRESS_PATCH_TICK_DURATION:  1m0s
      ARGO_PROGRESS_FILE_TICK_DURATION:   3s
    Mounts:
      /var/run/argo from var-run-argo (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n5zzp (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  var-run-argo:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  tmp-dir-argo:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  kube-api-access-n5zzp:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
....

This is the logs of the main container in the pod:

I1229 15:06:04.693757      18 main.go:105] input ComponentSpec:{
  "dag": {
    "tasks": {
      "container-component-trial": {
        "cachingOptions": {
          "enableCache": true
        },
        "componentRef": {
          "name": "comp-container-component-trial"
        },
        "inputs": {
          "parameters": {
            "sentence_count": {
              "runtimeValue": {
                "constant": "29"
              }
            }
          }
        },
        "taskInfo": {
          "name": "container-component-trial"
        }
      }
    }
  }
}
I1229 15:06:04.694565      18 main.go:118] input ContainerSpec:{}
I1229 15:06:04.694662      18 main.go:125] input RuntimeConfig:{}
I1229 15:06:04.694738      18 main.go:133] input kubernetesConfig:{}
I1229 15:06:04.694997      18 cache.go:116] Connecting to cache endpoint 10.xx.xxx.0:8887
I1229 15:06:04.700503      18 driver.go:151] PipelineRoot="minio://mlpipeline/v2/artifacts" from default config
I1229 15:06:04.710936      18 client.go:270] Pipeline Context: id:163 name:"container-running-pipeline" type_id:11 type:"system.Pipeline" create_time_since_epoch:1701780194643 last_update_time_since_epoch:1701780194643
I1229 15:06:04.748145      18 client.go:278] Pipeline Run Context: id:279 name:"abdd4511-10b4-4cb3-85c5-34466d4b4679" type_id:12 type:"system.PipelineRun" custom_properties:{key:"namespace" value:{string_value:"kubeflow"}} custom_properties:{key:"pipeline_root" value:{string_value:"minio://mlpipeline/v2/artifacts/container-running-pipeline/abdd4511-10b4-4cb3-85c5-34466d4b4679"}} custom_properties:{key:"resource_name" value:{string_value:"run-resource"}} create_time_since_epoch:1703862364722 last_update_time_since_epoch:1703862364722
F1229 15:06:04.772436      18 main.go:76] KFP driver: driver.RootDAG(pipelineName=container-running-pipeline, runID=abdd4511-10b4-4cb3-85c5-34466d4b4679, runtimeConfig, componentSpec) failed: rpc error: code = AlreadyExists desc = Given node already exists: type_id: 13
last_known_state: RUNNING
custom_properties {
  key: "display_name"
  value {
    string_value: ""
  }
}
custom_properties {
  key: "task_name"
  value {
    string_value: ""
  }
}
name: "run/abdd4511-10b4-4cb3-85c5-34466d4b4679"
INTERNAL: Cannot create node for type_id: 13 last_known_state: RUNNING custom_properties { key: "display_name" value { string_value: "" } } custom_properties { key: "task_name" value { string_value: "" } } name: "run/abdd4511-10b4-4cb3-85c5-34466d4b4679"mysql_query failed: errno: Duplicate entry '13-run/abdd4511-10b4-4cb3-85c5-34466d4b4679' for key 'Execution.UniqueExecutionTypeName', error: Duplicate entry '13-run/abdd4511-10b4-4cb3-85c5-34466d4b4679' for key 'Execution.UniqueExecutionTypeName'
time="2023-12-29T15:06:05.675Z" level=info msg="sub-process exited" argo=true error="<nil>"
time="2023-12-29T15:06:05.681Z" level=error msg="cannot save parameter /tmp/outputs/execution-id" argo=true error="open /tmp/outputs/execution-id: no such file or directory"
time="2023-12-29T15:06:05.681Z" level=error msg="cannot save parameter /tmp/outputs/iteration-count" argo=true error="open /tmp/outputs/iteration-count: no such file or directory"
time="2023-12-29T15:06:05.681Z" level=error msg="cannot save parameter /tmp/outputs/condition" argo=true error="open /tmp/outputs/condition: no such file or directory"
Error: exit status 1

Additionally, I am using k3s on WSL2 Ubuntu.

Thank you so much for your help in advance!

0

There are 0 best solutions below