AI Platform Pipelines sometimes and randomly fails

198 Views Asked by oguogura At 19 July 2020 at 13:16

I've been using AI Platform Pipelines (v0.2.5) for several months. I rebuilt the Pipelines instance because I've found a newer version (v0.5.1) on Console. I'm now stuck in completing Pipelines.

It's very weird because there seems not to be failure patterns.

Pods(Components) randomly fails. Most of the pods successfully complete, while some fail. In addition, failed pods vary depending on the time of executions.
Pods tell me the error messages of two below, randomly.

google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. 
Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. 
For more information, please see https://cloud.google.com/docs/authentication/getting-started

File "", line 3, in raise_from google.auth.exceptions.RefreshError: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Enginemetadata service. Status: 500 Response:\nb'Could not recursively fetch uri\n'", <google.auth.transport.requests._Response object at 0x7fe5729c9650>)

At GKE Cluster Workload Identity is set. I surely confirm the procedure and the setting is no problem. Though some pods fail, the other pods successfully run with Workload Identity. Of course, Google Cloud Credentials API is enabled.

I don't know these problems are caused by updating Pipelines instance.

Any ideas?

Original Q&A

AI Platform Pipelines sometimes and randomly fails

There are 0 best solutions below

Related Questions in GOOGLE-CLOUD-PLATFORM

Related Questions in KUBEFLOW-PIPELINES

Related Questions in GOOGLE-CLOUD-AI-PLATFORM-PIPELINES

Trending Questions

Popular # Hahtags

Popular Questions