TFX Evaluator does not run in Dataflow so it fails due to lack of memory for the pod

173 Views Asked by At

I am running a pipeline in AI Platform pipelines based on TFX. All components run fine until the Evaluator. It just does not want to run on Dataflow, it runs in the Kubeflow pod, so it fails as there is no enough memory in there.

Apache Beam config is set to run with Dataflow as a runner, so other components like the ExampleGen, StatisticsGen, ExampleValidator all run fine in Dataflow.

When it comes to the Evaluator component it just fails without even generating a log. Complaining about the error (in the Kubeflow UI):

"This step is in a Failed state with this message: The node was low on resource: memory. The container main was using 2093880Ki, which exceeds its request of 0. Container wait was using 13492Ki, which exceeds its request of 0."

1

There are 1 best solutions below

0
On

I was able to resolve this issue by setting the TFX version to 0.25.0.