Data Fusion: Note enough memory issue and Lost Executor Issue

846 Views Asked by At

I am processing a File via Google Data Fusion Pipeline but as pipeline goes I am getting below Warnings and Errors:

09/25/2020 12:31:31 WARN org.apache.spark.storage.memory.MemoryStore#66-Executor task launch worker for task 6 Not enough space to cache rdd_10_6 in memory! (computed 238.5 MB so far)

09/25/2020 12:45:05 ERROR org.apache.spark.scheduler.cluster.YarnClusterScheduler#70-dispatcher-event-loop-1
Lost executor 2 on cdap-soco-crea-99b67b97-fefb-11ea-8ee6-daceb18eb3cf-w-0.c.datalake-dev-rotw-36b8.internal: Container marked as failed: container_1601016787667_0001_01_000003 on host: cdap-soco-crea-99b67b97-fefb-11ea-8ee6-daceb18eb3cf-w-0.c.datalake-dev-rotw-36b8.internal. Exit status: 3. Diagnostics: [2020-09-25 07:15:05.226]Exception from container-launch. Container id: container_1601016787667_0001_01_000003 Exit code: 3

Help Please !

2

There are 2 best solutions below

1
On

Sudhir, can you navigate to Datafusion > SYSTEM ADMIN > Configuration > System Compute Profiles, then increase the memory of your Dataproc compute profile.

By default, a Datafusion ENTERPRISE instance has 8192 MB of memory per worker. You can start by doubling that amount, and keep increasing, until the pipeline runs successfully.

Note that Spark executes transformations on RDDs in-memory. As can be realized from the error message [1], one of your worker failed to cache an RDD in-memory, due to OOM conditions.

Caching RDDs in-memory is needed, before Spark can unleash its power of in-memory processing.

Hope this helps!

[1] worker for task 6 Not enough space to cache rdd_10_6 in memory

0
On

Sudhir,

There are two things you could try to see if it helps resolve your issue.

Increase the Executor memory. Steps below.

  1. Navigate to the pipeline detail page.
  2. In the Configure menu, click on Resources.
  3. Enter the desired amount under Executor.
  4. In the same Configure menu, click on Compute config.
  5. Click customize on the desired compute profile.
  6. Ensure that the worker memory is a multiple of the executor memory. For example, if executor memory is 4096, worker memory should use 4, 8, 12, etc GB of memory. Also scale the worker cores accordingly. Note that it is not strictly necessary for worker memory to be an exact multiple, but if it is not, it is more likely for cluster capacity to be wasted.

Also try turning off Auto-Caching. Steps below.

  1. Navigate to the pipeline detail page.

  2. In the Configure menu, click on Engine config.

  3. Enter 'spark.cdap.pipeline.autocache.enable' as the key, and 'false' as the value.

By default, pipelines will cache intermediate data in the pipeline in order to prevent Spark from re-computing data. This requires a substantial amount of memory, so pipelines that process a large amount of data will often need to turn this off.