Spark - dynamic allocation - shuffle_1_0_0.index (No such file or directory)

759 Views Asked by Stephan At 16 August 2025 at 22:06

I am running in following error from time to time while executing my scala job on Spark 2.2.0:

Caused by: java.io.FileNotFoundException: /spark/temporary/spark-927d72b5-154d-4fd5-a18e-4aefc0e05a59/executor-cdd8da76-bb86-4e4c-bf26-55acbcc761bf/blockmgr-fab15d07-2cca-4e9e-af7f-29b0b45565c1/0f/shuffle_1_0_0.index (No such file or directory)

My spark-submit command looks like this:

/spark/bin/spark-submit --verbose --conf spark.local.dir=/spark/temporary --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.executorIdleTimeout=2m --conf spark.shuffle.service.index.cache.entries=4096 --conf spark.memory.offHeap.enabled=true --conf spark.memory.offHeap.size=3g --conf spark.executor.extraJavaOptions="-XX:ParallelGCThreads=4 -XX:+UseParallelGC" --conf spark.file.transferTo=false --conf spark.shuffle.file.buffer=5MB --conf spark.shuffle.unsafe.file.output.buffer=5MB --conf spark.unsafe.sorter.spill.reader.buffer.size=1MB --conf spark.io.compression.lz4.blockSize=512KB --conf spark.shuffle.registration.timeout=2m --conf spark.shuffle.registration.maxAttempts=5 --conf spark.memory.useLegacyMode=true --conf spark.shuffle.memoryFraction=0.32 --conf spark.storage.memoryFraction=0.18 --conf spark.shuffle.io.maxRetries=10 --conf spark.dynamicAllocation.maxExecutors=3 --conf spark.dynamicAllocation.initialExecutors=3 --conf spark.task.cpus=2 --conf spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 --master spark://spark-master.spark:7077 --deploy-mode client --class control.TimeLensDriver --executor-cores 2 --executor-memory 2g --driver-memory 2g /spark/spark-job.jar /spark/s3Credential.conf 2017-09-08 7 /spark/public-holydays.json /spark/school-holydays.json /spark/de_postal_codes.json prometheus-pushgateway.monitoring-mida:9091

I am using a Spark Master in Standalone mode with 3 Workers. On each worker I started the external shuffle service. The spark job reads data from Ceph S3, converts them and save them back to Ceph S3 in parquet format. The shuffle files from the error message above are saved on the worker itself not in Ceph S3.

Its a strange behaviour because in 7 out of 10 runs it works fine. But in the other cases the job failed with the error message mentioned above.

A few thing I tested so far:

There is enough disc space and memory on the hosts
A minimal configuration for the spark submit (only activating dynamic allocation, without the optimizations) won´t work as well
The runs only fails when the spark job runs against specific inputdata. So the error is reproducible.

I am not sure if the implementation of my spark job is necessary for solving this problem, because in my opinion it must be a configuration issue of spark.

Original Q&A

Spark - dynamic allocation - shuffle_1_0_0.index (No such file or directory)

There are 0 best solutions below

Related Questions in SCALA

Related Questions in APACHE-SPARK

Related Questions in SHUFFLE

Related Questions in DYNAMIC-ALLOCATION

Related Questions in CEPH

Trending Questions

Popular # Hahtags

Popular Questions