How can I reduce the spark tasks when I run a spark job

147 Views Asked by xyfs At 01 August 2025 at 16:16

Here is my spark job stages: enter image description here

It has 260000 tasks because the job rely on more then 200000 small hdfs files, each file about
50MB and it is stored in gzip format

I tried using the following settings to reduce the tasks but it didn't work.

...
--conf spark.sql.mergeSmallFileSize=10485760 \
--conf spark.hadoopRDD.targetBytesInPartition=134217728 \
--conf spark.hadoopRDD.targetBytesInPartitionInMerge=134217728 \
...

Is it because file format is gzip that made it cannot be merged?

How can I do now if I want to reduce the job tasks?

Original Q&A

How can I reduce the spark tasks when I run a spark job

There are 0 best solutions below

Related Questions in APACHE-SPARK

Related Questions in MERGE

Related Questions in APACHE-SPARK-SQL-REPARTITION

Trending Questions

Popular # Hahtags

Popular Questions