HDFS Excel Rows got decreased when running the spark job on Yarn

55 Views Asked by Zdev At 27 June 2025 at 17:12

When running the same job in local (IntelliJ IDEA) the output counts are fine (For eg -55). But When submitted it on Yarn using spark-submit, Getting only few rows out of it (Rows -12).

spark2-submit --master yarn --deploy-mode client --num-executors 5 --executor-memory 5G --executor-cores 5 --driver-memory 8G --class com.test.Main --packages com.crealytics:spark-excel_2.11:0.13.1 --driver-class-path /test/ImpalaJDBC41.jar,/test/TCLIServiceClient.jar --jars /test/ImpalaJDBC41.jar,/test/TCLIServiceClient.jar /test/test-1.0-SNAPSHOT.jar

when use master - yarn getting partial rows. And when use local - Able to read all rows but got Exception as - Caused by: java.sql.SQLFeatureNotSupportedException: [Simba][JDBC](10220) Driver not capable.

Seems like it is not able to read all the block from HDFS when running on cluster.

Any help will be much appreciated. Thanks

Original Q&A

There are 1 best solutions below

ZS19 On 25 June 2021 at 04:50

As you are mentioning that you are able to get all the rows in single executor (Running --master local) that means all the partition is in Driver using which you are submitting the job in spark-submit.

Once your partition distributed across cluster nodes ( --master yarn) You loose many partition and not able to read all the HDFS block.

Look into your code are you using nested loops with if condition for example - while( while() )

Or any other loop with if condition. Generally the outer loop copied the same partition on each node and combiner combines the result as single partition. Please check this.

For JDBC exception, you need to replace all the NULL values with other values for example using .na().fill() method on your final dataframe. As each column row inside the file should be greater than CHAR > 0 ( Null values is having zero length i.e not supported in JDBC writing )

Hopes this helps

HDFS Excel Rows got decreased when running the spark job on Yarn

There are 1 best solutions below

Related Questions in JAVA

Related Questions in APACHE-SPARK

Related Questions in HADOOP-YARN

Related Questions in SPARK-SUBMIT

Related Questions in SPARK-EXCEL

Trending Questions

Popular # Hahtags

Popular Questions