I have an older version of Spark setup with YARN that I don't want to wipe out but still want to use a newer version. I found a couple posts referring to how a fat jar can be used for this.
Many SO posts point to either maven(officially supported) or sbt to build a fat jar because it's not directly available for download. There seem to be multiple plugins to do it using maven: maven-assembly-plugin, maven-shade-plugin, onejar-maven-plugin etc.
However, I can't figure out if I really need a plugin and if so, which one and how exactly to go about it. I tried directly compiling github source using 'build/mvn' and 'build/sbt' but the 'spark-assembly_2.11-2.0.2.jar' file is just 283 bytes.
My goal is to run pyspark shell using the newer version's fat jar in a similar way as mentioned here.
From spark version 2.0.0 creating far jar is no longer supported, you can find more information in Do we still have to make a fat jar for submitting jobs in Spark 2.0.0?
The recommended way in your case (running on YARN) is to create directory on HDFS with content of spark's
jars/directory and add this path tospark-defaults.conf:Then if you run pyspark shell it will use previously uploaded libraries so it will behave exactly like fat jar from Spark 1.X.