I'm trying to run a distributed Kmeans using a distributed Kmeans of Spark MLLIB and I'm getting the following error:
Caused by: java.lang.ClassNotFoundException: breeze.storage.Zero$DoubleZero$
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
I'm using scala 2.13.0 and spark 3.3.0. and breeze 2.1.0 Does anyone know how to solve it?
Looks like an issue with dependencies.
In Breeze 1.3-
breeze.storage.Zero.DoubleZerowas defined ashttps://github.com/scalanlp/breeze/blob/releases/v1.3/math/src/main/scala/breeze/storage/Zero.scala#L77
and
breeze.storage.Zero.DoubleZero.getClassproducedbreeze.storage.Zero$DoubleZero$.But in Breeze 2.0+
DoubleZerois defined ashttps://github.com/scalanlp/breeze/blob/releases/v2.0/math/src/main/scala/breeze/storage/Zero.scala#L46
and
breeze.storage.Zero.DoubleZero.getClassproducesbreeze.storage.Zero$mcD$sp(because of@specialized) whileClass.forName("breeze.storage.Zero$DoubleZero$")throwsClassNotFoundException.You should look what dependency still uses Breeze 1.3-
Update. Thanks for MCVE.
Debugging shows that
NoClassDefFoundError/ClassNotFoundExceptionis thrown herehttps://github.com/apache/spark/blob/v3.3.0/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala#L521
Simpler reproduction is
As I said, one of dependencies uses Breeze 1.3- although you're thinking that you're using Breeze 2.1.0. Namely,
org.apache.spark.ml.linalg.SparseMatrixis fromspark-mllib-localandspark-mllib-local3.3.0 uses Breeze 1.2https://repo1.maven.org/maven2/org/apache/spark/spark-mllib-local_2.13/3.3.0/spark-mllib-local_2.13-3.3.0.pom
So Spark 3.3.0 (and 3.3.2) is incompatible with Breeze 2.0+. Use Breeze 1.3-
Then your code runs successfully.
Compatibility issues between different versions of Spark and Breeze are not rare:
https://github.com/scalanlp/breeze/issues/710
Apache Spark - java.lang.NoSuchMethodError: breeze.linalg.Vector$.scalarOf()Lbreeze/linalg/support/ScalarOf
https://github.com/scalanlp/breeze/issues/690
Breeze should be upgraded to 2.0 in Spark 3.4.0
https://issues.apache.org/jira/browse/SPARK-39616
Meanwhile you can try it with the following
build.sbtThen your code runs successfully too.