I am trying to use Delta-Tables in a standalone spark local environment.
1a. The delta-spark package was installed:
pip install -y io.delta:delta-spark_2.12:3.1.0
1b. I am on spark 3.5.0:
$pip show pyspark
Name: pyspark
Version: 3.5.0
Summary: Apache Spark Python API
Home-page: https://github.com/apache/spark/tree/master/python
Author: Spark Developers
Author-email: [email protected]
License: http://www.apache.org/licenses/LICENSE-2.0
Location: /Users/stephenboesch/python/venv/lib/python3.10/site-packages
Requires: py4j
Required-by: delta-spark
- let's run
spark-sqlwith it:
spark-sql --packages io.delta:delta-spark_2.12:3.1.0
--conf "spark.sql.extensions=io.delaSparkSessionExtension"
--conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
2a. Seems to be a library/versioning issue?
The jars for the packages stored in: /Users/stephenboesch/.ivy2/jars
io.delta#delta-spark_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-e5b920a9-e134-4aa7-9002-64d01564dc90;1.0
confs: [default]
found io.delta#delta-spark_2.12;3.1.0 in central
found io.delta#delta-storage;3.1.0 in central
found org.antlr#antlr4-runtime;4.9.3 in central
:: resolution report :: resolve 268ms :: artifacts dl 24ms
:: modules in use:
io.delta#delta-spark_2.12;3.1.0 from central in [default]
io.delta#delta-storage;3.1.0 from central in [default]
org.antlr#antlr4-runtime;4.9.3 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 3 | 0 | 0 | 0 || 3 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-e5b920a9-e134-4aa7-9002-64d01564dc90
confs: [default]
0 artifacts copied, 3 already retrieved (0kB/23ms)
24/02/23 12:57:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/02/23 12:57:05 WARN SparkSession: Cannot use io.delta.sql.DeltaSparkSessionExtension to configure session extensions.
java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/analysis/UnresolvedLeafNode
- Can we create a table?
CREATE TABLE delta.`/tmp/delta-table`
USING DELTA AS SELECT col1 as id
FROM VALUES 0,1,2,3,4;
- Maybe not?
ANTLR Tool version 4.9.3 used for code generation does not match the current runtime version 4.8ANTLR Runtime version 4.9.3 used for parser compilation does not match the current runtime version 4.8ANTLR Tool version 4.9.3 used for code generation does not match the current runtime version 4.8ANTLR Runtime version 4.9.3 used for parser compilation does not match the current runtime version 4.824/02/23 12:45:58 ERROR SparkSQLDriver: Failed in [CREATE TABLE delta.`/tmp/delta-table` USING DELTA AS SELECT col1 as id FROM VALUES 0,1,2,3,4]
java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/analysis/UnresolvedLeafNode
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1022)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:555)
at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:458)
..
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.analysis.UnresolvedLeafNode
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
... 95 more
So then what is correct set of incantations for delta-tables?
You should upgrade your spark version to 3.5.x