iceberg is not a valid Spark SQL Data Source

507 Views Asked by At

I started a spark session in Python (I already installed pyspark and pyiceberg) like this

# Create a SparkSession
spark = SparkSession.builder.appName("MySQLRead") \
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .config("spark.sql.catalog.icebergcatalog", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.icebergcatalog.type", "hadoop") \
    .getOrCreate()

Then I created database and table using iceberg

spark.sql("CREATE DATABASE IF NOT EXISTS classicmodels;")
spark.sql("USE classicmodels;")
spark.sql("""
    CREATE TABLE orders (
    orderNumber bigint COMMENT 'unique id',
    orderDate timestamp,
    requiredDate timestamp,
    shippedDate timestamp,
    status string,
    comments string,
    customerNumber bigint)
    USING ICEBERG
    TBLPROPERTIES (
        'option.format-version'='2'
    );
""")

When I query the table classicmodels.orders, it returned an error

spark.sql("select * from classicmodels.orders ;").show()
Py4JJavaError: An error occurred while calling o31.sql.
: java.util.concurrent.ExecutionException: org.apache.spark.sql.AnalysisException: ICEBERG is not a valid Spark SQL Data Source.

The configuration of Iceberg, Spark and Hive Catalog

1

There are 1 best solutions below

0
On

Replace "/path/to/iceberg-spark.jar" with the actual path to the Iceberg Spark JAR file. You can download it from the Iceberg releases page on GitHub.

Make sure to adjust the Iceberg version and Spark version according to your environment.

from pyspark.sql import SparkSession

# Create a SparkSession with Iceberg dependencies
spark = SparkSession.builder \
    .appName("MySQLRead") \
    .config("spark.jars", "/path/to/iceberg-spark.jar") \  # Adjust the path accordingly
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .config("spark.sql.catalog.icebergcatalog", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.icebergcatalog.type", "hadoop") \
    .getOrCreate()