iceberg is not a valid Spark SQL Data Source

502 Views Asked by henryangminh At 28 July 2025 at 00:10

I started a spark session in Python (I already installed pyspark and pyiceberg) like this

# Create a SparkSession
spark = SparkSession.builder.appName("MySQLRead") \
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .config("spark.sql.catalog.icebergcatalog", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.icebergcatalog.type", "hadoop") \
    .getOrCreate()

Then I created database and table using iceberg

spark.sql("CREATE DATABASE IF NOT EXISTS classicmodels;")
spark.sql("USE classicmodels;")
spark.sql("""
    CREATE TABLE orders (
    orderNumber bigint COMMENT 'unique id',
    orderDate timestamp,
    requiredDate timestamp,
    shippedDate timestamp,
    status string,
    comments string,
    customerNumber bigint)
    USING ICEBERG
    TBLPROPERTIES (
        'option.format-version'='2'
    );
""")

When I query the table classicmodels.orders, it returned an error

spark.sql("select * from classicmodels.orders ;").show()

Py4JJavaError: An error occurred while calling o31.sql.
: java.util.concurrent.ExecutionException: org.apache.spark.sql.AnalysisException: ICEBERG is not a valid Spark SQL Data Source.

The configuration of Iceberg, Spark and Hive Catalog

Original Q&A

There are 1 best solutions below

mr.data_engg On 01 December 2023 at 04:27

Replace "/path/to/iceberg-spark.jar" with the actual path to the Iceberg Spark JAR file. You can download it from the Iceberg releases page on GitHub.

Make sure to adjust the Iceberg version and Spark version according to your environment.

from pyspark.sql import SparkSession

# Create a SparkSession with Iceberg dependencies
spark = SparkSession.builder \
    .appName("MySQLRead") \
    .config("spark.jars", "/path/to/iceberg-spark.jar") \  # Adjust the path accordingly
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .config("spark.sql.catalog.icebergcatalog", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.icebergcatalog.type", "hadoop") \
    .getOrCreate()

iceberg is not a valid Spark SQL Data Source

There are 1 best solutions below

Related Questions in PYTHON-3.X

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in APACHE-ICEBERG

Trending Questions

Popular # Hahtags

Popular Questions