I started a spark session in Python (I already installed pyspark and pyiceberg) like this
# Create a SparkSession
spark = SparkSession.builder.appName("MySQLRead") \
.config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
.config("spark.sql.catalog.icebergcatalog", "org.apache.iceberg.spark.SparkCatalog") \
.config("spark.sql.catalog.icebergcatalog.type", "hadoop") \
.getOrCreate()
Then I created database and table using iceberg
spark.sql("CREATE DATABASE IF NOT EXISTS classicmodels;")
spark.sql("USE classicmodels;")
spark.sql("""
CREATE TABLE orders (
orderNumber bigint COMMENT 'unique id',
orderDate timestamp,
requiredDate timestamp,
shippedDate timestamp,
status string,
comments string,
customerNumber bigint)
USING ICEBERG
TBLPROPERTIES (
'option.format-version'='2'
);
""")
When I query the table classicmodels.orders, it returned an error
spark.sql("select * from classicmodels.orders ;").show()
Py4JJavaError: An error occurred while calling o31.sql.
: java.util.concurrent.ExecutionException: org.apache.spark.sql.AnalysisException: ICEBERG is not a valid Spark SQL Data Source.
The configuration of Iceberg, Spark and Hive Catalog
Replace "/path/to/iceberg-spark.jar" with the actual path to the Iceberg Spark JAR file. You can download it from the Iceberg releases page on GitHub.
Make sure to adjust the Iceberg version and Spark version according to your environment.