Unable to create iceberg table on top of avro files

156 Views Asked by At

I have a bunch of avro files lying my hdfs location : 'hdfs://mycluster/poc/testData/outputAvroNew/Customer'

I need to create an iceberg table on top of these avro files using spark.

I have setup the iceberg catalog as a hadoop catalog as follows: spark-sql -v --packages org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.3.1 --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog --conf spark.sql.catalog.spark_catalog.type=hadoop --conf spark.sql.catalog.spark_catalog.warehouse=hdfs://mycluster/poc/iceberg_warehouse

I have created the iceberg table as follows: CREATE TABLE spark_catalog.iceberg_test.customer_iceberg_staging USING iceberg OPTIONS ( 'format' = 'avro', 'schema' = 'hdfs://mycluster/poc/avroSchemas_test/Customer.avsc', 'write' = 'hdfs://mycluster/poc/testData/outputAvroNew/Customer') LOCATION 'hdfs://mycluster/poc/iceberg_warehouse/iceberg_test/customer_iceberg_staging' TBLPROPERTIES ( 'current-snapshot-id' = 'none', ' format-version' = '1')

I do not see any output when I try and query this table spark-sql (iceberg_test)> select * from spark_catalog.iceberg_test.customer_iceberg_staging; select * from spark_catalog.iceberg_test.customer_iceberg_staging Time taken: 0.894 seconds spark-sql (iceberg_test)>

What am I doing incorrectly?

I was expecting the avro data to be returned.

0

There are 0 best solutions below