How to call avro SchemaConverters in Pyspark

899 Views Asked by At

Although PySpark has Avro support, it does not have the SchemaConverters method. I may be able to use Py4J to accomplish this, but I have never used a Java package within Python.

This is the code I am using

# Import SparkSession
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType


def _test():

      # Create SparkSession 
      spark = SparkSession.builder \
            .master("local[1]") \
            .appName("sparvro") \
            .getOrCreate() 

      avroSchema = sc._jvm.org.apache.spark.sql.avro.SchemaConverters.toAvroType(StructType([ StructField("firstname", StringType(), True)]))

if __name__ == "__main__":
    _test()

however, I keep getting this error

AttributeError: 'StructField' object has no attribute '_get_object_id'
1

There are 1 best solutions below

0
On

This should work by passing toAvroType a Java StructType and not a Python one:

df = <some-data-frame>
spark._jvm.org.apache.spark.sql.avro.SchemaConverters.toAvroType(df._jdf.schema(), False, 'namespace', None)

Here's the documentation for Java: https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/avro/SchemaConverters.html