ValueError: Unable to parse datatype from schema. Could not parse datatype: interval year

197 Views Asked by ZygD At 21 November 2023 at 10:36

We use data type dependent logic in Spark 3.2. For interval year data type, DataFrame methods schema and dtypes don't seem to work.

Without interval year type column, the methods work well:

df1 = spark.range(1)

df1.printSchema()
# root
#  |-- id: long (nullable = false)

print(df1.schema)
# StructType(List(StructField(id,LongType,false)))

print(df1.dtypes)
# [('id', 'bigint')]

But when I add a new column, schema and dtypes methods start to throw the parsing error:

df2 = df1.withColumn('col_interval_y', F.expr("INTERVAL '2021' YEAR"))

df2.printSchema()
# root
#  |-- id: long (nullable = false)
#  |-- col_interval_y: interval year (nullable = false)

print(df2.schema)
# ValueError: Unable to parse datatype from schema. Could not parse datatype: interval year

print(df2.dtypes)
# ValueError: Unable to parse datatype from schema. Could not parse datatype: interval year

For our logic to work, we need to access column data types of a dataframe. How can we access the type interval year in Spark 3.2? (Spark 3.5 doesn't throw errors, but we cannot use it yet)

Original Q&A

There are 1 best solutions below

ZygD On 21 November 2023 at 11:53

I have found that it's possible to use underlying _jdf.

The following recreates the result of dtypes:

jdtypes = [(x.name(), x.dataType().typeName()) for x in df2._jdf.schema().fields()]
print(jdtypes)
# [('id', 'long'), ('col_interval_y', 'interval year')]

ValueError: Unable to parse datatype from schema. Could not parse datatype: interval year

There are 1 best solutions below

Related Questions in DATAFRAME

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in TYPES

Related Questions in PYSPARK-SCHEMA

Trending Questions

Popular # Hahtags

Popular Questions