Is there a way to specify the schema of a pyspark DataFrame returned by a query within df = spark.sql(...)
? Specifically, I am looking for a way to specify some columns must be nullable = false
.
This answer shows you can change the schema by creating a new DataFrame using spark.createDataFrame(df.rdd, df.schema)
, but as a comment mentions, it is very costly.
df.schema["column name"].nullability = True