Is there a way to specify the schema of a pyspark DataFrame returned by a query within df = spark.sql(...)? Specifically, I am looking for a way to specify some columns must be nullable = false.
This answer shows you can change the schema by creating a new DataFrame using spark.createDataFrame(df.rdd, df.schema), but as a comment mentions, it is very costly.
df.schema["column name"].nullability = True