While doing a pyspark dataframe self-join I got a error message:
Py4JJavaError: An error occurred while calling o1595.join.
: org.apache.spark.sql.AnalysisException: Resolved attribute(s) un_val#5997 missing from day#290,item_listed#281,filename#286 in operator !Project [...]. Attribute(s) with the same name appear in the operation: un_val. Please check if the right attribute(s) are used.;;
It is a simple dataframe self-join like the one below, that works fine but after a couple of operations on the dataframe like adding columns or joining with other dataframes the error mentioned above is raised.
df.join(df,on='item_listed')
Using dataframe aliases like bellow wont work either and the same error message is raised:
df.alias('A').join(df.alias('B'), col('A.my_id') == col('B.my_id'))
I've found a Java workaround here SPARK-14948 and for pyspark is like this: