I have a problem in some scala code and I cannot get over it. I’m moving some scala project on databricks to use unity catalog, All is going well except one process I’m running.
To simplify, I have some columns that are defined as a calculation, I have a listof Dataframes that I want to Join, calculate my columns and aggregate.
Exception: AnalysisException: [GROUP_BY_POS_AGGREGATE] GROUP BY 6 refers to an expression round((sum(CASE WHEN Column OR Column ) THEN Column ELSE 0.0D END)), 7)
Code example: code is ment to be a Databricks Job on workflow columnList contains all computed columns
val dfList = groupKpiByTimeframe.map { case (timeFrame, columnList) =>
val scopeDf = timeFrame.getDataForTimeFrame(someDf)
positionDf
.transform(scopeDf)
.join(scopeDf, Seq(myColumn), leftJoin)
.groupBy(getGroupByColumns(timeFrame, valueDate): _*)
.agg(columnList.head, columnList.tail: _*)
dfList.reduce(_.unionByName(_, allowMissingColumns = true))
error is in the unionByName operation
putting a persist() then show() on dfs before reduce shows that all data is Ok and well evaluated. Only difference on code os that it uses "spark_version": "13.3.x-scala2.12" instead of "spark_version": "10.4.x-scala2.12" (UC requirement)
It was all running well with previous version I'm using scalaVersion := "2.12.15".