I have a [GROUP_BY_POS_AGGREGATE] error when doing aggregation on some columns having formulas

27 Views Asked by At

I have a problem in some scala code and I cannot get over it. I’m moving some scala project on databricks to use unity catalog, All is going well except one process I’m running.

To simplify, I have some columns that are defined as a calculation, I have a listof Dataframes that I want to Join, calculate my columns and aggregate.

Exception: AnalysisException: [GROUP_BY_POS_AGGREGATE] GROUP BY 6 refers to an expression round((sum(CASE WHEN Column OR Column ) THEN Column ELSE 0.0D END)), 7)

Code example: code is ment to be a Databricks Job on workflow columnList contains all computed columns

val dfList = groupKpiByTimeframe.map { case (timeFrame, columnList) =>
  val scopeDf = timeFrame.getDataForTimeFrame(someDf)
  positionDf
    .transform(scopeDf)
    .join(scopeDf, Seq(myColumn), leftJoin)
    .groupBy(getGroupByColumns(timeFrame, valueDate): _*)
    .agg(columnList.head, columnList.tail: _*)
dfList.reduce(_.unionByName(_, allowMissingColumns = true))

error is in the unionByName operation putting a persist() then show() on dfs before reduce shows that all data is Ok and well evaluated. Only difference on code os that it uses "spark_version": "13.3.x-scala2.12" instead of "spark_version": "10.4.x-scala2.12" (UC requirement)

It was all running well with previous version I'm using scalaVersion := "2.12.15".

0

There are 0 best solutions below