I am using delta live tables for learning.
So, I created tables that look like this:
/* Silver Layer */
CREATE LIVE TABLE SSliverLive
(
RowID INT COMMENT 'Primary key',
OrderDate DATE,
ShipDate DATE,
OrderYear INT GENERATED ALWAYS AS (YEAR (OrderDate)),
ShipYear INT GENERATED ALWAYS AS (YEAR (ShipDate)),
)
USING DELTA
AS
SELECT *
FROM live.SBronzeLive;
/* Gold Layer */
CREATE LIVE TABLE SGoldLive
AS
SELECT rowid
, OrderYear
, COUNT(RowId) AS tt
FROM live.SSliverLive
GROUP BY rowid, orderyear;
But, the DLT pipelines throw the below error:
A generated column cannot use a non-existent column or another generated column.
I followed the SQLBits tutorial online and they can use the generated columns.
Removing the columns works fine but the Databricks website says I can use it.
The documentation states that generated column values are automatically generated based on a user-specified function over other columns in the Delta table.
In your gold table, you mention only
OrderYear, which is evaluated asYEAR(OrderDate) AS OrderYear, but you don't haveOrderDatedefined in the gold table, so it raises an error.You can use
OrderYearonly when you includeOrderDatein the SELECT statement, as shown below:If you include
OrderDatein the SELECT statement, you will need to add it to the GROUP BY clause, which may not produce the desired results.Therefore, you should perform a GROUP BY on
OrderYearwithout selecting it.Here's an example using
MONTH:Silver table:
Gold table:
This produces output with a single column.
Alternatively, you can use the original column:
Outputs: