I've been working on some code that will extract some aggregated data from my Google Analytics 4 export.
Unfortunately I've been running into an error regarding my variable event_timestamp in my CTE SessionsPerUser. I can't solve the error and have been trying to trace back what the problem is. The variable exists in my staging model and works fine for every other CTE. There has to be an issue with the Group by statement or the nested function? Any insights or ideas are greatly appreciated!
I will reference my entire code as well as the code snippet I believe causing the issue.
WITH RECURSIVE DateDimension AS
(
SELECT '2024-01-01'::DATE AS date -- Start date
UNION ALL
SELECT DATEADD(day, 1, date)
FROM DateDimension
WHERE date < CURRENT_DATE() -- Automatically updates to include up to the current date
),
UserInfo AS
(
SELECT
DATE(event_timestamp) AS event_date,
user_pseudo_id,
MAX(IFF(event_name IN ('first_visit', 'first_open'), 1, 0)) AS is_new_user,
MAX(IFF(event_name = 'app_remove', 1, 0)) AS is_deletion,
MAX(IFF(event_name = 'delete_account', 1, 0)) AS is_account_deletion,
MAX(IFF(param_exclusion = TRUE, 1, 0)) AS is_exclusion,
SUM(param_ENGAGEMENT_TIME_MSEC) AS total_engagement_time,
MAX(IFF(event_name = 'first_open', 1, 0)) AS is_download
FROM
{{ ref('stg_google_analytics_events') }}
GROUP BY
DATE(event_timestamp), user_pseudo_id
),
SessionInfo AS
(
SELECT
DATE(event_timestamp) AS event_date,
user_pseudo_id,
param_ga_session_id,
TIMESTAMPDIFF(SECOND, MIN(event_timestamp), MAX(event_timestamp)) AS session_duration
FROM
{{ ref('stg_google_analytics_events') }}
GROUP BY
DATE(event_timestamp), user_pseudo_id, param_ga_session_id
),
TotalSessions AS
(
SELECT
DATE(event_timestamp) AS event_date,
COUNT(DISTINCT param_ga_session_id) AS total_sessions
FROM
{{ ref('stg_google_analytics_events') }}
GROUP BY
DATE(event_timestamp)
),
SessionsPerUser AS
(
SELECT
DATE(event_timestamp) AS event_date,
AVG(session_count) AS sessions_per_user
FROM
(SELECT
DATE(event_timestamp) AS event_date,
user_pseudo_id,
COUNT(DISTINCT param_ga_session_id) AS session_count
FROM
{{ ref('stg_google_analytics_events') }}
GROUP BY
DATE(event_timestamp), user_pseudo_id) AS user_sessions
GROUP BY
DATE(event_timestamp)
),
AggregatedMetrics AS
(
SELECT
ui.event_date,
COUNT(DISTINCT ui.user_pseudo_id) AS active_users,
SUM(ui.is_new_user) AS signups,
SUM(ui.is_deletion) AS deletions,
SUM(ui.is_account_deletion) AS account_deletions,
SUM(ui.is_exclusion) AS exclusions,
SUM(ui.is_download) AS downloads,
AVG(si.session_duration) AS average_session_duration,
SUM(ui.total_engagement_time) / COUNT(DISTINCT si.param_ga_session_id) AS average_engagement_time,
ts.total_sessions,
spu.sessions_per_user
FROM
UserInfo ui
LEFT JOIN
SessionInfo si ON ui.user_pseudo_id = si.user_pseudo_id
AND ui.event_date = si.event_date
LEFT JOIN
TotalSessions ts ON ui.event_date = ts.event_date
LEFT JOIN
SessionsPerUser spu ON ui.event_date = spu.event_date
GROUP BY
ui.event_date, ts.total_sessions, spu.sessions_per_user
)
SELECT
dd.date,
COALESCE(am.downloads, 0) AS downloads,
COALESCE(am.active_users, 0) AS active_users,
COALESCE(am.signups, 0) AS signups,
COALESCE(am.deletions, 0) AS deletions,
COALESCE(am.account_deletions, 0) AS account_deletions,
COALESCE(am.exclusions, 0) AS exclusions,
COALESCE(am.total_sessions, 0) AS total_sessions,
COALESCE(am.sessions_per_user, 0) AS sessions_per_user,
COALESCE(am.average_session_duration, 0) AS average_session_duration,
COALESCE(am.average_engagement_time, 0) AS average_engagement_time
FROM
DateDimension dd
LEFT JOIN
AggregatedMetrics am ON dd.date = am.event_date
ORDER BY
dd.date
Error message:
000904 (42000): SQL compilation error: error line 46 at position 11 invalid identifier 'EVENT_TIMESTAMP'
Code snippet:
SessionsPerUser AS
(
SELECT
DATE(event_timestamp) AS event_date,
AVG(session_count) AS sessions_per_user
FROM (
SELECT
DATE(event_timestamp) AS event_date,
user_pseudo_id,
COUNT(DISTINCT param_ga_session_id) AS session_count
FROM {{ ref('stg_google_analytics_events') }}
GROUP BY DATE(event_timestamp), user_pseudo_id
) AS user_sessions
GROUP BY DATE(event_timestamp)
),
Again I can verify that event_timestamp exists and works for my other CTEs so it is not an 'invalid identifier'
Thanks for the help!
In the
SessionsPerUserCTE, you haveDATE(event_timestamp) AS event_date. This is selecting from a subquery where you already created event_date from event_timestamp. So it should look like: