I have a website with many different products for sale and a table clicks with all the clicks on the different products.
| click_id | product_id |
|---|---|
| 1 | 1234 |
| 2 | 0023 |
| 3 | 1234 |
- My products are not available every day.
- The only way I have to know if a product exists if is someone clicks on it and it appears in the table.
Knowing which products were available at some moment is easy.
SELECT DISTINCT product_id
FROM clicks
But this is a very big query and I can't afford to do it everyday.
What I want is to maintain a list of historical products and for each new day, knowing which one are available today, with the flag is_available=1
I tried using incremental models in dbt but I could not manage to make it work.
- look at the clicked products today,
- add products that I never saw before with the flag is_available=1
- update the availability status of the product depending if they were clicked today. I need to check that hourly.
How can I do this ?
Thank you for your help.
EDIT
{{
config(
materialized='incremental',
unique_key=['product_id'],
)
}}
WITH
online_products AS (
SELECT DISTINCT
product_id,
1 AS is_available,
CURRENT_DATE() AS updated_at
FROM
clicks
WHERE
partition_date >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
)
SELECT DISTINCT
product_id
function_to_update(is_online),
function_to_update(updated_at)
FROM
{{ this }} current
{% if is_incremental() %}
FULL OUTER JOIN
online_products
USING product_id
{% endif %}