Iterate through distinct column value in DBT with Jinja

185 Views Asked by At

I'm new to DBT and Jinja and am currently working on a project where I need to iterate through distinct ID values, filter a table based on each ID, and then either materialize the result as a separate table or export it as a CSV file to S3. Could someone please guide me on how to do this in DBT?

{% set get_id_query %}
  select distinct id from  table_name limit 10
{% endset %}

{% set results = run_query(get_id_query) %}

{% if execute %}
 {# Return the first column #}
 {% set results_list = results.columns[0].values() %}
 {% else %}
 {% set results_list = [] %}
{% endif %}

{% for id in results_list %}
 select * from table_name
  WHERE id = '{{ id }}'
 limit 1 
{% endfor %}
1

There are 1 best solutions below

0
On

I see you already have some code. I guess you still need to know how to use it, right?

I see that in your first question, you talk about the dbt config block. This block is not needed if you want to use the default configurations (the default dbt "table" materialization, or the configurations defined in your dbt_project.yml file). Let's say you want to change the default configurations and want to load your model incrementally (so that I can give you a more complete code example). In this case, the dbt model could look like the following:

Model:

{% set get_id_query %}
  select distinct id from table_name limit 10
{% endset %}

{% if execute %} {# So that your query doesn't run at compilation phase #}
 {# Return the first column #}
 {% set results_list = run_query(get_id_query).columns[0].values() %}
{% endif %}


{{
  config(
    materialized = 'incremental',
   )
}}

{% if results_list|length > 0 %}
 {% for id in results_list %}
  SELECT *
  FROM table_name
  WHERE id = '{{ id }}'
 {% if not loop.last %}UNION ALL {% endif %}
 {% endfor %}
{% else %}
 SELECT * FROM table_name LIMIT 0
{% endif %}

If you want to keep the LIMIT 1 thing, you will have to apply some GROUP BY or DISTINCT if you can, or apply a ROW_NUMBER() inside a CTE and later filter on row_number = 1.

I hope this helps!