How to run DBT model and pass a variable at run time

1.8k Views Asked by At

We have a DBT application which we run in pods using Apache Airflow in AWS. I have a model which I need to run for a specific appId. I need to pass the parameter appId at runtime in cli command to the DBT model so that the model runs only for that specific appId.

We run the following DBT command to run a model on our local machine.

dbt -d run --model abc

The Apache airflow code that we use to run DBT model is :

abc_task = KubernetesPodOperator(namespace='etl',
                                  image=f'blotout/dbt-analytics:{TAG_DBT_ANALYTICS}',
                                  cmds=["/usr/local/bin/dbt"],
                                  arguments=['run', '--models', 'abc_task'],
                                  env_vars=env_var,
                                  name="abc_task",
                                  configmaps=['awskey'],
                                  task_id="abc_task",
                                  get_logs=True,
                                  dag=dag,
                                  is_delete_operator_pod=True,
                                  )

We need something like this:

{%- set appIdList = ['{{ var("appId") }}'] -%}

And the value of appId should be passed through Airflow task as shown above in the CLI command.

2

There are 2 best solutions below

0
On BEST ANSWER

Have you considered using environment variables to share the information from your Apache Airflow DAG with the Kubernetes Pod running dbt?

In your case, you could declare APP_ID within the env_vars dictionary.

Inside the dbt model file, you can use the env_var function to incorporate environment variables from the system into the model using Jinja:

{{ env_var('APP_ID') }}

The dbt docs give more details about this feature: https://docs.getdbt.com/reference/dbt-jinja-functions/env_var

2
On

Project variables offer this functionality. They can be defined either in dbt_project.yml or on the command line.

The model should use this variable in order to limit execution for the specific use case.

Here's an example of how this is possible with by adding --vars in KubernetesPodOperator arguments for dbt:

abc_task = KubernetesPodOperator(namespace='etl',
                                  image=f'blotout/dbt-analytics:{TAG_DBT_ANALYTICS}',
                                  cmds=["/usr/local/bin/dbt"],
                                  arguments=['run', '--models', 'abc_task', '--vars', '{"appId": "id-123"}'],
                                  env_vars=env_var,
                                  name="abc_task",
                                  configmaps=['awskey'],
                                  task_id="abc_task",
                                  get_logs=True,
                                  dag=dag,
                                  is_delete_operator_pod=True,
                                  )