How to integrate great expectations into airflow project

1k Views Asked by At

I m trying to integrate great expectations into a airflow project but without success.

My question is there a configuration to do ?

Here are the steps I followed:

1- I generate the great expectaions project by following this tutorial https://docs.greatexpectations.io/docs/tutorials/getting_started/tutorial_setup

2- I copy the great_expectations folder into /include

The airflow project looks like:

enter image description here

3- Create a DAG

import os
import pathlib
from pathlib import Path
from datetime import datetime, timedelta

from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from great_expectations_provider.operators.great_expectations import GreatExpectationsOperator

base_path = Path(__file__).parents[1]
ge_root_dir = os.path.join(base_path, "include", "great_expectations")
data_file = os.path.join(base_path, "include", "data/yellow_tripdata_sample_2019-01.csv")


default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2019, 1, 1),
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5)
}

dag = DAG('example_great_expectations_dag',
          schedule_interval='@once',
          default_args=default_args)

with dag:

    ge_task = GreatExpectationsOperator(
        task_id="ge_task",
        data_context_root_dir=ge_root_dir,
        checkpoint_name="getting_started_checkpoint")
    

    ge_task

Error:

[2022-04-17, 02:52:54 EDT] {great_expectations.py:122} INFO - Running validation with Great Expectations...
[2022-04-17, 02:52:54 EDT] {great_expectations.py:125} INFO - Ensuring data context is valid...
[2022-04-17, 02:52:54 EDT] {util.py:153} CRITICAL - Error The module: `great_expectations.data_context.store` does not contain the class: `ProfilerStore`.
    - Please verify that the class named `ProfilerStore` exists. occurred while attempting to instantiate a store.
[2022-04-17, 02:52:54 EDT] {taskinstance.py:1718} ERROR - Task failed with exception
1

There are 1 best solutions below

0
On

this might be a package dependency Problem. Please make sure:

Notes on compatibility

=> This operator currently works with the Great Expectations V3 Batch Request API only. If you would like to use the operator in conjunction with the V2 Batch Kwargs API, you must use a version below 0.1.0

=> make sure, that you use the same packages in both environments

I had the same problem