Using great expectations with streamed data

1.2k Views Asked by At

I am using great expectations to test streaming data (I collect a sample into a batch and test the batch). The issue is I cannot use the docs because this will results in 100 of 1000s of html pages being generated. What I would like to do is use my api to generate the page requested from the json result when the specific test results are clicked on (via the index page). Is great expectations able to generate only 1 html which can be disposed of when it is closed?

1

There are 1 best solutions below

1
On BEST ANSWER

If you are using a ValidationOperator / Checkpoint, then using the UpdateDataDocsAction action supports only building the resources that were validated in that run, and is the recommended approach.

If you are interacting directly with the DataContext API, then the build_data_docs method on DataContext supports a resource identifier option that you can use to request only a single asset is built. I think to get the behavior you're looking for (a truly ephemeral build of just that page), you'd want to pair that with a site configuration for a site in a temporary location, e.g. /tmp.

The docs on the build_data_docs method are here: https://docs.greatexpectations.io/en/latest/autoapi/great_expectations/data_context/data_context/index.html#great_expectations.data_context.data_context.BaseDataContext.build_data_docs

Note that the resource_identifiers parameter requires, e.g. a ValidationResultIdentifier object, such as:

context.build_data_docs("local_site", resource_identifiers=[ValidationResultIdentifier(
    run_id="20201203T182816.362147Z",
    expectation_suite_identifier=ExpectationSuiteIdentifier("foo"),
    batch_identifier="b739515cf1c461d67b4e56d27f3bfd02",
)])