Schema validation and alerting system for firehose data

30 Views Asked by At

We have a datalake, and various source teams ingest data into our S3 datalake via firehose. One of the main issues we are dealing with is that our schema contracts are maintained in documents, and these documents are getting outdated over time due to frequent schema changes from the source side. We would like to create a schema management registry and an alerting mechanism that will notify us of any changes in the schema from upstream.the reason behind creating an alerting system is our downstream ETL jobs fail with un-notified schema changes.We use AWS and python technologies.

I tried this: I integrated firehose with glue : https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html#record-format-conversion-concepts. In this S3 data will be in schema mentioned in glue. This is just like a schema enforcement to the incoming data. But drawback is the additional col are disgarded here, that implies to a data loss.

I tried to ingrated forehose with lambda to check each with a standard jsonschema library,but unable to get the desired result.

To summarize, I want to validate the incoming firehose data with standard schema and send alert if it finds any changes.

0

There are 0 best solutions below