How can i see metadata, lineage of data stored in AWS redshift?

Question

How can i see metadata, lineage of data stored in AWS redshift?

5.6k Views Asked by Nik At 03 July 2025 at 16:56

I am using solutions like cloudera navigator, atlas and Wherehows

to get Hadoop, HDFS, HIVE, SQOOP, MAPREDUCE metadata and lineage.

Now we have a data warehouse in AWS redshift as well. Is there a way to extract metadata or lineage or both information out of redshift.

So far i have not found anything on this.

Is there a way to integrate the same to wherehows as a crawled solution?

I found only one post which gives some information about how to get some information from redshift assuming it will be similar to postgresql. I am sure someone would have written some open source solution to this problem. Or is it just matter of writing a simple single script to extract this information? I am looking for a enterprise level solution. I hope someone will point me in right direction.

Original Q&A

There are 2 best solutions below

**Lars Kamp** · Answer 1

You can access metadata by querying the system tables in Redshift:

https://docs.aws.amazon.com/redshift/latest/dg/cm_chap_system-tables.html

The system tables are on the leader node in each cluster (see this guide on the Redshift Architecture that I wrote)

Redshift deletes the content of the system tables on a rolling basis, so you need to store that data in your cluster, or another separate cluster, to get a history. With the data in the system tables, you have a baseline of information about your queries and what tables they are touching.

You can put a dashboard like Kibana or Periscope Data on top of that data to visualize it. Plaid has done a write-up of how they've built an in-house monitoring solution that has some information about data lineage:

https://blog.plaid.com/managing-your-amazon-redshift-performance-how-plaid-uses-periscope-data/

But go get true data lineage, you need to understand how queries relate to your workflows, i.e. for an Airflow DAG. To get that information, you need to "tag" your queries so you can trace them in the context of transformations / workflows, vs. looking at the individual query.

This is something we've built into our product - heads up that it's a commercial solution:

https://www.intermix.io/blog/announcing-query-insights/

Unlike the raw logs from the system tables, we give you the context of what apps / workflows are triggering queries, which users are running them, and what tables they are touching.

Lars

**satish silveri** · Answer 2

AWS Glue Data catalog is a fully managed metadata management service.It has AWS Glue crawler which automatically crawls through your source(for you its redshift) and creates a centralized metadata repository which can be accessed by other AWS services.

Refer:

https://docs.aws.amazon.com/glue/latest/dg/components-overview.html

https://aws.amazon.com/glue/

How can i see metadata, lineage of data stored in AWS redshift?

There are 2 best solutions below

Related Questions in AMAZON-WEB-SERVICES

Related Questions in AMAZON-REDSHIFT

Related Questions in HADOOP2

Related Questions in DATABASE-METADATA

Related Questions in DATA-LINEAGE

Trending Questions

Popular # Hahtags

Popular Questions