Rest API: Is there any endpoint that provides dataset lineage info?

58 Views Asked by At

I was not able to find an endpoint that can provide information about datasets dependencies.

Example: I would like to same (but simplified) info as I do with lineage about Dataset A depending on B and C via Pipeline P / code repositori X?

2

There are 2 best solutions below

0
Matija Herceg On BEST ANSWER

You can always see the network requests that populate the data in the data lineage.

If you expand the lineage downstream it uses the endpoint you can call using requests like this:

requests.post(f'{FOUNDRY_URL}/provenance/api/provenance/containers/graph',
              data={
                  'type': 'downstreamRequest',
                  'downstreamRequest': {
                      'downstreamResourceLabels': {},
                      'rootContainer': yourDatasetRid,
                      'upstreamResourceLabels': {}
                    })

And there is the equivalent for the upstream lineage.

0
Ismael Serrano On

Matja solution worked. I also tried this one that reported incoming and outgoing links:

PalantirHostName = ...

datasetRid = ...

url = f"{PalantirHostName}/monocle/api/links"
request_body = {"resourceIdentifiers":[datasetRid],"branch":"master","fallbacks":[],"serviceTypeFilter":["BUILDGRAPH","EXTERNALGRAPH","PROVENANCEGRAPH","ONTOLOGYGRAPH"]}
response = requests.post(url, headers=headers,json=request_body)

Response:

{"nodes": [ { "resourceIdentifier":"ri.foundry.main.dataset.00000000-0000-0000-0000-000000000001" ,"links":[ { "type":"datasetLink" ,"datasetLink":{ "linkDirection":"INCOMING" ,"resourceIdentifier":"ri.foundry.main.dataset.00000000-0000-0000-0000-000000000002" ,"name":null,"inTrash":false } } ,{ "type":"datasetLink" ,"datasetLink":{ "linkDirection":"OUTGOING" ,"resourceIdentifier":"ri.foundry.main.dataset.00000000-0000-0000-0000-000000000003" ,"name":null ,"inTrash":false } } ] } ] }