Adding multiple fields for the same template in Data Catalog using python

63 Views Asked by At

I have a tag template in data catalog with name 'data check' and i need to add multiple fields to this template using python code by reading from a text file

the text file consists the following name,age,salary ricky,23,20k ricky,25,25k ricky,30,30k rishab,22,30k rishab,23,40k rishab,29,35k

i need to add these fields to the tag template "data check" in data catalog. need help

tried reading the text file and tried adding these lines to the template, but facing errors Error: 409 Template.ricky already exists for the second line for ricky and so on...

1

There are 1 best solutions below

0
Piotr Zalas On

I assume that you have CSV file and the first line in your file is header, and consecutive lines contain data to be put in tags.

The tag template is reusable specification of a tag. In your example, you could have tag template data check with 3 fields: name of type String, age of type Int and salary of type string. Here is example Python script for creating tag template. Names for tag templates must be unique within the project, and this is probably the reason you get error when trying to create it (you have multiple lines with ricky name that you use as a tag template name).

You can't store data (such as ricky,23,20k) in tag template. Tag template only says what is the schema of tag. Tag is actual application of tag template with some data filled in (e.g. name=ricky, age=23, salary=20k). To create Tag you must have an Entry in Data Catalog, on which the tag will be created. On one Entry you can create only one tag using the same tag template.

To sum up, CSV header of the file corresponds to Tag Template, and each consecutive line corresponds to Tag. There is no explicit information in the file that could be used to specify entry (you can't use name column because names of entries must be unique). For each data line and Tag you need to create a separate Entry.