I would like the data to be masked, but it was possible to understand how many people studied at UNIVERSITY_1.
What de-identification transformation can I use to accomplish such information\text masking?
Input:
{
"students": [
{
"name": "John Smith",
"university": "University of Pennsylvania"
},
{
"formattedName": "Mike Miller",
"university": "Harvard University"
},
{
"formattedName": "Elon Musk",
"university": "University of Pennsylvania"
}
]
}
Output:
{
"students": [
{
"name": "John Smith",
"university": "UNIVERSITY_1"
},
{
"formattedName": "Mike Miller",
"university": "UNIVERSITY_2"
},
{
"formattedName": "Elon Musk",
"university": "UNIVERSITY_1"
}
]
}
You could create a custom infotype with a single dictionary item (1 per college) and do a replace with infotype transform - how many schools are in your data set?