I have dataframe like below, I want to desensitize it with replacing the unique values of a column. i.e. I want to replace the last name column with some fake last names that were generated from 'faker' library.
The code snippet is as below.
import pandas as pd
from faker import Faker
fake = Faker()
print(fake.first_name())
print(fake.last_name())
last = ('Meyer', 'Maier', 'Meyer', 'Mayer', 'Meyr', 'Mair')
job = ('data analyst', 'programmer', 'computer scientist',
'data scientist', 'accountant', 'psychiatrist')
language = ('Python', 'Perl', 'Java', 'Java', 'Cobol', 'Brainfuck')
df = pd.DataFrame(list(zip(last, job, language)),
columns =['last', 'job', 'language'],
index=first)
The desired output I want is to change the last name column with the fake names, but for example, Meyer should always be replaced with the same fake last names.
Get yourself all unique names, create a dictionary with mapping unique name -> fake name, and map it your column:
Output: