Replace unique values with new values in dataframe, pandas?

1.4k Views Asked by At

I have dataframe like below, I want to desensitize it with replacing the unique values of a column. i.e. I want to replace the last name column with some fake last names that were generated from 'faker' library.

The code snippet is as below.

import pandas as pd
from faker import Faker
fake = Faker()
print(fake.first_name())
print(fake.last_name())
last = ('Meyer', 'Maier', 'Meyer', 'Mayer', 'Meyr', 'Mair')
job = ('data analyst', 'programmer', 'computer scientist', 
       'data scientist', 'accountant', 'psychiatrist')
language = ('Python', 'Perl', 'Java', 'Java', 'Cobol', 'Brainfuck')

df = pd.DataFrame(list(zip(last, job, language)), 
                  columns =['last', 'job', 'language'],
                  index=first) 

The desired output I want is to change the last name column with the fake names, but for example, Meyer should always be replaced with the same fake last names.

1

There are 1 best solutions below

0
On BEST ANSWER

Get yourself all unique names, create a dictionary with mapping unique name -> fake name, and map it your column:

import pandas as pd
first = ('Mike', 'Dorothee', 'Tom', 'Bill', 'Pete', 'Kate')
last = ('Meyer', 'Maier', 'Meyer', 'Mayer', 'Meyr', 'Mair')
job = ('data analyst', 'programmer', 'computer scientist', 
      'data scientist', 'accountant', 'psychiatrist')
language = ('Python', 'Perl', 'Java', 'Java', 'Cobol', 'Brainfuck')

df = pd.DataFrame(list(zip(last, job, language)), 
                  columns =['last', 'job', 'language'],
                  index=first) 
print(df)

# get all unique names - this can easily hande a couple tenthousand names
all_names = set(df["last"])

# create mapper: you would use fake.last_name() instead of 42+i
# mapper = {k: fake.last_name() for k in all_names }
mapper = {k: 42 + i for i, k in enumerate(all_names )}

# apply it
df["last"] = df["last"].map(mapper)
print(df)

Output:

# before
          last                 job   language
Mike      Meyer        data analyst     Python
Dorothee  Maier          programmer       Perl
Tom       Meyer  computer scientist       Java
Bill      Mayer      data scientist       Java
Pete       Meyr          accountant      Cobol
Kate       Mair        psychiatrist  Brainfuck

# after
          last                 job   language
Mike        44        data analyst     Python
Dorothee    43          programmer       Perl
Tom         44  computer scientist       Java
Bill        45      data scientist       Java
Pete        46          accountant      Cobol
Kate        47        psychiatrist  Brainfuck