Creating a dummy variable and data wrangling

58 Views Asked by At

I have a dataframe that looks like this:

enter image description here

I need to create a new dataframe in which the student names are the index, the course number is the columns and the values are 0 or 1, depending on whether or not the student took that course.

I have tried the pd.get_dummies() function but the result was too messy to work with, since I still had to condense the student names to only appear once in the rows.

I am running out of ideas on how to achieve the desired dataframe.

1

There are 1 best solutions below

0
On

Let's create source dataframe:

import pandas as pd 

df1 = pd.DataFrame({
    'Student name': ['Bill Mumy', 'Geraldine Ferraro', 'Geraldine Ferraro', 'Laura Lippman', 'Laura Lippman', 'Edward Koch', 'Celeste Holm'],
    'Course number': ['ARTS516', 'ARTS516', 'ARTS516', 'ARTS516', 'ARTS516', 'ARTS401', 'ARTS401']
})
df1.head(10)

Output:

Output

To turn student names to rows and course names to columns function "pivot_table" could be used:

df2 = df1.pivot_table(index = 'Student name', columns = 'Course number', aggfunc = 'size')
df2.head(10)

Output:

Output

To replace Nan values to zeroes and numeric values to ones function "applymap" could be used:

df2 = df2.applymap(lambda x: 0 if pd.isna(x) else 1)
df2.head(10)

Output:

Output