Deleting rows if missing in some variable in Python Pandas

1.8k Views Asked by At

I am trying to use Pandas to remove rows that contain missing ethnicity information, though I didn't get very far as I am new to Pandas.

Using 'print name[ethnic.isnull() == True]' I can visualize which are the people with missing ethnicity information. But ultimately I want to 1) record the index by appending the missing-ethnicity cases' indexes into the 'missing array', 2) then create a second frame by deleting all the row with index matched with those in the 'missing' array.

I am currently stuck in the 'for case in frame' loop, where I try to print names of those with missing ethnicity. But my program ends without error but without printing out anything.

import pandas as pd
from pandas import DataFrame, Series
import numpy as np

### Remove cases with missing name or missing ethnicity information
def RemoveMissing():
    data = pd.read_csv("C:\...\sample.csv")
    frame = DataFrame(data)
    frame.columns = ["Name", "Ethnicity", "Event_Place", "Birth_Place", "URL"]

    missing = []
    name = frame.Name
    ethnic = frame.Ethnicity

    # Filter based on some variable criteria
    #print name[ethnic == "English"]
    #print name[ethnic.isnull() == True] # identify those who don't have ethnicity entry

    # This works
    for case in frame:
        print frame.Name

    # Doesn't work
    for case in frame:
        if frame.Ethnicity.isnull() is True:
            print frame.Name

RemoveMissing()
1

There are 1 best solutions below

0
On

This seems to work:

# Create a var to check if Ethnicity is missing
index_missEthnic = frame.Ethnicity.isnull()
frame2 = frame[index_missEthnic != True]