only output columns with missing values and the percetage of it

139 Views Asked by At

I used this code to find the percentage of missing values in each column:

     df.isnull().mean().round(4).mul(100).sort_values(ascending=False)
     

The result shows all columns, which makes the output looks very long if you have a lot of columns and only need to know the percentage of the missing value ones. I would like to know how to remove all the columns that have no missing values for this code. Any other solutions are welcomed


Thanks for any answer.

2

There are 2 best solutions below

0
mozway On

You can save your nulls in an intermediate dataframe and use it for slicing:

nulls = df.isnull()
nulls.loc[:, nulls.any()].mean().round(4).mul(100).sort_values(ascending=False)

example input:

     a    b  c
0  0.0  1.0  2
1  3.0  NaN  4
2  NaN  5.0  6

output:

b    33.33
a    33.33
3
Daweo On

You might simply filter what you get to only include columns which have more than 0 missing value, consider following simple example

import pandas as pd
df = pd.DataFrame({'a':[1.0,2.0,3.0],'b':[4.0,float("nan"),float("nan")],'c':[float("nan"),float("nan"),float("nan")]})
means = df.isnull().mean()
percentages = means[means>0].round(4).mul(100).sort_values(ascending=False)
print(percentages)

output

c    100.00
b     66.67
dtype: float64