Replace missing values (given as strings) in pandas dataframe by np.NaN

10.2k Views Asked by At

I have a dataframe energy with missing values in some column. The missing values are represented by a string ... in the dataframe. I want to replace all these values by np.NaN

In [3]: import pandas as pd

In [4]: import numpy as np

In [7]: energy = pd.read_excel('test.xls', skiprows = 17, skip_footer = 38, parse_cols = range(2, 6), index_col = None, names = ['Country', 'ES'
   ...: , 'ESC', '% Renewable'])

In [8]: energy[(energy['ES'] == "...") | (energy['ESC'] == "...")]
Out[8]: 
                          Country   ES  ESC  % Renewable
3                  American Samoa  ...  ...     0.641026
86                           Guam  ...  ...     0.000000
150      Northern Mariana Islands  ...  ...     0.000000
210                        Tuvalu  ...  ...     0.000000
217  United States Virgin Islands  ...  ...     0.000000

To replace these values, I tried:

In [9]: energy[(energy['ES'] == "...")]['ES'] = np.NaN
/usr/local/bin/ipython:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  #!/usr/bin/python3

I don't understand the error and also I don't see any other way to achieve what I want to. Any ideas?

2

There are 2 best solutions below

0
On BEST ANSWER

I think you need:

energy['ES'] = energy.loc[energy['ES'] != "...", 'ES'] 

Another solution:

energy['ES'] = energy['ES'].mask(energy['ES'] == "...")

Or:

energy['ES'] = energy['ES'].replace({'...': np.nan})

But the best is ayhan comment:

you can pass na_values='...' to pd.read_excel

0
On

If Energy is your pandas dataframe then in your case you can also try:

for col in Energy.columns:
    Energy[col] = pd.to_numeric(Energy[col], errors = 'coerce')

Above code will convert all your missing values to nan automatically for all columns in your dataframe.