Handle missing values in pandas using dtype to read files

Question

Handle missing values in pandas using dtype to read files

848 Views Asked by Luis Ramon Ramirez Rodriguez At 27 July 2025 at 12:41

I'm reading a bunch of CSV files using dtype to specify the type of data of each column:

dict_tpye = {"columns_1":"int","column_2":"str"}
pd.read_csv(path,dtype=dict_tpye)

the problem I'm facing with at doing this that columns with non-float values have missing rows, which rise and error. How can I handle this?

I'll like to use a default value for such a cases, like a 0 for numeric values and an empty string for names.

Original Q&A

There are 2 best solutions below

hernamesbarbara On 26 December 2016 at 15:25

One way to fill missing w/ a placeholder is to perform the fill after you've read in the data to a DataFrame. Like so

#!/usr/bin/env python
# -*- coding: utf-8 -*- 
import numpy as np
import pandas as pd

# csv data with missing data in each of the 2 columns
csv_data = """number,colour
3,blue
12,
2,
2,red
,yellow
6,yellow
14,purple
4,green
18,green
11,orange"""

df = pd.read_csv(pd.io.parsers.StringIO(csv_data))

df.number = df.number.fillna(-999)    # fill missing numbers w/ -999
df.colour = df.colour.fillna('UNK')   # fill missing categorical w/ UNK 

print df

# In [1]: run test.py
#    number  colour
# 0     3.0    blue
# 1    12.0     UNK
# 2     2.0     UNK
# 3     2.0     red
# 4  -999.0  yellow
# 5     6.0  yellow
# 6    14.0  purple
# 7     4.0   green
# 8    18.0   green
# 9    11.0  orange

**Parfait** · Accepted Answer

Consider the converters argument which uses a dictionary, mapping results of a user-defined function to imported columns. Below user-defined methods uses the built-in isdigit() that returns True if all characters in string are a digit and False if at least one is not; and isalpha() as the string counterpart. Adjust as needed especially with strings as you may allow numbers in its content:

import pandas as pd

cleanFloat = lambda x: float(x if x.isdigit() else 0) 
cleanString = lambda x: str(x if x.isalpha() else '')

dict_convert = {1:cleanFloat, 2:cleanString,}
dict_type = {"columns_1":"int","column_2":"str"}

df = pd.read_csv('Input.csv', converters=dict_convert, dtype=dict_type)

Handle missing values in pandas using dtype to read files

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in MISSING-DATA

Trending Questions

Popular # Hahtags

Popular Questions