Any easy way to strip off the dtype from pandas.dtypes.to_dict()?

359 Views Asked by At

I am reading a large csv file. In order to set the datatypes properly when reading, I do a sample read of just 5 rows and then get the dtypes pandas has inferred. Then, I want to hand-edit this to properly configure the datatypes and then call the read_csv to read the full file.

However, when I do df1.dtypes.to_dict() pandas then produces this

{'Invoice Date': dtype('O'),
 'Invoice ID': dtype('O'),
 'Item ID': dtype('float64'),
 'Line Amount': dtype('float64'),
 'Line Amount Tax': dtype('float64')
}

I don't want that dtype as I get error when I paste it to assign it to a temp dict that I edit. Hence, I copy and paste this output to VSCode, use a regex to extract the following output:

{'Invoice Date': O,
 'Invoice ID': O,
 'Item ID': float64,
 'Line Amount': float64,
 'Line Amount Tax': float64
}

Is there any way to get this directly in pandas?

2

There are 2 best solutions below

1
On BEST ANSWER

You can convert values to strings and for object use Series.replace:

df = pd.DataFrame({
        'A':list('abcdef'),
         'B':[4,5,4,5,5,4],
         'C':[7,8,9,4,2,3],
         'D':[1,3,5.,7,1,0],
         'E':[5,3,6,9,2,4.],

    })

print (df.dtypes.astype(str).replace('object','O').to_dict())
{'A': 'O', 'B': 'int64', 'C': 'int64', 'D': 'float64', 'E': 'float64'}
1
On

You can use np.dtypes.name with Series.map here.

# Thanks to Jezrael for df
df = pd.DataFrame(
    {
        "A": list("abcdef"),
        "B": [4, 5, 4, 5, 5, 4],
        "C": [7, 8, 9, 4, 2, 3],
        "D": [1, 3, 5.0, 7, 1, 0],
        "E": [5, 3, 6, 9, 2, 4.0],
    }
)

df.dtypes.map(lambda x: x.name).to_dict()
# {'A': 'object', 'B': 'int64', 'C': 'int64', 'D': 'float64', 'E': 'float64'}

If you want to avoid lambda, then we can use operator.attrgetter

from operator import attrgetter

dtype_getter = attrgetter('name')
df.dtypes.map(dtype_getter).to_dict()
# {'A': 'object', 'B': 'int64', 'C': 'int64', 'D': 'float64', 'E': 'float64'}