Convert String With Comma To Number Using Python Pandas

5k Views Asked by At

I am generating a pivot table report using the pandas Python module. The source data includes a lot of readings measured in milliseconds. If the number of milliseconds exceeds 999 then the value in that CSV file will include commas (e.g. 1,234 = 1.234 seconds).

Here's how I'm trying to run the report:

import pandas as pd
import numpy as np

pool_usage = pd.read_csv("c:/foo/ds-dump.csv")

# Add a column to the end that shows you where the data came from
pool_usage["Source File"] = "ds-dump.csv"

report = pool_usage.pivot_table(values=['Average Pool Size', 'Average Usage Time (ms)'], index=['Source File'], aggfunc=np.max)

print(report)

The problem is that the dtype for the Average Usage Time (ms) is an object so the np.max function just treats it like it's NaN. I therefore never see any values greater than 999.

I tried fixing the issue like this:

import pandas as pd
import numpy as np

pool_usage = pd.read_csv("c:/foo/ds-dump.csv")

# Add a column to the end that shows you where the data came from
pool_usage["Source File"] = "ds-dump.csv"

# Convert strings to numbers if possible
pool_usage = pool_usage.convert_objects(convert_numeric=True)

report = pool_usage.pivot_table(values=['Average Pool Size', 'Average Usage Time (ms)'], index=['Source File'], aggfunc=np.max)

print(report)

This did actually change the dtype of the Average Usage Time column to a float but all of the values that are greater than 999 are still treated like NaN's.

How can I convert the Average Usage Time column to a float even though it's possible that some of the values may include commas?

1

There are 1 best solutions below

0
On BEST ANSWER

The read_csv function takes an optional thousands argument. Its default is None so you can change it to "," to have it recognise 1,234 as 1234 when it reads the file:

pd.read_csv("c:/foo/ds-dump.csv", thousands=",")

The column holding the millisecond values should then have the int64 datatype once the file has been read into memory.