Using dateutil I created function which check if data from csv is date and if not remove it from dataframe, but I have problem with speed, similar checking for string or int takes 1-2 seconds for hundreds of thousands rows, but this takes more than 1 minute for date checking. I need some tips, how to speed up this, maybe there is some function from pandas library
df = pd.read_csv(filename, delimiter='|', dtype=str)
for i, ColumnValueDate in enumerate(df[column]):
try:
df.loc[i, column[0]] = parse(str(ColumnValueDate)).strftime("%Y-%m-%d")
except ParserError as ex:
dataframeIndexToDelete.append(i)
print(ex)
df = df.drop(dataframeIndexToDelete)
You could just use
to_datetime
on the column, witherrors='coerce'
and then remove all values which areNaT
(not a time):If you don't want to modify the dataframe, use a temporary Series instead:
Update
Based on comments,
df[column]
could haveNaN
or empty (''
) values which should not be excluded. This requires making a second mask to ensure they are in the output even though the date conversion will fail: