I keep receiving the following error upon trying to run my python code in visual code studio: It seems that there must be some issue with the data type/format of the data I try to convert:
An extraction from the data I work with:
The data types of the columns (file):enter image description here
I thought the conversion I'm trying to make is correct but I keep getting the same error over and over again, somehow the date time format conversion will not be carried out
ERROR Message:
Traceback (most recent call last):
File "c:\Users\ZZ01TE740\Documents\Codecool\superstoreproba\superstore4.py", line 44, in <module>
df_original, grp_years_sales, grp_year_profit, grp_year_orders = load_data()
^^^^^^^^^^^
File "c:\Users\ZZ01TE740\Documents\Codecool\superstoreproba\superstore4.py", line 28, in load_data
df['Ship Date'] = pd.to_datetime(df['Ship Date'], format='%d/%m/%Y %H:%M:%S')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ZZ01TE740\anaconda3\Lib\site-packages\pandas\core\tools\datetimes.py", line 1046, in to_datetime
cache_array = _maybe_cache(arg, format, cache, convert_listlike)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ZZ01TE740\anaconda3\Lib\site-packages\pandas\core\tools\datetimes.py", line 250, in _maybe_cache
cache_dates = convert_listlike(unique_dates, format)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ZZ01TE740\anaconda3\Lib\site-packages\pandas\core\tools\datetimes.py", line 453, in _convert_listlike_datetimes
return _array_strptime_with_fallback(arg, name, utc, format, exact, errors)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ZZ01TE740\anaconda3\Lib\site-packages\pandas\core\tools\datetimes.py", line 484, in _array_strptime_with_fallback
result, timezones = array_strptime(arg, fmt, exact=exact, errors=errors, utc=utc)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pandas\_libs\tslibs\strptime.pyx", line 530, in pandas._libs.tslibs.strptime.array_strptime
File "pandas\_libs\tslibs\strptime.pyx", line 351, in pandas._libs.tslibs.strptime.array_strptime
ValueError: time data "11/11/2020" doesn't match format "%d/%m/%Y %H:%M:%S", at position 0. You might want to try:
- passing `format` if your strings have a consistent format;
- passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
- passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this
.
**MY CODE:
import streamlit as st # streamlit package
import numpy as np
import pandas as pd
from millify import millify # shortens values (10_000 ---> 10k)
from streamlit_extras.metric_cards import style_metric_cards # beautify metric card with css
import plotly.graph_objects as go
import altair as alt
# this function get the % change for any column by year and the specified aggregate
def get_per_year_change(col,df,metric):
# Group by years and calculate the specified metric
grp_years = df.groupby('year')[col].agg([metric])[metric]
# Calculate the % change
grp_years = grp_years.pct_change() * 100
grp_years.fillna(0, inplace=True)
grp_years = grp_years.apply(lambda x: f"{x:.1f}%" if pd.notnull(x) else 'NaN')
return grp_years
def load_data():
df = pd.read_csv('data.csv')
# Convert 'Order ID' to string
df['Order ID'] = df['Order ID'].astype(str)
# Convert 'Order Date' and 'Ship Date' to datetime with the specified format
df['Order Date'] = pd.to_datetime(df['Order Date'], format='%d/%m/%Y %H:%M:%S')
df['Ship Date'] = pd.to_datetime(df['Ship Date'], format='%d/%m/%Y %H:%M:%S')
# Extract the year and store it as a new column
df['year'] = df['Order Date'].dt.year
# Calculate the difference between Shipped date and order date
df['days to ship'] = (df['Ship Date'] - df['Order Date']).dt.days.abs()
# Calculate the % change of sales, profit and orders over the years
grp_years_sales = get_per_year_change('Sales', df, 'sum')
grp_year_profit = get_per_year_change('Profit', df, 'sum')
grp_year_orders = get_per_year_change('Order ID', df, 'count')
return df, grp_years_sales, grp_year_profit, grp_year_orders
# load cached data
df_original, grp_years_sales, grp_year_profit, grp_year_orders = load_data()
I tried to convert the Order date / Ship date and Order ID of the data set in order to have them displayed on a dashboard as a background calculation of KPI metrics for a project I try to make.
As the data types of the columns are object I wanted to change them to datetime though regardless of the code I provided (pd.to_datetime) I still end up getting error messages.