Python - possible datetime value issue upon organizing data for dashboard creation/defining KPI metrics

20 Views Asked by At

I keep receiving the following error upon trying to run my python code in visual code studio: It seems that there must be some issue with the data type/format of the data I try to convert:

An extraction from the data I work with:

enter image description here

The data types of the columns (file):enter image description here

I thought the conversion I'm trying to make is correct but I keep getting the same error over and over again, somehow the date time format conversion will not be carried out

ERROR Message:

Traceback (most recent call last):
  File "c:\Users\ZZ01TE740\Documents\Codecool\superstoreproba\superstore4.py", line 44, in <module>
    df_original, grp_years_sales, grp_year_profit, grp_year_orders = load_data()
                                                                     ^^^^^^^^^^^
  File "c:\Users\ZZ01TE740\Documents\Codecool\superstoreproba\superstore4.py", line 28, in load_data
    df['Ship Date'] = pd.to_datetime(df['Ship Date'], format='%d/%m/%Y %H:%M:%S')
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ZZ01TE740\anaconda3\Lib\site-packages\pandas\core\tools\datetimes.py", line 1046, in to_datetime
    cache_array = _maybe_cache(arg, format, cache, convert_listlike)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ZZ01TE740\anaconda3\Lib\site-packages\pandas\core\tools\datetimes.py", line 250, in _maybe_cache
    cache_dates = convert_listlike(unique_dates, format)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ZZ01TE740\anaconda3\Lib\site-packages\pandas\core\tools\datetimes.py", line 453, in _convert_listlike_datetimes
    return _array_strptime_with_fallback(arg, name, utc, format, exact, errors)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ZZ01TE740\anaconda3\Lib\site-packages\pandas\core\tools\datetimes.py", line 484, in _array_strptime_with_fallback
    result, timezones = array_strptime(arg, fmt, exact=exact, errors=errors, utc=utc)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pandas\_libs\tslibs\strptime.pyx", line 530, in pandas._libs.tslibs.strptime.array_strptime
  File "pandas\_libs\tslibs\strptime.pyx", line 351, in pandas._libs.tslibs.strptime.array_strptime
ValueError: time data "11/11/2020" doesn't match format "%d/%m/%Y %H:%M:%S", at position 0. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this

.

**MY CODE:

import streamlit as st # streamlit package
import numpy as np
import pandas as pd
from millify import millify # shortens values (10_000 ---> 10k)
from streamlit_extras.metric_cards import style_metric_cards # beautify metric card with css
import plotly.graph_objects as go
import altair as alt 

# this function get the % change for any column by year and the specified aggregate
def get_per_year_change(col,df,metric):
    # Group by years and calculate the specified metric
    grp_years = df.groupby('year')[col].agg([metric])[metric]
    # Calculate the % change
    grp_years = grp_years.pct_change() * 100
    grp_years.fillna(0, inplace=True)
    grp_years = grp_years.apply(lambda x: f"{x:.1f}%" if pd.notnull(x) else 'NaN')

    return grp_years

def load_data():
    df = pd.read_csv('data.csv')

    # Convert 'Order ID' to string
    df['Order ID'] = df['Order ID'].astype(str)

    # Convert 'Order Date' and 'Ship Date' to datetime with the specified format
    df['Order Date'] = pd.to_datetime(df['Order Date'], format='%d/%m/%Y %H:%M:%S')
    df['Ship Date'] = pd.to_datetime(df['Ship Date'], format='%d/%m/%Y %H:%M:%S')

    # Extract the year and store it as a new column
    df['year'] = df['Order Date'].dt.year

    # Calculate the difference between Shipped date and order date
    df['days to ship'] = (df['Ship Date'] - df['Order Date']).dt.days.abs()

    # Calculate the % change of sales, profit and orders over the years
    grp_years_sales = get_per_year_change('Sales', df, 'sum')
    grp_year_profit = get_per_year_change('Profit', df, 'sum')
    grp_year_orders = get_per_year_change('Order ID', df, 'count')

    return df, grp_years_sales, grp_year_profit, grp_year_orders

# load cached data
df_original, grp_years_sales, grp_year_profit, grp_year_orders = load_data()

I tried to convert the Order date / Ship date and Order ID of the data set in order to have them displayed on a dashboard as a background calculation of KPI metrics for a project I try to make.

As the data types of the columns are object I wanted to change them to datetime though regardless of the code I provided (pd.to_datetime) I still end up getting error messages.

0

There are 0 best solutions below