TypeError occurs when plotting timezone-aware datetimes, loaded via pd.read_csv, with mpf.plot

96 Views Asked by At

I have stock data that I wanted to play with, and i figured I'd plot it with mpl. This is what I tried:

daily = pd.read_csv('data/AAPL/history.csv',index_col=0,parse_dates=True)
mpf.plot(daily)

This returns

TypeError: Expect data.index as DatetimeIndex

Then I tried this, following the docs

daily = pd.read_csv('data/AAPL/history.csv',index_col=0,parse_dates=True)
daily.index.name = 'Date'
mpf.plot(daily)

This gives the same error.

Then I tried setting the index:

daily = pd.read_csv('data/AAPL/history.csv',index_col=0,parse_dates=True)
daily.index = pd.DatetimeIndex(daily['Date'])
mpf.plot(daily)

With the error:

KeyError: 'Date'

Trying to do the same but without setting the index:

daily = pd.read_csv('data/AAPL/history.csv')
daily.index = pd.DatetimeIndex(daily['Date'])
mpf.plot(daily)

Returns:

TypeError: [datetime.datetime(1980, 12, 12, 0, 0, tzinfo=tzoffset(None, -18000))
 datetime.datetime(1980, 12, 15, 0, 0, tzinfo=tzoffset(None, -18000))
 datetime.datetime(1980, 12, 16, 0, 0, tzinfo=tzoffset(None, -18000)) ...
 datetime.datetime(2023, 11, 10, 0, 0, tzinfo=tzoffset(None, -18000))
 datetime.datetime(2023, 11, 13, 0, 0, tzinfo=tzoffset(None, -18000))
 datetime.datetime(2023, 11, 14, 0, 0, tzinfo=tzoffset(None, -18000))]

This leads me to believe that one of the rows is not formatted correctly. I got this data off yfinance and I'm just reading it in like you'd expect. But since the number of rows is so large (10,823), I'm a little confused on how to clean the dates, on how to find the bad row, if there is any.

Any help would be appreciated. I don't know if it's my code or if it's my data. I'm led to believe it's my data, but this is my first time messing with this stuff so I don't know.

1

There are 1 best solutions below

0
On
  • The issue can be resolved by formatting the 'Datetime' column after loading the csv.
    1. df = pd.read_csv('aapl.csv', index_col='Datetime')
    2. df.index = pd.to_datetime(df.index, utc=True)
    • Timestamp('2023-01-03 14:30:00+0000', tz='UTC') is the resulting format, which does work with mpf.plot(df)
  • When the data is downloaded from , the index values are as such:
    • Timestamp('2023-01-03 09:30:00-0500', tz='America/New_York'), which works with mpf.plot(df).
  • df = pd.read_csv('aapl.csv', index_col='Datetime', parse_dates=['Datetime']) results in:
    • Timestamp('2023-01-03 09:30:00-0500', tz='UTC-05:00'), which does not work with mpf.plot(df).
  • mpf.plot(df) seems to be particular about the timezone, tz, format.
  • Other options for dealing with timezones in pandas.
  • Tested in python v3.12.0, pandas v2.1.2, mplfinance v0.12.9b7, yfinance v0.2.31, matplotlib v3.8.1.

Load Data from yfinace and Plot - Works

  • Timestamp('2023-01-03 09:30:00-0500', tz='America/New_York')
import pandas as pd
import yfinance as yf  # conda install -c conda-forge yfinance
import mplfinance as mpf  # conda install -c conda-forge mplfinance

# download data from yfinance
tickers = ['aapl']  # this can contain multiple tickers
df = pd.concat((yf.download(ticker, start='2023-01-01', end='2023-11-14', interval='1h').assign(tkr=ticker) for ticker in tickers), ignore_index=False)

# save the dataframe
df.to_csv('aapl.csv')

# plotting works
mpf.plot(df, type='line')

Load csv and Parse Dates - Plotting Doesn't Work

  • Timestamp('2023-01-03 09:30:00-0500', tz='UTC-05:00')
  • Results in TypeError: Expect data.index as DatetimeIndex
# load the dataframe and parse dates
df = pd.read_csv('aapl.csv', index_col='Datetime', parse_dates=['Datetime'])

# plotting does not work
mpf.plot(df, type='line')

Load csv then Parse Dates with utc=True and Plot - Works

  • Timestamp('2023-01-03 14:30:00+0000', tz='UTC')
# load the file, but don't parse dates
df = pd.read_csv('aapl.csv', index_col='Datetime'

# set datetime format as utc
df.index = pd.to_datetime(df.index, utc=True)

# plotting works
mpf.plot(df, type='line')

enter image description here


  • Note that using mpf.plot() sets a particular style with matplotlib rcParams, which doesn't revert until the notebook environment has changed.
  • Plotting also works by directly using pandas.DataFrame.plot, which uses matplotlib as the default backend.
    • However, the other mplfinance features can't be used.
# load csv and parse the dates
df = pd.read_csv('aapl.csv', index_col='Datetime', parse_dates=['Datetime'])

# plotting with pandas works with this format, while it doesn't with mpf.plot
ax = df.plot(y='Close')
  • Note the plot style is like mpf.plot

enter image description here

  • Standard pandas.DataFrame.plot format

enter image description here