I'm having an issue changing a pandas DataFrame index to a datetime from an integer. I want to do it so that I can call reindex and fill in the dates between those listed in the table. Note that I have to use pandas 0.7.3 at the moment because I'm also using qstk, and qstk relies on pandas 0.7.3
First, here's my layout:
(Pdb) df
AAPL GOOG IBM XOM date
1 0 0 4000 0 2011-01-13 16:00:00
2 0 1000 4000 0 2011-01-26 16:00:00
3 0 1000 4000 0 2011-02-02 16:00:00
4 0 1000 4000 4000 2011-02-10 16:00:00
6 0 0 1800 4000 2011-03-03 16:00:00
7 0 0 3300 4000 2011-06-03 16:00:00
8 0 0 0 4000 2011-05-03 16:00:00
9 1200 0 0 4000 2011-06-10 16:00:00
11 1200 0 0 4000 2011-08-01 16:00:00
12 0 0 0 4000 2011-12-20 16:00:00
(Pdb) type(df['date'])
<class 'pandas.core.series.Series'>
(Pdb) df2 = DataFrame(index=df['date'])
(Pdb) df2
Empty DataFrame
Columns: array([], dtype=object)
Index: array([2011-01-13 16:00:00, 2011-01-26 16:00:00, 2011-02-02 16:00:00,
2011-02-10 16:00:00, 2011-03-03 16:00:00, 2011-06-03 16:00:00,
2011-05-03 16:00:00, 2011-06-10 16:00:00, 2011-08-01 16:00:00,
2011-12-20 16:00:00], dtype=object)
(Pdb) df2.merge(df,left_index=True,right_on='date')
AAPL GOOG IBM XOM date
1 0 0 4000 0 2011-01-13 16:00:00
2 0 1000 4000 0 2011-01-26 16:00:00
3 0 1000 4000 0 2011-02-02 16:00:00
4 0 1000 4000 4000 2011-02-10 16:00:00
6 0 0 1800 4000 2011-03-03 16:00:00
8 0 0 0 4000 2011-05-03 16:00:00
7 0 0 3300 4000 2011-06-03 16:00:00
9 1200 0 0 4000 2011-06-10 16:00:00
11 1200 0 0 4000 2011-08-01 16:00:00
12 0 0 0 4000 2011-12-20 16:00:00
I have tried multiple things to get a datetime index:
1.) Using the reindex() method with a list of datetime values. This creates a datetime index, but then fills in NaNs for the data in the DataFrame. I'm guessing that this is because the original values are tied to the integer index and reindexing to datetime tries to fill the new indices with default values (NaNs if no fill method is indicated). Thusly:
(Pdb) df.reindex(index=df['date'])
AAPL GOOG IBM XOM date
date
2011-01-13 16:00:00 NaN NaN NaN NaN NaN
2011-01-26 16:00:00 NaN NaN NaN NaN NaN
2011-02-02 16:00:00 NaN NaN NaN NaN NaN
2011-02-10 16:00:00 NaN NaN NaN NaN NaN
2011-03-03 16:00:00 NaN NaN NaN NaN NaN
2011-06-03 16:00:00 NaN NaN NaN NaN NaN
2011-05-03 16:00:00 NaN NaN NaN NaN NaN
2011-06-10 16:00:00 NaN NaN NaN NaN NaN
2011-08-01 16:00:00 NaN NaN NaN NaN NaN
2011-12-20 16:00:00 NaN NaN NaN NaN NaN
2.) Using DataFrame.merge with my original df and a second dataframe, df2, that is basically just a datetime index with nothing else. So I end up doing something like:
(pdb) df2.merge(df,left_index=True,right_on='date')
AAPL GOOG IBM XOM date
1 0 0 4000 0 2011-01-13 16:00:00
2 0 1000 4000 0 2011-01-26 16:00:00
3 0 1000 4000 0 2011-02-02 16:00:00
4 0 1000 4000 4000 2011-02-10 16:00:00
6 0 0 1800 4000 2011-03-03 16:00:00
8 0 0 0 4000 2011-05-03 16:00:00
7 0 0 3300 4000 2011-06-03 16:00:00
9 1200 0 0 4000 2011-06-10 16:00:00
11 1200 0 0 4000 2011-08-01 16:00:00
(and vice-versa). But I always end up with this kind of thing, with integer indices.
3.) Starting with an empty DataFrame with a datetime index (created from the 'date' field of df) and a bunch of empty columns. Then I attempt to assign each column by setting the columns with the same names to be equal to the columns from df:
(Pdb) df2['GOOG']=0
(Pdb) df2
GOOG
date
2011-01-13 16:00:00 0
2011-01-26 16:00:00 0
2011-02-02 16:00:00 0
2011-02-10 16:00:00 0
2011-03-03 16:00:00 0
2011-06-03 16:00:00 0
2011-05-03 16:00:00 0
2011-06-10 16:00:00 0
2011-08-01 16:00:00 0
2011-12-20 16:00:00 0
(Pdb) df2['GOOG'] = df['GOOG']
(Pdb) df2
GOOG
date
2011-01-13 16:00:00 NaN
2011-01-26 16:00:00 NaN
2011-02-02 16:00:00 NaN
2011-02-10 16:00:00 NaN
2011-03-03 16:00:00 NaN
2011-06-03 16:00:00 NaN
2011-05-03 16:00:00 NaN
2011-06-10 16:00:00 NaN
2011-08-01 16:00:00 NaN
2011-12-20 16:00:00 NaN
So, how in pandas 0.7.3 do I get df to be re-created with an datetime index instead of the integer index? What am I missing?
I think you are looking for
set_index
: