Having an issue changing index from integer to date in pandas

1.6k Views Asked by At

I'm having an issue changing a pandas DataFrame index to a datetime from an integer. I want to do it so that I can call reindex and fill in the dates between those listed in the table. Note that I have to use pandas 0.7.3 at the moment because I'm also using qstk, and qstk relies on pandas 0.7.3

First, here's my layout:

(Pdb) df
    AAPL  GOOG   IBM   XOM                 date
1      0     0  4000     0  2011-01-13 16:00:00
2      0  1000  4000     0  2011-01-26 16:00:00
3      0  1000  4000     0  2011-02-02 16:00:00
4      0  1000  4000  4000  2011-02-10 16:00:00
6      0     0  1800  4000  2011-03-03 16:00:00
7      0     0  3300  4000  2011-06-03 16:00:00
8      0     0     0  4000  2011-05-03 16:00:00
9   1200     0     0  4000  2011-06-10 16:00:00
11  1200     0     0  4000  2011-08-01 16:00:00
12     0     0     0  4000  2011-12-20 16:00:00

(Pdb) type(df['date'])
<class 'pandas.core.series.Series'>

(Pdb) df2 = DataFrame(index=df['date'])
(Pdb) df2
Empty DataFrame
Columns: array([], dtype=object)
Index: array([2011-01-13 16:00:00, 2011-01-26 16:00:00, 2011-02-02 16:00:00,
       2011-02-10 16:00:00, 2011-03-03 16:00:00, 2011-06-03 16:00:00,
       2011-05-03 16:00:00, 2011-06-10 16:00:00, 2011-08-01 16:00:00,
       2011-12-20 16:00:00], dtype=object)

(Pdb) df2.merge(df,left_index=True,right_on='date')
    AAPL  GOOG   IBM   XOM                 date
1      0     0  4000     0  2011-01-13 16:00:00
2      0  1000  4000     0  2011-01-26 16:00:00
3      0  1000  4000     0  2011-02-02 16:00:00
4      0  1000  4000  4000  2011-02-10 16:00:00
6      0     0  1800  4000  2011-03-03 16:00:00
8      0     0     0  4000  2011-05-03 16:00:00
7      0     0  3300  4000  2011-06-03 16:00:00
9   1200     0     0  4000  2011-06-10 16:00:00
11  1200     0     0  4000  2011-08-01 16:00:00
12     0     0     0  4000  2011-12-20 16:00:00

I have tried multiple things to get a datetime index:

1.) Using the reindex() method with a list of datetime values. This creates a datetime index, but then fills in NaNs for the data in the DataFrame. I'm guessing that this is because the original values are tied to the integer index and reindexing to datetime tries to fill the new indices with default values (NaNs if no fill method is indicated). Thusly:

(Pdb) df.reindex(index=df['date'])
                     AAPL  GOOG  IBM  XOM date
date                                          
2011-01-13 16:00:00   NaN   NaN  NaN  NaN  NaN
2011-01-26 16:00:00   NaN   NaN  NaN  NaN  NaN
2011-02-02 16:00:00   NaN   NaN  NaN  NaN  NaN
2011-02-10 16:00:00   NaN   NaN  NaN  NaN  NaN
2011-03-03 16:00:00   NaN   NaN  NaN  NaN  NaN
2011-06-03 16:00:00   NaN   NaN  NaN  NaN  NaN
2011-05-03 16:00:00   NaN   NaN  NaN  NaN  NaN
2011-06-10 16:00:00   NaN   NaN  NaN  NaN  NaN
2011-08-01 16:00:00   NaN   NaN  NaN  NaN  NaN
2011-12-20 16:00:00   NaN   NaN  NaN  NaN  NaN

2.) Using DataFrame.merge with my original df and a second dataframe, df2, that is basically just a datetime index with nothing else. So I end up doing something like:

(pdb) df2.merge(df,left_index=True,right_on='date')
    AAPL  GOOG   IBM   XOM                 date
1      0     0  4000     0  2011-01-13 16:00:00
2      0  1000  4000     0  2011-01-26 16:00:00
3      0  1000  4000     0  2011-02-02 16:00:00
4      0  1000  4000  4000  2011-02-10 16:00:00
6      0     0  1800  4000  2011-03-03 16:00:00
8      0     0     0  4000  2011-05-03 16:00:00
7      0     0  3300  4000  2011-06-03 16:00:00
9   1200     0     0  4000  2011-06-10 16:00:00
11  1200     0     0  4000  2011-08-01 16:00:00

(and vice-versa). But I always end up with this kind of thing, with integer indices.

3.) Starting with an empty DataFrame with a datetime index (created from the 'date' field of df) and a bunch of empty columns. Then I attempt to assign each column by setting the columns with the same names to be equal to the columns from df:

(Pdb) df2['GOOG']=0
(Pdb) df2
                     GOOG
date                     
2011-01-13 16:00:00     0
2011-01-26 16:00:00     0
2011-02-02 16:00:00     0
2011-02-10 16:00:00     0
2011-03-03 16:00:00     0
2011-06-03 16:00:00     0
2011-05-03 16:00:00     0
2011-06-10 16:00:00     0
2011-08-01 16:00:00     0
2011-12-20 16:00:00     0
(Pdb) df2['GOOG'] = df['GOOG']
(Pdb) df2
                     GOOG
date                     
2011-01-13 16:00:00   NaN
2011-01-26 16:00:00   NaN
2011-02-02 16:00:00   NaN
2011-02-10 16:00:00   NaN
2011-03-03 16:00:00   NaN
2011-06-03 16:00:00   NaN
2011-05-03 16:00:00   NaN
2011-06-10 16:00:00   NaN
2011-08-01 16:00:00   NaN
2011-12-20 16:00:00   NaN

So, how in pandas 0.7.3 do I get df to be re-created with an datetime index instead of the integer index? What am I missing?

1

There are 1 best solutions below

1
On BEST ANSWER

I think you are looking for set_index:

In [11]: df.set_index('date')
Out[11]: 
                     AAPL  GOOG   IBM   XOM
date                                  
2011-01-13 16:00:00     0     0  4000     0
2011-01-26 16:00:00     0  1000  4000     0
2011-02-02 16:00:00     0  1000  4000     0
2011-02-10 16:00:00     0  1000  4000  4000
2011-03-03 16:00:00     0     0  1800  4000
2011-06-03 16:00:00     0     0  3300  4000
2011-05-03 16:00:00     0     0     0  4000
2011-06-10 16:00:00  1200     0     0  4000
2011-08-01 16:00:00  1200     0     0  4000
2011-12-20 16:00:00     0     0     0  4000