dataframe: how to get columns of Period objects (calendar+fiscal year and month)from DatetimeIndex?

791 Views Asked by At

I have a dataframe and these are the first 5 index, there are several rows with different datapoint for a date and then it goes to the next day

DatetimeIndex(['2014-01-01', '2014-01-01', '2014-01-01', '2014-01-01',
               '2014-01-01'],
              dtype='datetime64[ns]', name='DayStartedOn', freq=None)

and this is the current column dtypes

country                  object
type                     object
name                     object
injection               float64
withdrawal              float64
cy_month              period[M]

I wish to add a column with calendar year month, and 2 columns with different fiscal years and months. better to separate year and month in different columns like: calendar year, calendar month, fiscal year, fiscal month. The objective is to keep these column values when I perform regroup or resample with other columns

I achieved above cy_month by

df['cy_month']=df.index.to_period('M')

even I don't feel very comfortable with this, as I want the period, not the monthend

I tried to add these 2 columns for calendar year:

pd.Period(df_storage_clean.index.year, freq='A-DEC')  

for another fiscal year:

pd.Period(df_storage_clean.index.year, freq='A-SEP') 

but had Traceback:

ValueError: Value must be Period, string, integer, or datetime

So I started to NOT using pandas by loop row by row and add to a list,

lst_period_cy=[]
for y in lst_cy:
    period_cy=pd.Period(y, freq='A-DEC')
    lst_period_cy.append(period_cy)

then convert the list to a Series or df and add it back to the df

but I suppose it's not efficient (150k rows data) so haven't continued

1

There are 1 best solutions below

1
On

Just in case you haven't found a solution yet ...

You could do the following:

df.reset_index(drop=False, inplace=True)
df['cal_year_month'] = df.DayStartedOn.dt.month
df['cal_year'] = df.DayStartedOn.dt.year
df['fisc_year'] = df.DayStartedOn.apply(pd.Period, freq='A-SEP')
df.set_index('DayStartedOn', drop=True, inplace=True)

My assumption is that, as in your example, the index is named DayStartedOn. If that's not the case then the code has to be adjusted accordingly.