pandas :pd.date_range output error

240 Views Asked by At

I have a data set with 3 columns ID, vrddat, enddat and 21000 rows.

ID       vrddat      enddat
1       2015.01.01  2015.01.03    
2       2015.03.01  2015.03.03 

PS: Each ID can have multiple vrddat's and enddat's.

I need output as below:

ID      vrddat      enddat       day
1    2015.01.01   2015.01.03  2015.01.01
1    2015.01.01   2015.01.03  2015.01.02
1    2015.01.01   2015.01.03  2015.01.03
2    2015.03.01   2015.03.03  2015.03.01
2    2015.03.01   2015.03.03  2015.03.02
2    2015.03.01   2015.03.03  2015.03.03

I used following code to get above output

for index,row in data.iterrows():
    data_2 = pd.DataFrame(pd.date_range(row['vrddat'],row['enddat'], freq ='D'))

Using above code I get only 98 rows, but ideally output should contain way more rows than the input. Could any one suggest why I'm getting this kind of output. Is my code not iterating each and every row? How do I get ID, vrddat and enddat variables also in my output?

Please suggest.

1

There are 1 best solutions below

2
On BEST ANSWER

You can use first cast to_datetime both columns vrddat and enddat and then use itertuples with concat for creating new expanding DataFrame. Last merge, but is necessary ID in df is unique.

df.vrddat = pd.to_datetime(df.vrddat)
df.enddat = pd.to_datetime(df.enddat)

df1 = pd.concat([pd.Series(r.ID,pd.date_range(r.vrddat, r.enddat)) for r in df.itertuples()])
        .reset_index()
df1.columns = ['day','ID']
print (df1)
         day  ID
0 2015-01-01   1
1 2015-01-02   1
2 2015-01-03   1
3 2015-03-01   2
4 2015-03-02   2
5 2015-03-03   2

print (pd.merge(df,df1, on='ID', how='left'))
   ID     vrddat     enddat        day
0   1 2015-01-01 2015-01-03 2015-01-01
1   1 2015-01-01 2015-01-03 2015-01-02
2   1 2015-01-01 2015-01-03 2015-01-03
3   2 2015-03-01 2015-03-03 2015-03-01
4   2 2015-03-01 2015-03-03 2015-03-02
5   2 2015-03-01 2015-03-03 2015-03-03

If ID is not unique, is possible use unique index for merging:

df.vrddat = pd.to_datetime(df.vrddat)
df.enddat = pd.to_datetime(df.enddat)
df = df.reset_index()

df1=pd.concat([pd.Series(r.index,pd.date_range(r.vrddat,r.enddat)) for r in df.itertuples()])
      .reset_index()
df1.columns = ['day','index']
print (df1)
         day  index
0 2015-01-01      0
1 2015-01-02      0
2 2015-01-03      0
3 2015-03-01      1
4 2015-03-02      1
5 2015-03-03      1

print (pd.merge(df,df1, on='index', how='left').drop('index', axis=1))
   ID     vrddat     enddat        day
0   1 2015-01-01 2015-01-03 2015-01-01
1   1 2015-01-01 2015-01-03 2015-01-02
2   1 2015-01-01 2015-01-03 2015-01-03
3   2 2015-03-01 2015-03-03 2015-03-01
4   2 2015-03-01 2015-03-03 2015-03-02
5   2 2015-03-01 2015-03-03 2015-03-03