I have a data set with 3 columns ID, vrddat, enddat and 21000 rows.
ID vrddat enddat
1 2015.01.01 2015.01.03
2 2015.03.01 2015.03.03
PS: Each ID can have multiple vrddat's and enddat's.
I need output as below:
ID vrddat enddat day
1 2015.01.01 2015.01.03 2015.01.01
1 2015.01.01 2015.01.03 2015.01.02
1 2015.01.01 2015.01.03 2015.01.03
2 2015.03.01 2015.03.03 2015.03.01
2 2015.03.01 2015.03.03 2015.03.02
2 2015.03.01 2015.03.03 2015.03.03
I used following code to get above output
for index,row in data.iterrows():
data_2 = pd.DataFrame(pd.date_range(row['vrddat'],row['enddat'], freq ='D'))
Using above code I get only 98 rows, but ideally output should contain way more rows than the input. Could any one suggest why I'm getting this kind of output. Is my code not iterating each and every row? How do I get ID, vrddat and enddat variables also in my output?
Please suggest.
You can use first cast
to_datetimeboth columnsvrddatandenddatand then useitertupleswithconcatfor creating new expandingDataFrame. Lastmerge, but is necessaryIDindfis unique.If
IDis not unique, is possible useuniqueindex for merging: