I have a data set with 3 columns ID
, vrddat
, enddat
and 21000 rows.
ID vrddat enddat
1 2015.01.01 2015.01.03
2 2015.03.01 2015.03.03
PS: Each ID can have multiple vrddat's and enddat's.
I need output as below:
ID vrddat enddat day
1 2015.01.01 2015.01.03 2015.01.01
1 2015.01.01 2015.01.03 2015.01.02
1 2015.01.01 2015.01.03 2015.01.03
2 2015.03.01 2015.03.03 2015.03.01
2 2015.03.01 2015.03.03 2015.03.02
2 2015.03.01 2015.03.03 2015.03.03
I used following code to get above output
for index,row in data.iterrows():
data_2 = pd.DataFrame(pd.date_range(row['vrddat'],row['enddat'], freq ='D'))
Using above code I get only 98 rows, but ideally output should contain way more rows than the input. Could any one suggest why I'm getting this kind of output. Is my code not iterating each and every row? How do I get ID
, vrddat
and enddat
variables also in my output?
Please suggest.
You can use first cast
to_datetime
both columnsvrddat
andenddat
and then useitertuples
withconcat
for creating new expandingDataFrame
. Lastmerge
, but is necessaryID
indf
is unique.If
ID
is not unique, is possible useunique
index for merging: