I gave up after trying to separate a dataframe into different rows depending the days that the period takes place.
This is the dataframe and, as we can see, all rows start in a different day that they ends. 'Tiempo_ini' es when starts, 'Tiempo_fin' when it ends and 'Tiempo_diff' the difference between the previous both.
| Tiempo_ini | Valor | Tiempo_diff | Tiempo_fin | Cambio |
|---|---|---|---|---|
| 2018-01-02 21:51:47 | 1 | 0 days 12:46:30 | 2018-01-03 10:38:17 | True |
| 2018-01-03 22:11:17 | 1 | 0 days 07:46:00 | 2018-01-04 05:57:17 | True |
| 2018-01-04 22:22:17 | 1 | 0 days 08:38:00 | 2018-01-05 07:00:17 | True |
| 2018-01-05 07:01:17 | 1 | 1 days 13:01:30 | 2018-01-06 20:02:47 | True |
| 2018-01-06 20:30:17 | 1 | 0 days 04:27:00 | 2018-01-07 00:57:17 | True |
I would like to generate as many additional rows as required depending on the period specified in 'Tiempo_diff'. And being able to have in each line, the corresponding interval and 'Tiempo_diff' that belongs to its day.
first row should divide into two:
| Tiempo_ini | Valor | Tiempo_fin | Tiempo_diff | Cambio |
|---|---|---|---|---|
| 2018-01-02 21:51:47 | 1 | 2018-01-03 00:00:00 | 0 days 02:08:13 | True |
| 2018-01-03 00:00:00 | 1 | 2018-01-03 10:38:17 | 0 days 10:38:17 | True |
I tried with this code but, the problem I had is that when there is the second iteration, it modifies the values and I lose the changes.
I was trying with this code:
nuevos_registros = []
for index, row in df.iterrows():
if row['Cambio'] == True:
tiempo_ini = row['Tiempo_ini']
tiempo_diff = row['Tiempo_diff']
tiempo_fin = row['Tiempo_fin']
while tiempo_diff > timedelta(seconds=0):
tiempo_fin_original = row['Tiempo_fin']
tiempo_diff_original = tiempo_diff
tiempo_diff = ((tiempo_ini + timedelta(days=1)).replace(hour=0, minute=0, second=0) - tiempo_ini)
nuevo_registro = {
"Tiempo_ini": tiempo_ini,
"Valor": row["Valor"],
"Tiempo_fin": (tiempo_ini + timedelta(days=1)).replace(hour=0, minute=0, second=1),
"Tiempo_diff": ((tiempo_ini + timedelta(days=1)).replace(hour=0, minute=0, second=0) - tiempo_ini),
"Cambio": True
}
nuevos_registros.append(nuevo_registro)
tiempo_diff = tiempo_diff_original - tiempo_diff
tiempo_ini = (tiempo_ini + timedelta(days=1)).replace(hour=0, minute=0, second=0)
tiempo_fin = tiempo_fin_original
else:
nuevo_registro = {
"Tiempo_ini": row['Tiempo_ini'],
"Valor": row["Valor"],
"Tiempo_fin": row["Tiempo_fin"],
"Tiempo_diff": row["Tiempo_diff"],
"Cambio": False
}
nuevos_registros.append(nuevo_registro)
df_mod = pd.DataFrame(nuevos_registros)
df_mod[df_mod['Cambio'] == True].head()
Which gives it to me:
| Tiempo_ini | Valor | Tiempo_fin | Tiempo_diff | Cambio |
|---|---|---|---|---|
| 2018-01-02 21:51:47 | 1 | 2018-01-03 00:00:01 | 0 days 02:08:13 | True |
| 2018-01-03 00:00:00 | 1 | 2018-01-04 00:00:01 | 1 days 00:00:00 | True |
| 2018-01-03 22:11:17 | 1 | 2018-01-04 00:00:01 | 0 days 01:48:43 | True |
| 2018-01-04 00:00:00 | 1 | 2018-01-05 00:00:01 | 1 days 00:00:00 | True |
| 2018-01-04 22:22:17 | 1 | 2018-01-05 00:00:01 | 0 days 01:37:43 | True |
I hope to have been clear. I am sorry for the inconvenience! I have been trying for long and I identified the problem but I do not know how to solve it.
Thank you very much in advance ^^
finally, I found the right path which was also easier:
First of all, divide with as many rows as days.
Determine the differents options:
Get the right dates for each row
Obtain the corresponding charge for each day.
Use groupby to see the charge for each day:
Perhaps there is an easier way. Thank you for your time.