Timestep multiple items dataframe

73 Views Asked by At

I have such dataframe:

product1 product2 product3 product4 product5 time
straws orange melon chair bread 1
melon milk book coffee cake 2
bread bananas juice chair book 3
straws coffee cake milk orange 4

I need to have the time step per items.

Example :

TimesProduct1 TimesProduct2 TimesProduct3 Timesproduct4 Timesproduct5
0 0 0 0 0
1 0 0 0 0
2 0 0 2 1
3 2 2 2 3

df.diff() unfortunatly doesn't work in that way.

Thanks for your help.

1

There are 1 best solutions below

7
Riley On BEST ANSWER

setup

import pandas as pd

df = pd.DataFrame(
    {
        "product1":["straws", "melon", "bread", "straws"],
        "product2":["orange", "milk", "bananas", "coffee"],
        "product3":["melon", "book", "juice", "cake"],
        "product4":["chair", "coffee", "chair", "milk"],
        "product5":["bread", "cake", "book", "orange"],
        "time":[1,2,3,4],
    }
)

solution

result = (
    df.melt(id_vars="time")
    .groupby("value")
    .apply(lambda d: d.sort_values("time").eval("diff = time.diff()"))
    .pivot(index="time", columns="variable", values="diff")
    .fillna(0)
    .reset_index(drop=True)
)

solution 2 (for pandas bug workaround)

def make_diff_column(df):
    df = df.sort_values("time")
    df["diff"] = df["time"].diff()
    return df


result = (
    df.melt(id_vars="time")
    .groupby("value")
    .apply(make_diff_column)
    .pivot(index="time", columns="variable", values="diff")
    .fillna(0)
    .reset_index(drop=True)
)