How to change the view of a dataframe filled with numpy array to better describe a period

42 Views Asked by At

I work in the context of the supply chain. I'm trying to set deliveries. The date of receipt and the quantity delivered of the deliveries can be stochastic (advance, delay in relation to the fixed date and breakage or error in entering the order). To model this stochasticity I simulate several scenarii for example in the first scenario, the order arrives 40 units of item_1 and 45 of item_2 at time 1 (today = 0), in the second scenario 45 units arrive of item_1 and 39 of item_2 at time 2, etc. There can be like hundred scenarii.

All is represented with a dataframe like this one :

item_id reception_date   ordered_qty
0  item_1      [1, 2, 1]  [40, 45, 39]
1  item_2      [1, 2, 1]  [45, 39, 42]

i° coord == i° scenario

I have a fixed horizon (for exemple 3 days). I want to give a more explicite view of the deliveries through horizon. I want to obtain a new dataframe like this one :

item_id      period_date    ordered_qty
0  item_1            0      [0, 0, 0]
1  item_1            1      [40, 0, 39]
2  item_1            2      [0, 45, 0]
3  item_2            0      [0, 0, 0]
4  item_2            1      [45, 0, 42]
5  item_2            2      [0, 39, 0]

In this dataframe, i° scenario is describe by the i° column of the matrix formed by all the scenarii ; we can see the quantity and the reception day looking at the corresponding day in the column period_date.

I've made several tries (merges, concate, for loop, etc.), the best I've obtain is bellow :

import pandas as pd
import numpy as np

trace_size = 3  #number of scenarii
df_initial = pd.DataFrame({
    'item_id': ['item_1', 'item_2'],
    'reception_date': [[1, 2, 1], [1, 2, 1]],
    'ordered_qty': [[40, 45, 39], [45, 39, 42]]
})

df = pd.DataFrame(columns=['item_id', 'reception_date', 'ordered_qty'])
for z in range(trace_size):
    df1 = df_initial.copy()
    df1['reception_date']=df_initial['reception_date'].apply(lambda x: x[z])
    df1['ordered_qty']=df_initial['ordered_qty'].apply(lambda x: x[z])
    df = pd.concat([df, df1])
df = df.groupby(['item_id', 'reception_date'], as_index=False).agg({'ordered_qty': list})


dyn = pd.DataFrame({'item_id': np.repeat(df_initial['item_id'].values, 3),
                    'period_date': np.tile(np.arange(0, 3), 2)})
df_final = dyn.merge(df.rename(columns={'reception_date': 'period_date'}), how='outer').fillna(0)

with this result :

item_id  period_date ordered_qty
0  item_1            0           0
1  item_1            1    [40, 39]
2  item_1            2        [45]
3  item_2            0           0
4  item_2            1    [45, 42]
5  item_2            2        [39]

it's better without for loop to make it more efficient but I don't think it's possible.

2

There are 2 best solutions below

2
mozway On BEST ANSWER

IIUC, you can reshape using:

trace_size = 3

out = (df_initial
 .explode(['reception_date', 'ordered_qty'])
 .assign(n=lambda d: d.groupby(level=0).cumcount())
 # add missing combinations of
 # item_id/reception_date/n
 .pipe(lambda d: d
       .set_index(['item_id', 'reception_date', 'n'])
       .reindex(pd.MultiIndex.from_product([
   d['item_id'].unique(),
   range(trace_size),
   range(trace_size)
  ], names=['item_id', 'reception_date', 'n']), fill_value=0)
       .reset_index()
     )
 # aggregate as lists
 .groupby(['item_id', 'reception_date'], as_index=False)
  ['ordered_qty'].agg(list)
)

Output:

  item_id  reception_date  ordered_qty
0  item_1               0    [0, 0, 0]
1  item_1               1  [40, 0, 39]
2  item_1               2   [0, 45, 0]
3  item_2               0    [0, 0, 0]
4  item_2               1  [45, 0, 42]
5  item_2               2   [0, 39, 0]
0
Pep_8_Guardiola On

You can get very close to your desired output with the following, and it should be significantly faster on larger datasets:

df = df.explode(['reception_date', 'ordered_qty'])
df.groupby(['item_id', 'reception_date'])['ordered_qty'].apply(list).reset_index()

Output:

    item_id     reception_date  ordered_qty
0   item_1      1               [40, 39]
1   item_1      2               [45]
2   item_2      1               [45, 42]
3   item_2      2               [39]

How necessary are the reception_date = 0 rows?