EDITED
I have a pandas dataframe like so:
data = {'ID': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B'],
'column_1': [0, 0, 0, 0, 0.1, 1, 1.5, 2, 3, 4, 4.5, 5, 4.9, 3, 2, 1.8, 1, 0, 0, 1, 3, 0, 1.3, 2, 3, 4.3, 4.8, 5, 4.2, 3.5, 3, 2.6, 2, 1.9, 1, 0, 0, 0, 0, 0, 0.1, 0.2, 0.3, 1, 2, 3, 5, 4, 2, 0.5, 0, 0],
'column_2': [13, 25, 96, 59, 5, 92, 82, 141, 50, 85, 84, 113, 119, 128, 8, 133, 82, 10, 15, 62, 11, 68, 18, 24, 37, 55, 83, 48, 13, 81, 43, 36, 56, 43, 36, 46, 45, 127, 55, 67, 113, 98, 78, 78, 57, 131, 121, 126, 142, 51, 64, 95]}
ID column_1 column_2
0 A 0.0 13
1 A 0.0 25
2 A 0.0 96
3 A 0.0 59
4 A 0.1 5
5 A 1.0 92
6 A 1.5 82
7 A 2.0 141
8 A 3.0 50
9 A 4.0 85
10 A 4.5 84
11 A 5.0 113
12 A 4.9 119
13 A 3.0 128
14 A 2.0 8
15 A 1.8 133
16 A 1.0 82
17 A 0.0 10
18 A 0.0 15
19 A 1.0 62
20 A 3.0 11
21 A 0.0 68
22 A 1.3 18
23 A 2.0 24
24 A 3.0 37
25 A 4.3 55
26 A 4.8 83
27 A 5.0 48
28 A 4.2 13
29 A 3.5 81
30 A 3.0 43
31 A 2.6 36
32 A 2.0 56
33 A 1.9 43
34 A 1.0 36
35 A 0.0 46
36 A 0.0 45
37 A 0.0 127
38 A 0.0 55
39 A 0.0 67
40 A 0.1 113
41 A 0.2 98
42 B 0.3 78
43 B 1.0 78
44 B 2.0 57
45 B 3.0 131
46 B 5.0 121
47 B 4.0 126
48 B 2.0 142
49 B 0.5 51
50 B 0.0 64
51 B 0.0 95
Tracing back from when the value hits 5 in column_1, I want to find the value in column_2 just before the value in column_1 increased from 0 and just after it came back down to 0. So, in the data frame above, the values in column_2 would be 5, 10 and 18, 46.
I want to perform some arithmetic and would like to add 2 columns before & after with those values grouped by the ID column.
The expected output would be:
ID column_1 column_2 Before After
0 A 0.0 13 0 0
1 A 0.0 25 0 0
2 A 0.0 96 0 0
3 A 0.0 59 0 0
4 A 0.1 5 0 0
5 A 1.0 92 0 0
6 A 1.5 82 0 0
7 A 2.0 141 0 0
8 A 3.0 50 0 0
9 A 4.0 85 0 0
10 A 4.5 84 0 0
11 A 5.0 113 5 10
12 A 4.9 119 0 0
13 A 3.0 128 0 0
14 A 2.0 8 0 0
15 A 1.8 133 0 0
16 A 1.0 82 0 0
17 A 0.0 10 0 0
18 A 0.0 15 0 0
19 A 1.0 62 0 0
20 A 3.0 11 0 0
21 A 0.0 68 0 0
22 A 1.3 18 0 0
23 A 2.0 24 0 0
24 A 3.0 37 0 0
25 A 4.3 55 0 0
26 A 4.8 83 0 0
27 A 5.0 48 18 46
28 A 4.2 13 0 0
29 A 3.5 81 0 0
30 A 3.0 43 0 0
31 A 2.6 36 0 0
32 A 2.0 56 0 0
33 A 1.9 43 0 0
34 A 1.0 36 0 0
35 A 0.0 46 0 0
36 A 0.0 45 0 0
37 A 0.0 127 0 0
38 A 0.0 55 0 0
39 A 0.0 67 0 0
40 A 0.1 113 0 0
41 A 0.2 98 0 0
42 B 0.3 78 0 0
43 B 1.0 78 0 0
44 B 2.0 57 0 0
45 B 3.0 131 0 0
46 B 5.0 121 78 64
47 B 4.0 126 0 0
48 B 2.0 142 0 0
49 B 0.5 51 0 0
50 B 0.0 64 0 0
51 B 0.0 95 0 0
For a given ID if column_1 starts with a non zero value, it should give the first value of column_2 for that group.
The rest of the rows in Before and After can be filled with null or zeroes.
Is there an elegant way to achieve this?
With
whereandshiftyou can generate series where only the endpoints of the subsequences are filled and the rest is padded withNaN. Combined withffill(orbackfill, respectively),locating the rows wherecolumn_1equals 5 gives pretty much the desired result:Result:
With groupby()
Result with new data