how to find the number of events or pulses recorded when the value is greater than 0 in python?

79 Views Asked by At

I've a data set that represents rainfall every hour in a day. I'm creating column 'E1' which should start from zero and increment every time column 'value' is greater than zero, and stops when column 'value' becomes zero again, again when column 'value' is zero the numbering must continue.

condition = ((df['value'] > 0) & (df['value'].shift(periods=1) == 0))

df['E2'] = (condition).cumsum()
print(df)
    hour  value  E2
0      0    0.0   0
1      1    0.2   1
2      2    0.2   1
3      3    0.2   1
4      4    0.0   1
5      5    0.2   2
6      6    0.2   2
7      7    0.0   2
8      8    NaN   2
9      9    0.2   2
10    10    0.0   2
11    11    0.0   2
12    12    0.2   3
13    13    0.2   3
14    14    0.0   3
15    15    NaN   3
16    16    0.2   3
17    17    0.0   3
18    18    0.2   4
19    19    0.0   4
20    20    0.2   5
21    21    0.2   5
22    22    NaN   5
23    23    0.0   5

E1 represents the event number, an event can last 1 or several hours, an event should only be considered when the cell before the start of the event is zero and the cell after the last data is equal to zero

I'm stuck, trying to list the events. Should get:

    hour  value  E2
0      0    0.0   0
1      1    0.2   1
2      2    0.2   1
3      3    0.2   1
4      4    0.0   0
5      5    0.2   2
6      6    0.2   2
7      7    0.0   0
8      8    NaN   0
9      9    0.2   0
10    10    0.0   0
11    11    0.0   0
12    12    0.2   3
13    13    0.2   3
14    14    0.0   0
15    15    NaN   0
16    16    0.2   0
17    17    0.0   0
18    18    0.2   4
19    19    0.0   0
20    20    0.2   0
21    21    0.2   0
22    22    NaN   0
23    23    0.0   0
1

There are 1 best solutions below

4
Tim Roberts On BEST ANSWER

I find this an odd criteria, but here's how to compute your "event" numbers. Because you're looking both forward and backward, there's no way to do this in a vectorized way.

import numpy as np
import pandas as pd

data = [
  0.0,
  0.2,
  0.2,
  0.2,
  0.0,
  0.2,
  0.2,
  0.0,
  np.nan,
  0.2,
  0.0,
  0.0,
  0.2,
  0.2,
  0.0,
  np.nan,
  0.2,
  0.0,
  0.2,
  0.0,
  0.2,
  0.2,
  np.nan,
  0.0
]

data = [[k] for k in data]
df = pd.DataFrame( data, columns=['data'])
print(df)

nxt = 1
nums = np.zeros(len(df['data']), dtype=int)
start = None
for ndx,v in enumerate(df['data']):
    if np.isnan(v):
        start = None
    elif not v:
        if start is not None and start < ndx:
            nums[start:ndx] = nxt
            nxt += 1
        start = ndx+1

df['E1'] = nums
print(df)

Output:

    data
0    0.0
1    0.2
2    0.2
3    0.2
4    0.0
5    0.2
6    0.2
7    0.0
8    NaN
9    0.2
10   0.0
11   0.0
12   0.2
13   0.2
14   0.0
15   NaN
16   0.2
17   0.0
18   0.2
19   0.0
20   0.2
21   0.2
22   NaN
23   0.0
    data  E1
0    0.0   0
1    0.2   1
2    0.2   1
3    0.2   1
4    0.0   0
5    0.2   2
6    0.2   2
7    0.0   0
8    NaN   0
9    0.2   0
10   0.0   0
11   0.0   0
12   0.2   3
13   0.2   3
14   0.0   0
15   NaN   0
16   0.2   0
17   0.0   0
18   0.2   4
19   0.0   0
20   0.2   0
21   0.2   0
22   NaN   0
23   0.0   0