Pandas: find nearest higher high (OHLC)

862 Views Asked by At

I'm trying to find the fastest solution to iterate over each row of Open High Low Close data and count out how many rows exist between the current row's high and the next high that’s greater than or equal to the original. Here is a shortened code example which I think explains it well (I'd be trying to find the final value of _next):

for i in df.index:
    while df.loc[i, 'high'] > df.loc[_next, 'high']:
         _next += 1

I think I need to use a lambda function inside apply but I can't seem to get it right. Been trying to figure this out for days and would love any advice!

Using this data, the function should return [2, 0, 0, 3, 1, 0, 0]

d = {'high': [1.2, 1.1, 1.1, 1.4, 1.3, 1.2, 1.3]}
df = pd.DataFrame(data=d)


    high   rows_to_higher_high
0   1.2    2
1   1.1    0
2   1.1    0
3   1.4    3
4   1.3    1
5   1.2    0
6   1.3    0

Edit: itertuples(name=None, index=True) is the fastest way to do it so far.

2

There are 2 best solutions below

1
On BEST ANSWER

using list comprehension and get index to list

In [166]: d = {'high': [1.2, 1.1, 1.1, 1.4, 1.3, 1.2, 1.3]}

In [167]: df = pd.DataFrame(data=d)

In [168]: df['rows_to_higher_high']=[(df['high'].values[i+1:]>=val).tolist().index(True) if True in (df['high'].values[i+1:]>=val).tolist() else le
     ...: n((df['high'].values[i+1:]>=val).tolist())  for i, val in enumerate(df['high'].values)]

In [169]: df
Out[169]:
   high  rows_to_higher_high
0   1.2                    2
1   1.1                    0
2   1.1                    0
3   1.4                    3
4   1.3                    1
5   1.2                    0
6   1.3                    0
1
On

Base on my understanding, we can get the subtraction from current row by all value, then get the upper tri , and get the index max with argmax when the value is more than 0

Sample :

s
0    1
1    2
2    1
3    3
4    1
5    1
6    5
dtype: int64

Solution :

np.clip(np.triu((s.values-s.values[:,None])>0,0).argmax(1)-np.arange(len(s)),a_min=0, a_max=None)
array([1, 2, 1, 3, 2, 1, 0])