How can I get the next row value in a Python dataframe?

14.2k Views Asked by At

I'm a new Python user and I'm trying to learn this so I can complete a research project on cryptocurrencies. What I want to do is retrieve the value right after having found a condition, and retrieve the value 7 rows later in another variable.

I'm working within an Excel spreadsheet which has 2250 rows and 25 columns. By adding 4 columns as detailed just below, I get to 29 columns. It has lots of 0s (where no pattern has been found), and a few 100s (where a pattern has been found). I want my program to get the row right after the one where 100 is present, and return it's Close Price. That way, I can see the difference between the day of the pattern and the day after the pattern. I also want to do this for seven days down the line, to find the performance of the pattern on a week.

Here's a screenshot of the spreadsheet to illustrate this

You can see -100 cells too, those are bearish pattern recognition. For now I just want to work with the "100" cells so I can at least make this work.

I want this to happen:

import pandas as pd
import talib
import csv
import numpy as np

my_data = pd.read_excel('candlesticks-patterns-excel.xlsx')
df = pd.DataFrame(my_data)

df['Next Close'] = np.nan_to_num(0) #adding these next four columns to my dataframe so I can fill them up with the later variables#
df['Variation2'] = np.nan_to_num(0)
df['Next Week Close'] = np.nan_to_num(0)
df['Next Week Variation'] = np.nan_to_num(0)
df['Close'].astype(float)

for row in df.itertuples(index=True):
    str(row[7:23])
    if ((row[7:23]) == 100):
        nextclose = np.where(row[7:23] == row[7:23]+1)[0] #(I Want this to be the next row after having found the condition)#
    if (row.Index + 7 < len(df)):
        nextweekclose = np.where(row[7:23] == row[7:23]+7)[0] #(I want this to be the 7th row after having found the condition)#
    else:
        nextweekclose = 0

The reason I want these values is to later compare them with these variables:

variation2 = (nextclose - row.Close) / row.Close * 100
    nextweekvariation = (nextweekclose - row.Close) / row.Close * 100
    df.append({'Next Close': nextclose, 'Variation2': variation2, 'Next Week Close': nextweekclose, 'Next Week Variation': nextweekvariation}, ignore_index = true)

My errors come from the fact that I do not know how to retrieve the row+1 value, and the row+7 value. I have searched high and low all day online and haven't found a concrete way to do this. Whichever idea I try to come up with gives me either a "can only concatenate tuple (not "int") to tuple" error, or a "AttributeError: 'Series' object has no attribute 'close'". This second one I get when I try:

for row in df.itertuples(index=True):
    str(row[7:23])
    if ((row[7:23]) == 100):
        nextclose = df.iloc[row.Index + 1,:].close
    if (row.Index + 7 < len(df)):
        nextweekclose = df.iloc[row.Index + 7,:].close
    else:
        nextweekclose = 0

I would really love some help on this. Using Jupyter Notebook.

EDIT : FIXED

I have finally succeeded ! As it often seems to be the case with programming (yeah, I'm new here...), the mistakes were because of my inability to think outside the box. I was persuaded a certain part of my code was the problem, when the issues ran deeper than that.

Thanks to BenB and Michael Gardner, I have fixed my code and it is now returning what I wanted. Here it is.

import pandas as pd
import talib
import csv
import numpy as np
        
my_data = pd.read_excel('candlesticks-patterns-excel.xlsx')
df = pd.DataFrame(my_data)
        
        
#Creating my four new columns. In my first message I thought I needed to fill them up
#with 0s (or NaNs) and then fill them up with their respective content later. 
#It is actually much simpler to make the operations right now, keeping in mind 
#that I need to reference df['Column Of Interest'] every time.
    
df['Next Close'] = df['Close'].shift(-1)
df['Variation2'] = (((df['Next Close'] - df['Close']) / df['Close']) * 100)
df['Next Week Close'] = df['Close'].shift(-7)
df['Next Week Variation'] = (((df['Next Week Close'] - df['Close']) / df['Close']) * 100)
    
#The only use of this is for me to have a visual representation of my newly created columns#
print(df)
        
for row in df.itertuples(index=True):
    if 100 or -100 in row[7:23]:
        nextclose = df['Next Close']
            
    if (row.Index + 7 < len(df)) and 100 or -100 in row[7:23]:
            nextweekclose = df['Next Week Close']
        else:
            nextweekclose = 0
                
        variation2 = (nextclose - row.Close) / row.Close * 100
        nextweekvariation = (nextweekclose - row.Close) / row.Close * 100
        df.append({'Next Close': nextclose, 'Variation2': variation2, 'Next Week Close': nextweekclose, 'Next Week Variation': nextweekvariation}, ignore_index = True)
        
df.to_csv('gatherinmahdata3.csv')
2

There are 2 best solutions below

2
On BEST ANSWER

If I understand correctly, you should be able to use shift to move the rows by the amount you want and then do your conditional calculations.

import pandas as pd
import numpy as np

df = pd.DataFrame({'Close': np.arange(8)})

df['Next Close'] = df['Close'].shift(-1)
df['Next Week Close'] = df['Close'].shift(-7)

df.head(10)

   Close  Next Close  Next Week Close
0      0         1.0              7.0
1      1         2.0              NaN
2      2         3.0              NaN
3      3         4.0              NaN
4      4         5.0              NaN
5      5         6.0              NaN
6      6         7.0              NaN
7      7         NaN              NaN

df['Conditional Calculation'] = np.where(df['Close'].mod(2).eq(0), df['Close'] * df['Next Close'], df['Close'])

df.head(10)

   Close  Next Close  Next Week Close  Conditional Calculation
0      0         1.0              7.0                      0.0
1      1         2.0              NaN                      1.0
2      2         3.0              NaN                      6.0
3      3         4.0              NaN                      3.0
4      4         5.0              NaN                     20.0
5      5         6.0              NaN                      5.0
6      6         7.0              NaN                     42.0
7      7         NaN              NaN                      7.0
1
On

From your update it becomes clear that the first if statement checks that there is the value "100" in your row. You would do that with

if 100 in row[7:23]:

This checks whether the integer 100 is in one of the elements of the tuple containing the columns 7 to 23 (23 itself is not included) of the row.

If you look closely at the error messages you get, you see where the problems are:

TypeError: can only concatenate tuple (not "int") to tuple

comes from

nextclose = np.where(row[7:23] == row[7:23]+1)[0]

row is a tuple and slicing it will just give you a shorter tuple to which you are trying to add an integer, as is said in the error message. Maybe have a look at the documentation of numpy.where and see how it works in general, but I think it is not really needed in this case. This brings us to your second error message:

AttributeError: 'Series' object has no attribute 'close'

This is case sensitive and for me it works if I just capitalize the close to "Close" (same reason why Index has to be capitalized):

nextclose = df.iloc[row.Index + 1,:].Close

You could in principle use the shift method mentioned in the other reply and I would suggest it for easiness, but I want to point out another method, because I think understanding them is important for working with dataframes:

nextclose = df.iloc[row[0]+1]["Close"]
nextclose = df.iloc[row[0]+1].Close
nextclose = df.loc[row.Index + 1, "Close"]

All of them work and there are probably even more possibilities. I can't really tell you which ones are the fastest or whether there are any differences, but they are very commonly used when working with dataframes. Therefore, I would recommend to have a closer look at the documentation of the methods you used and especially what kind of data type they return. Hope that helps understanding the topic a bit more.