I'm looking to create a column in a dataframe based on both another column using two rows and the new column itself. I'm trying to replicate and excel function that creates an ongoing sum of values unless a certain condition is met.

if I have the following column:

Column A 0 0 1 2 3 0 0 1 4

The excel for the next column would be column B where if column A cell = 0, then column B cell = 0 but if column A = X column B = X + [value in previous row in Column B]

Using this formula and the above column A, column B would be:

Column B 0 0 1 3 = 1 + 2 6 = 3 + 3 0 0 1 5 = 1 + 4

I'm not really sure if this is possible since the new column is referring back to itself. I've looked at loc and iloc functions, but couldn't find how to make the offset work - I see there is a shift function, but I also have a conditional some something like df["B"] = df.loc[df["A"] == 0 ....

I'm pretty new to programming python and I couldn't even really get started on this. I can write basic formulas to create a new column based on values in another column, but then the reference to more than one row is problematic for me. Perhaps doing it in more than one step would be one approach.

Outside of dataframes it seems straight forward to write a recursive formula to mimic the excel:

in cell B2 = if(A2 = 0, 0, A2+B1) B1 is always going to be 0.

Thank you,

1

There are 1 best solutions below

0
Josh Friedlander On

This is absolutely doable, although it is a little advanced. It falls in the category of what are known as streaks and islands problems, meaning finding groups in a column of data that match a pattern. In this case we want the cumulative sum (cumsum), but resetting every time we get to a 0.

import pandas as pd

column_a = pd.Series([0, 0, 1, 2, 3, 0, 0, 1, 4])

# find rows which equal 0, these begin a new pattern
streaks = column_a == 0

# each pattern beginning is a new group, group by these and cumsum
column_b = column_a.groupby(streaks.cumsum()).cumsum()

Giving us the desired output:

0    0
1    0
2    1
3    3
4    6
5    0
6    0
7    1
8    5
dtype: int64