Summing rows based on cumsum values

155 Views Asked by At

I have a data frame like

index  A B C
0      4 7 9
1      2 6 2
2      6 9 1
3      7 2 4
4      8 5 6

I want to create another data frame out of this based on the sum of C column. But the catch here is if the sum of C reached 10 or higher it should create another row. Something like this.

index  A B C
0      6 13 11
1      21 16 11

Any help will be highly appreciable. Is there a robust way to do this, or iterating is my last resort?

2

There are 2 best solutions below

2
On BEST ANSWER

There is a non-iterative approach. You'll need a groupby based on C % 11.

# Groupby logic - https://stackoverflow.com/a/45959831/4909087
out = df.groupby((df.C.cumsum() % 10).diff().shift().lt(0).cumsum(), as_index=0).agg('sum')
print(out) 
    A   B   C
0   6  13  11
1  21  16  11
0
On

The code would look something like this:

import pandas as pd

lista = [4, 7, 10, 11, 7]
listb= [7, 8, 2, 5, 9]
listc = [9, 2, 1, 4, 6]

df = pd.DataFrame({'A': lista, 'B': listb, 'C': listc})

def sumsc(df):
    suma=0
    sumb=0
    sumc=0
    list_of_sums = []
    for i in range(len(df)):
        suma+=df.iloc[i,0]
        sumb+=df.iloc[i,1]
        sumc+=df.iloc[i,2]
        if sumc > 10:
            list_of_sums.append([suma, sumb, sumc])
            suma=0
            sumb=0
            sumc=0
    return pd.DataFrame(list_of_sums)

sumsc(df)

    0   1   2
0  11  15  11
1  28  16  11