Given a large array of tuples, how to groupby the first element of each tuple in order to sum the last element of each tuple without Pandas dataframe?

Question

Given a large array of tuples, how to groupby the first element of each tuple in order to sum the last element of each tuple without Pandas dataframe?

273 Views Asked by ericdwkim At 20 March 2022 at 14:56

I have a large list of tuples where each tuple contains 9 string elements:

pdf_results = [
("Kohl's - Dallas", '-', "Kohl's Cafe", '03/18/22', 'RC', '8', '0', '16', '8'),
("Kohl's - Dallas", '-', "Kohl's Cafe", '03/18/22', 'SMI', '5', '0', '10', '5'),
("Kohl's - Dallas", '-', "Kohl's Cafe", '03/19/22', 'RC', '8', '0', '16', '8'),
("Kohl's - Dallas", '-', "Kohl's Cafe", '03/19/22', 'SMI', '5', '0', '10', '5'),
("Kohl's - Dallas", '-', "Kohl's Cafe", '03/20/22', 'RC', '8', '0', '16', '8'),
("Kohl's - Dallas", '-', "Kohl's Cafe", '03/20/22', 'SMI', '5', '0', '10', '5'),
("Kohl's - Dallas", '-', "Kohl's Cafe", '03/21/22', 'RC', '8', '0', '16', '8'),
("Kohl's - Dallas", '-', "Kohl's Cafe", '03/21/22', 'SMI', '5', '0', '10', '5'),
("Kohl's - Dallas", '-', "Kohl's Cafe", '03/23/22', 'SMI', '5', '0', '10', '5'),
("Kohl's - Dallas", '-', "Kohl's Cafe", '03/24/22', 'RC', '8', '0', '16', '8'),
("Kohl's - Dallas", '-', "Kohl's Cafe", '03/24/22', 'SMI', '5', '0', '10', '5'),
('Bronx-Lebanon Hospital Center', '-', 'Patient Trayline ', '03/18/22', 'RC', '8', '0', '16', '8'),
('Bronx-Lebanon Hospital Center', '-', 'Patient Trayline ', '03/18/22', 'SMI', '5', '0', '10', '5'),
('Bronx-Lebanon Hospital Center', '-', 'Patient Trayline ', '03/19/22', 'RC', '8', '0', '16', '8'),
('Bronx-Lebanon Hospital Center', '-', 'Patient Trayline ', '03/19/22', 'SMI', '5', '0', '10', '5')
]

Without using a Pandas dataframe, how best to group by the first element of each tuple in order to sum the last element of each tuple. Output should look like this:

desired_output = [
("Kohl's - Dallas", 70),
("Bronx-Lebanon Hospital Center", 26)
]

I've tried using itertools.groupby which seems to be the most appropriate solution; however, getting stuck on properly iterating, indexing, and summing the last element of each tuple without running into one of the following obstacles:

The last element of each tuple is of type string and upon converting to int prevents iteration as TypeError: 'int' object not iterable
ValueError is raised where invalid literal for int() with base 10: 'b'

Attempt:

from itertools import groupby

def getSiteName(siteChunk):
    return siteChunk[0]

siteNameGroup = groupby(pdf_results, getSiteName)

for key, group in siteNameGroup:
    print(key) # 1st element of tuple as desired
    for pdf_results in group:
        # Raises TypeError: unsupported operand type(s) for +: 'int' and 'str'
        print(sum(pdf_results[8]))
    print()

Original Q&A

There are 4 best solutions below

TheFaultInOurStars On 20 March 2022 at 15:13

Why not using a simple for loop on a empty dictionary?

resultDict = {}
for value in pdf_results:
  if value[0] not in resultDict:
    resultDict[value[0]] = 0
  resultDict[value[0]] += float(value[len(value)-1])
print(resultDict)

Output

{"Kohl's - Dallas": 70.0,
'Bronx-Lebanon Hospital Center': 26.0}

If a dictionary is not what you want and you are insisting on having a tuple instead, you can use:

list(resultDict.items())

Output

[("Kohl's - Dallas", 70.0), ('Bronx-Lebanon Hospital Center', 26.0)]

Kelly Bundy On 20 March 2022 at 15:53

You're almost there. Just change your

for pdf_results in group:
    print(sum(pdf_results[8]))

to:

print(sum(int(pdf_results[8])
          for pdf_results in group))

(Though I'd also rename to pdf_result, singular.)

Prashanth On 20 March 2022 at 16:06

This would also work:

from collections import defaultdict

output = defaultdict(int)

for item in pdf_results:
    output[item[0]] += int(item[-1])

print(list(output.items()))

Output

[("Kohl's - Dallas", 70), ('Bronx-Lebanon Hospital Center', 26)]

**dawg** · Accepted Answer · 2022-03-20T15:52:27.023000

Assuming your list is sorted by the first element, you can do:

from itertools import groupby 

for k,v in groupby(pdf_results, key=lambda t: t[0]):
    print(k, sum(int(x[-1]) for x in v))

Prints:

Kohl's - Dallas 70
Bronx-Lebanon Hospital Center 26

If the order is not sorted, just use a dict to total the elements keyed by the the first entry of the tuple:

res={}

for t in pdf_results:
    res[t[0]]=res.get(t[0],0)+int(t[-1])

>>> res
{"Kohl's - Dallas": 70, 'Bronx-Lebanon Hospital Center': 26}

Given a large array of tuples, how to groupby the first element of each tuple in order to sum the last element of each tuple without Pandas dataframe?

There are 4 best solutions below

Output

Output

Output

Related Questions in PYTHON

Related Questions in TUPLES

Related Questions in AGGREGATION

Related Questions in PYTHON-ITERTOOLS

Related Questions in ITERTOOLS-GROUPBY

Trending Questions

Popular # Hahtags

Popular Questions