Python itertools.groupby with dictionaries with multiple values

1.1k Views Asked by At

I am trying to use the Python itertools.groupby function to change this list:

items = [
  {'price': 5.0, 'name': 'Strawberries'}, 
  {'price': 5.0, 'name': 'Strawberries'}, 
  {'price': 5.0, 'name': 'Strawberries'}, 
  {'price': 11.23, 'name': 'Coffee'}, 
  {'price': 11.23, 'name': 'Coffee'}, 
  {'price': 3.11, 'name': 'Green Tea'}
]

into this:

{
  'Strawberries': {'price': 5.0, 'quantity': 3}, 
  'Coffee': {'price': 11.23, 'quantity': 2}, 
  'Green Tea': {'price': 3.11, 'quantity': 1}
}

I have tried both:

grouped = { 
  name: {
    'price': list(article)[0]['price'], 
    'quantity': len(list(article))
  } for name, article in groupby(items, key=lambda x: x['name']) 
}

and:

grouped = { 
  name: {
    'quantity': list(article), 
    'price': list(article)[0]['price']
  } for name, article in groupby(items, key=lambda x: x['name']) 
}

with the following results:

{
  'Strawberries': {'price': 5.0, 'quantity': []}, 
  'Coffee': {'price': 11.23, 'quantity': []}, 
  'Green Tea': {'price': 3.11, 'quantity': []}
}

IndexError: list index out of range

I'm not sure why I am only able to access article for one of the values within the sub-dict I am trying to create.

Any suggestions would be much appreciated. Thanks!

2

There are 2 best solutions below

1
On BEST ANSWER

The reason you are getting a blank list or the index error is because your article object is an iterator, which is fully consumed on the first call to list(article).

When you get the price first, the price is correct but the quantity is an empty list because you already consumed article. By contrast, when you get the quantity first then take the first item's price, the second call to list(article) produces an empty list, which you try to index but cannot because there are no items.

Here is a solution with groupby where you save the list(article) and use it for both the price and quantity.

grouped = {}
for name, article in groupby(items, key=lambda itm: itm["name"]):
    products = list(article)
    grouped[name] = {
        "price": products[0]["price"],
        "quantity": len(products),
    }

Edit: As mentioned in the comments, this assumes your items list is in the order that you want it. Often you will want the iterable passed to groupby() to be sorted in a meaningful way. But perhaps you just want to group consecutive items together, even if the same item occurs later in your list.

1
On

Not the best use case for groupby in my opinion. It's easier to build a (default)dict with a loop over items.

from collections import defaultdict

result = defaultdict(lambda: {'price': None, 'quantity': 0})

for item in items:
    subdict = result[item['name']]
    subdict['quantity'] += 1
    subdict['price'] = item['price']

Output:

>>> result
defaultdict(<function __main__.<lambda>()>,
            {'Strawberries': {'price': 5.0, 'quantity': 3},
             'Coffee': {'price': 11.23, 'quantity': 2},
             'Green Tea': {'price': 3.11, 'quantity': 1}})

(The price if overridden by the last seen price for an item. This is ok if you don't expect ambiguous prices across items with the same name.)

edit: without defaultdict

result = {}
for item in items:
    result.setdefault(item['name'], {'price': item['price'], 'quantity': 0})['quantity'] += 1