How can I initialize a Field() to contain a nested python dict?

176 Views Asked by At

I have a Field() in my items.py called:

scores = Field()

I want multiple scrapers to append a value to a nested dict inside scores. For example, one of my scrapers:

item['scores']['baseball_score'] = '92'

And another scraper would:

item['scores']['basket_score'] = '21'

So that when I retrieve scores:

> item['scores']
  { 'baseball_score': '92', 'basket_score': '21' }

I do not want to initialize the dict inside my scraper because all my scrapers will be running simultaneously, so there could be race problems. Is there anyway for me to initialize item['scores'] as a nested dict in items.py? Or else, should I create a script before I run my scrapers to initialize it?

I actually want to make all of the fields in my Item as either a nested list or dict. Once my scrapers are done, I plan to aggregate these somehow in my pipelines.py.

This has got me thinking as well as to whether I should have a different Item Class for each of my scrapers, and then aggregate them into 1 item at the end once all the scrapers have finished. Thoughts?

2

There are 2 best solutions below

2
On BEST ANSWER

Achievable using defaultdict

from collections import defaultdict
item = defaultdict(dict)

Then you can pass item to all your scrapers and they can each do add data at the appropriate key. Note that the above only creates a 2 level dict.

0
On

The best way to do something like this with multiple levels is with Perl style autovivification.

There are multiple methods in Python to implement autovivification, involving either a recursive definition of defaultdictor subclassing dict

Here is a subclass involving __getitem__:

class AutoVivification(dict):
    """Implementation of perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            value = self[item] = type(self)()
            return value

>>> item=AutoVivification()
>>> item['scores']['baseball_score'] = '92'
>>> item
{'scores': {'baseball_score': '92'}}

And here is an alternate method that involves __missing__:

class Autoviv(dict):
    def __missing__(self, key):
        value = self[key] = type(self)()
        return value

>>> common_name = Autoviv()
>>> common_name['Mammalia']['Primates']['Homo']['H. sapiens'] = 'human being'
>>> common_name
{'Mammalia': {'Primates': {'Homo': {'H. sapiens': 'human being'}}}}

Both methods will work with arbitrarily deep nesting.