I just came across a scenario that I don't know how to resolve with the existing structure of my documents. As shown below I can obviously resolve this problem with some refactoring but I am curious about how this could be resolve the most efficiently possible and respecting the same structure.
Please see that this queestion is different to How to Do An Atomic Update on an EmbeddedDocument in a ListField in MongoEngine?
Let's suppose the following models:
class Scans(mongoengine.EmbeddedDocument):
peer = mongoengine.ReferenceField(Peers, required=True)
site = mongoengine.ReferenceField(Sites, required=True)
process_name = mongoengine.StringField(default=None)
documents = mongoengine.ListField(mongoengine.ReferenceField('Documents'))
is_complete = mongoengine.BooleanField(default=False)
to_start_at = mongoengine.DateTimeField()
started = mongoengine.DateTimeField()
finished = mongoengine.DateTimeField()
class ScanSettings(mongoengine.Document):
site = mongoengine.ReferenceField(Sites, required=True)
max_links = mongoengine.IntField(default=100)
max_size = mongoengine.IntField(default=1024)
mime_types = mongoengine.ListField(default=['text/html'])
is_active = mongoengine.BooleanField(default=True)
created = mongoengine.DateTimeField(default=datetime.datetime.now)
repeat = mongoengine.StringField(choices=REPEAT_PATTERN)
scans = mongoengine.EmbeddedDocumentListField(Scans)
What I would like to do is to insert a ScanSettings object if and only if all elements of the scans fields - list of Scans embedded documents - have in turn their document list unique? By unique I mean all elements within the list at database level and not the whole list - that'd be easy.
In plain English if at the time of inserting ScanSetting any element of the scans list has a instance of scans which list of documents are duplicated, then such insertion should not happen. I mean uniqueness at the database level, taking into account existing records if any.
Given that Mongo does not support uniqueness across all elements of a list within the same document I find two solutions:
Option A
I refactor my "schema" and make Scans collection inherit from Document rather than Embedded document and change the scans field of ScanSettings to be a ListField of ReferenceFields to Scans documents. Then it is easy as I just need to save the Scans first using "Updates" with operator "add_to_set" and option "upsert=True". Then once the operation has been approved, save the ScanSettings. I will need the number of scans instances to insert + 1 queries.
Option B I keep the same "schema" but somehow generates unique IDs for the Scans embedded document. Then before any insertion of Scan Settings with a non-empty scans field I'll fetch the already existing records to see if there are duplicated document's ObjectIds among the just retrieved records and the ones to be inserted. In other words I would check uniqueness via Python rather than using MogoneEngine/Mongodb. I will need 2 x number of scan instances to insert (read + update with add_set_operator) + 1 ScanSettings save
Option C Ignore Uniqueness. Given how my model will be structured I am pretty sure there will be no duplicates or if any, it will be negligible. Then deal with duplicates at reading time. For those like me coming from Relational databases this solution feels hitching.
I am a novice in Mongo so I appreciate any comments. Thanks.
PS: I am using latest MongoEngine and free Mongodb.
Thanks a lot in advance.
I finally went for Option A so I refactor my model to:
a) Create a Mixin class that inherits from a Document class to add two methods: overriding 'save' so that it only allows saves when the list of unique documents is empty and 'save_with_uniqueness' which allows saves and/or updates when the list of documents is empty. The idea is to enforce uniqueness.
b) Refactor both Scans and ScanSettings sot that the former redefine the 'scans' field as a ListField of references to Scans and the latter so that inherits from Document rather than Embedded Document.
c) The reality is that Scans and ScanSettings are now inheriting from the Mixin class as both classes need to guarantee uniqueness for both of their attribute 'documents' and 'scans', respectively. Hence the Mixin class.
With a) and b) I can guarantee uniqueness and save first each scan instance for it to later on be added to ScanSettings.scans in the usual way.
A few points for novices like me:
Finally the code. It's not fully tested but enough to get the main idea right for what I need.