couchdbkit: how to bulk_save with attachments

173 Views Asked by At

I'm using couchdbkit (python 2.7) and I need to save about 100 new items at a time in bulk. I want to save the payload (=attachment) together with metadata (=doc) at the same time.

Right now I'm saving those items very inefficiently one by one because couchdbkit only allows put_attachment() after a doc already exists in the database. This forces me into a very slow implementation. When I want to save 1 item I need to communicate twice and in fixed order: First save() the item and secondly put_attachment().

What I want is to locally create all the docs with their _attachments and send everything at once. The following code is not working, because bulk_save does not handle attachments [edit: not true, see my answer]

def setInBulk(self, key2value):

    datetimeprop = DateTimeProperty()

    def createItemToSave(thekey, thevalue):
        pickled = cPickle.dumps(obj = value, protocol = 2).decode('latin1')
        item = {"_id": key, "content": {"seeattachment": True, "ispickled" : True}, "creationtm": datetimeprop.to_json(datetime.datetime.utcnow()), "lastaccesstm": datetimeprop.to_json(datetime.datetime.utcnow())}
        item ["_attachments"] = {
                             "theattachment":
                                {
                                    "content_type":"application/octet-stream",
                                    "data": pickled.encode('utf-8')
                                }
                            }
        return item

    docs = []
    for key, value in key2value.iteritems():
        docs.append(createItemToSave(key, value))

    #this is what I want but it seems not to work
    self.db.bulk_save(docs, all_or_nothing = True) 

How can I circumvent the write-one-at-a-time limitation forced upon me by couchdbkit?

1

There are 1 best solutions below

0
On

I got it working! Turns out that bulk_save does indeed handle _attachments field correctly. What I did wrong was the data encoding. Here is my new code:

def setInBulk(self, key2value):

    datetimeprop = DateTimeProperty()
    boolprop = BooleanProperty() #added

    def createItemToSave(thekey, thevalue):
        pickled = cPickle.dumps(obj = value, protocol = 2).decode('latin1')
        #modified: usage of BooleanProperty for booleans
        item = {"_id": key, "content": {"seeattachment": boolprop.to_json(True), "ispickled" : boolprop.to_json(True)}, "creationtm": datetimeprop.to_json(datetime.datetime.utcnow()), "lastaccesstm": datetimeprop.to_json(datetime.datetime.utcnow())} #modified
            item ["_attachments"] = {
                                     "theattachment":
                                        {
                                            "content_type":"application/octet-stream",
                                            #modified: base64 encoding needed
                                            "data": base64.encodestring(pickled.encode('utf-8')) 
                                        }
                                    }
        return item

    docs = []
    for key, value in key2value.iteritems():
        docs.append(createItemToSave(key, value))


    self.db.bulk_save(docs, all_or_nothing = True)

First of all I also added usage of BooleanProperty just to make sure that everything is JSON compatible.

Second of all I failed to base64-encode the data. The pure and unfiltered base64 code is needed.

Do NOT try to filter the base64 code. I got confused by the couchdb documentation which says "Please note that any base64 data you send has to be on a single line of characters, so pre-process your data to remove any carriage returns and newlines. " JSON-base64 specification suggested similar filtering. This may be true in itself, but bulk_save() already seems to take care of that and doing it twice will only result in "badmatch"-errors.