I'm using couchdbkit (python 2.7) and I need to save about 100 new items at a time in bulk. I want to save the payload (=attachment) together with metadata (=doc) at the same time.
Right now I'm saving those items very inefficiently one by one because couchdbkit only allows put_attachment()
after a doc already exists in the database. This forces me into a very slow implementation. When I want to save 1 item I need to communicate twice and in fixed order: First save()
the item and secondly put_attachment()
.
What I want is to locally create all the docs with their _attachments
and send everything at once. The following code is not working, because bulk_save
does not handle attachments [edit: not true, see my answer]
def setInBulk(self, key2value):
datetimeprop = DateTimeProperty()
def createItemToSave(thekey, thevalue):
pickled = cPickle.dumps(obj = value, protocol = 2).decode('latin1')
item = {"_id": key, "content": {"seeattachment": True, "ispickled" : True}, "creationtm": datetimeprop.to_json(datetime.datetime.utcnow()), "lastaccesstm": datetimeprop.to_json(datetime.datetime.utcnow())}
item ["_attachments"] = {
"theattachment":
{
"content_type":"application/octet-stream",
"data": pickled.encode('utf-8')
}
}
return item
docs = []
for key, value in key2value.iteritems():
docs.append(createItemToSave(key, value))
#this is what I want but it seems not to work
self.db.bulk_save(docs, all_or_nothing = True)
How can I circumvent the write-one-at-a-time limitation forced upon me by couchdbkit?
I got it working! Turns out that
bulk_save
does indeed handle_attachments
field correctly. What I did wrong was the data encoding. Here is my new code:First of all I also added usage of BooleanProperty just to make sure that everything is JSON compatible.
Second of all I failed to base64-encode the data. The pure and unfiltered base64 code is needed.
Do NOT try to filter the base64 code. I got confused by the couchdb documentation which says "Please note that any base64 data you send has to be on a single line of characters, so pre-process your data to remove any carriage returns and newlines. " JSON-base64 specification suggested similar filtering. This may be true in itself, but
bulk_save()
already seems to take care of that and doing it twice will only result in "badmatch"-errors.