I am trying to load a huge amount of data into memcachedb. I am running some queries on MySQL database, and I want to store the results of these queries in memcachedb for later easy access.
Currently, I was just using simple set commands to store the results in memcachedb but since there are billions of these results, storing them one by one in a loop is very inefficient and time-consuming. So, I was wondering if there is a better way to load data into memcachedb? Like a data import wizard in traditional RDMS
I am using pylibmc to connect to memcachedb.
The pylibmc library has the
set_multifunction, which sends a bunch of commands in one go:This should probably work well enough. If you have billions of keys, you probably want to split it into chunks of a few thousand.
You can probably squeeze a bit more performance if you just send the commands over a socket. the memcache protocol is pretty simple. This has the advantage that you can add the
noreplyflag, so the server won't bother sending a reply. Of course, this means you can't do any error checking and that losing a few keys for whatever reason is fine.Here's a simple proof of concept:
Which should output:
The format of the
setcommand is:Of course, this is just a proof-of-concept; a slightly more advanced example might be something like:
If you're getting data from MySQL, then consider making a
setcommand with an SQL query! For example:Not sure this is actually faster, but I suspect it is.