I am running a Flask server which loads data into a MongoDB database. Since there is a large amount of data, and this takes a long time, I want to do this via a background job.
I am using Redis as the message broker and Python-rq to implement the job queues. All the code runs on Heroku.
As I understand, python-rq uses pickle to serialise the function to be executed, including the parameters, and adds this along with other values to a Redis hash value.
Since the parameters contain the information to be saved to the database, it quite large (~50MB) and when this is serialised and saved to Redis, not only does it take a noticeable amount of time but it also consumes a large amount of memory. Redis plans on Heroku cost $30 p/m for 100MB only. In fact I every often get OOM errors like:
OOM command not allowed when used memory > 'maxmemory'.
I have two questions:
- Is python-rq well suited to this task or would Celery's JSON serialisation be more appropriate?
- Is there a way to not serialise the parameter but rather a reference to it?
Your thoughts on the best solution are much appreciated!
Since you mentioned in your comment that your task input is a large list of key value pairs, I'm going to recommend the following:
Using the method above, you'll be able to:
For use cases like what you're doing, this will be MUCH faster and require much less overhead than sending these items through your queueing system.
Hope this helps!