Each Job in my system belongs to a specific userid and can be put in rabbitmq from multiple sources. My requirements:
- No more than 1 job should be running per user at any given time.
- Jobs for other users should not experience any delay because of job piling up for a specific user.
- Each Job should be executed at least once. Each Job will have a max retries count and is re-inserted in queue (or probably delayed) with a delay if fails.
- Maintaining Sequence of Jobs (per user) is desirable but not compulsory.
- Jobs should probably be persisted, as I need them executing atleast once. There is no expiry time of jobs.
- Any of the workers should be able to run jobs for any of the user.
With these requirements, I think maintaining a single queue for each individual user makes sense. I would also need all the workers watching all user queues and execute job for user, whose job is currently not running anywhere (ie, no more than 1 job per user)
Would this solution work using RabbitMQ in a cluster setup? Since the number of queues would be large, I am not sure each worker watching every user queue would cause significant overhead or not. Any help is appreciated.
As @dectarin has mentioned, having multiple workers listen to multiple job queues will make it hard to ensure that only one job per user is being executed.
I think it'd work better if the jobs go through a couple steps.
I don't know how the jobs get submitted to the system, so it's hard to tell if actual per-user MessageQueues would be the best way to queue the waiting. If the jobs already sit in a mailbox, that might work as well, for instance. Or store the queued jobs in a database, as a bonus that'd allow you to write a little front end for users to inspect and manage their queued jobs.
Depending on what you choose, you can then find an elegant way to coordinate the single job per user constraint.
For instance, if the jobs sit in a database, the database keeps things synchronised and multiple coordinator workers could go through the following loop:
Or if the waiting jobs sit in per-user message queues, transactions will be easier and a single Coordinator going through the loop won't need to worry about multi-threading.
Making things fully transactional across database and queues may be hard but need not be necessary. Introducing a pending state you should allow you to err on the side of caution making sure no jobs get lost if a step fails.