Manage multiple instances of a process automatically

70 Views Asked by At

I have a program that takes about 1 second to run and takes a file as input and produces another file as output. Problem is I have to be able to process about 30 files a second. The files to process will be available as a queue (implemented over memcached) and don't have to be processed exactly in order, so basically an instance of the program checks out a file to process and does so. I could use a process manager that automatically launches instances of the program when system resources are available.

At the simple end, "system resources" will simply mean "up to two processes at a time," but if I move to a different machine make this could be 2 or 10 or 100 or whatever. I could use a utility to handle this, at least. And at the complex end, I would like to bring up another process whenever CPU is available since these machines will be dedicated. CPU time seems to be the constraining resource - the program isn't memory intensive.

What tool can accomplish this sort of process management?

1

There are 1 best solutions below

0
On

Storm - Without knowing more details, I would suggest Backtype Storm. But it would probably mean a total rewrite of your current code. :-)

More details at Tutorial, but it basically takes tuples of work and distributed them through a topology of worker nodes. A "spout" emits work into the topology and a "'bolt" is a step/task in the graph where some bit of work takes place. When a bolt finish it's work, it emits same/new tuple back into the topology. Bolts can do work in parallel or series.