I want to run 100 networking (non cpu intense) jobs in parallel and want to understand the best approach.
Specifically is it possible to run 100+ jobs using xargs and what are the drawbacks?
I understand that there is a point where there is more context switching being done then actual packet processing. How to understand where that point is and what is the best way to minimise it?
For example, are there better tools to use other then xargs, etc?
Better will often be a matter of taste.
Using GNU Parallel you can do something like this to fetch 100 images in parallel:
If you want data from 100 servers and you get a full line every time:
Or:
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
For security reasons you should install GNU Parallel with your package manager, but if GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel