Launching parallel network tasks using xargs whilst minimising context switching overhead

147 Views Asked by At

I want to run 100 networking (non cpu intense) jobs in parallel and want to understand the best approach.

Specifically is it possible to run 100+ jobs using xargs and what are the drawbacks?

I understand that there is a point where there is more context switching being done then actual packet processing. How to understand where that point is and what is the best way to minimise it?

For example, are there better tools to use other then xargs, etc?

1

There are 1 best solutions below

3
On

Better will often be a matter of taste.

Using GNU Parallel you can do something like this to fetch 100 images in parallel:

seq 1000 | parallel -j100 wget https://foo.bar/image{}.jpg

If you want data from 100 servers and you get a full line every time:

parallel -a servers.txt -j0 --line-buffer my_connect {}

Or:

parallel -a servers.txt -j0 --line-buffer --tag my_connect {}

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

For security reasons you should install GNU Parallel with your package manager, but if GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel