GNU parallel stderr with --files or sensible --results tree

1.8k Views Asked by At

I've recently discovered GNU parallel, and it is already incredibly useful, but I cannot figure out how to get all my output into any kind of usable structure. Here are my issues:

  • The commands I'm running take several hours or days and produce reams of output to both stdout and often stderr, so I want to redirect all of the output
  • Sounds like --files should work, right? But unless I'm crazy, I only get the stdout from those files. Is stderr just dumped with this option???
  • OK, how about --results? This is maybe a bit better, but has two problems:
    1. The commands are long: /path/to/command -a --blah /path/to/data /another/path {} . This makes a ridiculous directory name and the spaces make trying to do anything a pain (e.g. 'cat `find . -name stdout`' won't work)
    2. stdout and stderr go to separate files, which is usually ok, but in this case error messages are sometimes produced in the middle of other output and trying to piece things back together is a pain.

So: is there any way within parallel, i.e. without having to modify my command to either get the stderr when using --files or force --results to use sensible directory names?

EDIT: In response to comment, I've tried:

find controlFiles/ -name "*.txt" | parallel --files --tmpdir logs --tagstr {/.} -j15 --joblog logs/joblog --eta /path/to/command --opt --opt2 /path/to/data /path/to/output {} > logs/logfiles.txt

and

find controlFiles/ -name "*.txt" | parallel --files --results logs --tagstr {/.} -j15 --joblog logs/joblog --eta /path/to/command --opt --opt2 /path/to/data /path/to/output {} > logs/logfiles.txt

where the former loses stderr and the latter produces unusable directory names

EDIT2: After a bunch more testing, it seems I somehow got things into a really weird state. The directory structure from --results is supposed to be named after the arguments, but somehow mine was using the entire command. When I tried removing the existing logs directory and starting fresh with what I thought was the same command, I got the expected behavior. Still not ideal, but I can certainly live with it.

1

There are 1 best solutions below

2
On BEST ANSWER

The most obvious solution is to rename the long part of the dir after the jobs are done.

cd resultdir/1/
rename 's:long/common/string/to/remove::' */2/*

Another idea is to use the new .csv output (available from 20161222):

parallel --results foo.csv ...

which will generate a CSV-file with the content from --joblog, the arguments, stdout, and stderr. This is particularly handy if you want to post-process this in R or LibreCalc.

If you prefer mixed stderr/stdout, simply let 2>&1 be part of your command:

parallel '(echo joe; ls /doesnotexists {}) 2>&1' ::: bar > foo

From version 20170122 you can:

parallel --results out/{/.} mycommand