I am looking to speed up two lines of grep and awk code with the great gnu-parallel tool, but using the simple syntax it breaks down or loops to infinity. Help is greatly appreciated!
Normal code:
for FILENAME in `cat FileList.tmp`
do
echo "Bearbeite $FILENAME ..."
FILE_BASENAME=`echo ${FILENAME##*/}`
grep -v "^t=[0-9]*.[0-9]*\&\-$" ${FILENAME} > ${INPUT}/cleaned/${FILE_BASENAME}.tmp
awk '{ if (gsub("t=|r=|i=|d=|ip=|ua=|uc=|um=|ud=|pc=|la=|lo=|do=|dm=|c=","")) print; else print}' \
${INPUT}/cleaned/${FILE_BASENAME}.tmp > ${INPUT}/cleaned/${FILE_BASENAME}
rm -f ${INPUT}/cleaned/${FILE_BASENAME}.tmp
done
Parallel try:
[...]
parallel -j100 --pipe grep -v "^t=[0-9]*.[0-9]*\&\-$" | awk '{s = s + $1} END {print s, s/NR}' ${FILENAME} > ${INPUT}/cleaned/${FILE_BASENAME}.tmp
awk '{ if (gsub("t=|r=|i=|d=|ip=|ua=|uc=|um=|ud=|pc=|la=|lo=|do=|dm=|c=","")) print; else print}' \
${INPUT}/cleaned/${FILE_BASENAME}.tmp > ${INPUT}/cleaned/${FILE_BASENAME}
[...]
My thoughts are that I just piped the parallel commands the wrong way...
Some thinkings:
while read ... done < file
instead ofcat
blabla.echo ${FILENAME##*/}
to assign the variable, just doFILE_BASENAME=${FILENAME##*/}
.explain what you want to accomplish with the
grep/awk
pair, because it can probably be improved. For example the following expression does not make much sense.You want to perform either of these: replace and then print the line, or print the original line if no replacement was done. This you can do by directly saying
gsub(); print
, becausegsub()
updates the value of$0
(the line) in case it matches: