Combining multiple cuts operations into one

110 Views Asked by At

I have the input file:

$ cat bleu.out 
BLEU = 16.67, 54.4/26.8/14.9/8.2 (BP=0.813, ratio=0.828, hyp_len=8982, ref_len=10844)
BLEU = 17.56, 55.1/27.6/15.8/9.4 (BP=0.804, ratio=0.821, hyp_len=8905, ref_len=10844)
BLEU = 17.95, 54.4/27.5/15.6/9.1 (BP=0.837, ratio=0.849, hyp_len=9206, ref_len=10844)
BLEU = 19.10, 54.8/28.1/16.3/9.7 (BP=0.860, ratio=0.869, hyp_len=9423, ref_len=10844)
BLEU = 19.29, 53.0/26.6/15.1/8.9 (BP=0.925, ratio=0.928, hyp_len=10058, ref_len=10844)
BLEU = 18.70, 55.7/28.7/16.4/9.4 (BP=0.839, ratio=0.851, hyp_len=9223, ref_len=10844)
BLEU = 18.63, 55.2/28.1/16.3/9.8 (BP=0.834, ratio=0.846, hyp_len=9178, ref_len=10844)
BLEU = 18.41, 54.2/27.4/15.5/9.2 (BP=0.857, ratio=0.867, hyp_len=9398, ref_len=10844)
BLEU = 18.70, 53.7/26.9/15.7/9.3 (BP=0.871, ratio=0.878, hyp_len=9526, ref_len=10844)

But when I need to cut out a certain column, let's say the first column after the first comma, I had to use multiple instances of cut, e.g. :

$ cat bleu.out | cut -f1 -d',' | cut -f3 -d ' '
16.67
17.56
17.95
19.10
19.29
18.70
18.63
18.41
18.70

Is there a way to sequentially order multiple cut criterion in one cut instance? E.g. something like cut-multi.sh -f1 -d',' -f3 -d' '?

If no, what would be other methods to perform the same operation of cut -f1 -d',' | cut -f3 -d' '? Using awk, sed or the likes are also welcomed.

5

There are 5 best solutions below

0
On BEST ANSWER

You can specify multiple field separator in awk

$ awk -F'= *|,' '{print $2}' bleu.out
16.67
17.56
17.95
19.10
19.29
18.70
18.63
18.41
18.70
  • -F'= *|,' specifies = followed by zero or more space or , as field separator
  • {print $2} print second column
0
On

Following solution using grep and perl's lookaround feature. This will print the text between = and first , .

grep -oP '= \K.*?(?=,)' input
16.67
17.56
17.95
19.10
19.29
18.70
18.63
18.41
18.70

Or as suggested to Sundeep:

 grep -oP '= \K[^,]+' input
0
On
awk -F'[ = ,]' '{print $4}' file
16.67
17.56
17.95
19.10
19.29
18.70
18.63
18.41
18.70
0
On

With sed:

$ sed 's/^[^=]*= \([^,]*\).*/\1/' bleu.out
16.67
17.56
17.95
19.10
19.29
18.70
18.63
18.41
18.70

This captures all characters that are not a comma up to a comma (\([^,]*\)) after the first occurrence of = (and a space) (^[^=]*=) and substitutes the line with the capture group (\1).

0
On

Another solution with awk:

awk '{sub(/,$/, "", $3); print $3}' bleu.out

Remove the last , from the 3rd field and print it.