key1=value1 key2=value2 key3=value3 key4=value4 Name= key1=val1 key2=val2 key3=va" /> key1=value1 key2=value2 key3=value3 key4=value4 Name= key1=val1 key2=val2 key3=va" /> key1=value1 key2=value2 key3=value3 key4=value4 Name= key1=val1 key2=val2 key3=va"/>

Multiline matching: extract separate lines

102 Views Asked by At

I have command output like

var="""
Name=<Some name>
    key1=value1
    key2=value2 key3=value3
    key4=value4

Name=<Some other name>
    key1=val1
    key2=val2 key3=val3
    key4=val4

"""

See e.g. the output of scontrol show partition.

From these multiline strings, how do I extract a certain key value pair?

I can e.g. use sed to match a block via

echo "${var}" | | sed -n '/^Name=/,/^\s+key4=/p'

This gives me the entire block (nothing is gained here)

How do I get output like

<Some name>: key4=value4
<Some other name>: key4=val4

or

<Some name>: key3=value3
<Some other name>: key3=val3

EDIT in response to comments

i.e. print the Name of the block together with the value of the specified key.

Moreover one can assume:

  • Key and value are separated by =
  • Key value pairs are separated by
  • Keys or values do not contain white spaces or new lines
  • Keys do not appear in values
4

There are 4 best solutions below

0
Nahuel Fouilleul On

Some explanations about the comment, how it works

perl -00ne 'print"$1: $2\n" if /^Name=(.*)(?s:.)*^\s+key4=(.*)/m'

From command line help perl -h

-0[octal]         specify record separator (\0, if no argument)

here 0 special case "paragraph mode" (Two or more consecutive newline characters will act as the record separator).

-n                assume "while (<>) { ... }" loop around program
-e program        one line of program (several -e's allowed, omit programfile)

About the expression, the if modifier is shorter one-line in this case but, is the same as

if (/^Name=(.*)(?s:.)*^\s+key4=(.*)/m) {
   print "$1: $2\n";
}

About the regex the capturing groups (.*) are matching any character except newline . The s flag inside non-capturing group (?s:..) changes the meaning of . to also match newline. The m flag to allow ^ to match also the position after a newline.

About backtracking, because of the multiple * quantifiers, the regular expression may have a catastrophic backtracking, some more quantifiers to prevent backtracking, the regex can be improved:

/^Name=(.*+)(?:.*+\n)*?\s+key4=(.*+)/

(?:..) : non-capturing group

0
Walter A On

I would choose for the solution

echo "${var}" | grep -Eo "^Name=.*| key4=.*" | paste -d ':' - -

When you look for a sed solution, you will also need the additional paste command:

echo "${var}" | sed -rn 's/^Name=//p; s/^\s+(key4=)/\1/p' | paste -d ' ' - -

With the given input, awk can handle it without the help of paste:

echo "${var}" | awk -F= -v lookup="key4" '
  /^Name=/ {printf("%s: ", $2)}
  $1 ~ lookup {print lookup "=" $2}
  '

This will fail when the value of key4 has an = inside, the value will be truncated.
Fix this using the sub() function and remove the first part of the line.

echo "${var}" | awk -v lookup="key4=" '
  /^Name=/ {sub(/[^=]*=/,""); printf("%s: ", $0)}
  $1 ~ lookup {sub(/ */,""); print}
  '
0
Ed Morton On

Using any awk, this will produce the output you asked for from the input you provided:

$ echo "$var" |
awk -v OFS=': ' 'sub(/^Name=/,""){name=$0} sub(/.*key4=/,""){print name, $1}'
<Some name>: value4
<Some other name>: val4

$ echo "$var" |
awk -v OFS=': ' 'sub(/^Name=/,""){name=$0} sub(/.*key3=/,""){print name, $1}'
<Some name>: value3
<Some other name>: val3

and it'll even work for key2 which your other answers so far won't work for since key3 appears after it on the same line:

$ echo "$var" |
awk -v OFS=': ' 'sub(/^Name=/,""){name=$0} sub(/.*key2=/,""){print name, $1}'
<Some name>: value2
<Some other name>: val2

but, like every other answer you have so far, there are many possible inputs that it won't work for. You'd have to provide more details about your input format and sample input/output that covers all cases to get a robust solution that works for all cases but, for example, here's a far more robust script using any POSIX awk that can handle:

  • the target key being a substring of some other key
  • the target value containing any white space, even newlines (but not an empty line mid-string)
  • the target value containing escaped (by doubling) double quotes
$ cat tst.sh
#!/usr/bin/env bash

var='
Name="Some name"
    key1=value1
    key2="some value2" key3=value3
    dummykey4="fake4"
    key4="A value4 that has
        a newline in it"

Name="Some other name"
    key1=val1
    key2=val2 key3="some ""big"" val3"
    key4=val4

'

for tgt in key2 key3 key4; do
    echo "===== $tgt ====="
    echo "$var" |
    awk -v RS= -v tgt="$tgt" '
        {
            tail = $0
            while ( match(tail,/[^[:space:]=]+=([^[:space:]]+|("[^"]*"|"")+)/) ) {
                key_val = substr(tail,RSTART,RLENGTH)
                sep = index(key_val,"=")
                key = substr(key_val,1,sep-1)
                val = substr(key_val,sep+1)
                k2v[key] = val
                tail = substr(tail,RSTART+RLENGTH)
            }
            print k2v["Name"], k2v[tgt]
        }
    '
done
$ ./tst.sh
===== key2 =====
"Some name" "some value2"
"Some other name" val2
===== key3 =====
"Some name" value3
"Some other name" "some ""big"" val3"
===== key4 =====
"Some name" "A value4 that has
        a newline in it"
"Some other name" val4

but whether that's overkill or not robust enough or making wrong assumptions, I don't know.

Note that with that second approach of creating an array of values indexed by keys you can do far more than just print values, you can compare them reorder them, or do anything else you want with them, e.g. you could write:

if ( k2v["Name"] == "John Smith") && (k2v["key2"] > k2v["key3"]) ) {
    print k2v["key4"], k2v["key1"]
}
0
potong On

This might work for you (GNU sed):

sed -nE 's/^Name=(.*)/\1:/;T;h
         :a;n;s/^$//;s/.*(key4=\S+).*/ \1/;Ta;H;x;s/\n(.+)/\1/p' file

Turn off implicit printing (-n) and on easier regexp (-E).

Store the name following Name= in the hold space.

Then fetch the next line until an empty line or a line containing the required variable (in this case key4), append it to the hold space, swap to the hold space remove the introduced newline and print the result.

N.B. This relies on the variable following the variable name to contain no white space. If no variable name is found, nothing is printed.