extract multiple items on single line using grep/sed/perl

2.8k Views Asked by At

I have a massive text file a bit like this:

=?accession=P12345;=?position=999;
=?accession=Q19283;=?position=777;
=?accession=A918282;=?position=888;

and I would like to extract terms between accession= and ;, and then also between position= and ;

so that I get:

P12345 999
Q19283 777
A918282 888

The strings I need to grep between do get more complicated, so I imagine a hardcoded solution.

I know I can take the "grep between two strings" approach:

grep -Po 'accession= \K.*(?= ;)'

but I don't know how to get subsequent extractions from the same line of the input to also appear on the same line as the output.

I really don't mind how this is done as long as I can call it from a linux command line.

Thanks.

4

There are 4 best solutions below

4
On BEST ANSWER
  1. You can update your grep expression like this.

    grep -oP "(accession=\K\w+)|(position=\K\d+)" file
    

    Output:

    P12345
    999
    Q19283
    777
    A918282
    888
    

    To format it the way you want, use paste :

    grep -oP "(accession=\K\w+)|(position=\K\d+)" file | paste -d ' ' - -
    

    Output:

    P12345 999
    Q19283 777
    A918282 888
    
  2. Another simple awk solution :

    awk -F"=|;" '{print $3, $6}' file
    

    Output:

    P12345 999
    Q19283 777
    A918282 888
    
0
On

This awk should work:

awk -F ';' '{gsub(/=[^=]*=/, ""); $1=$1} 1' file

P12345 999
Q19283 777
A918282 888
0
On
sed -r 's/.*accession=([^;]*);.*position=([^;]*).*/\1 \2/' textfile
0
On

This perl one-liner

perl -wnE'say join " ", /(?:accession|position)=([^;]+)/g' input.txt

prints the desired output.