How to use map and grep in Perl for following data

151 Views Asked by At

How to display only the chains (such as A, C, E, G) which end with a semicolon ;

Data

COMPND    MOL_ID: 1;                                                            
COMPND   2 MOLECULE: JACALIN;                                                   
COMPND   3 CHAIN: A, C, E, G;                                                   
COMPND   4 SYNONYM: JACKFRUIT 
AGGLUTININ;                                       
COMPND   5 MOL_ID: 2;                                                           
COMPND   6 MOLECULE: JACALIN;                                                   
COMPND   7 CHAIN: B, D, F, H;                                                   
COMPND   8 SYNONYM: JACKFRUIT AGGLUTININ  

I tried the below code

#!usr/local/bin/perl

open(FILE, "/home/httpd/cgi-bin/r/1JAC.pdb");

while ( $line = <FILE> ) {

    if ( $line =~ /^COMPND/ ) {

        #$line = substr $line,4,21;

        my $line =~ m(/\$:^\w+\$\;/g);
        print $line;
    }
}
5

There are 5 best solutions below

4
On
perl -nle'print $1 if /^COMPND\s+\S*\s*CHAIN:(.+);/' /home/httpd/cgi-bin/r/1JAC.pdb

This is a fairly simple method of "grepping" part of a line to standard output. It will capture everything in the parentheses and print it.

  • -n uses a while(<>) loop to read data from your file
  • -l handles newlines
0
On

Try this

use warnings;
use strict;
open my $nis,"<1jac.pdb";
my @ar = grep{ m/^COMPND/g} <$nis>;
my $s = join("",@ar);
my @dav;
my @mp2  = map{split(/,\s|,/, $_)} grep{ s/(COMPND\s+\d+\s+(CHAIN\:\s+)?)|(\n|;)//g} @dav= $s =~m/(COMPND\s+\d+\s+CHAIN\:.+?(?:.|\n)+?\;)/g;
$, = ", ";
print @mp2;

Output

A, C, E, G, B, D, F, H
0
On

You may like this one-line solution

perl -le 'print for map /CHAIN:\s*([^;]+)/, <>' /home/httpd/cgi-bin/r/1JAC.pdb

output

A, C, E, G
B, D, F, H
1
On

You can use a single regular expression like the following:

while (my $line = <FILE>) {
    if ($line =~ /^COMPND.+?CHAIN:\s*(.*?)\s*;\s*$/) {
        my $chain = $1;
        print "$chain\n";
    }
}

This uses a regular expression to match COMPND, CHAIN and an ending ;. The \s* at the end of the regular expression will match any trailing spaces. It will capture the string between CHAIN: and ; excluding trailing and leading spaces in $1 which is set as the value for the $chain variable.

More information on Perldoc: Perlre - Perl regular expressions.

0
On

Using GNU grep with perl regular expressions: find the text between "CHAIN:" and the semicolon

$ grep -oP '(?<=CHAIN: ).*?(?=;)' filename
A, C, E, G
B, D, F, H