How to use map and grep in Perl for following data

170 Views Asked by At

How to display only the chains (such as A, C, E, G) which end with a semicolon ;

Data

COMPND    MOL_ID: 1;                                                            
COMPND   2 MOLECULE: JACALIN;                                                   
COMPND   3 CHAIN: A, C, E, G;                                                   
COMPND   4 SYNONYM: JACKFRUIT 
AGGLUTININ;                                       
COMPND   5 MOL_ID: 2;                                                           
COMPND   6 MOLECULE: JACALIN;                                                   
COMPND   7 CHAIN: B, D, F, H;                                                   
COMPND   8 SYNONYM: JACKFRUIT AGGLUTININ  

I tried the below code

#!usr/local/bin/perl

open(FILE, "/home/httpd/cgi-bin/r/1JAC.pdb");

while ( $line = <FILE> ) {

    if ( $line =~ /^COMPND/ ) {

        #$line = substr $line,4,21;

        my $line =~ m(/\$:^\w+\$\;/g);
        print $line;
    }
}
5

There are 5 best solutions below

1
Grokify On

You can use a single regular expression like the following:

while (my $line = <FILE>) {
    if ($line =~ /^COMPND.+?CHAIN:\s*(.*?)\s*;\s*$/) {
        my $chain = $1;
        print "$chain\n";
    }
}

This uses a regular expression to match COMPND, CHAIN and an ending ;. The \s* at the end of the regular expression will match any trailing spaces. It will capture the string between CHAIN: and ; excluding trailing and leading spaces in $1 which is set as the value for the $chain variable.

More information on Perldoc: Perlre - Perl regular expressions.

4
TLP On
perl -nle'print $1 if /^COMPND\s+\S*\s*CHAIN:(.+);/' /home/httpd/cgi-bin/r/1JAC.pdb

This is a fairly simple method of "grepping" part of a line to standard output. It will capture everything in the parentheses and print it.

  • -n uses a while(<>) loop to read data from your file
  • -l handles newlines
0
mkHun On

Try this

use warnings;
use strict;
open my $nis,"<1jac.pdb";
my @ar = grep{ m/^COMPND/g} <$nis>;
my $s = join("",@ar);
my @dav;
my @mp2  = map{split(/,\s|,/, $_)} grep{ s/(COMPND\s+\d+\s+(CHAIN\:\s+)?)|(\n|;)//g} @dav= $s =~m/(COMPND\s+\d+\s+CHAIN\:.+?(?:.|\n)+?\;)/g;
$, = ", ";
print @mp2;

Output

A, C, E, G, B, D, F, H
0
Borodin On

You may like this one-line solution

perl -le 'print for map /CHAIN:\s*([^;]+)/, <>' /home/httpd/cgi-bin/r/1JAC.pdb

output

A, C, E, G
B, D, F, H
0
glenn jackman On

Using GNU grep with perl regular expressions: find the text between "CHAIN:" and the semicolon

$ grep -oP '(?<=CHAIN: ).*?(?=;)' filename
A, C, E, G
B, D, F, H