Perl code for reverse complement of DNA sequence

747 Views Asked by At

I was trying to make a perl code to get the reverse complement of fasta sequences of DNA in a .fna file format. the sequence02C.fna file contains 100s of DNA sequences :

>adbca3e
TGCTCCCCACGCTTGCCTCTCCAGTACTCAACCAAAGCAGTCTCTAGAAAAACAGTTTCCAACGCAATACGATGGAATTCCACTTCCCAAATATCTC
>4c2a958
TCCCCACGCTTTCGCGCTTCAGCGTCAGTATCTGTCCAGTGAGCTGACTTCTCCATCGGCATTCCTACACAGTACTCTAGAAAAACAGTTTCTGCTC
>0639b5b
TCGCGCCTCAGTGTCCAACGCAATACGAGTTGCAGACCAGGACACATGGAATTCCACTTCCCTCTCCAGTACTCAACCAAAGCAGTCTCTAGAAAAG

I have used the following command which can open the file and make reverse but does not show the sequence ID (eg: >adbca3e) in the output.

The code is:

#!/usr/local/perl

open (NS, "sequence02C.fna");
while (<NS>) {
    if ($_ =~ tr/ATGC/TACG/) {print $_;}
}

output file is only the complementary of the sequence but not reverse. Additionally, it does not contain the sequence IDs ">adbca3e"

Could anyone please suggest the appropriate code to do this reverse complementary of this sequence at once and getting the result into an output file?

2

There are 2 best solutions below

0
On

You only print the lines that contained a A, T, G or C. You want to print every line, so the print shouldn't be conditional.

#!/usr/local/perl

use strict;               # Always
use warnings;             # Always

while (<>) {
    if (/^>/) {           # Only modify lines starting with ">".
       tr/ATGC/TACG/;
       $_ = reverse($_);  # You didn't reverse.
    }

    print;                # Print undconditionally.
}

(tr/// and print use $_ by default.)

Note I didn't open the file. You can use the program as follows:

perl program.pl sequence02C.fna >sequence02C_revcomp.fna

or

perl -i~ program.pl sequence02C.fna

The latter modifies the file in place. (Be careful! Test it first. It does make a backup, though.)

3
On

You say you have a program to "make reverse", but it only gives complementary. Maybe that is a very obvious description for you, but it is not very clear to me.

If by "reverse" you mean to print the string backwards, just use the reverse function. Complementary I assume is taking the corresponding nucleobases, which is what your transliteration is meant to do tr/ATGC/TACG/.

To fix not printing ids, just remove the if condition on the print-statement.

What I would do is just use the diamond operator for a small program like this:

use strict;
use warnings;
use feature 'say';

while (<>) {
    chomp;
    unless (/^>/) {
        tr/ATGC/TACG/;            # transliterate non-ids
        my $reverse = reverse;    # reverse $_
        say $reverse;             # do something with $reverse
    }
    say;          # print current line
}

Then you can use this program like this:

$ perl program.pl sequence02C.fna > output.txt