Problem with Reversing DNA/mRNA Sequences in Python

154 Views Asked by At

I'm having trouble getting my program to work under all circumstances and anyone who has expertise in biology and coding should be able to tell me where I'm going wrong. I am attempting to create a program that asks a few questions about biological molecules. First, it asks the user if the DNA/mRNA strand is in the 5' to 3' direction. Then it will ask if the molecule is DNA or RNA. If it is DNA, it will then ask whether we are going to read the template or coding strand in order to find the resulting mRNA. Then the program will take the mRNA molecule and read it in the 5' to 3' direction and determine the amino acid sequence. The problem is that the program seems to work for the mRNA regardless of direction, but breaks when reading the DNA molecule in certain directions. I have attached an image showing a few examples so you can see where it fails. I am trying to get the correct MetPheIle amino acid sequence from all 6 conditions. I will also attach a rough picture in paint of the overview in case the code is confusing.

Here is my code:

#rules for converting any DNA strand to its complementary RNA strand

def complement_base(base):
    if base == 'A':
        return 'U'
    elif base == 'T':
        return 'A'
    elif base == 'C':
        return 'G'
    elif base == 'G':
        return 'C'
    else:
        return ''


#converts dna strands to mRNA so they can be transcribed

def convert_to_mrna(dna_strand, is_template_strand, reverse_sequence=False):
    if is_template_strand:
        mrna_strand = ''.join(complement_base(base) for base in dna_strand)
    else:
        mrna_strand = dna_strand.replace('T', 'U')
    return mrna_strand


#takes the mRNA sequence and sets rules for start and stop points and how to read the strand 

def translate_mrna_to_amino_acid(mrna_strand, codon_table):
    start_codon = "AUG"
    stop_codons = {"UAA", "UAG", "UGA"}

    amino_acid_sequence = ""
    translating = False

    index = 0
    while index < len(mrna_strand):
        codon = mrna_strand[index:index + 3]

        if codon == start_codon:
            translating = True

        if translating:
            if codon in stop_codons:
                break

            amino_acid = codon_table.get(codon, "-")
            amino_acid_sequence += amino_acid

        index += 3

    return amino_acid_sequence


#function for getting the mRNA sequence by input type, and reversing it if it is initially in the 3' to 5' direction because the RNA is translated in the 5' to 3' direction

def get_mrna_sequence(reverse_sequence=False):
    valid_bases = {'A', 'U', 'C', 'G'}

    while True:
        option = input("Enter '1' to input mRNA sequence manually, '2' to upload a file: ")

        if option == "1":
            mrna_sequence = input("Enter the mRNA sequence (only A, U, C, G): ").upper()
            if all(base in valid_bases for base in mrna_sequence):
                if reverse_sequence:
                    mrna_sequence = mrna_sequence[::-1]  # Reverse the sequence
                return mrna_sequence
            else:
                print("Invalid sequence! Please use only A, U, C, and G.")

        elif option == "2":
            file_name = input("Enter the file name with the mRNA sequence: ")
            try:
                with open(file_name, "r") as file:
                    mrna_sequence = file.read().replace("\n", "").upper()
                    if all(base in valid_bases for base in mrna_sequence):
                        if reverse_sequence:
                            mrna_sequence = mrna_sequence[::-1]  # Reverse the sequence
                        return mrna_sequence
                    else:
                        print("Invalid sequence in file! Please use only A, U, C, and G.")
            except FileNotFoundError:
                print("File not found!")

        else:
            print("Invalid option!")

    return mrna_sequence


#similiar to the code above but is the DNA sequence that will be converted to mRNA when thymine is replaced with uracil 

def get_dna_sequence():
    valid_bases = {'A', 'T', 'C', 'G'}

    while True:
        option = input("Enter '1' to input DNA sequence manually, '2' to upload a file: ")

        if option == "1":
            dna_sequence = input("Enter the DNA sequence (only A, T, C, G): ").upper()
            if all(base in valid_bases for base in dna_sequence):
                return dna_sequence
            else:
                print("Invalid sequence! Please use only A, T, C, and G.")

        elif option == "2":
            file_name = input("Enter the file name with the DNA sequence: ")
            try:
                with open(file_name, "r") as file:
                    dna_sequence = file.read().replace("\n", "").upper()
                    if all(base in valid_bases for base in dna_sequence):
                        return dna_sequence
                    else:
                        print("Invalid sequence in file! Please use only A, T, C, and G.")
            except FileNotFoundError:
                print("File not found!")

        else:
            print("Invalid option!")


#the main function and codon table to translate sequence. It asks a few questions, 1)Is the molecule on the 5' to 3' direction? 2) is it RNA or DNA? 3) If DNA, is it the coding or template strand? 4) will the sequence be entered manually or a text file? 5) enter the dna sequence. And then is attempting to give the amino acid sequence from this. 

def main():
    direction = input("Is the molecule in the 5' to 3' direction? (yes/no): ").lower()
    molecule_type = input("Is the molecule DNA or RNA? ").lower()

    if molecule_type == 'dna':
        sequence_type = input("Is it the template strand or the coding strand? ").lower()
        if sequence_type == 'template strand':
            dna_sequence = get_dna_sequence()
            mrna_sequence = convert_to_mrna(dna_sequence, is_template_strand=True, reverse_sequence=(direction == 'no'))
        elif sequence_type == 'coding strand':
            dna_sequence = get_dna_sequence()
            mrna_sequence = convert_to_mrna(dna_sequence, is_template_strand=False)
        else:
            print("Invalid input!")

    elif molecule_type == 'rna':
        mrna_sequence = get_mrna_sequence(reverse_sequence=(direction == 'no'))

    else:
        print("Invalid input!")

    # Example codon table mapping
    codon_table = {
        "UUU": "Phe", "UUC": "Phe", "UUA": "Leu", "UUG": "Leu",
        "CUU": "Leu", "CUC": "Leu", "CUA": "Leu", "CUG": "Leu",
        "AUU": "Ile", "AUC": "Ile", "AUA": "Ile", "AUG": "Met",
        # ... other codons and their respective amino acids
    }

    if mrna_sequence:
        resulting_amino_acids = translate_mrna_to_amino_acid(mrna_sequence, codon_table)
        print("Resulting amino acid sequence:", resulting_amino_acids)

if __name__ == "__main__":
    main()


Schematuc TESTS

Tests :


5' 3' Template AAATCAGATAAACAT -> metpheile FAIL
3'5' template TACAAATAGACTAAA -> metpheile PASS

5'3' coding ATGTTTATCTGATTT -> metpheile PASS
3'5' coding TTTAGTCTATTTGTA -> metpheile FAIL

5'3' mrna AUGUUUAUCUGAUUU -> metpheile PASS
3'S'mrna UUUAGUCUAUUUGUA -> metpheile  PASS

The mRNA tests work so there must be some problem with reversing the DNA sequences. The 5' to 3' coding strand and the 5' to 3' mRNA strand should be the same with T replaced with U. The 3' to 5' coding strand should be reverse with T replaced with U and something isn't right in my code, either i'm not reversing the strand correctly or im calling the wrong function at the wrong time. I am new to this so I may be having trouble with how to reverse and translate. The 5' to 3' template will give a resulting mRNA molecule in the 3'to 5' direction, and i should have to reverse the resulting mRNA strand, and you can see this one failed too. The 3' to 5' template should give a 5' to 3' mRNA strand, and this one passed, so I have deduced its a problem with the reversing function but I'm not sure where to put it. I have tried to reverse it under the get_mRNA_sequence function but failed. I know this is a lot, but help would be greatly appreciated. If there is any problem with my understanding of DNA or RNA that would be appreciated too. Thank you!

1

There are 1 best solutions below

0
dROOOze On

The problem is in both the convert_to_mrna function and main function.


In the main function, these two conditions here

        if sequence_type == 'template strand':
            dna_sequence = get_dna_sequence()
            mrna_sequence = convert_to_mrna(dna_sequence, is_template_strand=True, reverse_sequence=(direction == 'no'))
        elif sequence_type == 'coding strand':
            dna_sequence = get_dna_sequence()
            mrna_sequence = convert_to_mrna(dna_sequence, is_template_strand=False)

can't tell the difference between a coding strand in the 5' -> 3' direction and a coding strand in the 3' -> 5' direction. To differentiate between the two, you can rewrite these two conditions into one:

        if sequence_type in ("template strand", "coding strand"):
            dna_sequence = get_dna_sequence()
            mrna_sequence = convert_to_mrna(
                dna_sequence,
                is_template_strand=sequence_type == "template strand",
                reverse_sequence=(direction == "no"),
            )

In the convert_to_mrna function, you currently aren't using the argument to reverse_sequence at all:

def convert_to_mrna(dna_strand, is_template_strand, reverse_sequence=False):
    if is_template_strand:
        mrna_strand = ''.join(complement_base(base) for base in dna_strand)
    else:
        mrna_strand = dna_strand.replace('T', 'U')
    return mrna_strand

For 3' -> 5' coding strands, or 5' -> 3' template strands, you need to do a reverse complement transcription to get a 5' -> 3' mRNA sequence ready for translation. That means you have to reverse the sequence before or after performing base complement substitution. The easiest way to do this is with an exclusive or (XOR) check given the information about the given DNA sequence; [whether to reverse the sequence] = [DNA is a template strand] XOR [DNA is in the 3' -> 5' direction].

XOR is implemented in Python given 2 boolean operands using the ^ operator, so you only need to add two lines. In the following, the sequence reversal is implemented before converting to mRNA (you could also instead reverse the sequence after conversion to mRNA):

def convert_to_mrna(dna_strand, is_template_strand, reverse_sequence=False):
    # `[whether to reverse the sequence] = [DNA is a template strand] XOR [DNA is in the 3' -> 5' direction]`
    if is_template_strand ^ reverse_sequence:
        dna_strand = dna_strand[::-1]
    if is_template_strand:
        mrna_strand = "".join(complement_base(base) for base in dna_strand)
    else:
        mrna_strand = dna_strand.replace("T", "U")
    return mrna_strand

Anyway, I would re-think how you name some of these parameters and variables, as naming variables properly helps a lot when you want to focus on developing a correct algorithm.

For example, I wouldn't name that parameter to convert_to_mrna as reverse_sequence; whether or not you actually reverse the sequence is dependent also on whether the DNA sequence is a template or coding strand. You should name it something like is_3_to_5 instead.