I have a txt file containing multiple fasta sequences ( and I am willing to parse the sequences together with gene names especially. Can you please help with the selection of sequences with specific names in the header. Thank you
Original data in the txt file.
lcl|NC_045512.2_gene_6 [gene=ORF6] [locus_tag=GU280_gp06] [db_xref=GeneID:43740572] [location=27202..27387] [gbkey=Gene] ATGTTTCATCTCGTTGACTTTCAGGTTACTATAGCAGAGATATTACTAATTATTATGAGGACTTTTAAAG
Expected data after parsing in python
ORF6 ATGTTTCATCTCGTTGACTTTCAGGTTACTATAGCAGAGATATTACTAATTATTATGAGGACTTTTAAAG
I have used this and I was able to obtain
***from Bio import SeqIO
for record in SeqIO.parse("mytext.txt", 'fasta'):
print(record.name)
print(record.seq)***
Obtained results were like this.
lcl|NC_045512.2_gene_6 ATGTTTCATCTCGTTGACTTTCAGGTTACTATAGCAGAGATATTACTAATTATTATGAGGACTTTTAAAG
here i tried it in python regular expression....
here I were grouped the gene and the sequence for two sequences.....
the output will be....