I am trying to extract the protein sequence from an .xml file obtained via entrez_fetch for use in a multi sequence analysis, however, this requires the sequence file to be in .fasta format and I am unsure of how to parse this information.
I have used the following code to obtain my .xml file but am unable to use this file in msa package.
#Search NCBI Id
species <- c("flaviviridae")
species_ids <- entrez_search(db = "genome", term = species, retmax = 999, use_history = F)
species_ids$ids
#Fetch protein seq
search <- entrez_fetch(dbfrom = "genome", db = "protein", id = species_ids$ids, rettype = "xml", parsed = TRUE)
search
#Parse .xml into .fasta