I am trying to create a simple pipeline that takes a file containing various short sequences, and run them through blast. I have an example sequence;
>some_random_sequence
AAGGTTCGGTCCAAATTGAA
which returns a hit for FUT3 (using the RefSeq_Gene DB and using grch38 as an entrez_query). However, when I try to run this using NCBIWWW (Biopython) I don't get any results and I would like to figure out why.
A snippet that contains, the call that I am using, is listed here;
from Bio.Blast import NCBIWWW
from io import StringIO
sequence=['>some_random_sequence\n','AAGGTTCGGTCCAAATTGAA']
lines = "".join(sequence)
file = StringIO(lines)
results = NCBIWWW.qblast(
program="blastn",
database="RefSeq_Gene",
entrez_query="grch38",
sequence=file.read(),
# Manually setting the parameters because short_query checks the length
# of the sequence, but for some reason includes the name (resulting in
# a length of 42, and thus defeating the entire purpose of using
# short_query (which does a if len(sequence) < 31).
expect=1000,
word_size=7,
nucl_reward=1,
filter=None,
lcase_mask=None,
# short_query=True,
)
with open('test_output.xml', 'w') as save_file:
blast_results = results.read()
save_file.write(blast_results)
Try this, I have included some action, which will help to know the status of yor submission and operation