retrieving a BLAST request using a query_id

53 Views Asked by At
  1. is there any way to retrieve a BLAST request made using the Biopython library using the request's query_id?
  2. alternatively is it possible to retrieve a query_id before the blast result is complete, and checking on it later?

I'm working on a script which processes a BLAST result and having to re-request the search every time is quite temporally cumbersome since i'm still learning the library. currently i'm using: blastResult = NCBIWWW.qblast("blastp", "swissprot", AAsequence)

1

There are 1 best solutions below

0
On
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML

# Define your amino acid sequence
query_id = "KAF8065798.1"

# Display message indicating connection
print("Connecting to NCBI BLAST server...")

# Perform the BLAST search
print("Submitting BLAST request...")
result_handle = NCBIWWW.qblast("blastp", "swissprot", query_id)

# Display message indicating successful submission
print("BLAST request submitted successfully.")

# Parse the BLAST result and retrieve the query_id
print("Processing BLAST results...")
blast_record = NCBIXML.read(result_handle)
query_id = blast_record.query

# Display message indicating completion of result processing
print("BLAST results processed.")

# You can use the query_id to retrieve the BLAST request later
# For example, you can use the query_id to check the status of the BLAST request

# Close the result handle
result_handle.close()

# Example query_id
print("Query ID:", query_id)

Connecting to NCBI BLAST server...
Submitting BLAST request...
BLAST request submitted successfully.
Processing BLAST results...
BLAST results processed.
Query ID: Lipase [Scenedesmus sp. PABB004]

Also to print the BLAST result, you can use the read() function from NCBIXML module to parse the result and then print out the hits or any relevant information you need from the result. Here's how you can do it:

import time
from tqdm import tqdm

from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML

cache = {}  # Store BLAST results using query_id as key

def get_blast_result(query_id, AAsequence):
    if query_id in cache:
        return cache[query_id]
    else:
        # Display connecting message
        print("Connecting to NCBI...")
        
        # query search
        print("Querying NCBI database...")
        
        blast_result = NCBIWWW.qblast("blastp", "swissprot", AAsequence)
        
        # Store the result in the cache
        cache[query_id] = blast_result
        
        # Display research message
        print("Processing BLAST results...")
        
        # Simulate research progress (can be replaced with actual checking of BLAST status)
        for i in tqdm(range(5), desc="Progress"):
            time.sleep(2)  # Simulate processing time
        
        # Display completion message
        print("BLAST search completed successfully.")
        
        # Return query_id and blast_record
        return blast_result

# Example usage:
query_id = "KAF8065798.1"
AAsequence = "maaggrsaallallwlgcslmlllpparaarapgelssqlllssgmhllsdqsslapegllglsaaqraspllanpatrsvfaadsalartatardeawrrggdaaagrpaaagggraarrrgrgrsaalrggfdldvsatlaalqsisycanlsdvaawnctrcaripn"

blast_result = get_blast_result(query_id, AAsequence)

# Parse the BLAST result
blast_record = NCBIXML.read(blast_result)

# Print out the hits or any relevant information
for alignment in blast_record.alignments:
    for hsp in alignment.hsps:
        print("****Alignment****")
        print("sequence:", alignment.title)
        print("length:", alignment.length)
        print("e value:", hsp.expect)
        print(hsp.query)
        print(hsp.match)
        print(hsp.sbjct)

output will be

Connecting to NCBI...
Querying NCBI database...
Processing BLAST results...
Progress: 100%|███████████████████████████████████| 5/5 [00:10<00:00,  2.00s/it]
BLAST search completed successfully.
****Alignment****
sequence: sp|B4M693.1| RecName: Full=Eukaryotic translation initiation factor 3 subunit A; Short=eIF3a; AltName: Full=Eukaryotic translation initiation factor 3 subunit 10 [Drosophila virilis]
length: 1138
e value: 2.38094
TARDEAWRRGGDAAAGRPAAAGGGR---AARRRGRGRSAALRGGFDLD
+ RD+ WRRGGD    R    GG R   + RR    R    RGGF  D
SGRDDKWRRGGD----RSERLGGDRDRDSFRRNDGPRRDDDRGGFRRD

u can use this also time function alone in place of integrating tqdm I used in above code, whcih provide a more visual and informative representation of the progress during the BLAST search

import time
#replace this with above in case u use time 
# Simulate research progress (can be replaced with actual checking of BLAST status)
        for i in tqdm(range(5), desc="Progress"):
            time.sleep(2)  # Simulate processing time