How can I compare protein sequences to find closest match

395 Views Asked by At

How could I build a tool to help with this scenario :

I work in a lab where we use plasmids to express recombinant proteins. We have a database containing all the plasmid identifiers and the sequence of the protein that they code for.

When a new protein is requested, I would like to be able to input the new desired protein sequence and search in our database for the plasmid that has the closest match to that sequence, with the highest identity score. The objective is to then use that existing plasmid and use it as a cloning template for the new plasmid.

In other words, I want to build a tool similar to NCBI blast that would work locally with proprietary sequences that are in an SQL database.

Would Python be able to achieve that ?

Thanks !

1

There are 1 best solutions below

2
On BEST ANSWER

How about creating your own local BLAST database with makeblastdb? Then you can use something like this:

from Bio.Blast.Applications import NcbiblastnCommandline

run_command = NcbiblastnCommandline(query=YOUR_SEQUENCE_FASTA_PATH,
                                    db=DATABASE_PATH,
                                    out=RESULT_PATH,
                                    outfmt=5,
                                    [… other parameters …],
                                    evalue=1e-10
                                   )
stdout, stderr = run_command()