I'm trying to get sequence information by parsing and reading files by biopython, I have to use efetch and I'm making a feather file format for speed and to save disk space. I know the usage limitations and policies of NCBI API, that is, in one code 10 requests per second is acceptable with API key, I'm not sure if I'm doing things right. Have NCBI banned my IP? Do I have to mail them? This is the code.
from Bio import Entrez, SeqIO
import feather
# Set up the email address and API key for NCBI
Entrez.email = '*********@domain.com'
Entrez.api_key = '**************************'
# Search for the amylase gene sequence
search_term = 'amylase'
handle = Entrez.esearch(db='nucleotide', term=search_term)
record = Entrez.read(handle)
handle.close()
# Retrieve the sequence record for the first result
id_list = record['IdList']
handle = Entrez.efetch(db='nucleotide', id=id_list[0], rettype='gb', retmode='text')
seq_record = SeqIO.read(handle, 'genbank')
handle.close()
# Extract relevant information from the sequence record
gene_name = seq_record.name
accession_number = seq_record.id
nucleotide_length = len(seq_record.seq)
amino_acid_length = len(seq_record.translate())
organism_name = seq_record.annotations['organism']
tax_id = seq_record.annotations['taxonomy'][-1]
organism_accession_id = seq_record.annotations['db_source']
location = ''
strand = ''
if 'source' in seq_record.features:
location = seq_record.features['source'][0].location
strand = seq_record.features['source'][0].strand
# Save the information to a feather file
data = {
'gene_name': [gene_name],
'accession_number': [accession_number],
'sequence_length_(NUC)': [nucleotide_length],
'sequence_length_(AA)': [amino_acid_length],
'organism_name': [organism_name],
'Tax_ID': [tax_id],
'organism_accession_ID': [organism_accession_id],
'sequence_location': [location],
'strand': [strand]
}
feather.write_dataframe(data, 'amylase_info.feather')
I get this error when I run in pycharm, I know that this is a client side error but still if any of you have tried the above and haven't got this error please let me know, and if you have got this HTTP Error 400 Bad Request error and solved it, I would like to know how to resolve it. Thanks.
Traceback (most recent call last):
File "........\Code to retrieve amylase sequence info.py", line 11, in <module>
handle = Entrez.esearch(db='nucleotide', term=search_term)
File "........\venv\lib\site-packages\Bio\Entrez\__init__.py", line 230, in esearch
return _open(request)
File "........\venv\lib\site-packages\Bio\Entrez\__init__.py", line 594, in _open
handle = urlopen(request)
File "C:\Users\Asus\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 214, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\Asus\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 523, in open
response = meth(req, response)
File "C:\Users\Asus\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 632, in http_response
response = self.parent.error(
File "C:\Users\Asus\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 561, in error
return self._call_chain(*args)
File "C:\Users\Asus\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 494, in _call_chain
result = func(*args)
File "C:\Users\Asus\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 641, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request