I'm working on some tasks using RDKit
and have some problems.
I'm trying to sanitize my dataset with SaltRemover()
function but the ArgumentError
occurs and I cannot figure out it.
The code used is this:
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem.PandasTools import LoadSDF
A1 = LoadSDF('finaldata_A1.sdf', smilesName='SMILES')
A1 = A1['SMILES']
for mol in A1:
A1_mol = Chem.MolFromSmiles(mol)
if mol is None: continue
from rdkit.Chem import SaltRemover
remover = SaltRemover.SaltRemover(defnFormat='smiles')
A1_mol_SR = remover.StripMol(A1_mol)
The error message after running the code is:
ArgumentError: Python argument types in rdkit.Chem.rdmolops.DeleteSubstructs(Mol, NoneType, bool) did not match C++ signature: DeleteSubstructs(class RDKit::ROMol mol, class RDKit::ROMol query, bool onlyFrags=False, bool useChirality=False)
I think there are a few things you are confused about here.
As for the
SaltRemover
, what are you trying to achieve with thedefnFormat
argument? When you use this argument you should also providedefnData
, defining the salt that you want to remove i.e.If you initialise the
SaltRemover
, without these arguments the salt definitions are read from a file which is read as a set of SMARTS queries. When you setdefnFormat
to 'smiles', you are telling the remover to read the file as a series of SMILES strings. Of course, this file cannot be read as SMILES strings due to the comments and incorrect formatting. In the defined salts there is subsequently 'None' objects which is why you receive theArgumentError
. Internally rdkit is using the functionDeleteSubstructs
which is being passed your query molecule and the salt to be removed which now may be 'None'.You may not require defining your own salts. If not just use the default arguments:
Your other problem seems to be with how you are handling the data. It seems like you are calculating the
Mol
objects twice? When you use the 'LoadSDF' function the Mols are added to the column with the name: 'ROMol' unless specified in themolColName
argument. You are also only trying to strip one molecule? You should probably look at trying to apply your remover to the molecule column in the A1 dataframe.