I'm working on some tasks using RDKit and have some problems. I'm trying to sanitize my dataset with SaltRemover() function but the ArgumentError occurs and I cannot figure out it.

The code used is this:

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem.PandasTools import LoadSDF

A1 = LoadSDF('finaldata_A1.sdf', smilesName='SMILES')
A1 = A1['SMILES']

for mol in A1:
A1_mol = Chem.MolFromSmiles(mol)
if mol is None: continue

from rdkit.Chem import SaltRemover
remover = SaltRemover.SaltRemover(defnFormat='smiles')
A1_mol_SR = remover.StripMol(A1_mol)

The error message after running the code is:

ArgumentError: Python argument types in rdkit.Chem.rdmolops.DeleteSubstructs(Mol, NoneType, bool) did not match C++ signature: DeleteSubstructs(class RDKit::ROMol mol, class RDKit::ROMol query, bool onlyFrags=False, bool useChirality=False)

1

There are 1 best solutions below

2
On BEST ANSWER

I think there are a few things you are confused about here.

As for the SaltRemover, what are you trying to achieve with the defnFormat argument? When you use this argument you should also provide defnData, defining the salt that you want to remove i.e.

from rdkit import Chem
from rdkit.Chem import SaltRemover

remover = SaltRemover(defnFormat='smarts', defnData="[Cl]")
mol = Chem.MolFromSmiles('CN(C)C.Cl')
res = remover.StripMol(mol)

# We have stripped the Cl
res.GetNumAtoms()
[Out]: 4

If you initialise the SaltRemover, without these arguments the salt definitions are read from a file which is read as a set of SMARTS queries. When you set defnFormat to 'smiles', you are telling the remover to read the file as a series of SMILES strings. Of course, this file cannot be read as SMILES strings due to the comments and incorrect formatting. In the defined salts there is subsequently 'None' objects which is why you receive the ArgumentError. Internally rdkit is using the function DeleteSubstructs which is being passed your query molecule and the salt to be removed which now may be 'None'.

You may not require defining your own salts. If not just use the default arguments:

remover = SaltRemover()

# We can have a look at some of the salts
Chem.MolToSmarts(remover.salts[0])

[Out]: '[Cl,Br,I]'

# Use to strip salts
mol = Chem.MolFromSmiles('CN(C)C.Cl')
res = remover.StripMol(mol)
print(Chem.MolToSmiles(res))

[Out]: 'CN(C)C'

Your other problem seems to be with how you are handling the data. It seems like you are calculating the Mol objects twice? When you use the 'LoadSDF' function the Mols are added to the column with the name: 'ROMol' unless specified in the molColName argument. You are also only trying to strip one molecule? You should probably look at trying to apply your remover to the molecule column in the A1 dataframe.