How to convert large sdf file to dataframe in RDKit

196 Views Asked by jacobdavis At 31 October 2025 at 07:57

This is the question in cheminformatics major. I have a large sdf file (greater than 700 MB, 200.000 molecules). I want to convert to dataframe to analysis. I use this below code:

df = PandasTools.LoadSDF('Datatest/0 chemdiv_PPI.sdf')

Consequently, the memory (Ram) skyrocketed to near 100%. My question is: "is there any easy way to convert a large sdf file to df and handle it with pandas (of course this way does not affect ram too much)?

Original Q&A

There are 1 best solutions below

rwalroth On 07 February 2024 at 19:55

I think your best bet is to handle this with the sdf mol supplier, which will not read the entire file into memory at once: https://www.rdkit.org/docs/source/rdkit.Chem.rdmolfiles.html#rdkit.Chem.rdmolfiles.SDMolSupplier

From there you have a couple of options. You can iterate over the supplier and write chunks of molecules to sdf files, then load those one at a time into pandas with the PandasTools function. Otherwise it depends on what you are trying to accomplish with the PandasTools function.

How to convert large sdf file to dataframe in RDKit

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in RDKIT

Trending Questions

Popular # Hahtags

Popular Questions