This is the question in cheminformatics major. I have a large sdf file (greater than 700 MB, 200.000 molecules). I want to convert to dataframe to analysis. I use this below code:
df = PandasTools.LoadSDF('Datatest/0 chemdiv_PPI.sdf')
Consequently, the memory (Ram) skyrocketed to near 100%. My question is: "is there any easy way to convert a large sdf file to df and handle it with pandas (of course this way does not affect ram too much)?
I think your best bet is to handle this with the sdf mol supplier, which will not read the entire file into memory at once: https://www.rdkit.org/docs/source/rdkit.Chem.rdmolfiles.html#rdkit.Chem.rdmolfiles.SDMolSupplier
From there you have a couple of options. You can iterate over the supplier and write chunks of molecules to sdf files, then load those one at a time into pandas with the PandasTools function. Otherwise it depends on what you are trying to accomplish with the PandasTools function.