Saving/Storing pymatgen Structures

948 Views Asked by At

I'm currently dealing with a material science dataset having various information.

In particular, I have a column 'Structure' with several pymatgen.core.Structure objects.

I would like to save/store this dataset as .csv file or something similar but the problem is that after having done that and reopening, the pymatgen structures lose their type becoming just formatted strings and I cannot get back to their initial pymatgen.core.Structure data type.

Any hints on how to that? I'm searching on pymatgen documentation but haven't been lucky for now..

Thanks in advance!

2

There are 2 best solutions below

0
On

From the docs:

Side-note : as_dict / from_dict

As you explore the code, you may notice that many of the objects have an as_dict method and a from_dict static method implemented. For most of the non-basic objects, we have designed pymatgen such that it is easy to save objects for subsequent use. While python does provide pickling functionality, pickle tends to be extremely fragile with respect to code changes. Pymatgen’s as_dict provide a means to save your work in a more robust manner, which also has the added benefit of being more readable. The dict representation is also particularly useful for entering such objects into certain databases, such as MongoDb. This as_dict specification is provided in the monty library, which is a general python supplementary library arising from pymatgen.

The output from an as_dict method is always json/yaml serializable. So if you want to save a structure, you may do the following:

with open('structure.json','w') as f:
    json.dump(structure.as_dict(), f)

Similarly, to get the structure back from a json, you can do the following to restore the structure (or any object with a as_dict method) from the json as follows:

with open('structure.json', 'r') as f:
    d = json.load(f)
    structure = Structure.from_dict(d)

You may replace any of the above json commands with yaml in the PyYAML package to create a yaml file instead. There are certain tradeoffs between the two choices. JSON is much more efficient as a format, with extremely fast read/write speed, but is much less readable. YAML is an order of magnitude or more slower in terms of parsing, but is more human readable.

See also https://pymatgen.org/usage.html#montyencoder-decoder and https://pymatgen.org/usage.html#reading-and-writing-structures-molecules

0
On

pymatgen.core.structure object can be stored with only some sort of fixed format, for example, cif, vasp, xyz... so maybe you, first, need to store your structure information to cif or vasp. and open it and preprocess to make it "csv" form with python command.(hint : using python string-related command).