So I am running a artificial chemistry simulation and I'm using networkx's undirected graphs to represent the structure of molecules. I want to be able to determine if two molecules have the same structure. So determining if two molecular graphs are isomorphic is an appropriate way of determining if they have the same structure. The atoms can be different chemical elements in the simulation. So nodes/atoms have a label which is effectively the atomic number of the element. However there are specific bonding sites on the atoms as well which also have a integer label. For example, consider the molecular structure 1-1-3 (dashes are edges and numbers are labelled nodes/atoms). Let's say there are two molecules with this structure but with a different bonding site configuration:
1(2)-(1)1(2)-(1)3 and 1(1)-(2)1(1)-(1)3
where the numbers in brackets are the bonding sites of the atoms. If I compare these molecules using the following code, is_isomorphic would incorrectly return true:
node_match = iso.numerical_node_match('element', -1)
nx.is_isomorphic(molecule1, molecule2, node_match=node_match))
Additionally if I made it so each edge contains a bonding_sites attribute, which is a sorted list of bonding sites (e.g. bonding_sites=[1,2]), and tried to add an edge matcher, the following code would also incorrectly return true:
edge_match = iso.categorical_edge_match('bonding_sites', None)
node_match = iso.numerical_node_match('element', -1)
nx.is_isomorphic(molecule1, molecule2, node_match=node_match, edge_match=edge_match))
The only way I can think to ensure is_isomorphic returns false is to introduce some kind of label matching based on node-edge tuples. But I cannot find any facility for this in networkx. Can anyone think of a solution to this that uses networkx (rest of my code heavily relies on it)? Also, is there a name for the type of graph I'm trying to compare here?
I can think of a slightly complicated solution which is to have each atom be represented by a complete subgraph with each node of the subgraph representing each bonding site. But I would like to avoid this if possible.
I think what you are actually looking for is a canonicalization method rather than isomorphism, though they are very similar concepts.
https://en.wikipedia.org/wiki/Canonicalization (In general)
https://en.wikipedia.org/wiki/Graph_canonization (For graphs specifically)
Canonicalization means transforming the inputs into a standard form that allows for comparison to determine if they are equivalent.
Keep in mind, molecules may be represented as graphs, but they are not exactly graphs in a purely mathmatical sense, they do have a physical structure and that structure has it's own rules which may not accounted for in general graph isomorphism. As such I would be wary of using a standard graph isomorphism algorithm, but rather write an algorithm to search for them based on actual possible arrangements. Sadly, my Chemistry is a bit too rusty to be able to give solid examples without a large number of examples of different arrangements that a molecule could look like vs what it actually is. You may also want to look into subgraph isomorphism.
https://en.wikipedia.org/wiki/Subgraph_isomorphism_problem https://en.wikipedia.org/wiki/Induced_subgraph_isomorphism_problem
You may also be able to use Canonicalizationm, as mentioned above, to convert the representation of the molecule into a "standard" representation of the structure of the molecule. In theory, this is at least as hard if not as harder than isomorphism, but it may be possible that structure of a molecule (due to physical constraints) can be determined easier.
It would help if you provided a few examples of ways a molecule could be defined such that the molecules are equivalent, while also defining a number of ways the molecule could be defined such that they are similar, but not the same.
For example, list all the ways the two molecules you provided could be defined that are equivilent.
Another option might be to look into weighted graph isomorphism.