when I ran the StellarGraph's demo on graph classification using DGCNNs, I got the same result as in the demo.
However, when I tested what happens when I first shuffle the data using the following code:
shuffler = list(zip(graphs, graph_labels))
random.shuffle(shuffler)
graphs, graph_labels = zip(*shuffler)
The model didn't learn at all (accuracy of around 50% - just as data distribution).
Does anyone know why this happens? Maybe I shuffled in a wrong way? Or is it that the data should be unshuffled in the first place (also why? it doesn't make any sense)? Or is it a bug in StellarGraph's implementation?
I found the problem. It wasn't anything to do with the shuffling algorithm, nor with StellarGraph's implementation. The problem was in the demo, at the following lines:
The problem was caused, specifically by
train_graphs.index - 1andtest_graphs.index - 1. The indices are already in the range between0ton, so subtructing one from them would cause the graph data to "shift" one backwards, causing each data point to get the label of a different data point.To fix this, simply change them to
train_graphs.indexandtest_graphs.indexwithout the-1at the end.