when I ran the StellarGraph's demo on graph classification using DGCNNs, I got the same result as in the demo.
However, when I tested what happens when I first shuffle the data using the following code:
shuffler = list(zip(graphs, graph_labels))
random.shuffle(shuffler)
graphs, graph_labels = zip(*shuffler)
The model didn't learn at all (accuracy of around 50% - just as data distribution).
Does anyone know why this happens? Maybe I shuffled in a wrong way? Or is it that the data should be unshuffled in the first place (also why? it doesn't make any sense)? Or is it a bug in StellarGraph's implementation?
I found the problem. It wasn't anything to do with the shuffling algorithm, nor with StellarGraph's implementation. The problem was in the demo, at the following lines:
The problem was caused, specifically by
train_graphs.index - 1
andtest_graphs.index - 1
. The indices are already in the range between0
ton
, so subtructing one from them would cause the graph data to "shift" one backwards, causing each data point to get the label of a different data point.To fix this, simply change them to
train_graphs.index
andtest_graphs.index
without the-1
at the end.