Stellargraph failing to work with data shuffle

78 Views Asked by At

when I ran the StellarGraph's demo on graph classification using DGCNNs, I got the same result as in the demo.

However, when I tested what happens when I first shuffle the data using the following code:

shuffler = list(zip(graphs, graph_labels))
random.shuffle(shuffler)
graphs, graph_labels = zip(*shuffler)

The model didn't learn at all (accuracy of around 50% - just as data distribution).

Does anyone know why this happens? Maybe I shuffled in a wrong way? Or is it that the data should be unshuffled in the first place (also why? it doesn't make any sense)? Or is it a bug in StellarGraph's implementation?

1

There are 1 best solutions below

0
On BEST ANSWER

I found the problem. It wasn't anything to do with the shuffling algorithm, nor with StellarGraph's implementation. The problem was in the demo, at the following lines:

train_gen = gen.flow(
    list(train_graphs.index - 1),
    targets=train_graphs.values,
    batch_size=50,
    symmetric_normalization=False,
)

test_gen = gen.flow(
    list(test_graphs.index - 1),
    targets=test_graphs.values,
    batch_size=1,
    symmetric_normalization=False,
)

The problem was caused, specifically by train_graphs.index - 1 and test_graphs.index - 1. The indices are already in the range between 0 to n, so subtructing one from them would cause the graph data to "shift" one backwards, causing each data point to get the label of a different data point.

To fix this, simply change them to train_graphs.index and test_graphs.index without the -1 at the end.