Memory requirement of GNN SageConv

61 Views Asked by At

Sorry for my simple question. My question is that in GraphSage paper, they talked about graphsage being inductive. Is Graphsage needs all of its training time samples in memory during testing? If not, how does it performs test? Since in its algorithm 1, in the paper, the authors proposed Aggregate and concatenation operators, from neighboring nodes, which they must be in memory during testing. Otherwise, there must be other forward propagation algorithm specifically for test, which I didn't find in the paper. Thank you for any comments or answers.

In this colab notebook link, they passed all of the training, validation and test data to the model for testing. Are we always have to keep all information given to the model?

1

There are 1 best solutions below

0
arnoegw On

The fundamental idea behind GraphSAGE [Hamilton&al., 2017] is to use a subsampled neighborhood [ibid., p.4] for each node of interest.

The preferred method for sampling from the input graph depends on its size. Here are examples of two methods from the TF-GNN library, applied to the popular OGBN-MAG benchmark:

  • If the input graph is small enough, subgraph sampling can happen from an in-memory representation, as in this live Colab demo.

  • If the input graph is large, sampled subgraphs can be created with large-scale dataflow systems like Apache Beam, as discussed in this blogpost.