VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections
On this page
Graph transformer has been proven as an effective graph learning method forits adoption of attention mechanism that is capable of capturing expressiverepresentations from complex topological and feature information of graphs.Graph transformer conventionally performs dense attention (or global attention)for every pair of nodes to learn node representation vectors, resulting inquadratic computational costs that are unaffordable for large-scale graph data.Therefore, mini-batch training for graph transformers is a promising direction,but limited samples in each mini-batch can not support effective denseattention to encode informative representations. Facing this bottleneck, (1) westart by assigning each node a token list that is sampled by personalizedPageRank (PPR) and then apply standard multi-head self-attention only on thislist to compute its node representations. This PPR tokenization methoddecouples model training from complex graph topological information and makesheavy feature engineering offline and independent, such that mini-batchtraining of graph transformers is possible by loading each node’s token list inbatches. We further prove this PPR tokenization is viable as a graphconvolution network with a fixed polynomial filter and jumping knowledge.However, only using personalized PageRank may limit information carried by atoken list, which could not support different graph inductive biases for modeltraining. To this end, (2) we rewire graphs by introducing multiple types ofvirtual connections through structure- and content-based super nodes thatenable PPR tokenization to encode local and global contexts, long-rangeinteraction, and heterophilous information into each node’s token list, andthen formalize our Virtual Connection Ranking based Graph Transformer(VCR-Graphormer).
Further reading
- Access Paper in arXiv.org