Hi! I am Shengjie Luo (罗胜杰), a third-year Ph.D. student at School of Intelligence Science and Technology in Peking University, advised by Prof.Liwei Wang and Prof.Di He. Before that, I finished my undergraduate study at ShenYuan Honors College in Beihang University, majoring in Computer Science. I have been a research intern at Microsoft Research Asia.

My main research area lies in machine learning, with special interests in models and algorithms inspired by theoretical insights. Recently, my works focus on representation learning on structured data (e.g., graphs and sequences) and provide comprehensive analysis on base architectures (Transformers, GNNs) covering their expressiveness, efficiency and effectiveness, as well as apply them to AI for Science, Graph Learning and Natural Language Processing. I have published several papers and been reviewers at the top machine learning and artificial intelligence conferences such as NeurIPS, ICML and ICLR.

If you are interested in collaborating with me or want to have a chat, always feel free to contact me through e-mail or WeChat :)

🔥 News

  • 2023.03: Our paper “Rethinking the Expressive Power of GNNs via Graph Biconnectivity” received the
    ICLR 2023 Outstanding Paper Award (top 4/4966)!
  • 2023.01: Two papers are accepted at ICLR 2023!
  • 2022.11: Transformer-M has been used by all Top-3 winners in PCQM4Mv2 Track, 2nd OGB Large-Scale Challenge, NeurIPS 2022!
  • 2022.09: One paper is accepted at NeurIPS 2022!
  • 2021.09: Two papers are accepted at NeurIPS 2021!
  • 2021.06: Graphormer won the 1st place in PCQM4M Track, OGB Large-Scale Challenge, KDD CUP 2021!
  • 2021.05: One paper is accepted at ICML 2021!

📝 Selected Publications

sym

[ICLR 2023 Outstanding Paper Award] Rethinking the Expressive Power of GNNs via Graph Biconnectivity

Bohang Zhang*, Shengjie Luo*, Liwei Wang, Di He

[Project] [Code]

  • Beyond the WL test, we propose a fundamentally different perspective, a novel class of expressivity metrics via 🚀Graph Biconnectivity🚀, to study the expressive power of GNNs.
  • Through the lens of graph biconnectivity, we systematically investigate popular GNNs including classic MPNNs, Graph Substructure Networks (GSN) and its variant, GNN with lifting transformations (MPSN and CWN), GraphSNN, and Subgraph GNNs. The thorough analysis provide a fine-grained understanding on the expressive power of existing GNNs.
  • Based on the above theoretical analysis, we develop a principled and more efficient approach, called the Generalized Distance Weisfeiler-Lehman (GD-WL), which is provably expressive for all biconnectivity metrics.
  • We further develop Graphormer-GD to implement the GD-WL, which is a Transformer-like architecture that preserves expressiveness and enjoys full parallelizability.
sym

[ICLR 2023] One Transformer Can Understand Both 2D & 3D Molecular Data

Shengjie Luo, Tianlang Chen*, Yixian Xu*, Shuxin Zheng, Tie-Yan Liu, Liwei Wang, Di He

[Project] [Code] PWC

  • We develop a novel Transformer-based Molecular model called Transformer-M, which can take molecular data of 2D or 3D formats as input and generate meaningful semantic representations.
  • Using the standard Transformer as the backbone architecture, Transformer-M develops two separated channels to encode 2D and 3D structural information and incorporate them with the atom features in the network modules. When the input data is in a particular format, the corresponding channel will be activated, and the other will be disabled.
  • We conduct extensive experiments to show that Transformer-M can simultaneously achieve strong performance on 2D and 3D tasks (PCQM4Mv2 (2D), PDBBind (2D+3D), QM9 (3D)), suggesting its broad applicability.
  • 🚀 Transformer-M has been used by all Top-3 winners in PCQM4Mv2 Track, 2nd OGB Large-Scale Challenge, NeurIPS 2022!
sym

[NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect

Shengjie Luo*, Shanda Li*, Shuxin Zheng, Tie-Yan Liu, Liwei Wang, Di He

[Project] [Code]

  • We mathematically analyze the expressive power of RPE-based Transformers, and show that they are not universal approximators of continuous sequence-to-sequence functions.
  • We then present sufficient conditions for RPE-based Transformers to achieve universal function approximation. With the theoretical guidance, we develop Universal RPE-based (URPE) Attention, which is easy to implement and parameter-efficient.
  • Our URPE-based Transformers are verified to be universal approximators from both theoretical analysis and extensive experiments including synthetic tasks, language modeling and graph learning.
sym

[NeurIPS 2021] Do Transformers Really Perform Badly for Graph Representation?

Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu

[Project] [Code] [Technical Report] [Slides] [Video]

sym

[NeurIPS 2021] Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

Shengjie Luo*, Shanda Li*, Tianle Cai, Dinglan Peng, Di He, Shuxin Zheng, Guolin Ke, Liwei Wang, Tie-Yan Liu

[Project] [Code] [Slides & Video]

  • We propose a novel way to accelerate attention calculation (O(nlog(n)) for Transformers with RPE on top of the kernelized attention.
  • We mathematically show that kernelized attention with RPE can be calculated efficiently using Fast Fourier Transform (FFT), based on the Toeplitz matrix form of RPE.
  • We further demonstrate that properly using RPE can mitigate the training instability problem of vanilla kernelized attention.
  • Extensive experiments covering language pre-training, language modeling, Image Classification and Machine Translation are conducted to demonstrate the efficiency and effectiveness of our model.
sym

[ICML 2021] GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training

Tianle Cai*, Shengjie Luo*, Keyulu Xu, Di He, Tie-Yan Liu, Liwei Wang

[Project] [Slides] [Video]

[Code (Official)] [Code (PyG)] [Code (Microsoft ptgnn)]

  • We theoretically study the preconditioning effect of normalization methods on GNN training, and empirically observe that the batch noise of graph data is larger than data from other domain, e.g. image data.
  • We further show that the shift operation in InstanceNorm can cause expressiveness degradation of GNNs for highly regular graphs.
  • Based on these findings, we propose a principled normalization scheme, GraphNorm, and demonstrate its acceleration effect on graph learning benchmarks.

🎖 Honors and Awards

  • 2023.03, ICLR 2023 Outstanding Paper Award (top 4/4966) [Link].
  • 2021.06, 1st place Winner of PCQM4M Track, OGB Large Scale Challenge, KDD CUP 2021.
  • 2018.12, National Scholarship (Top 1%).

📖 Educations

  • 2022.09 - 2025.07 (expected), Ph.D. Student, School of Intelligence Science and Technology, Peking University.
  • 2020.09 - 2022.07, Master Student, Academy for Advanced Interdisciplinary Studies, Peking University.
  • 2016.09 - 2020.07, Undergraduate Student, Shenyuan Honors College, Beihang University.

💻 Internships

  • 2021.12 - now, Machine Learning Group, Microsoft Research Asia, China.
  • 2020.10 - 2021.06, Machine Learning Group, Microsoft Research Asia, China.
  • 2019.10 - 2020.06, Natural Language Computing Group, Microsoft Research Asia, China.

💬 Invited Talks

🏫 Professional Services

  • Reviewer for ICLR 2022, NeurIPS 2022, LOG 2022, ICML 2023, NeurIPS 2023.

Visitor distribution