Biography

I’m a researcher at Zhipu AI, now specializing in the structure and training techniques for large language models. Previously, I spent a year as a research assistant with Prof. Yu Cheng at Shanghai Artificial Intelligence Laboratory, researching on the Mixture of Experts (MoE) in large language models. At the same time, I also collaborated with Prof. Stan Z. Li at Westlake University, researching on AI related molecule and protein design. Before that, I received my BS in Computer Science & Mathematics from the University of Electronic Science and Technology of China (UESTC), where I worked closely with Prof. Wen Li.

My research spans across domains such as ML, NLP, and CV, and I have a strong passion for uncovering the intrinsic properties of neural networks with theoretical guarantees. My primary research interests include, but are not limited to:

  1. Representation Learning: Enhancing abstract data representations to improve generalizability, interpretability, and expand model capacity, thereby avoiding degradation.
  2. Neural Network Architecture: Discovering general structures to enhance model efficiency or achieve mathematical completeness (e.g., MoE, GNN).
  3. AI for Biology / Psychology: Leveraging AI to advance the scientific progress of human beings.

News

Publications

  1. Zhangyang Gao*, Daize Dong*, Cheng Tan, Jun Xia, Bozhen Hu, Stan Z. Li, A Graph is Worth K Words: Euclideanizing Graph using Pure Transformer, The 41st International Conference on Machine Learning (ICML 2024). [Paper]
  2. Jiacheng Ruan, Jingsheng Gao, Mingye Xie, Daize Dong, Suncheng Xiang, Ting Liu, Yuzhuo Fu, iDAT: inverse Distillation Adapter-Tuning, The 15th International Congress on Mathematical Education (ICME 2024) (Oral Presentation). [Paper]
  3. Shwai He, Liang Ding, Daize Dong, Boan Liu, Fuqiang Yu, Dacheng Tao, PAD-Net: An Efficient Framework for Dynamic Networks, Proceedings of The 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023). [Paper]
  4. Shwai He, Liang Ding, Daize Dong, Miao Zhang, Dacheng Tao, SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters, Findings of The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). [Paper]
  5. Shwai He, Chenbo Jiang, Daize Dong, Liang Ding, SD-Conv: Towards the Parameter-Efficiency of Dynamic Convolution, IEEE/CVF Winter Conference on Applications of Computer Vision, 2023 (WACV 2023). [Paper]

Projects

  • Tong Zhu, Xiaoye Qu, Daize Dong, Jiacheng Ruan, Jingqi Tong, Conghui He, Yu Cheng, LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training, The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). [Code] [Paper]