Biography
I’m a researcher at Zhipu AI, specializing in the structures and training algorithms for large language models. Previously, I spent a year as a research assistant with Prof. Yu Cheng at Shanghai Artificial Intelligence Laboratory, researching the Mixture of Experts (MoE) in large language models. At the same time, I also collaborated with Prof. Stan Z. Li at Westlake University, researching AI-related molecule and protein design. Before that, I received my BS in Computer Science & Mathematics from the University of Electronic Science and Technology of China (UESTC), where I worked closely with Prof. Wen Li.
My research spans domains such as ML, NLP, and CV, and I have a strong passion for uncovering the intrinsic properties of neural networks with theoretical guarantees. My primary research interests include, but are not limited to:
- Representation Learning: Enhancing abstract data representations to improve generalizability and interpretability, thereby expanding model capacity and avoiding degradation.
- Neural Network Architecture: Discovering general structures to enhance model efficiency and achieve mathematical completeness (e.g., MoE, GNN).
- AI for Biology: Leveraging AI to advance the scientific progress of human beings.
News
- [2024/11] Open-sourced project LLaMA-MoE v2 is released!
- [2024/09] One paper (LLaMA-MoE) is accepted by EMNLP 2024.
- [2024/05] One paper (GraphsGPT) is accepted by ICML 2024.
- [2024/04] One paper (iDAT) is accepted by ICME 2024 as oral.
- [2023/12] Open-sourced project LLaMA-MoE is released!
- [2023/05] One paper (PAD-Net) is accepted by ACL 2023.
- [2022/10] One paper (SparseAdapter) is accepted by EMNLP 2022.
- [2022/08] One paper (SD-Conv) is accepted by WACV 2023.
Publications
- Zhangyang Gao*, Daize Dong*, Cheng Tan, Jun Xia, Bozhen Hu, Stan Z. Li, A Graph is Worth K Words: Euclideanizing Graph using Pure Transformer, The 41st International Conference on Machine Learning (ICML 2024). [Paper]
- Jiacheng Ruan, Jingsheng Gao, Mingye Xie, Daize Dong, Suncheng Xiang, Ting Liu, Yuzhuo Fu, iDAT: inverse Distillation Adapter-Tuning, The 15th International Congress on Mathematical Education (ICME 2024) (Oral Presentation). [Paper]
- Shwai He, Liang Ding, Daize Dong, Boan Liu, Fuqiang Yu, Dacheng Tao, PAD-Net: An Efficient Framework for Dynamic Networks, Proceedings of The 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023). [Paper]
- Shwai He, Liang Ding, Daize Dong, Miao Zhang, Dacheng Tao, SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters, Findings of The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). [Paper]
- Shwai He, Chenbo Jiang, Daize Dong, Liang Ding, SD-Conv: Towards the Parameter-Efficiency of Dynamic Convolution, IEEE/CVF Winter Conference on Applications of Computer Vision, 2023 (WACV 2023). [Paper]
Projects
- Tong Zhu, Xiaoye Qu, Daize Dong, Jiacheng Ruan, Jingqi Tong, Conghui He, Yu Cheng, LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training, The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). [Code] [Paper]
- Xiaoye Qu, Daize Dong, Xuyang Hu, Tong Zhu, Weigao Sun, Yu Cheng, *LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training. [Code] [Paper]