Biography
I am an incoming Ph.D. student in Computer Science at Rutgers University, New Brunswick, advised by Prof. Hongyi Wang. My research focuses on improving the efficiency of neural networks across algorithmic and system levels. Specifically, I am interested in the following areas:
- Conditional Computation: Mixture of Experts, Sparse Activation Models
- Model Compression: Pruning, Quantization, Knowledge Distillation
- System Optimization: Accelerated Training, Low-Latency Inference, Efficient Kernel Design
News
- [2025/04] Will join Rutgers University, New Brunswick as a Ph.D. student!
Publications
- Shwai He*, Daize Dong*, Liang Ding, Ang Li, Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques, TMLR. [Paper]
- Zhen Tan*, Daize Dong*, Xinyu Zhao, Jie Peng, Yu Cheng, Tianlong Chen, DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs, ICLR 2025 SCOPE Workshop. [Paper]
- Tong Zhu, Daize Dong, Xiaoye Qu, Jiacheng Ruan, Wenliang Chen, Yu Cheng, Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts, NAACL 2025. [Paper]
- Zhangyang Gao*, Daize Dong*, Cheng Tan, Jun Xia, Bozhen Hu, Stan Z. Li, A Graph is Worth K Words: Euclideanizing Graph using Pure Transformer, ICML 2024. [Paper]
- Jiacheng Ruan, Jingsheng Gao, Mingye Xie, Daize Dong, Suncheng Xiang, Ting Liu, Yuzhuo Fu, iDAT: inverse Distillation Adapter-Tuning, ICME 2024 (Oral). [Paper]
- Shwai He, Liang Ding, Daize Dong, Boan Liu, Fuqiang Yu, Dacheng Tao, PAD-Net: An Efficient Framework for Dynamic Networks, ACL 2023. [Paper]
- Shwai He, Chenbo Jiang, Daize Dong, Liang Ding, SD-Conv: Towards the Parameter-Efficiency of Dynamic Convolution, WACV 2023. [Paper]
- Shwai He, Liang Ding, Daize Dong, Miao Zhang, Dacheng Tao, SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters, EMNLP 2022. [Paper]
Projects
- Tong Zhu, Xiaoye Qu, Daize Dong, Jiacheng Ruan, Jingqi Tong, Conghui He, Yu Cheng, LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training, EMNLP 2024. [Code] [Paper]
- Xiaoye Qu, Daize Dong, Xuyang Hu, Tong Zhu, Weigao Sun, Yu Cheng, LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training**. [Code] [Paper]