Personalized Daily Arxiv Papers 01/23/2025

	Prompt	Completion	Total
Token	47901	4076	51977
Cost	$1.19752	$0.4076	$1.60512

Total relevant papers: 17

Table of contents with paper titles:

Autonomy-of-Experts Models Authors: Ang Lv, Ruobing Xie, Yining Qian, Songhao Wu, Xingwu Sun, Zhanhui Kang, Di Wang, Rui Yan
Irrational Complex Rotations Empower Low-bit Optimizers Authors: Zhen Tian, Wayne Xin Zhao, Ji-Rong Wen
GANQ: GPU-Adaptive Non-Uniform Quantization for Large Language Models Authors: Pengxiang Zhao, Xiaoming Yuan
Human-like conceptual representations emerge from language prediction Authors: Ningyu Xu, Qi Zhang, Chao Du, Qiang Luo, Xipeng Qiu, Xuanjing Huang, Menghan Zhang
A Rate-Distortion Framework for Summarization Authors: Enes Arda, Aylin Yener
NExtLong: Toward Effective Long-Context Training without Long Documents Authors: Chaochen Gao, Xing Wu, Zijia Lin, Debing Zhang, Songlin Hu
EchoLM: Accelerating LLM Serving with Real-time Knowledge Distillation Authors: Yifan Yu, Yu Gan, Lily Tasi, Nikhil Sarda, Jiaming Shen, Yanqi Zhou, Arvind Krishnamurthy, Fan Lai, Henry M. Levy, David Culler
Stability and Generalization of Quantum Neural Networks Authors: Jiaqi Yang, Wei Xie, Xiaohua Xu
HierPromptLM: A Pure PLM-based Framework for Representation Learning on Heterogeneous Text-rich Networks Authors: Qiuyu Zhu, Liang Zhang, Qianxiong Xu, Cheng Long
Machine Learning Modeling for Multi-order Human Visual Motion Processing Authors: Zitang Sun, Yen-Ju Chen, Yung-Hao Yang, Yuan Li, Shin'ya Nishida
On Tradeoffs in Learning-Augmented Algorithms Authors: Ziyad Benomar, Vianney Perchet
Manifold learning and optimization using tangent space proxies Authors: Ryan A. Robinett, Lorenzo Orecchia, Samantha J. Riesenfeld
R2D2: Remembering, Reflecting and Dynamic Decision Making for Web Agents Authors: Tenghao Huang, Kinjal Basu, Ibrahim Abdelaziz, Pavan Kapanipathi, Jonathan May, Muhao Chen
Generalization Performance of Hypergraph Neural Networks Authors: Yifan Wang, Gonzalo R. Arce, Guangmo Tong
Multiscale Training of Convolutional Neural Networks Authors: Niloufar Zakariaei, Shadab Ahamed, Eldad Haber, Moshe Eliasof
Modality Interactive Mixture-of-Experts for Fake News Detection Authors: Yifan Liu, Yaokun Liu, Zelin Li, Ruichen Yao, Yang Zhang, Dong Wang
Hybrid Losses for Hierarchical Embedding Learning Authors: Haokun Tian, Stefan Lattner, Brian McFee, Charalampos Saitis

1. Autonomy-of-Experts Models

ArXiv ID: 2501.13074

Authors: Ang Lv, Ruobing Xie, Yining Qian, Songhao Wu, Xingwu Sun, Zhanhui Kang, Di Wang, Rui Yan

Abstract: Mixture-of-Experts (MoE) models mostly use a router to assign tokens to specific expert modules, activating only partial parameters and often outperforming dense models. We argue that the separation between the router's decision-making and the experts' execution is a critical yet overlooked issue, leading to suboptimal expert selection and ineffective learning. To address this, we propose Autonomy-of-Experts (AoE), a novel MoE paradigm in which experts autonomously select themselves to process inputs. AoE is based on the insight that an expert is aware of its own capacity to effectively process a token, an awareness reflected in the scale of its internal activations. In AoE, routers are removed; instead, experts pre-compute internal activations for inputs and are ranked based on their activation norms. Only the top-ranking experts proceed with the forward pass, while the others abort. The overhead of pre-computing activations is reduced through a low-rank weight factorization. This self-evaluating-then-partner-comparing approach ensures improved expert selection and effective learning. We pre-train language models having 700M up to 4B parameters, demonstrating that AoE outperforms traditional MoE models with comparable efficiency.

Comment: The paper proposes a novel Mixture-of-Experts variation using expert-driven selection without a router, directly challenging foundational aspects of MoE architectures. Highly relevant to core architectural innovations.

Relevance: 10 Novelty: 9

2. Irrational Complex Rotations Empower Low-bit Optimizers

ArXiv ID: 2501.12896

Authors: Zhen Tian, Wayne Xin Zhao, Ji-Rong Wen

Abstract: In this paper, we propose a novel optimizer state compression algorithm, namely $\pi$-Quant, which leverages the properties of irrational numbers (e.g., $\pi$) for memory-efficient training. The core idea is based on our mathematical findings, which show that a pair of parameters can be represented by a single rotation angle using the complex rotation scheme. Building on this insight, we map the parameters into a complex space and perform quantization using the corresponding rotation angles. To efficiently integrate it into optimization process, we develop an efficient system of geometric equations that computes the precise rotation angles with linear complexity. We evaluate $\pi$-Quant on a wide range of tasks. Our experiments show that it can reduce the bit-width of parameters to 3.32-bit, achieving a 75% reduction in parameter scale and a 40% decrease in GPU memory usage, all while maintaining full accuracy.

Comment: The paper presents a novel optimizer state compression algorithm leveraging properties of irrational numbers for memory-efficient training. This directly relates to model compression, focusing on bit-width reduction and parameter quantization, which matches the core interest in sparsity, quantization, and low-rank approaches.