Personalized Daily ArXiv Papers 2025-05-07

[gpt-4o]	Prompt	Completion	Total
Token	48833	6718	55551
Cost	$0.12	$0.07	$0.19

Total arXiv papers: 526

Total scanned papers: 312

Total relevant papers: 28

Table of contents with paper titles:

Contextures: Representations from Contexts Authors: Runtian Zhai, Kai Yang, Che-Ping Tsai, Burak Varici, Zico Kolter, Pradeep Ravikumar
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Authors: Andrew Zhao, Yiran Wu, Yang Yue, Tong Wu, Quentin Xu, Yang Yue, Matthieu Lin, Shenzhi Wang, Qingyun Wu, Zilong Zheng, Gao Huang
Agentic Neurodivergence as a Contingent Solution to the AI Alignment Problem Authors: Alberto Hern\'andez-Espinosa, Felipe S. Abrah\~ao, Olaf Witkowski, Hector Zenil
Binding threshold units with artificial oscillatory neurons Authors: Vladimir Fanaskov, Ivan Oseledets
Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights Authors: Zhaiming Shen, Alex Havrilla, Rongjie Lai, Alexander Cloninger, Wenjing Liao
SPAP: Structured Pruning via Alternating Optimization and Penalty Methods Authors: Hanyu Hu, Xiaoming Yuan
RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale Authors: Daniel Goldstein, Eric Alcaide, Janna Lu, Eugene Cheah
Intra-Layer Recurrence in Transformers for Language Modeling Authors: Anthony Nguyen, Wenjun Lin
Nonnegative Low-rank Matrix Recovery Can Have Spurious Local Minima Authors: Richard Y. Zhang
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference Authors: Yaoqi Chen, Jinkai Zhang, Baotong Lu, Qianxi Zhang, Chengruidong Zhang, Jingjia Luo, Di Liu, Huiqiang Jiang, Qi Chen, Jing Liu, Bailu Ding, Xiao Yan, Jiawei Jiang, Chen Chen, Mingxing Zhang, Yuqing Yang, Fan Yang, Mao Yang
What do Language Model Probabilities Represent? From Distribution Estimation to Response Prediction Authors: Eitan Wagner, Omri Abend
Faster MoE LLM Inference for Extremely Large Models Authors: Haoqi Yang, Luohe Shi, Qiwei Li, Zuchao Li, Ping Wang, Bo Du, Mengjia Shen, Hai Zhao
Sharpness-Aware Minimization with Z-Score Gradient Filtering for Neural Networks Authors: Juyoung Yun
MoxE: Mixture of xLSTM Experts with Entropy-Aware Routing for Efficient Language Modeling Authors: Abdoul Majid O. Thiombiano, Brahim Hnich, Ali Ben Mrad, Mohamed Wiem Mkaouer
Physics-inspired Energy Transition Neural Network for Sequence Learning Authors: Zhou Wu, Junyi An, Baile Xu, Furao Shen, Jian Zhao
seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models Authors: Hafez Ghaemi, Eilif Muller, Shahab Bakhtiari
GeoERM: Geometry-Aware Multi-Task Representation Learning on Riemannian Manifolds Authors: Aoran Chen, Yang Feng
Unlearning vs. Obfuscation: Are We Truly Removing Knowledge? Authors: Guangzhi Sun, Potsawee Manakul, Xiao Zhan, Mark Gales
HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking Authors: Runquan Gui, Zhihai Wang, Jie Wang, Chi Ma, Huiling Zhen, Mingxuan Yuan, Jianye Hao, Defu Lian, Enhong Chen, Feng Wu
Teaching Models to Understand (but not Generate) High-risk Data Authors: Ryan Wang, Matthew Finlayson, Luca Soldaini, Swabha Swayamdipta, Robin Jia
Don't be lazy: CompleteP enables compute-efficient deep transformers Authors: Nolan Dey, Bin Claire Zhang, Lorenzo Noci, Mufan Li, Blake Bordelon, Shane Bergsma, Cengiz Pehlevan, Boris Hanin, Joel Hestness
Robustly Invertible Nonlinear Dynamics and the BiLipREN: Contracting Neural Models with Contracting Inverses Authors: Yurui Zhang, Ruigang Wang, Ian R. Manchester
Advancing Constrained Monotonic Neural Networks: Achieving Universal Approximation Beyond Bounded Activations Authors: Davide Sartor, Alberto Sinigaglia, Gian Antonio Susto
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach Authors: Jiancong Xiao, Bojian Hou, Zhanliang Wang, Ruochen Jin, Qi Long, Weijie J. Su, Li Shen
Attention Mechanisms Perspective: Exploring LLM Processing of Graph-Structured Data Authors: Zhong Guan, Likang Wu, Hongke Zhao, Ming He, Jianpin Fan
Knowing You Don't Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing Authors: Diji Yang, Linda Zeng, Jinmeng Rao, Yi Zhang
Large Language Model Partitioning for Low-Latency Inference at the Edge Authors: Dimitrios Kafetzis, Ramin Khalili, Iordanis Koutsopoulos
Quantitative Analysis of Performance Drop in DeepSeek Model Quantization Authors: Enbo Zhao, Yi Shen, Shuming Shi, Jieyun Huang, Zhihao Chen, Ning Wang, Siqi Xiao, Jian Zhang, Kai Wang, Shiguo Lian

1. Contextures: Representations from Contexts

ArXiv ID: 2505.01557

Authors: Runtian Zhai, Kai Yang, Che-Ping Tsai, Burak Varici, Zico Kolter, Pradeep Ravikumar

Abstract: Despite the empirical success of foundation models, we do not have a systematic characterization of the representations that these models learn. In this paper, we establish the contexture theory. It shows that a large class of representation learning methods can be characterized as learning from the association between the input and a context variable. Specifically, we show that many popular methods aim to approximate the top-d singular functions of the expectation operator induced by the context, in which case we say that the representation learns the contexture. We demonstrate the generality of the contexture theory by proving that representation learning within various learning paradigms -- supervised, self-supervised, and manifold learning -- can all be studied from such a perspective. We also prove that the representations that learn the contexture are optimal on those tasks that are compatible with the context. One important implication of the contexture theory is that once the model is large enough to approximate the top singular functions, further scaling up the model size yields diminishing returns. Therefore, scaling is not all we need, and further improvement requires better contexts. To this end, we study how to evaluate the usefulness of a context without knowing the downstream tasks. We propose a metric and show by experiments that it correlates well with the actual performance of the encoder on many real datasets.

Comment: The paper introduces a novel theoretical framework for representation learning, directly addressing the 'Representation Learning' criterion with a focus on foundational insights.

Relevance: 10 Novelty: 9

2. Absolute Zero: Reinforced Self-play Reasoning with Zero Data

ArXiv ID: 2505.03335

Authors: Andrew Zhao, Yiran Wu, Yang Yue, Tong Wu, Quentin Xu, Yang Yue, Matthieu Lin, Shenzhi Wang, Qingyun Wu, Zilong Zheng, Gao Huang

Abstract: Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from outcome-based rewards. Recent RLVR works that operate under the zero setting avoid supervision in labeling the reasoning process, but still depend on manually curated collections of questions and answers for training. The scarcity of high-quality, human-produced examples raises concerns about the long-term scalability of relying on human supervision, a challenge already evident in the domain of language model pretraining. Furthermore, in a hypothetical future where AI surpasses human intelligence, tasks provided by humans may offer limited learning potential for a superintelligent system. To address these concerns, we propose a new RLVR paradigm called Absolute Zero, in which a single model learns to propose tasks that maximize its own learning progress and improves reasoning by solving them, without relying on any external data. Under this paradigm, we introduce the Absolute Zero Reasoner (AZR), a system that self-evolves its training curriculum and reasoning ability by using a code executor to both validate proposed code reasoning tasks and verify answers, serving as an unified source of verifiable reward to guide open-ended yet grounded learning. Despite being trained entirely without external data, AZR achieves overall SOTA performance on coding and mathematical reasoning tasks, outperforming existing zero-setting models that rely on tens of thousands of in-domain human-curated examples. Furthermore, we demonstrate that AZR can be effectively applied across different model scales and is compatible with various model classes.

Comment: The paper introduces a self-evolving reasoning paradigm for LLMs, which aligns with the 'Large Language Models' criterion for foundational innovations in reasoning capabilities.

Relevance: 9 Novelty: 9

3. Agentic Neurodivergence as a Contingent Solution to the AI Alignment Problem

ArXiv ID: 2505.02581

Authors: Alberto Hern\'andez-Espinosa, Felipe S. Abrah\~ao, Olaf Witkowski, Hector Zenil

Abstract: The AI alignment problem, which focusses on ensuring that artificial intelligence (AI), including AGI and ASI, systems act according to human values, presents profound challenges. With the progression from narrow AI to Artificial General Intelligence (AGI) and Superintelligence, fears about control and existential risk have escalated. This paper demonstrates that achieving complete alignment is inherently unattainable due to mathematical principles rooted in the foundations of predicate logic and computability, in particular Turing's computational universality, G\"odel's incompleteness and Chaitin's randomness. Instead, we argue that embracing AI misalignment or agent's neurodivergence' as a contingent strategy, defined as fostering a dynamic ecosystem of competing, partially aligned agents, is a possible only viable path to mitigate risks. Through mathematical proofs and an experimental design, we explore how misalignment may serve and should be promoted as a counterbalancing mechanism to team up with whichever agents are most aligned AI to human values, ensuring that no single system dominates destructively. The main premise of our contribution is that misalignment is inevitable because full AI-human alignment is a mathematical impossibility from Turing-complete systems which we also prove in this paper, a feature then inherited to AGI and ASI systems. We introduce and testchange-of-opinion' attacks based on this kind of perturbation and intervention analysis to study how agents may neutralise friendly or unfriendly AIs through cooperation, competition or malice.

Comment: The paper provides a theoretical argument about AI alignment and introduces a novel perspective on misalignment as a strategy, which aligns with emerging trends in foundational AI research.

Relevance: 9 Novelty: 9

4. Binding threshold units with artificial oscillatory neurons

ArXiv ID: 2505.03648

Authors: Vladimir Fanaskov, Ivan Oseledets

Abstract: Artificial Kuramoto oscillatory neurons were recently introduced as an alternative to threshold units. Empirical evidence suggests that oscillatory units outperform threshold units in several tasks including unsupervised object discovery and certain reasoning problems. The proposed coupling mechanism for these oscillatory neurons is heterogeneous, combining a generalized Kuramoto equation with standard coupling methods used for threshold units. In this research note, we present a theoretical framework that clearly distinguishes oscillatory neurons from threshold units and establishes a coupling mechanism between them. We argue that, from a biological standpoint, oscillatory and threshold units realise distinct aspects of neural coding: roughly, threshold units model intensity of neuron firing, while oscillatory units facilitate information exchange by frequency modulation. To derive interaction between these two types of units, we constrain their dynamics by focusing on dynamical systems that admit Lyapunov functions. For threshold units, this leads to Hopfield associative memory model, and for oscillatory units it yields a specific form of generalized Kuramoto model. The resulting dynamical systems can be naturally coupled to form a Hopfield-Kuramoto associative memory model, which also admits a Lyapunov function. Various forms of coupling are possible. Notably, oscillatory neurons can be employed to implement a low-rank correction to the weight matrix of a Hopfield network. This correction can be viewed either as a form of Hebbian learning or as a popular LoRA method used for fine-tuning of large language models. We demonstrate the practical realization of this particular coupling through illustrative toy experiments.

Comment: The paper introduces a theoretical framework combining oscillatory and threshold units, which aligns with foundational research on neural coding and architecture-level innovations.

Relevance: 9 Novelty: 9

5. Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights

ArXiv ID: 2505.03205

Authors: Zhaiming Shen, Alex Havrilla, Rongjie Lai, Alexander Cloninger, Wenjing Liao

Abstract: Transformers serve as the foundational architecture for large language and video generation models, such as GPT, BERT, SORA and their successors. Empirical studies have demonstrated that real-world data and learning tasks exhibit low-dimensional structures, along with some noise or measurement error. The performance of transformers tends to depend on the intrinsic dimension of the data/tasks, though theoretical understandings remain largely unexplored for transformers. This work establishes a theoretical foundation by analyzing the performance of transformers for regression tasks involving noisy input data on a manifold. Specifically, the input data are in a tubular neighborhood of a manifold, while the ground truth function depends on the projection of the noisy data onto the manifold. We prove approximation and generalization errors which crucially depend on the intrinsic dimension of the manifold. Our results demonstrate that transformers can leverage low-complexity structures in learning task even when the input data are perturbed by high-dimensional noise. Our novel proof technique constructs representations of basic arithmetic operations by transformers, which may hold independent interest.

Comment: The paper provides theoretical insights into how transformers leverage low-dimensional structures in noisy data, aligning with the 'Model Architecture' criterion for foundational analysis of transformers.