Personalized Daily ArXiv Papers 2025-04-08

[gpt-4o]	Prompt	Completion	Total
Token	53257	7591	60848
Cost	$0.13	$0.08	$0.21

Total arXiv papers: 856

Total scanned papers: 572

Total relevant papers: 38

Table of contents with paper titles:

Language Models Are Implicitly Continuous Authors: Samuele Marro, Davide Evangelista, X. Angelo Huang, Emanuele La Malfa, Michele Lombardi, Michael Wooldridge
On the Spatial Structure of Mixture-of-Experts in Transformers Authors: Daniel Bershatsky, Ivan Oseledets
Generalising from Self-Produced Data: Model Training Beyond Human Constraints Authors: Alfath Daryl Alhajir, Jennifer Dodgson, Joseph Lim, Truong Ma Phi, Julian Peh, Akira Rafhael Janson Pattirane, Lokesh Poovaragan
Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition Authors: Rishi Hazra, Gabriele Venturato, Pedro Zuidberg Dos Martires, Luc De Raedt
Exact Unlearning of Finetuning Data via Model Merging at Scale Authors: Kevin Kuo, Amrith Setlur, Kartik Srinivas, Aditi Raghunathan, Virginia Smith
Variational Self-Supervised Learning Authors: Mehmet Can Yavuz, Berrin Yanikoglu
Scalable Robust Bayesian Co-Clustering with Compositional ELBOs Authors: Ashwin Vinod, Chandrajit Bajaj
Saliency-driven Dynamic Token Pruning for Large Language Models Authors: Yao Tao, Yehui Tang, Yun Wang, Mingjian Zhu, Hailin Hu, Yunhe Wang
HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs Authors: Yongji Wu, Xueshen Liu, Shuowei Jin, Ceyu Xu, Feng Qian, Z. Morley Mao, Matthew Lentz, Danyang Zhuo, Ion Stoica
Towards Symmetric Low-Rank Adapters Authors: Tales Panoutsos, Rodrygo L. T. Santos, Flavio Figueiredo
Capturing AI's Attention: Physics of Repetition, Hallucination, Bias and Beyond Authors: Frank Yingjie Huo, Neil F. Johnson
RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm Authors: Yongyi Yang, Jianyang Gao, Wei Hu
Using Attention Sinks to Identify and Evaluate Dormant Heads in Pretrained LLMs Authors: Pedro Sandoval-Segura, Xijun Wang, Ashwinee Panda, Micah Goldblum, Ronen Basri, Tom Goldstein, David Jacobs
Entropy-Based Block Pruning for Efficient Large Language Models Authors: Liangwei Yang, Yuhui Xu, Juntao Tan, Doyen Sahoo, Silvio Savarese, Caiming Xiong, Huan Wang, Shelby Heinecke
AI in a vat: Fundamental limits of efficient world modelling for agent sandboxing and interpretability Authors: Fernando Rosas, Alexander Boyd, Manuel Baltieri
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning Authors: Yingcong Li, Davoud Ataee Tarzanagh, Ankit Singh Rawat, Maryam Fazel, Samet Oymak
Directional Sign Loss: A Topology-Preserving Loss Function that Approximates the Sign of Finite Differences Authors: Harvey Dam, Tripti Agarwal, Ganesh Gopalakrishnan
Nonlocal techniques for the analysis of deep ReLU neural network approximations Authors: Cornelia Schneider, Mario Ullrich, Jan Vybiral
Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning Authors: Ximing Lu, Seungju Han, David Acuna, Hyunwoo Kim, Jaehun Jung, Shrimai Prabhumoye, Niklas Muennighoff, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Yejin Choi
Sensitivity Meets Sparsity: The Impact of Extremely Sparse Parameter Patterns on Theory-of-Mind of Large Language Models Authors: Yuheng Wu, Wentao Guo, Zirui Liu, Heng Ji, Zhaozhuo Xu, Denghui Zhang
Rethinking Reflection in Pre-Training Authors: Essential AI, :, Darsh J Shah, Peter Rushton, Somanshu Singla, Mohit Parmar, Kurt Smith, Yash Vanjani, Ashish Vaswani, Adarsh Chaluvaraju, Andrew Hojel, Andrew Ma, Anil Thomas, Anthony Polloreno, Ashish Tanwer, Burhan Drak Sibai, Divya S Mansingka, Divya Shivaprasad, Ishaan Shah, Karl Stratos, Khoi Nguyen, Michael Callahan, Michael Pust, Mrinal Iyer, Philip Monk, Platon Mazarakis, Ritvik Kapila, Saurabh Srivastava, Tim Romanski
Contrastive and Variational Approaches in Self-Supervised Learning for Complex Data Mining Authors: Yingbin Liang, Lu Dai, Shuo Shi, Minghao Dai, Junliang Du, Haige Wang
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes Authors: Ruiqi Zhang, Jingfeng Wu, Licong Lin, Peter L. Bartlett
Adversarial KA Authors: Sviatoslav Dzhenzher, Michael H. Freedman
Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models Authors: Adri\'an Bazaga, Rexhina Blloshmi, Bill Byrne, Adri`a de Gispert
Adaptive Elicitation of Latent Information Using Natural Language Authors: Jimmy Wang, Thomas Zollo, Richard Zemel, Hongseok Namkoong
Better Rates for Random Task Orderings in Continual Linear Models Authors: Itay Evron, Ran Levinstein, Matan Schliserman, Uri Sherman, Tomer Koren, Daniel Soudry, Nathan Srebro
asKAN: Active Subspace embedded Kolmogorov-Arnold Network Authors: Zhiteng Zhou, Zhaoyue Xu, Yi Liu, Shizhao Wang
Learning symmetries in datasets Authors: Veronica Sanz
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models Authors: Ruikang Liu, Yuxuan Sun, Manyi Zhang, Haoli Bai, Xianzhi Yu, Tiezheng Yu, Chun Yuan, Lu Hou
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning Authors: Xuerui Su, Shufang Xie, Guoqing Liu, Yingce Xia, Renqian Luo, Peiran Jin, Zhiming Ma, Yue Wang, Zun Wang, Yuting Liu
Memory and Bandwidth are All You Need for Fully Sharded Data Parallel Authors: Jiangtao Wang, Jan Ebert, Oleg Filatov, Stefan Kesselheim
LOGLO-FNO: Efficient Learning of Local and Global Features in Fourier Neural Operators Authors: Marimuthu Kalimuthu, David Holzm\"uller, Mathias Niepert
The Effects of Grouped Structural Global Pruning of Vision Transformers on Domain Generalisation Authors: Hamza Riaz, Alan F. Smeaton
PINNverse: Accurate parameter estimation in differential equations from noisy data with constrained physics-informed neural networks Authors: Marius Almanst\"otter, Roman Vetter, Dagmar Iber
Cramer-Rao Bounds for Laplacian Matrix Estimation Authors: Morad Halihal, Tirza Routtenberg, H. Vincent Poor
A Simultaneous Approach for Training Neural Differential-Algebraic Systems of Equations Authors: Laurens R. Lueg, Victor Alves, Daniel Schicksnus, John R. Kitchin, Carl D. Laird, Lorenz T. Biegler
SnapPix: Efficient-Coding--Inspired In-Sensor Compression for Edge Vision Authors: Weikai Lin, Tianrui Ma, Adith Boloor, Yu Feng, Ruofan Xing, Xuan Zhang, Yuhao Zhu

1. Language Models Are Implicitly Continuous

ArXiv ID: 2504.03933

Authors: Samuele Marro, Davide Evangelista, X. Angelo Huang, Emanuele La Malfa, Michele Lombardi, Michael Wooldridge

Abstract: Language is typically modelled with discrete sequences. However, the most successful approaches to language modelling, namely neural networks, are continuous and smooth function approximators. In this work, we show that Transformer-based language models implicitly learn to represent sentences as continuous-time functions defined over a continuous input space. This phenomenon occurs in most state-of-the-art Large Language Models (LLMs), including Llama2, Llama3, Phi3, Gemma, Gemma2, and Mistral, and suggests that LLMs reason about language in ways that fundamentally differ from humans. Our work formally extends Transformers to capture the nuances of time and space continuity in both input and output space. Our results challenge the traditional interpretation of how LLMs understand language, with several linguistic and engineering implications.

Comment: Explores the implicit continuous nature of LLMs, providing theoretical insights into their behavior, which is highly relevant to foundational research on LLMs.

Relevance: 10 Novelty: 9

2. On the Spatial Structure of Mixture-of-Experts in Transformers

ArXiv ID: 2504.04444

Authors: Daniel Bershatsky, Ivan Oseledets

Abstract: A common assumption is that MoE routers primarily leverage semantic features for expert selection. However, our study challenges this notion by demonstrating that positional token information also plays a crucial role in routing decisions. Through extensive empirical analysis, we provide evidence supporting this hypothesis, develop a phenomenological explanation of the observed behavior, and discuss practical implications for MoE-based architectures.

Comment: The paper analyzes the spatial structure of Mixture-of-Experts (MoE) in Transformers, which directly aligns with the model architecture criterion, particularly MoE behavior.

Relevance: 10 Novelty: 8

3. Generalising from Self-Produced Data: Model Training Beyond Human Constraints

ArXiv ID: 2504.04711

Authors: Alfath Daryl Alhajir, Jennifer Dodgson, Joseph Lim, Truong Ma Phi, Julian Peh, Akira Rafhael Janson Pattirane, Lokesh Poovaragan

Abstract: Current large language models (LLMs) are constrained by human-derived training data and limited by a single level of abstraction that impedes definitive truth judgments. This paper introduces a novel framework in which AI models autonomously generate and validate new knowledge through direct interaction with their environment. Central to this approach is an unbounded, ungamable numeric reward - such as annexed disk space or follower count - that guides learning without requiring human benchmarks. AI agents iteratively generate strategies and executable code to maximize this metric, with successful outcomes forming the basis for self-retraining and incremental generalisation. To mitigate model collapse and the warm start problem, the framework emphasizes empirical validation over textual similarity and supports fine-tuning via GRPO. The system architecture employs modular agents for environment analysis, strategy generation, and code synthesis, enabling scalable experimentation. This work outlines a pathway toward self-improving AI systems capable of advancing beyond human-imposed constraints toward autonomous general intelligence.

Comment: The paper introduces a novel framework for AI models to autonomously generate and validate knowledge, which aligns with foundational research in LLMs and emerging trends. The focus on self-improving AI systems is highly relevant.

Relevance: 9 Novelty: 9

4. Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition

ArXiv ID: 2504.03930

Authors: Rishi Hazra, Gabriele Venturato, Pedro Zuidberg Dos Martires, Luc De Raedt

Abstract: Large Language Models (LLMs) have been touted as AI models possessing advanced reasoning abilities. In theory, autoregressive LLMs with Chain-of-Thought (CoT) can perform more serial computations to solve complex reasoning tasks. However, recent studies suggest that, despite this capacity, LLMs do not truly learn to reason but instead fit on statistical features. To study the reasoning capabilities in a principled fashion, we adopt a computational theory perspective and propose an experimental protocol centered on 3-SAT -- the prototypical NP-complete problem lying at the core of logical reasoning and constraint satisfaction tasks. Specifically, we examine the phase transitions in random 3-SAT and characterize the reasoning abilities of state-of-the-art LLMs by varying the inherent hardness of the problem instances. By comparing DeepSeek R1 with other LLMs, our findings reveal two key insights (1) LLM accuracy drops significantly on harder instances, suggesting all current models struggle when statistical shortcuts are unavailable (2) Unlike other LLMs, R1 shows signs of having learned the underlying reasoning. Following a principled experimental protocol, our study moves beyond the benchmark-driven evidence often found in LLM reasoning research. Our findings highlight important gaps and suggest clear directions for future research.

Comment: The paper explores reasoning capabilities of LLMs using a principled experimental protocol, which aligns with foundational research on LLM behavior and interpretability.