Personalized Daily ArXiv Papers 2025-04-04

[gpt-4o]	Prompt	Completion	Total
Token	37570	4808	42378
Cost	$0.09	$0.05	$0.14

Total arXiv papers: 406

Total scanned papers: 239

Total relevant papers: 23

Table of contents with paper titles:

TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining Authors: Jeffrey Li, Mohammadreza Armandpour, Iman Mirzadeh, Sachin Mehta, Vaishaal Shankar, Raviteja Vemulapalli, Samy Bengio, Oncel Tuzel, Mehrdad Farajtabar, Hadi Pouransari, Fartash Faghri
MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators Authors: Beichen Huang, Yueming Yuan, Zelei Shao, Minjia Zhang
Epistemic Closure and the Irreversibility of Misalignment: Modeling Systemic Barriers to Alignment Innovation Authors: Andy Williams
Example-Free Learning of Regular Languages with Prefix Queries Authors: Eve Fernando, Sasha Rubin, Rahul Gopinath
A Physics-Informed Meta-Learning Framework for the Continuous Solution of Parametric PDEs on Arbitrary Geometries Authors: Reza Najian Asl, Yusuke Yamazaki, Kianoosh Taghikhani, Mayu Muramatsu, Markus Apel, Shahed Rezaei
Do Two AI Scientists Agree? Authors: Xinghong Fu, Ziming Liu, Max Tegmark
Analytical Discovery of Manifold with Machine Learning Authors: Yafei Shen, Huan-Fei Ma, Ling Yang
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models Authors: Mateusz Pach, Shyamgopal Karthik, Quentin Bouniot, Serge Belongie, Zeynep Akata
MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism Authors: Ruidong Zhu, Ziheng Jiang, Chao Jin, Peng Wu, Cesar A. Stuardo, Dongyang Wang, Xinlei Zhang, Huaping Zhou, Haoran Wei, Yang Cheng, Jianzhe Xiao, Xinyi Zhang, Lingjun Liu, Haibin Lin, Li-Wen Chang, Jianxi Ye, Xiao Yu, Xuanzhe Liu, Xin Jin, Xin Liu
MDP: Multidimensional Vision Model Pruning with Latency Constraint Authors: Xinglong Sun, Barath Lakshmanan, Maying Shen, Shiyi Lan, Jingde Chen, Jose M. Alvarez
GPTQv2: Efficient Finetuning-Free Quantization for Asymmetric Calibration Authors: Yuhang Li, Ruokai Yin, Donghyun Lee, Shiting Xiao, Priyadarshini Panda
Large (Vision) Language Models are Unsupervised In-Context Learners Authors: Artyom Gadetsky, Andrei Atanov, Yulun Jiang, Zhitong Gao, Ghazal Hosseini Mighan, Amir Zamir, Maria Brbic
Variational Online Mirror Descent for Robust Learning in Schr\"odinger Bridge Authors: Dong-Sig Han, Jaein Kim, Hee Bin Yoo, Byoung-Tak Zhang
When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks Authors: Nan Zhang, Yusen Zhang, Prasenjit Mitra, Rui Zhang
Implicit Neural Differential Model for Spatiotemporal Dynamics Authors: Deepak Akhare, Pan Du, Tengfei Luo, Jian-Xun Wang
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Authors: Bang Liu, Xinfeng Li, Jiayi Zhang, Jinlin Wang, Tanjin He, Sirui Hong, Hongzhang Liu, Shaokun Zhang, Kaitao Song, Kunlun Zhu, Yuheng Cheng, Suyuchen Wang, Xiaoqiang Wang, Yuyu Luo, Haibo Jin, Peiyan Zhang, Ollie Liu, Jiaqi Chen, Huan Zhang, Zhaoyang Yu, Haochen Shi, Boyan Li, Dekun Wu, Fengwei Teng, Xiaojun Jia, Jiawei Xu, Jinyu Xiang, Yizhang Lin, Tianming Liu, Tongliang Liu, Yu Su, Huan Sun, Glen Berseth, Jianyun Nie, Ian Foster, Logan Ward, Qingyun Wu, Yu Gu, Mingchen Zhuge, Xiangru Tang, Haohan Wang, Jiaxuan You, Chi Wang, Jian Pei, Qiang Yang, Xiaoliang Qi, Chenglin Wu
FT-Transformer: Resilient and Reliable Transformer with End-to-End Fault Tolerant Attention Authors: Huangliang Dai, Shixun Wu, Hairui Zhao, Jiajun Huang, Zizhe Jian, Yue Zhu, Haiyang Hu, Zizhong Chen
GMR-Conv: An Efficient Rotation and Reflection Equivariant Convolution Kernel Using Gaussian Mixture Rings Authors: Yuexi Du, Jiazhen Zhang, Nicha C. Dvornek, John A. Onofrey
Towards Interpretable Soft Prompts Authors: Oam Patel, Jason Wang, Nikhil Shivakumar Nayak, Suraj Srinivas, Himabindu Lakkaraju
State-Space Model Inspired Multiple-Input Multiple-Output Spiking Neurons Authors: Sanja Karilanova, Subhrakanti Dey, Ay\c{c}a \"Oz\c{c}elikkale
Learning Geometrically-Informed Lyapunov Functions with Deep Diffeomorphic RBF Networks Authors: Samuel Tesfazgi, Leonhard Sprandl, Sandra Hirche
Fourier Feature Attribution: A New Efficiency Attribution Method Authors: Zechen Liu, Feiyang Zhang, Wei Song, Xiang Li, Wei Wei
Efficient Model Editing with Task-Localized Sparse Fine-tuning Authors: Leonardo Iurada, Marco Ciccone, Tatiana Tommasi

1. TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining

ArXiv ID: 2504.02107

Authors: Jeffrey Li, Mohammadreza Armandpour, Iman Mirzadeh, Sachin Mehta, Vaishaal Shankar, Raviteja Vemulapalli, Samy Bengio, Oncel Tuzel, Mehrdad Farajtabar, Hadi Pouransari, Fartash Faghri

Abstract: Large Language Models (LLMs) trained on historical web data inevitably become outdated. We investigate evaluation strategies and update methods for LLMs as new data becomes available. We introduce a web-scale dataset for time-continual pretraining of LLMs derived from 114 dumps of Common Crawl (CC) - orders of magnitude larger than previous continual language modeling benchmarks. We also design time-stratified evaluations across both general CC data and specific domains (Wikipedia, StackExchange, and code documentation) to assess how well various continual learning methods adapt to new data while retaining past knowledge. Our findings demonstrate that, on general CC data, autoregressive meta-schedules combined with a fixed-ratio replay of older data can achieve comparable held-out loss to re-training from scratch, while requiring significantly less computation (2.6x). However, the optimal balance between incorporating new data and replaying old data differs as replay is crucial to avoid forgetting on generic web data but less so on specific domains.

Comment: Author match

2. MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators

ArXiv ID: 2504.02658

Authors: Beichen Huang, Yueming Yuan, Zelei Shao, Minjia Zhang

Abstract: A critical approach for efficiently deploying Mixture-of-Experts (MoE) models with massive parameters is quantization. However, state-of-the-art MoE models suffer from non-negligible accuracy loss with extreme quantization, such as under 4 bits. To address this, we introduce MiLo, a novel method that augments highly quantized MoEs with a mixture of low-rank compensators. These compensators consume only a small amount of additional memory but significantly recover accuracy loss from extreme quantization. MiLo also identifies that MoEmodels exhibit distinctive characteristics across weights due to their hybrid dense-sparse architectures, and employs adaptive rank selection policies along with iterative optimizations to close the accuracy gap. MiLo does not rely on calibration data, allowing it to generalize to different MoE models and datasets without overfitting to a calibration set. To avoid the hardware inefficiencies of extreme quantization, such as 3-bit, MiLo develops Tensor Core-friendly 3-bit kernels, enabling measured latency speedups on 3-bit quantized MoE models. Our evaluation shows that MiLo outperforms existing methods on SoTA MoE models across various tasks.

Comment: The paper addresses efficient quantization in Mixture-of-Experts (MoE) models, aligning closely with the 'Model Compression' and 'Model Architecture' criteria. It introduces low-rank compensators and adaptive rank selection policies, which are novel contributions.

Relevance: 10 Novelty: 9

3. Epistemic Closure and the Irreversibility of Misalignment: Modeling Systemic Barriers to Alignment Innovation

ArXiv ID: 2504.02058

Authors: Andy Williams

Abstract: Efforts to ensure the safe development of artificial general intelligence (AGI) often rely on consensus-based alignment approaches grounded in axiomatic formalism, interpretability, and empirical validation. However, these methods may be structurally unable to recognize or incorporate novel solutions that fall outside their accepted epistemic frameworks. This paper introduces a functional model of epistemic closure, in which cognitive, institutional, social, and infrastructural filters combine to make many alignment proposals illegible to existing evaluation systems. We present a weighted closure model supported by both theoretical and empirical sources, including a meta-analysis performed by an AI system on patterns of rejection and non-engagement with a framework for decentralized collective intelligence (DCI). We argue that the recursive failure to assess models like DCI is not just a sociological oversight but a structural attractor, mirroring the very risks of misalignment we aim to avoid in AGI. Without the adoption of DCI or a similarly recursive model of epistemic correction, we may be on a predictable path toward irreversible misalignment. The development and acceptance of this paper, first through simulated review and then through formal channels, provide a case study supporting its central claim: that epistemic closure can only be overcome by recursive modeling of the constraints that sustain it.

Comment: The paper introduces a theoretical model of epistemic closure and systemic barriers to alignment innovation, aligning with the 'Emerging Trends' criterion. It challenges established assumptions and proposes recursive modeling as a novel paradigm.

Relevance: 10 Novelty: 9

4. Example-Free Learning of Regular Languages with Prefix Queries

ArXiv ID: 2504.02170

Authors: Eve Fernando, Sasha Rubin, Rahul Gopinath

Abstract: Language learning refers to the problem of inferring a mathematical model which accurately represents a formal language. Many language learning algorithms learn by asking certain types of queries about the language being modeled. Language learning is of practical interest in the field of cybersecurity, where it is used to model the language accepted by a program's input parser (also known as its input processor). In this setting, a learner can only query a string of its choice by executing the parser on it, which limits the language learning algorithms that can be used. Most practical parsers can indicate not only whether the string is valid or not, but also where the parsing failed. This extra information can be leveraged into producing a type of query we call the prefix query. Notably, no existing language learning algorithms make use of prefix queries, though some ask membership queries i.e., they ask whether or not a given string is valid. When these approaches are used to learn the language of a parser, the prefix information provided by the parser remains unused. In this work, we present PL, the first known language learning algorithm to make use of the prefix query, and a novel modification of the classical L algorithm. We show both theoretically and empirically that PL is able to learn more efficiently than L due to its ability to exploit the additional information given by prefix queries over membership queries. Furthermore, we show how PL can be used to learn the language of a parser, by adapting it to a more practical setting in which prefix queries are the only source of information available to it; that is, it does not have access to any labelled examples or any other types of queries. We demonstrate empirically that, even in this more constrained setting, PL is still capable of accurately learning a range of languages of practical interest.

Comment: The paper introduces a novel algorithm (PL*) for learning regular languages using prefix queries, which is a cutting-edge theoretical contribution to representation learning and language modeling.

Relevance: 9 Novelty: 9

5. A Physics-Informed Meta-Learning Framework for the Continuous Solution of Parametric PDEs on Arbitrary Geometries

ArXiv ID: 2504.02459

Authors: Reza Najian Asl, Yusuke Yamazaki, Kianoosh Taghikhani, Mayu Muramatsu, Markus Apel, Shahed Rezaei

Abstract: In this work, we introduce implicit Finite Operator Learning (iFOL) for the continuous and parametric solution of partial differential equations (PDEs) on arbitrary geometries. We propose a physics-informed encoder-decoder network to establish the mapping between continuous parameter and solution spaces. The decoder constructs the parametric solution field by leveraging an implicit neural field network conditioned on a latent or feature code. Instance-specific codes are derived through a PDE encoding process based on the second-order meta-learning technique. In training and inference, a physics-informed loss function is minimized during the PDE encoding and decoding. iFOL expresses the loss function in an energy or weighted residual form and evaluates it using discrete residuals derived from standard numerical PDE methods. This approach results in the backpropagation of discrete residuals during both training and inference. iFOL features several key properties: (1) its unique loss formulation eliminates the need for the conventional encode-process-decode pipeline previously used in operator learning with conditional neural fields for PDEs; (2) it not only provides accurate parametric and continuous fields but also delivers solution-to-parameter gradients without requiring additional loss terms or sensitivity analysis; (3) it can effectively capture sharp discontinuities in the solution; and (4) it removes constraints on the geometry and mesh, making it applicable to arbitrary geometries and spatial sampling (zero-shot super-resolution capability). We critically assess these features and analyze the network's ability to generalize to unseen samples across both stationary and transient PDEs. The overall performance of the proposed method is promising, demonstrating its applicability to a range of challenging problems in computational mechanics.

Comment: The paper introduces a physics-informed meta-learning framework for solving parametric PDEs, which aligns with 'AI for Science' due to its foundational contributions to computational mechanics and operator learning.