Personalized Daily ArXiv Papers 2025-06-26

[gpt-4o]	Prompt	Completion	Total
Token	29100	3520	32620
Cost	$0.07	$0.04	$0.11

Total arXiv papers: 416

Total scanned papers: 251

Total relevant papers: 24

Table of contents with paper titles:

Engineering Sentience Authors: Konstantin Demin, Taylor Webb, Eric Elmoznino, Hakwan Lau
DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs Authors: Ruokai Yin, Yuhang Li, Donghyun Lee, Priyadarshini Panda
A foundation model with multi-variate parallel attention to generate neuronal activity Authors: Francesco Carzaniga, Michael Hersche, Abu Sebastian, Kaspar Schindler, Abbas Rahimi
Global Convergence of Iteratively Reweighted Least Squares for Robust Subspace Recovery Authors: Gilad Lerman, Kang Li, Tyler Maunu, Teng Zhang
Language Modeling by Language Models Authors: Junyan Cheng, Peter Clark, Kyle Richardson
An ab initio foundation model of wavefunctions that accurately describes chemical bond breaking Authors: Adam Foster, Zeno Sch\"atzle, P. Bern\'at Szab\'o, Lixue Cheng, Jonas K\"ohler, Gino Cassella, Nicholas Gao, Jiawei Li, Frank No\'e, Jan Hermann
Cross-Layer Discrete Concept Discovery for Interpreting Language Models Authors: Ankur Garg, Xuemin Yu, Hassan Sajjad, Samira Ebrahimi Kahou
DipSVD: Dual-importance Protected SVD for Efficient LLM Compression Authors: Xuan Ding, Rui Sun, Yunjian Zhang, Xiu Yan, Yueqi Zhou, Kaihao Huang, Suzhong Fu, Chuanlong Xie, Yao Zhu
Orthogonal Soft Pruning for Efficient Class Unlearning Authors: Qinghui Gong, Xue Yang, Xiaohu Tang
Disentangled representations of microscopy images Authors: Jacopo Dapueto, Vito Paolo Pastore, Nicoletta Noceti, Francesca Odone
Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs Authors: Sonia K. Murthy, Rosie Zhao, Jennifer Hu, Sham Kakade, Markus Wulfmeier, Peng Qian, Tomer Ullman
Enhancing Large Language Models through Structured Reasoning Authors: Yubo Dong, Hehe Fan
Prover Agent: An Agent-based Framework for Formal Mathematical Proofs Authors: Kaito Baba, Chaoran Liu, Shuhei Kurita, Akiyoshi Sannai
Mixtures of Neural Cellular Automata: A Stochastic Framework for Growth Modelling and Self-Organization Authors: Salvatore Milite, Giulio Caravagna, Andrea Sottoriva
DualEquiNet: A Dual-Space Hierarchical Equivariant Network for Large Biomolecules Authors: Junjie Xu, Jiahao Zhang, Mangal Prakash, Xiang Zhang, Suhang Wang
SEED: A Structural Encoder for Embedding-Driven Decoding in Time Series Prediction with LLMs Authors: Fengze Li, Yue Wang, Yangle Liu, Ming Huang, Dou Hong, Jieming Ma
COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees Authors: Zhiyuan Wang, Jinhao Duan, Qingni Wang, Xiaofeng Zhu, Tianlong Chen, Xiaoshuang Shi, Kaidi Xu
Mastering Multiple-Expert Routing: Realizable $H$-Consistency and Strong Guarantees for Learning to Defer Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong
Self-Supervised Graph Learning via Spectral Bootstrapping and Laplacian-Based Augmentations Authors: Lorenzo Bini, Stephane Marchand-Maillet
Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture Authors: Shuchen Xue, Tianyu Xie, Tianyang Hu, Zijin Feng, Jiacheng Sun, Kenji Kawaguchi, Zhenguo Li, Zhi-Ming Ma
A Complete Loss Landscape Analysis of Regularized Deep Matrix Factorization Authors: Po Chen, Rujun Jiang, Peng Wang
Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models Authors: Kejia Chen, Jiawen Zhang, Jiacong Hu, Yu Wang, Jian Lou, Zunlei Feng, Mingli Song
Lost in Retraining: Roaming the Parameter Space of Exponential Families Under Closed-Loop Learning Authors: Fariba Jangjoo, Matteo Marsili, Yasser Roudi
Argumentative Ensembling for Robust Recourse under Model Multiplicity Authors: Junqi Jiang, Antonio Rago, Francesco Leofante, Francesca Toni

1. Engineering Sentience

ArXiv ID: 2506.20504

Authors: Konstantin Demin, Taylor Webb, Eric Elmoznino, Hakwan Lau

Abstract: We spell out a definition of sentience that may be useful for designing and building it in machines. We propose that for sentience to be meaningful for AI, it must be fleshed out in functional, computational terms, in enough detail to allow for implementation. Yet, this notion of sentience must also reflect something essentially 'subjective', beyond just having the general capacity to encode perceptual content. For this specific functional notion of sentience to occur, we propose that certain sensory signals need to be both assertoric (persistent) and qualitative. To illustrate the definition in more concrete terms, we sketch out some ways for potential implementation, given current technology. Understanding what it takes for artificial agents to be functionally sentient can also help us avoid creating them inadvertently, or at least, realize that we have created them in a timely manner.

Comment: The paper discusses the concept of engineering sentience in AI, which is an emerging trend challenging established assumptions about AI capabilities.

Relevance: 9 Novelty: 9

2. DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs

ArXiv ID: 2506.20194

Authors: Ruokai Yin, Yuhang Li, Donghyun Lee, Priyadarshini Panda

Abstract: Large language models (LLMs) deliver strong performance but are difficult to deploy due to high memory and compute costs. While pruning reduces these demands, most methods ignore activation sparsity observed at runtime. We reinterpret activation sparsity as dynamic structured weight sparsity and propose DuoGPT, a unified framework that constructs dual-sparse (spMspV) workloads by combining unstructured weight pruning with activation sparsity. To preserve accuracy, we extend the Optimal Brain Compression (OBC) framework with activation-aware calibration and introduce output residuals from the dense model as correction terms. We further optimize the solution for efficient GPU execution, enabling scalability to billion-parameter LLMs. Evaluations on LLaMA-2 and LLaMA-3 show that DuoGPT outperforms state-of-the-art structured pruning methods by up to 9.17% accuracy at an iso-speedup of 1.39$\times$ compared to the baseline dense model.

Comment: The paper introduces a novel framework for dual sparsity in LLMs, focusing on pruning and activation sparsity, which aligns with the model compression criterion.

Relevance: 9 Novelty: 8

3. A foundation model with multi-variate parallel attention to generate neuronal activity

ArXiv ID: 2506.20354

Authors: Francesco Carzaniga, Michael Hersche, Abu Sebastian, Kaspar Schindler, Abbas Rahimi

Abstract: Learning from multi-variate time-series with heterogeneous channel configurations remains a fundamental challenge for deep neural networks (DNNs), particularly in clinical domains such as intracranial electroencephalography (iEEG), where channel setups vary widely across subjects. In this work, we introduce multi-variate parallel attention (MVPA), a novel self-attention mechanism that disentangles content, temporal, and spatial attention, enabling flexible, generalizable, and efficient modeling of time-series data with varying channel counts and configurations. We use MVPA to build MVPFormer, a generative foundation model for human electrophysiology, trained to predict the evolution of iEEG signals across diverse subjects. To support this and future effort by the community, we release the SWEC iEEG dataset, the largest publicly available iEEG dataset to date, comprising nearly 10,000 hours of recordings from heterogeneous clinical sources. MVPFormer leverages MVPA to achieve strong generalization across subjects, demonstrating expert-level performance in seizure detection and outperforming state-of-the-art Transformer baselines on our SWEC, the MAYO, and the FNUSA dataset. We further validate MVPA on standard time-series forecasting and classification tasks, where it matches or exceeds existing attention-based models. Together, our contributions establish MVPA as a general-purpose attention mechanism for heterogeneous time-series and MVPFormer as the first open-source, open-weights, and open-data iEEG foundation model with state-of-the-art clinical performance. The code is available at https://github.com/IBM/multi-variate-parallel-transformer. The SWEC iEEG dataset is available at https://mb-neuro.medical-blocks.ch/public_access/databases/ieeg/swec_ieeg.

Comment: The paper introduces a novel self-attention mechanism and a generative foundation model, relevant to model architecture and foundational model research.

Relevance: 9 Novelty: 8

4. Global Convergence of Iteratively Reweighted Least Squares for Robust Subspace Recovery

ArXiv ID: 2506.20533

Authors: Gilad Lerman, Kang Li, Tyler Maunu, Teng Zhang

Abstract: Robust subspace estimation is fundamental to many machine learning and data analysis tasks. Iteratively Reweighted Least Squares (IRLS) is an elegant and empirically effective approach to this problem, yet its theoretical properties remain poorly understood. This paper establishes that, under deterministic conditions, a variant of IRLS with dynamic smoothing regularization converges linearly to the underlying subspace from any initialization. We extend these guarantees to affine subspace estimation, a setting that lacks prior recovery theory. Additionally, we illustrate the practical benefits of IRLS through an application to low-dimensional neural network training. Our results provide the first global convergence guarantees for IRLS in robust subspace recovery and, more broadly, for nonconvex IRLS on a Riemannian manifold.

Comment: The paper provides theoretical insights into Iteratively Reweighted Least Squares (IRLS) for robust subspace recovery, which is relevant to representation learning.

Relevance: 9 Novelty: 8

5. Language Modeling by Language Models

ArXiv ID: 2506.20249

Authors: Junyan Cheng, Peter Clark, Kyle Richardson

Abstract: Can we leverage LLMs to model the process of discovering novel language model (LM) architectures? Inspired by real research, we propose a multi-agent LLM approach that simulates the conventional stages of research, from ideation and literature search (proposal stage) to design implementation (code generation), generative pre-training, and downstream evaluation (verification). Using ideas from scaling laws, our system, Genesys, employs a Ladder of Scales approach; new designs are proposed, adversarially reviewed, implemented, and selectively verified at increasingly larger model scales (14M$\sim$350M parameters) with a narrowing budget (the number of models we can train at each scale). To help make discovery efficient and factorizable, Genesys uses a novel genetic programming backbone, which we show has empirical advantages over commonly used direct prompt generation workflows (e.g., $\sim$86\% percentage point improvement in successful design generation, a key bottleneck). We report experiments involving 1,162 newly discovered designs (1,062 fully verified through pre-training) and find the best designs to be highly competitive with known architectures (e.g., outperform GPT2, Mamba2, etc., on 6/9 common benchmarks). We couple these results with comprehensive system-level ablations and formal results, which give broader insights into the design of effective autonomous discovery systems.

Comment: The paper discusses a novel approach to discovering language model architectures using a multi-agent LLM system, which aligns with foundational research in LLMs.

Relevance: 9 Novelty: 8

6. An ab initio foundation model of wavefunctions that accurately describes chemical bond breaking

ArXiv ID: 2506.19960

Authors: Adam Foster, Zeno Sch\"atzle, P. Bern\'at Szab\'o, Lixue Cheng, Jonas K\"ohler, Gino Cassella, Nicholas Gao, Jiawei Li, Frank No\'e, Jan Hermann

Abstract: Reliable description of bond breaking remains a major challenge for quantum chemistry due to the multireferential character of the electronic structure in dissociating species. Multireferential methods in particular suffer from large computational cost, which under the normal paradigm has to be paid anew for each system at a full price, ignoring commonalities in electronic structure across molecules. Quantum Monte Carlo with deep neural networks (deep QMC) uniquely offers to exploit such commonalities by pretraining transferable wavefunction models, but all such attempts were so far limited in scope. Here, we bring this new paradigm to fruition with Orbformer, a novel transferable wavefunction model pretrained on 22,000 equilibrium and dissociating structures that can be fine-tuned on unseen molecules reaching an accuracy-cost ratio rivalling classical multireferential methods. On established benchmarks as well as more challenging bond dissociations and Diels-Alder reactions, Orbformer is the only method that consistently converges to chemical accuracy (1 kcal/mol). This work turns the idea of amortizing the cost of solving the Schr\"odinger equation over many molecules into a practical approach in quantum chemistry.

Comment: The paper presents a novel transferable wavefunction model, Orbformer, which is a foundational research in molecular modeling, aligning with AI for Science.

Relevance: 9 Novelty: 8

7. Cross-Layer Discrete Concept Discovery for Interpreting Language Models

ArXiv ID: 2506.20040

Authors: Ankur Garg, Xuemin Yu, Hassan Sajjad, Samira Ebrahimi Kahou

Abstract: Uncovering emergent concepts across transformer layers remains a significant challenge because the residual stream linearly mixes and duplicates information, obscuring how features evolve within large language models. Current research efforts primarily inspect neural representations at single layers, thereby overlooking this cross-layer superposition and the redundancy it introduces. These representations are typically either analyzed directly for activation patterns or passed to probing classifiers that map them to a limited set of predefined concepts. To address these limitations, we propose \gls{clvqvae}, a framework that uses vector quantization to map representations across layers and in the process collapse duplicated residual-stream features into compact, interpretable concept vectors. Our approach uniquely combines top-$k$ temperature-based sampling during quantization with EMA codebook updates, providing controlled exploration of the discrete latent space while maintaining code-book diversity. We further enhance the framework with scaled-spherical k-means++ for codebook initialization, which clusters by directional similarity rather than magnitude, better aligning with semantic structure in word embedding space.

Comment: The paper introduces a framework for interpreting language models using vector quantization, relevant to representation learning and model architecture analysis.

Relevance: 9 Novelty: 8

8. DipSVD: Dual-importance Protected SVD for Efficient LLM Compression

ArXiv ID: 2506.20353

Authors: Xuan Ding, Rui Sun, Yunjian Zhang, Xiu Yan, Yueqi Zhou, Kaihao Huang, Suzhong Fu, Chuanlong Xie, Yao Zhu

Abstract: The ever-increasing computational demands and deployment costs of large language models (LLMs) have spurred numerous compressing methods. Compared to quantization and unstructured pruning, SVD compression offers superior hardware compatibility and theoretical guarantees. However, existing SVD-based methods focus on the overall discrepancy between the original and compressed matrices while overlooking the protection of critical components within the matrix, which leads to inferior performance in the compressed models. This paper proposes a dual-level importance protection mechanism to enhance SVD-based compression methods: (1) local importance protection: preserving the most critical singular vectors within each weight matrix through channel-weighted data whitening; and (2) global importance protection: enabling less important layers to bear a greater portion of the compression burden through either a heuristic or optimization-based approach, thereby minimizing the impact of compression on critical layers. Extensive experiments demonstrate that DipSVD outperforms existing SVD-based compression approaches across multiple benchmarks, achieving superior model performance especially at high model compression ratios.

Comment: The paper proposes a dual-level importance protection mechanism for SVD-based compression, which is relevant to model compression through low-rank approaches.

Relevance: 9 Novelty: 7

9. Orthogonal Soft Pruning for Efficient Class Unlearning

ArXiv ID: 2506.19891

Authors: Qinghui Gong, Xue Yang, Xiaohu Tang

Abstract: Machine unlearning aims to selectively remove class-specific knowledge from pretrained neural networks to satisfy privacy regulations such as the GDPR. Existing methods typically face a trade-off between unlearning speed and preservation of predictive accuracy, often incurring either high computational overhead or significant performance degradation on retained classes. In this paper, we propose a novel class-aware soft pruning framework leveraging orthogonal convolutional kernel regularization to achieve rapid and precise forgetting with millisecond-level response times. By enforcing orthogonality constraints during training, our method decorrelates convolutional filters and disentangles feature representations, while efficiently identifying class-specific channels through activation difference analysis. Extensive evaluations across multiple architectures and datasets demonstrate stable pruning with near-instant execution, complete forgetting of targeted classes, and minimal accuracy loss on retained data. Experiments on CIFAR-10, CIFAR-100, and TinyImageNet confirm that our approach substantially reduces membership inference attack risks and accelerates unlearning by orders of magnitude compared to state-of-the-art baselines. This framework provides an efficient, practical solution for real-time machine unlearning in Machine Learning as a Service (MLaaS) scenarios.

Comment: The paper presents a novel class-aware soft pruning framework, which aligns with the model compression criterion, focusing on pruning techniques.

Relevance: 9 Novelty: 7

10. Disentangled representations of microscopy images

ArXiv ID: 2506.20649

Authors: Jacopo Dapueto, Vito Paolo Pastore, Nicoletta Noceti, Francesca Odone

Abstract: Microscopy image analysis is fundamental for different applications, from diagnosis to synthetic engineering and environmental monitoring. Modern acquisition systems have granted the possibility to acquire an escalating amount of images, requiring a consequent development of a large collection of deep learning-based automatic image analysis methods. Although deep neural networks have demonstrated great performance in this field, interpretability, an essential requirement for microscopy image analysis, remains an open challenge. This work proposes a Disentangled Representation Learning (DRL) methodology to enhance model interpretability for microscopy image classification. Exploiting benchmark datasets from three different microscopic image domains (plankton, yeast vacuoles, and human cells), we show how a DRL framework, based on transferring a representation learnt from synthetic data, can provide a good trade-off between accuracy and interpretability in this domain.

Comment: The paper proposes a disentangled representation learning methodology, which is relevant to representation learning.

Relevance: 9 Novelty: 7

11. Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs

ArXiv ID: 2506.20666

Authors: Sonia K. Murthy, Rosie Zhao, Jennifer Hu, Sham Kakade, Markus Wulfmeier, Peng Qian, Tomer Ullman

Abstract: Navigating everyday social situations often requires juggling conflicting goals, such as conveying a harsh truth, maintaining trust, all while still being mindful of another person's feelings. These value trade-offs are an integral part of human decision-making and language use, however, current tools for interpreting such dynamic and multi-faceted notions of values in LLMs are limited. In cognitive science, so-called "cognitive models" provide formal accounts of these trade-offs in humans, by modeling the weighting of a speaker's competing utility functions in choosing an action or utterance. In this work, we use a leading cognitive model of polite speech to interpret the extent to which LLMs represent human-like trade-offs. We apply this lens to systematically evaluate value trade-offs in two encompassing model settings: degrees of reasoning "effort" in frontier black-box models, and RL post-training dynamics of open-source models. Our results highlight patterns of higher informational utility than social utility in reasoning models, and in open-source models shown to be stronger in mathematical reasoning. Our findings from LLMs' training dynamics suggest large shifts in utility values early on in training with persistent effects of the choice of base model and pretraining data, compared to feedback dataset or alignment method. We show that our method is responsive to diverse aspects of the rapidly evolving LLM landscape, with insights for forming hypotheses about other high-level behaviors, shaping training regimes for reasoning models, and better controlling trade-offs between values during model training.

Comment: The paper uses cognitive models to interpret value trade-offs in LLMs, providing insights into LLM behavior and interpretability.

Relevance: 9 Novelty: 7

12. Enhancing Large Language Models through Structured Reasoning

ArXiv ID: 2506.20241

Authors: Yubo Dong, Hehe Fan

Abstract: Recent Large Language Models (LLMs) have significantly advanced natural language processing and automated decision-making. However, these models still encounter difficulties when performing complex reasoning tasks involving logical deduction and systematic planning, primarily due to their reliance on implicit statistical relationships without structured knowledge representation.Inspired by cognitive science and neurosymbolic AI, we introduce a novel approach to enhance LLMs through explicit structured reasoning. First, we convert unstructured data into structured formats by explicitly annotating reasoning steps. We then employ this structured dataset to train LLMs through Supervised Fine-Tuning (SFT). Additionally, we enhance the structured reasoning capabilities of LLMs using Group Relative Policy Optimization (GRPO), incorporating two innovative algorithms--MAX-Flow and Longest Common Subsequence (LCS)--which notably improve reasoning effectiveness and reduce computational complexity. Experimental results from fine-tuning a DeepSeek-R1-Distill-Qwen-1.5B model demonstrate concise reasoning, robust performance across various scenarios, and improved compatibility with optimization techniques, validating the efficacy of structured reasoning integration in LLMs.

Comment: The paper proposes structured reasoning to enhance LLMs, which is relevant to foundational research in LLM architecture and theoretical insights.

Relevance: 9 Novelty: 7

13. Prover Agent: An Agent-based Framework for Formal Mathematical Proofs

ArXiv ID: 2506.19923

Authors: Kaito Baba, Chaoran Liu, Shuhei Kurita, Akiyoshi Sannai

Abstract: We present Prover Agent, a novel AI agent for automated theorem proving that integrates large language models (LLMs) with a formal proof assistant, Lean. Prover Agent coordinates an informal reasoning LLM, a formal prover model, and feedback from Lean while also generating auxiliary lemmas to assist in discovering the overall proof strategy. It achieves an 86.1% success rate on the MiniF2F benchmark, establishing a new state-of-the-art among methods using small language models (SLMs) with a much lower sample budget than previous approaches. We also present case studies illustrating how these generated lemmas contribute to solving challenging problems.

Comment: The paper presents a novel AI agent for automated theorem proving, integrating LLMs with a formal proof assistant, relevant to LLM theoretical insights.

Relevance: 8 Novelty: 8

14. Mixtures of Neural Cellular Automata: A Stochastic Framework for Growth Modelling and Self-Organization

ArXiv ID: 2506.20486

Authors: Salvatore Milite, Giulio Caravagna, Andrea Sottoriva

Abstract: Neural Cellular Automata (NCAs) are a promising new approach to model self-organizing processes, with potential applications in life science. However, their deterministic nature limits their ability to capture the stochasticity of real-world biological and physical systems. We propose the Mixture of Neural Cellular Automata (MNCA), a novel framework incorporating the idea of mixture models into the NCA paradigm. By combining probabilistic rule assignments with intrinsic noise, MNCAs can model diverse local behaviors and reproduce the stochastic dynamics observed in biological processes. We evaluate the effectiveness of MNCAs in three key domains: (1) synthetic simulations of tissue growth and differentiation, (2) image morphogenesis robustness, and (3) microscopy image segmentation. Results show that MNCAs achieve superior robustness to perturbations, better recapitulate real biological growth patterns, and provide interpretable rule segmentation. These findings position MNCAs as a promising tool for modeling stochastic dynamical systems and studying self-growth processes.

Comment: The paper introduces Mixtures of Neural Cellular Automata, a novel framework for modeling stochastic dynamical systems, which is relevant to emerging trends in model architecture.

Relevance: 8 Novelty: 8

15. DualEquiNet: A Dual-Space Hierarchical Equivariant Network for Large Biomolecules

ArXiv ID: 2506.19862

Authors: Junjie Xu, Jiahao Zhang, Mangal Prakash, Xiang Zhang, Suhang Wang

Abstract: Geometric graph neural networks (GNNs) that respect E(3) symmetries have achieved strong performance on small molecule modeling, but they face scalability and expressiveness challenges when applied to large biomolecules such as RNA and proteins. These systems require models that can simultaneously capture fine-grained atomic interactions, long-range dependencies across spatially distant components, and biologically relevant hierarchical structure, such as atoms forming residues, which in turn form higher-order domains. Existing geometric GNNs, which typically operate exclusively in either Euclidean or Spherical Harmonics space, are limited in their ability to capture both the fine-scale atomic details and the long-range, symmetry-aware dependencies required for modeling the multi-scale structure of large biomolecules. We introduce DualEquiNet, a Dual-Space Hierarchical Equivariant Network that constructs complementary representations in both Euclidean and Spherical Harmonics spaces to capture local geometry and global symmetry-aware features. DualEquiNet employs bidirectional cross-space message passing and a novel Cross-Space Interaction Pooling mechanism to hierarchically aggregate atomic features into biologically meaningful units, such as residues, enabling efficient and expressive multi-scale modeling for large biomolecular systems. DualEquiNet achieves state-of-the-art performance on multiple existing benchmarks for RNA property prediction and protein modeling, and outperforms prior methods on two newly introduced 3D structural benchmarks demonstrating its broad effectiveness across a range of large biomolecule modeling tasks.

Comment: The paper introduces a novel network architecture for biomolecular modeling, which aligns with the AI for Science criterion, focusing on foundational research in molecular modeling.