Personalized Daily ArXiv Papers 2025-09-05

[gpt-4o]	Prompt	Completion	Total
Token	32487	3966	36453
Cost	$0.08	$0.04	$0.12

Total arXiv papers: 417

Total scanned papers: 244

Total relevant papers: 21

Table of contents with paper titles:

Differentiable Entropy Regularization for Geometry and Neural Networks Authors: Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma
Towards a Unified View of Large Language Model Post-Training Authors: Xingtai Lv, Yuxin Zuo, Youbang Sun, Hongyi Liu, Yuntian Wei, Zhekai Chen, Lixuan He, Xuekai Zhu, Kaiyan Zhang, Bingning Wang, Ning Ding, Bowen Zhou
Natural Latents: Latent Variables Stable Across Ontologies Authors: John Wentworth, David Lorell
Sparse Autoencoder Neural Operators: Model Recovery in Function Spaces Authors: Bahareh Tolooshams, Ailsa Shen, Anima Anandkumar
Improving Factuality in LLMs via Inference-Time Knowledge Graph Construction Authors: Shanglin Wu, Lihui Liu, Jinho D. Choi, Kai Shu
PagedEviction: Structured Block-wise KV Cache Pruning for Efficient Large Language Model Inference Authors: Krishna Teja Chitty-Venkata, Jie Ye, Xian-He Sun, Anthony Kougkas, Murali Emani, Venkatram Vishwanath, Bogdan Nicolae
IPA: An Information-Preserving Input Projection Framework for Efficient Foundation Model Adaptation Authors: Yuan Yin, Shashanka Venkataramanan, Tuan-Hung Vu, Andrei Bursuc, Matthieu Cord
Rethinking the long-range dependency in Mamba/SSM and transformer models Authors: Cong Ma, Kayvan Najarian
The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs Authors: Pengrui Han, Rafal Kocielnik, Peiyang Song, Ramit Debnath, Dean Mobbs, Anima Anandkumar, R. Michael Alvarez
MEPG:Multi-Expert Planning and Generation for Compositionally-Rich Image Generation Authors: Yuan Zhao, Liu Lin
Transition Models: Rethinking the Generative Learning Objective Authors: Zidong Wang, Yiyuan Zhang, Xiaoyu Yue, Xiangyu Yue, Yangguang Li, Wanli Ouyang, Lei Bai
Nonnegative matrix factorization and the principle of the common cause Authors: E. Khalafyan, A. E. Allahverdyan, A. Hovhannisyan
Breaking the Mirror: Activation-Based Mitigation of Self-Preference in LLM Evaluators Authors: Dani Roytburg, Matthew Bozoukov, Matthew Nguyen, Jou Barzdukas, Simon Fu, Narmeen Oozeer
Detecting Regional Spurious Correlations in Vision Transformers via Token Discarding Authors: Solha Kang, Esla Timothy Anzaku, Wesley De Neve, Arnout Van Messem, Joris Vankerschaver, Francois Rameau, Utku Ozbulak
Intermediate Languages Matter: Formal Languages and LLMs affect Neurosymbolic Reasoning Authors: Alexander Beiser, David Penz, Nysret Musliu
Hierarchical Federated Foundation Models over Wireless Networks for Multi-Modal Multi-Task Intelligence: Integration of Edge Learning with D2D/P2P-Enabled Fog Learning Architectures Authors: Payam Abdisarabshali, Fardis Nadimi, Kasra Borazjani, Naji Khosravan, Minghui Liwang, Wei Ni, Dusit Niyato, Michael Langberg, Seyyedali Hosseinalipour
Interpretable Clustering with Adaptive Heterogeneous Causal Structure Learning in Mixed Observational Data Authors: Wenrui Li, Qinghao Zhang, Xiaowo Wang
ArcMemo: Abstract Reasoning Composition with Lifelong LLM Memory Authors: Matthew Ho, Chen Si, Zhaoxiang Feng, Fangxu Yu, Zhijian Liu, Zhiting Hu, Lianhui Qin
Delta Activations: A Representation for Finetuned Large Language Models Authors: Zhiqiu Xu, Amish Sethi, Mayur Naik, Ser-Nam Lim
MTQA:Matrix of Thought for Enhanced Reasoning in Complex Question Answering Authors: Fengxiao Tang, Yufeng Li, Zongzong Wu, Ming Zhao
CEHR-GPT: A Scalable Multi-Task Foundation Model for Electronic Health Records Authors: Chao Pang, Jiheum Park, Xinzhuo Jiang, Nishanth Parameshwar Pavinkurve, Krishna S. Kalluri, Shalmali Joshi, No\'emie Elhadad, Karthik Natarajan

1. Differentiable Entropy Regularization for Geometry and Neural Networks

ArXiv ID: 2509.03733

Authors: Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

Abstract: We introduce a differentiable estimator of range-partition entropy, a recent concept from computational geometry that enables algorithms to adapt to the "sortedness" of their input. While range-partition entropy provides strong guarantees in algorithm design, it has not yet been made accessible to deep learning. In this work, we (i) propose the first differentiable approximation of range-partition entropy, enabling its use as a trainable loss or regularizer; (ii) design EntropyNet, a neural module that restructures data into low-entropy forms to accelerate downstream instance-optimal algorithms; and (iii) extend this principle beyond geometry by applying entropy regularization directly to Transformer attention. Across tasks, we demonstrate that differentiable entropy improves efficiency without degrading correctness: in geometry, our method achieves up to $4.1\times$ runtime speedups with negligible error ($<0.2%$); in deep learning, it induces structured attention patterns that yield 6% higher accuracy at 80% sparsity compared to L1 baselines. Our theoretical analysis provides approximation bounds for the estimator, and extensive ablations validate design choices. These results suggest that entropy-bounded computation is not only theoretically elegant but also a practical mechanism for adaptive learning, efficiency, and structured representation.

Comment: The paper introduces a differentiable estimator of range-partition entropy and applies it to neural networks, contributing to representation learning and model efficiency.

Relevance: 9 Novelty: 8

2. Towards a Unified View of Large Language Model Post-Training

ArXiv ID: 2509.04419

Authors: Xingtai Lv, Yuxin Zuo, Youbang Sun, Hongyi Liu, Yuntian Wei, Zhekai Chen, Lixuan He, Xuekai Zhu, Kaiyan Zhang, Bingning Wang, Ning Ding, Bowen Zhou

Abstract: Two major sources of training data exist for post-training modern language models: online (model-generated rollouts) data, and offline (human or other-model demonstrations) data. These two types of data are typically used by approaches like Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT), respectively. In this paper, we show that these approaches are not in contradiction, but are instances of a single optimization process. We derive a Unified Policy Gradient Estimator, and present the calculations of a wide spectrum of post-training approaches as the gradient of a common objective under different data distribution assumptions and various bias-variance tradeoffs. The gradient estimator is constructed with four interchangeable parts: stabilization mask, reference policy denominator, advantage estimate, and likelihood gradient. Motivated by our theoretical findings, we propose Hybrid Post-Training (HPT), an algorithm that dynamically selects different training signals. HPT is designed to yield both effective exploitation of demonstration and stable exploration without sacrificing learned reasoning patterns. We provide extensive experiments and ablation studies to verify the effectiveness of our unified theoretical framework and HPT. Across six mathematical reasoning benchmarks and two out-of-distribution suites, HPT consistently surpasses strong baselines across models of varying scales and families.

Comment: The paper provides a unified theoretical framework for post-training LLMs, which is a significant contribution to foundational research on LLMs.

Relevance: 9 Novelty: 8

3. Natural Latents: Latent Variables Stable Across Ontologies

ArXiv ID: 2509.03780

Authors: John Wentworth, David Lorell

Abstract: Suppose two Bayesian agents each learn a generative model of the same environment. We will assume the two have converged on the predictive distribution, i.e. distribution over some observables in the environment, but may have different generative models containing different latent variables. Under what conditions can one agent guarantee that their latents are a function of the other agents latents? We give simple conditions under which such translation is guaranteed to be possible: the natural latent conditions. We also show that, absent further constraints, these are the most general conditions under which translatability is guaranteed. Crucially for practical application, our theorems are robust to approximation error in the natural latent conditions.

Comment: The paper discusses conditions under which latent variables can be translated between different generative models, contributing to foundational research in representation learning.

Relevance: 9 Novelty: 8

4. Sparse Autoencoder Neural Operators: Model Recovery in Function Spaces

ArXiv ID: 2509.03738

Authors: Bahareh Tolooshams, Ailsa Shen, Anima Anandkumar

Abstract: We frame the problem of unifying representations in neural models as one of sparse model recovery and introduce a framework that extends sparse autoencoders (SAEs) to lifted spaces and infinite-dimensional function spaces, enabling mechanistic interpretability of large neural operators (NO). While the Platonic Representation Hypothesis suggests that neural networks converge to similar representations across architectures, the representational properties of neural operators remain underexplored despite their growing importance in scientific computing. We compare the inference and training dynamics of SAEs, lifted-SAE, and SAE neural operators. We highlight how lifting and operator modules introduce beneficial inductive biases, enabling faster recovery, improved recovery of smooth concepts, and robust inference across varying resolutions, a property unique to neural operators.

Comment: The paper extends sparse autoencoders to function spaces, contributing to foundational research in representation learning and model recovery.

Relevance: 9 Novelty: 8

5. Improving Factuality in LLMs via Inference-Time Knowledge Graph Construction

ArXiv ID: 2509.03540

Authors: Shanglin Wu, Lihui Liu, Jinho D. Choi, Kai Shu

Abstract: Large Language Models (LLMs) often struggle with producing factually consistent answers due to limitations in their parametric memory. Retrieval-Augmented Generation (RAG) methods address this issue by incorporating external knowledge from trusted sources at inference time. However, such methods typically treat knowledge as unstructured text, which limits their ability to support compositional reasoning and identify factual inconsistencies. To overcome these limitations, we propose a novel framework that dynamically constructs and expands knowledge graphs (KGs) during inference, integrating both internal knowledge extracted from LLMs and external information retrieved from external sources. Our method begins by extracting a seed KG from the question via prompting, followed by iterative expansion using the LLM's latent knowledge. The graph is then selectively refined through external retrieval, enhancing factual coverage and correcting inaccuracies. We evaluate our approach on three diverse factual QA benchmarks, demonstrating consistent improvements in factual accuracy, answer precision, and interpretability over baseline prompting and static KG-augmented methods. Our findings suggest that inference-time KG construction is a promising direction for enhancing LLM factuality in a structured, interpretable, and scalable manner.

Comment: The paper introduces a novel framework for improving factuality in LLMs by constructing knowledge graphs during inference, which aligns with foundational research in LLM behavior and interpretability.

Relevance: 9 Novelty: 8

6. PagedEviction: Structured Block-wise KV Cache Pruning for Efficient Large Language Model Inference

ArXiv ID: 2509.04377

Authors: Krishna Teja Chitty-Venkata, Jie Ye, Xian-He Sun, Anthony Kougkas, Murali Emani, Venkatram Vishwanath, Bogdan Nicolae

Abstract: KV caching significantly improves the efficiency of Large Language Model (LLM) inference by storing attention states from previously processed tokens, enabling faster generation of subsequent tokens. However, as sequence length increases, the KV cache quickly becomes a major memory bottleneck. To address this, we propose PagedEviction, a novel fine-grained, structured KV cache pruning strategy that enhances the memory efficiency of vLLM's PagedAttention. Unlike existing approaches that rely on attention-based token importance or evict tokens across different vLLM pages, PagedEviction introduces an efficient block-wise eviction algorithm tailored for paged memory layouts. Our method integrates seamlessly with PagedAttention without requiring any modifications to its CUDA attention kernels. We evaluate PagedEviction across Llama-3.1-8B-Instruct, Llama-3.2-1B-Instruct, and Llama-3.2-3B-Instruct models on the LongBench benchmark suite, demonstrating improved memory usage with better accuracy than baselines on long context tasks.

Comment: The paper introduces PagedEviction, a novel KV cache pruning strategy for LLM inference, which aligns with the model compression criterion focusing on efficiency breakthroughs.

Relevance: 9 Novelty: 8

7. IPA: An Information-Preserving Input Projection Framework for Efficient Foundation Model Adaptation

ArXiv ID: 2509.04398

Authors: Yuan Yin, Shashanka Venkataramanan, Tuan-Hung Vu, Andrei Bursuc, Matthieu Cord

Abstract: Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, reduce adaptation cost by injecting low-rank updates into pretrained weights. However, LoRA's down-projection is randomly initialized and data-agnostic, discarding potentially useful information. Prior analyses show that this projection changes little during training, while the up-projection carries most of the adaptation, making the random input compression a performance bottleneck. We propose IPA, a feature-aware projection framework that explicitly preserves information in the reduced hidden space. In the linear case, we instantiate IPA with algorithms approximating top principal components, enabling efficient projector pretraining with negligible inference overhead. Across language and vision benchmarks, IPA consistently improves over LoRA and DoRA, achieving on average 1.5 points higher accuracy on commonsense reasoning and 2.3 points on VTAB-1k, while matching full LoRA performance with roughly half the trainable parameters when the projection is frozen.

Comment: The paper proposes a new framework for efficient foundation model adaptation, which aligns with model compression and efficiency improvements.

Relevance: 9 Novelty: 8

8. Rethinking the long-range dependency in Mamba/SSM and transformer models

ArXiv ID: 2509.04226

Authors: Cong Ma, Kayvan Najarian

Abstract: Long-range dependency is one of the most desired properties of recent sequence models such as state-space models (particularly Mamba) and transformer models. New model architectures are being actively developed and benchmarked for prediction tasks requiring long-range dependency. However, the capability of modeling long-range dependencies of these models has not been investigated from a theoretical perspective, which hinders a systematic improvement on this aspect. In this work, we mathematically define long-range dependency using the derivative of hidden states with respect to past inputs and compare the capability of SSM and transformer models of modeling long-range dependency based on this definition. We showed that the long-range dependency of SSM decays exponentially with the sequence length, which aligns with the exponential decay of memory function in RNN. But the attention mechanism used in transformers is more flexible and is not constrained to exponential decay, which could in theory perform better at modeling long-range dependency with sufficient training data, computing resources, and proper training. To combine the flexibility of long-range dependency of attention mechanism and computation efficiency of SSM, we propose a new formulation for hidden state update in SSM and prove its stability under a standard Gaussian distribution of the input data.

Comment: The paper provides a theoretical analysis of long-range dependency in SSM and transformer models, which is relevant to model architecture and offers insights into the behavior of these models.

Relevance: 9 Novelty: 8

9. The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs

ArXiv ID: 2509.03730

Authors: Pengrui Han, Rafal Kocielnik, Peiyang Song, Ramit Debnath, Dean Mobbs, Anima Anandkumar, R. Michael Alvarez

Abstract: Personality traits have long been studied as predictors of human behavior.Recent advances in Large Language Models (LLMs) suggest similar patterns may emerge in artificial systems, with advanced LLMs displaying consistent behavioral tendencies resembling human traits like agreeableness and self-regulation. Understanding these patterns is crucial, yet prior work primarily relied on simplified self-reports and heuristic prompting, with little behavioral validation. In this study, we systematically characterize LLM personality across three dimensions: (1) the dynamic emergence and evolution of trait profiles throughout training stages; (2) the predictive validity of self-reported traits in behavioral tasks; and (3) the impact of targeted interventions, such as persona injection, on both self-reports and behavior. Our findings reveal that instructional alignment (e.g., RLHF, instruction tuning) significantly stabilizes trait expression and strengthens trait correlations in ways that mirror human data. However, these self-reported traits do not reliably predict behavior, and observed associations often diverge from human patterns. While persona injection successfully steers self-reports in the intended direction, it exerts little or inconsistent effect on actual behavior. By distinguishing surface-level trait expression from behavioral consistency, our findings challenge assumptions about LLM personality and underscore the need for deeper evaluation in alignment and interpretability.

Comment: The paper provides insights into LLM behavior and interpretability, which aligns with the foundational research on LLMs.

Relevance: 9 Novelty: 7

10. MEPG:Multi-Expert Planning and Generation for Compositionally-Rich Image Generation

ArXiv ID: 2509.04126

Authors: Yuan Zhao, Liu Lin

Abstract: Text-to-image diffusion models have achieved remarkable image quality, but they still struggle with complex, multiele ment prompts, and limited stylistic diversity. To address these limitations, we propose a Multi-Expert Planning and Gen eration Framework (MEPG) that synergistically integrates position- and style-aware large language models (LLMs) with spatial-semantic expert modules. The framework comprises two core components: (1) a Position-Style-Aware (PSA) module that utilizes a supervised fine-tuned LLM to decom pose input prompts into precise spatial coordinates and style encoded semantic instructions; and (2) a Multi-Expert Dif fusion (MED) module that implements cross-region genera tion through dynamic expert routing across both local regions and global areas. During the generation process for each lo cal region, specialized models (e.g., realism experts, styliza tion specialists) are selectively activated for each spatial par tition via attention-based gating mechanisms. The architec ture supports lightweight integration and replacement of ex pert models, providing strong extensibility. Additionally, an interactive interface enables real-time spatial layout editing and per-region style selection from a portfolio of experts. Ex periments show that MEPG significantly outperforms base line models with the same backbone in both image quality and style diversity.

Comment: The paper proposes a Multi-Expert Planning and Generation Framework, which involves mixture-of-experts, relevant to model architecture.

Relevance: 9 Novelty: 7

11. Transition Models: Rethinking the Generative Learning Objective

ArXiv ID: 2509.04394

Authors: Zidong Wang, Yiyuan Zhang, Xiaoyu Yue, Xiangyu Yue, Yangguang Li, Wanli Ouyang, Lei Bai

Abstract: A fundamental dilemma in generative modeling persists: iterative diffusion models achieve outstanding fidelity, but at a significant computational cost, while efficient few-step alternatives are constrained by a hard quality ceiling. This conflict between generation steps and output quality arises from restrictive training objectives that focus exclusively on either infinitesimal dynamics (PF-ODEs) or direct endpoint prediction. We address this challenge by introducing an exact, continuous-time dynamics equation that analytically defines state transitions across any finite time interval. This leads to a novel generative paradigm, Transition Models (TiM), which adapt to arbitrary-step transitions, seamlessly traversing the generative trajectory from single leaps to fine-grained refinement with more steps. Despite having only 865M parameters, TiM achieves state-of-the-art performance, surpassing leading models such as SD3.5 (8B parameters) and FLUX.1 (12B parameters) across all evaluated step counts. Importantly, unlike previous few-step generators, TiM demonstrates monotonic quality improvement as the sampling budget increases. Additionally, when employing our native-resolution strategy, TiM delivers exceptional fidelity at resolutions up to 4096x4096.

Comment: The paper introduces a novel generative paradigm, Transition Models, which is a significant contribution to generative modeling.

Relevance: 8 Novelty: 8

12. Nonnegative matrix factorization and the principle of the common cause

ArXiv ID: 2509.03652

Authors: E. Khalafyan, A. E. Allahverdyan, A. Hovhannisyan

Abstract: Nonnegative matrix factorization (NMF) is a known unsupervised data-reduction method. The principle of the common cause (PCC) is a basic methodological approach in probabilistic causality, which seeks an independent mixture model for the joint probability of two dependent random variables. It turns out that these two concepts are closely related. This relationship is explored reciprocally for several datasets of gray-scale images, which are conveniently mapped into probability models. On one hand, PCC provides a predictability tool that leads to a robust estimation of the effective rank of NMF. Unlike other estimates (e.g., those based on the Bayesian Information Criteria), our estimate of the rank is stable against weak noise. We show that NMF implemented around this rank produces features (basis images) that are also stable against noise and against seeds of local optimization, thereby effectively resolving the NMF nonidentifiability problem. On the other hand, NMF provides an interesting possibility of implementing PCC in an approximate way, where larger and positively correlated joint probabilities tend to be explained better via the independent mixture model. We work out a clustering method, where data points with the same common cause are grouped into the same cluster. We also show how NMF can be employed for data denoising.

Comment: The paper explores the relationship between nonnegative matrix factorization and the principle of the common cause, contributing to representation learning.