Personalized Daily ArXiv Papers 2025-08-13

[gpt-4o]	Prompt	Completion	Total
Token	34045	4029	38074
Cost	$0.09	$0.04	$0.13

Total arXiv papers: 447

Total scanned papers: 281

Total relevant papers: 23

Table of contents with paper titles:

Understanding Transformers through the Lens of Pavlovian Conditioning Authors: Mu Qiao
Topos Causal Models Authors: Sridhar Mahadevan
Entangled in Representations: Mechanistic Investigation of Cultural Biases in Large Language Models Authors: Haeun Yu, Seogyeong Jeong, Siddhesh Pawar, Jisu Shin, Jiho Jin, Junho Myung, Alice Oh, Isabelle Augenstein
$\text{M}^{2}$LLM: Multi-view Molecular Representation Learning with Large Language Models Authors: Jiaxin Ju, Yizhen Zheng, Huan Yee Koh, Can Wang, Shirui Pan
Dynamic Rank Adjustment for Accurate and Efficient Neural Network Training Authors: Hyuntak Shin, Aecheon Jung, Sunwoo Lee, Sungeun Hong
Topos Theory for Generative AI and LLMs Authors: Sridhar Mahadevan
Towards Scalable Lottery Ticket Networks using Genetic Algorithms Authors: Julian Sch\"onberger, Maximilian Zorn, Jonas N\"u{\ss}lein, Thomas Gabor, Philipp Altmann
Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference Authors: Kexin Chu, Zecheng Lin, Dawei Xiang, Zixu Shen, Jianchang Su, Cheng Chu, Yiwei Yang, Wenhui Zhang, Wenfei Wu, Wei Zhang
Fast weight programming and linear transformers: from machine learning to neurobiology Authors: Kazuki Irie, Samuel J. Gershman
Evaluating Contrast Localizer for Identifying Causal Unitsin Social & Mathematical Tasks in Language Models Authors: Yassine Jamaa, Badr AlKhamissi, Satrajit Ghosh, Martin Schrimpf
Intrinsic Memory Agents: Heterogeneous Multi-Agent LLM Systems through Structured Contextual Memory Authors: Sizhe Yuen, Francisco Gomez Medina, Ting Su, Yali Du, Adam J. Sobey
Language Models Can Understand Spectra: A Multimodal Model for Molecular Structure Elucidation Authors: Yunyue Su, Jiahui Chen, Zao Jiang, Zhenyi Zhong, Liang Wang, Qiang Liu
Interpretable Reward Model via Sparse Autoencoder Authors: Shuyi Zhang, Wei Shi, Sihang Li, Jiayi Liao, Tao Liang, Hengxing Cai, Xiang Wang
Elucidating Rectified Flow with Deterministic Sampler: Polynomial Discretization Complexity for Multi and One-step Models Authors: Ruofeng Yang, Zhaoyu Zhu, Bo Jiang, Cheng Chen, Shuai Li
Towards Universal Neural Inference Authors: Shreyas Bhat Brahmavar, Yang Li, Junier Oliva
Deep Neural Network Calibration by Reducing Classifier Shift with Stochastic Masking Authors: Jiani Ni, He Zhao, Yibo Yang, Dandan Guo
EditMF: Drawing an Invisible Fingerprint for Your Large Language Models Authors: Jiaxuan Wu, Yinghan Zhou, Wanli Peng, Yiming Xue, Juan Wen, Ping Zhong
Bio-Inspired Artificial Neural Networks based on Predictive Coding Authors: Davide Casnici, Charlotte Frenkel, Justin Dauwels
ASPD: Unlocking Adaptive Serial-Parallel Decoding by Exploring Intrinsic Parallelism in LLMs Authors: Keyu Chen, Zhifeng Shen, Daohai Yu, Haoqian Wu, Wei Wen, Jianfeng He, Ruizhi Qiao, Xing Sun
Superclass-Guided Representation Disentanglement for Spurious Correlation Mitigation Authors: Chenruo Liu, Hongjun Liu, Zeyu Lai, Yiqiu Shen, Chen Zhao, Qi Lei
Train Long, Think Short: Curriculum Learning for Efficient Reasoning Authors: Hasan Abed Al Kader Hammoud, Kumail Alhamoud, Abed Hammoud, Elie Bou-Zeid, Marzyeh Ghassemi, Bernard Ghanem
Wavelet Mixture of Experts for Time Series Forecasting Authors: Zheng Zhou, Yu-Jie Xiong, Jia-Chen Zhang, Chun-Ming Xia, Xi-Jiong Xie
OverFill: Two-Stage Models for Efficient Language Model Decoding Authors: Woojeong Kim, Junxiong Wang, Jing Nathan Yan, Mohamed Abdelfattah, Alexander M. Rush

1. Understanding Transformers through the Lens of Pavlovian Conditioning

ArXiv ID: 2508.08289

Authors: Mu Qiao

Abstract: Transformer architectures have revolutionized artificial intelligence (AI) through their attention mechanisms, yet the computational principles underlying their success remain opaque. We present a novel theoretical framework that reinterprets the core computation of attention as Pavlovian conditioning. Our model finds a direct mathematical analogue in linear attention, which simplifies the analysis of the underlying associative process. We demonstrate that attention's queries, keys, and values can be mapped to the three elements of classical conditioning: test stimuli that probe associations, conditional stimuli (CS) that serve as retrieval cues, and unconditional stimuli (US) that contain response information. Through this lens, we suggest that each attention operation constructs a transient associative memory via a Hebbian rule, where CS-US pairs form dynamic associations that test stimuli can later retrieve. Our framework yields several theoretical insights grounded in this linearized model: (1) a capacity theorem showing that attention heads can store O($\sqrt{d_k}$) associations before interference degrades retrieval; (2) an error propagation analysis revealing fundamental architectural trade-offs of balancing model depth, width, and head redundancy to maintain reliability; and (3) an understanding of how biologically plausible learning rules could enhance transformer architectures. By establishing this deep connection, we suggest that the success of modern AI may stem not from architectural novelty alone, but from implementing computational principles that biology optimized over millions of years of evolution.

Comment: The paper provides a novel theoretical framework for understanding transformers, which is relevant to model architecture.

Relevance: 10 Novelty: 9

2. Topos Causal Models

ArXiv ID: 2508.08295

Authors: Sridhar Mahadevan

Abstract: We propose topos causal models (TCMs), a novel class of causal models that exploit the key properties of a topos category: they are (co)complete, meaning all (co)limits exist, they admit a subobject classifier, and allow exponential objects. The main goal of this paper is to show that these properties are central to many applications in causal inference. For example, subobject classifiers allow a categorical formulation of causal intervention, which creates sub-models. Limits and colimits allow causal diagrams of arbitrary complexity to be solved", using a novel interpretation of causal approximation. Exponential objects enable reasoning about equivalence classes of operations on causal models, such as covered edge reversal and causal homotopy. Analogous to structural causal models (SCMs), TCMs are defined by a collection of functions, each defining alocal autonomous" causal mechanism that assemble to induce a unique global function from exogenous to endogenous variables. Since the category of TCMs is (co)complete, which we prove in this paper, every causal diagram has a solution" in the form of a (co)limit: this implies that any arbitrary causal model can beapproximated" by some global function with respect to the morphisms going into or out of the diagram. Natural transformations are crucial in measuring the quality of approximation. In addition, we show that causal interventions are modeled by subobject classifiers: any sub-model is defined by a monic arrow into its parent model. Exponential objects permit reasoning about entire classes of causal equivalences and interventions. Finally, as TCMs form a topos, they admit an internal logic defined as a Mitchell-Benabou language with an associated Kripke-Joyal semantics. We show how to reason about causal models in TCMs using this internal logic.

Comment: The introduction of topos causal models represents a novel theoretical framework for causal inference, aligning with emerging trends in foundational research.

Relevance: 9 Novelty: 9

3. Entangled in Representations: Mechanistic Investigation of Cultural Biases in Large Language Models

ArXiv ID: 2508.08879

Authors: Haeun Yu, Seogyeong Jeong, Siddhesh Pawar, Jisu Shin, Jiho Jin, Junho Myung, Alice Oh, Isabelle Augenstein

Abstract: The growing deployment of large language models (LLMs) across diverse cultural contexts necessitates a better understanding of how the overgeneralization of less documented cultures within LLMs' representations impacts their cultural understanding. Prior work only performs extrinsic evaluation of LLMs' cultural competence, without accounting for how LLMs' internal mechanisms lead to cultural (mis)representation. To bridge this gap, we propose Culturescope, the first mechanistic interpretability-based method that probes the internal representations of LLMs to elicit the underlying cultural knowledge space. CultureScope utilizes a patching method to extract the cultural knowledge. We introduce a cultural flattening score as a measure of the intrinsic cultural biases. Additionally, we study how LLMs internalize Western-dominance bias and cultural flattening, which allows us to trace how cultural biases emerge within LLMs. Our experimental results reveal that LLMs encode Western-dominance bias and cultural flattening in their cultural knowledge space. We find that low-resource cultures are less susceptible to cultural biases, likely due to their limited training resources. Our work provides a foundation for future research on mitigating cultural biases and enhancing LLMs' cultural understanding. Our codes and data used for experiments are publicly available.

Comment: The paper investigates cultural biases in LLMs using a mechanistic interpretability-based method, providing insights into LLM behavior and interpretability.

Relevance: 9 Novelty: 8

4. $\text{M}^{2}$LLM: Multi-view Molecular Representation Learning with Large Language Models

ArXiv ID: 2508.08657

Authors: Jiaxin Ju, Yizhen Zheng, Huan Yee Koh, Can Wang, Shirui Pan

Abstract: Accurate molecular property prediction is a critical challenge with wide-ranging applications in chemistry, materials science, and drug discovery. Molecular representation methods, including fingerprints and graph neural networks (GNNs), achieve state-of-the-art results by effectively deriving features from molecular structures. However, these methods often overlook decades of accumulated semantic and contextual knowledge. Recent advancements in large language models (LLMs) demonstrate remarkable reasoning abilities and prior knowledge across scientific domains, leading us to hypothesize that LLMs can generate rich molecular representations when guided to reason in multiple perspectives. To address these gaps, we propose $\text{M}^{2}$LLM, a multi-view framework that integrates three perspectives: the molecular structure view, the molecular task view, and the molecular rules view. These views are fused dynamically to adapt to task requirements, and experiments demonstrate that $\text{M}^{2}$LLM achieves state-of-the-art performance on multiple benchmarks across classification and regression tasks. Moreover, we demonstrate that representation derived from LLM achieves exceptional performance by leveraging two core functionalities: the generation of molecular embeddings through their encoding capabilities and the curation of molecular features through advanced reasoning processes.

Comment: The paper proposes a multi-view framework for molecular representation learning using LLMs, which is relevant to foundational research in representation learning and AI for Science.

Relevance: 9 Novelty: 8

5. Dynamic Rank Adjustment for Accurate and Efficient Neural Network Training

ArXiv ID: 2508.08625

Authors: Hyuntak Shin, Aecheon Jung, Sunwoo Lee, Sungeun Hong

Abstract: Low-rank training methods reduce the number of trainable parameters by re-parameterizing the weights with matrix decompositions (e.g., singular value decomposition). However, enforcing a fixed low-rank structure caps the rank of the weight matrices and can hinder the model's ability to learn complex patterns. Furthermore, the effective rank of the model's weights tends to decline during training, and this drop is accelerated when the model is reparameterized into a low-rank structure. In this study, we argue that strategically interleaving full-rank training epochs within low-rank training epochs can effectively restore the rank of the model's weights. Based on our findings, we propose a general dynamic-rank training framework that is readily applicable to a wide range of neural-network tasks. We first describe how to adjust the rank of weight matrix to alleviate the inevitable rank collapse that arises during training, and then present extensive empirical results that validate our claims and demonstrate the efficacy of the proposed framework. Our empirical study shows that the proposed method achieves almost the same computational cost as SVD-based low-rank training while achieving a comparable accuracy to full-rank training across various benchmarks.

Comment: The paper proposes a dynamic-rank training framework, relevant to model compression and efficiency.

Relevance: 9 Novelty: 8

6. Topos Theory for Generative AI and LLMs

ArXiv ID: 2508.08293

Authors: Sridhar Mahadevan

Abstract: We propose the design of novel categorical generative AI architectures (GAIAs) using topos theory, a type of category that is set-like": a topos has all (co)limits, is Cartesian closed, and has a subobject classifier. Previous theoretical results on the Transformer model have shown that it is a universal sequence-to-sequence function approximator, and dense in the space of all continuous functions with compact support on the Euclidean space of embeddings of tokens. Building on this theoretical result, we explore novel architectures for LLMs that exploit the property that the category of LLMs, viewed as functions, forms a topos. Previous studies of large language models (LLMs) have focused on daisy-chained linear architectures or mixture-of-experts. In this paper, we use universal constructions in category theory to construct novel LLM architectures based on new types of compositional structures. In particular, these new compositional structures are derived from universal properties of LLM categories, and include pullback, pushout, (co) equalizers, exponential objects, and subobject classifiers. We theoretically validate these new compositional structures by showing that the category of LLMs is (co)complete, meaning that all diagrams have solutions in the form of (co)limits. Building on this completeness result, we then show that the category of LLMs forms a topos, aset-like" category, which requires showing the existence of exponential objects as well as subobject classifiers. We use a functorial characterization of backpropagation to define a potential implementation of an LLM topos architecture.

Comment: The paper introduces novel LLM architectures using topos theory, which is a significant theoretical insight into LLM behavior and architecture.

Relevance: 9 Novelty: 8

7. Towards Scalable Lottery Ticket Networks using Genetic Algorithms

ArXiv ID: 2508.08877

Authors: Julian Sch\"onberger, Maximilian Zorn, Jonas N\"u{\ss}lein, Thomas Gabor, Philipp Altmann

Abstract: Building modern deep learning systems that are not just effective but also efficient requires rethinking established paradigms for model training and neural architecture design. Instead of adapting highly overparameterized networks and subsequently applying model compression techniques to reduce resource consumption, a new class of high-performing networks skips the need for expensive parameter updates, while requiring only a fraction of parameters, making them highly scalable. The Strong Lottery Ticket Hypothesis posits that within randomly initialized, sufficiently overparameterized neural networks, there exist subnetworks that can match the accuracy of the trained original model-without any training. This work explores the usage of genetic algorithms for identifying these strong lottery ticket subnetworks. We find that for instances of binary and multi-class classification tasks, our approach achieves better accuracies and sparsity levels than the current state-of-the-art without requiring any gradient information. In addition, we provide justification for the need for appropriate evaluation metrics when scaling to more complex network architectures and learning tasks.

Comment: The paper explores using genetic algorithms to identify strong lottery ticket subnetworks, which is relevant to model compression and efficiency.

Relevance: 9 Novelty: 8

ArXiv ID: 2508.08438

Authors: Kexin Chu, Zecheng Lin, Dawei Xiang, Zixu Shen, Jianchang Su, Cheng Chu, Yiwei Yang, Wenhui Zhang, Wenfei Wu, Wei Zhang

Abstract: Global KV-cache sharing has emerged as a key optimization for accelerating large language model (LLM) inference. However, it exposes a new class of timing side-channel attacks, enabling adversaries to infer sensitive user inputs via shared cache entries. Existing defenses, such as per-user isolation, eliminate leakage but degrade performance by up to 38.9% in time-to-first-token (TTFT), making them impractical for high-throughput deployment. To address this gap, we introduce SafeKV (Secure and Flexible KV Cache Sharing), a privacy-aware KV-cache management framework that selectively shares non-sensitive entries while confining sensitive content to private caches. SafeKV comprises three components: (i) a hybrid, multi-tier detection pipeline that integrates rule-based pattern matching, a general-purpose privacy detector, and context-aware validation; (ii) a unified radix-tree index that manages public and private entries across heterogeneous memory tiers (HBM, DRAM, SSD); and (iii) entropy-based access monitoring to detect and mitigate residual information leakage. Our evaluation shows that SafeKV mitigates 94% - 97% of timing-based side-channel attacks. Compared to per-user isolation method, SafeKV improves TTFT by up to 40.58% and throughput by up to 2.66X across diverse LLMs and workloads. SafeKV reduces cache-induced TTFT overhead from 50.41% to 11.74% on Qwen3-235B. By combining fine-grained privacy control with high cache reuse efficiency, SafeKV reclaims the performance advantages of global sharing while providing robust runtime privacy guarantees for LLM inference.

Comment: The paper discusses KV-cache sharing, which is relevant to model compression and efficiency in LLMs.

Relevance: 9 Novelty: 7

9. Fast weight programming and linear transformers: from machine learning to neurobiology

ArXiv ID: 2508.08435

Authors: Kazuki Irie, Samuel J. Gershman

Abstract: Recent advances in artificial neural networks for machine learning, and language modeling in particular, have established a family of recurrent neural network (RNN) architectures that, unlike conventional RNNs with vector-form hidden states, use two-dimensional (2D) matrix-form hidden states. Such 2D-state RNNs, known as Fast Weight Programmers (FWPs), can be interpreted as a neural network whose synaptic weights (called fast weights) dynamically change over time as a function of input observations, and serve as short-term memory storage; corresponding synaptic weight modifications are controlled or programmed by another network (the programmer) whose parameters are trained (e.g., by gradient descent). In this Primer, we review the technical foundations of FWPs, their computational characteristics, and their connections to transformers and state space models. We also discuss connections between FWPs and models of synaptic plasticity in the brain, suggesting a convergence of natural and artificial intelligence.

Comment: The paper reviews Fast Weight Programmers and their connections to transformers, relevant to model architecture innovations.

Relevance: 9 Novelty: 7

ArXiv ID: 2508.08276

Authors: Yassine Jamaa, Badr AlKhamissi, Satrajit Ghosh, Martin Schrimpf

Abstract: This work adapts a neuroscientific contrast localizer to pinpoint causally relevant units for Theory of Mind (ToM) and mathematical reasoning tasks in large language models (LLMs) and vision-language models (VLMs). Across 11 LLMs and 5 VLMs ranging in size from 3B to 90B parameters, we localize top-activated units using contrastive stimulus sets and assess their causal role via targeted ablations. We compare the effect of lesioning functionally selected units against low-activation and randomly selected units on downstream accuracy across established ToM and mathematical benchmarks. Contrary to expectations, low-activation units sometimes produced larger performance drops than the highly activated ones, and units derived from the mathematical localizer often impaired ToM performance more than those from the ToM localizer. These findings call into question the causal relevance of contrast-based localizers and highlight the need for broader stimulus sets and more accurately capture task-specific units.

Comment: The paper explores the causal relevance of units in LLMs using a neuroscientific approach, which aligns with the interest in understanding LLM behavior and interpretability.

Relevance: 9 Novelty: 7

11. Intrinsic Memory Agents: Heterogeneous Multi-Agent LLM Systems through Structured Contextual Memory

ArXiv ID: 2508.08997

Authors: Sizhe Yuen, Francisco Gomez Medina, Ting Su, Yali Du, Adam J. Sobey

Abstract: Multi-agent systems built on Large Language Models (LLMs) show exceptional promise for complex collaborative problem-solving, yet they face fundamental challenges stemming from context window limitations that impair memory consistency, role adherence, and procedural integrity. This paper introduces Intrinsic Memory Agents, a novel framework that addresses these limitations through structured agent-specific memories that evolve intrinsically with agent outputs. Specifically, our method maintains role-aligned memory templates that preserve specialized perspectives while focusing on task-relevant information. We benchmark our approach on the PDDL dataset, comparing its performance to existing state-of-the-art multi-agentic memory approaches and showing an improvement of 38.6\% with the highest token efficiency. An additional evaluation is performed on a complex data pipeline design task, we demonstrate that our approach produces higher quality designs when comparing 5 metrics: scalability, reliability, usability, cost-effectiveness and documentation with additional qualitative evidence of the improvements. Our findings suggest that addressing memory limitations through structured, intrinsic approaches can improve the capabilities of multi-agent LLM systems on structured planning tasks.

Comment: The paper proposes a novel framework for addressing memory limitations in multi-agent LLM systems, which aligns with foundational research in LLM architecture.

Relevance: 8 Novelty: 8

12. Language Models Can Understand Spectra: A Multimodal Model for Molecular Structure Elucidation

ArXiv ID: 2508.08441

Authors: Yunyue Su, Jiahui Chen, Zao Jiang, Zhenyi Zhong, Liang Wang, Qiang Liu

Abstract: Structure elucidation is a fundamental technique for understanding the microscopic composition of matter and is widely applied across various disciplines in the natural sciences and engineering. However, existing methods often rely heavily on prior databases or known structural information, making it difficult to resolve unknown structures. In addition, complex structures typically require the joint analysis of multiple spectroscopic modalities. This process heavily depends on expert domain knowledge and is often accompanied by high costs in terms of both time and instrumentation. To address these challenges, we propose SpectraLLM, the first large language model designed to support multi-modal spectroscopic joint reasoning. SpectraLLM is capable of processing either single or multiple spectroscopic inputs and performing end-to-end structure elucidation. By integrating continuous and discrete spectroscopic modalities into a shared semantic space, SpectraLLM learns to uncover substructural patterns that are consistent and complementary across spectra, enabling precise molecular structure elucidation. We pretrain and fine-tune SpectraLLM in the domain of small molecules, and evaluate it on six standardized, publicly available chemical datasets. The model achieves state-of-the-art performance, significantly outperforming existing approaches trained on single modalities. Notably, SpectraLLM demonstrates strong robustness and generalization even for single-spectrum inference, while its multi-modal reasoning capability further improves the accuracy of structural prediction.

Comment: The paper introduces a multimodal model for molecular structure elucidation, which is relevant to AI for Science with a focus on foundational research in molecular modeling.

Relevance: 8 Novelty: 8

13. Interpretable Reward Model via Sparse Autoencoder

ArXiv ID: 2508.08746

Authors: Shuyi Zhang, Wei Shi, Sihang Li, Jiayi Liao, Tao Liang, Hengxing Cai, Xiang Wang

Abstract: Large language models (LLMs) have been widely deployed across numerous fields. Reinforcement Learning from Human Feedback (RLHF) leverages reward models (RMs) as proxies for human preferences to align LLM behaviors with human values, making the accuracy, reliability, and interpretability of RMs critical for effective alignment. However, traditional RMs lack interpretability, offer limited insight into the reasoning behind reward assignments, and are inflexible toward user preference shifts. While recent multidimensional RMs aim for improved interpretability, they often fail to provide feature-level attribution and require costly annotations. To overcome these limitations, we introduce the Sparse Autoencoder-enhanced Reward Model (\textbf{SARM}), a novel architecture that integrates a pretrained Sparse Autoencoder (SAE) into a reward model. SARM maps the hidden activations of LLM-based RM into an interpretable, sparse, and monosemantic feature space, from which a scalar head aggregates feature activations to produce transparent and conceptually meaningful reward scores. Empirical evaluations demonstrate that SARM facilitates direct feature-level attribution of reward assignments, allows dynamic adjustment to preference shifts, and achieves superior alignment performance compared to conventional reward models. Our code is available at https://github.com/schrieffer-z/sarm.

Comment: The paper introduces a Sparse Autoencoder-enhanced Reward Model, which is relevant to representation learning and model architecture due to its novel integration of sparse autoencoders for interpretability.

Relevance: 8 Novelty: 8

14. Elucidating Rectified Flow with Deterministic Sampler: Polynomial Discretization Complexity for Multi and One-step Models

ArXiv ID: 2508.08735

Authors: Ruofeng Yang, Zhaoyu Zhu, Bo Jiang, Cheng Chen, Shuai Li

Abstract: Recently, rectified flow (RF)-based models have achieved state-of-the-art performance in many areas for both the multi-step and one-step generation. However, only a few theoretical works analyze the discretization complexity of RF-based models. Existing works either focus on flow-based models with stochastic samplers or establish complexity results that exhibit exponential dependence on problem parameters. In this work, under the realistic bounded support assumption, we prove the first polynomial discretization complexity for multi-step and one-step RF-based models with a deterministic sampler simultaneously. For the multi-step setting, inspired by the predictor-corrector framework of diffusion models, we introduce a Langevin process as a corrector and show that RF-based models can achieve better polynomial discretization complexity than diffusion models. To achieve this result, we conduct a detailed analysis of the RF-based model and explain why it is better than previous popular models, such as variance preserving (VP) and variance exploding (VE)-based models. Based on the observation of multi-step RF-based models, we further provide the first polynomial discretization complexity result for one-step RF-based models, improving upon prior results for one-step diffusion-based models. These findings mark the first step toward theoretically understanding the impressive empirical performance of RF-based models in both multi-step and one-step generation.

Comment: The paper provides theoretical insights into rectified flow models, which is relevant to emerging trends in model architecture and theoretical understanding.

Relevance: 8 Novelty: 8

15. Towards Universal Neural Inference

ArXiv ID: 2508.09100

Authors: Shreyas Bhat Brahmavar, Yang Li, Junier Oliva

Abstract: Real-world data often appears in diverse, disjoint forms -- with varying schemas, inconsistent semantics, and no fixed feature ordering -- making it challenging to build general-purpose models that can leverage information across datasets. We introduce ASPIRE, Arbitrary Set-based Permutation-Invariant Reasoning Engine, a Universal Neural Inference model for semantic reasoning and prediction over heterogeneous structured data. ASPIRE combines a permutation-invariant, set-based Transformer with a semantic grounding module that incorporates natural language descriptions, dataset metadata, and in-context examples to learn cross-dataset feature dependencies. This architecture allows ASPIRE to ingest arbitrary sets of feature--value pairs and support examples, align semantics across disjoint tables, and make predictions for any specified target. Once trained, ASPIRE generalizes to new inference tasks without additional tuning. In addition to delivering strong results across diverse benchmarks, ASPIRE naturally supports cost-aware active feature acquisition in an open-world setting, selecting informative features under test-time budget constraints for an arbitrary unseen dataset. These capabilities position ASPIRE as a step toward truly universal, semantics-aware inference over structured data.

Comment: The paper introduces a universal neural inference model, which is relevant to representation learning and emerging trends in model architecture.

Relevance: 8 Novelty: 8

16. Deep Neural Network Calibration by Reducing Classifier Shift with Stochastic Masking

ArXiv ID: 2508.09116

Authors: Jiani Ni, He Zhao, Yibo Yang, Dandan Guo

Abstract: In recent years, deep neural networks (DNNs) have shown competitive results in many fields. Despite this success, they often suffer from poor calibration, especially in safety-critical scenarios such as autonomous driving and healthcare, where unreliable confidence estimates can lead to serious consequences. Recent studies have focused on improving calibration by modifying the classifier, yet such efforts remain limited. Moreover, most existing approaches overlook calibration errors caused by underconfidence, which can be equally detrimental. To address these challenges, we propose MaC-Cal, a novel mask-based classifier calibration method that leverages stochastic sparsity to enhance the alignment between confidence and accuracy. MaC-Cal adopts a two-stage training scheme with adaptive sparsity, dynamically adjusting mask retention rates based on the deviation between confidence and accuracy. Extensive experiments show that MaC-Cal achieves superior calibration performance and robustness under data corruption, offering a practical and effective solution for reliable confidence estimation in DNNs.

Comment: The paper introduces a novel mask-based classifier calibration method leveraging stochastic sparsity, which aligns with representation learning through insights into training dynamics and sparsity.