Previous Day 2025-07-15
Monthly Overview 2025-07
Next Day 2025-07-17

Personalized Daily ArXiv Papers 2025-07-16

[gpt-4o] Prompt Completion Total
Token 40215 4685 44900
Cost $0.1 $0.05 $0.15

Total arXiv papers: 526

Total scanned papers: 304

Total relevant papers: 18

Table of contents with paper titles:

  1. Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety Authors: Tomek Korbak, Mikita Balesni, Elizabeth Barnes, Yoshua Bengio, Joe Benton, Joseph Bloom, Mark Chen, Alan Cooney, Allan Dafoe, Anca Dragan, Scott Emmons, Owain Evans, David Farhi, Ryan Greenblatt, Dan Hendrycks, Marius Hobbhahn, Evan Hubinger, Geoffrey Irving, Erik Jenner, Daniel Kokotajlo, Victoria Krakovna, Shane Legg, David Lindner, David Luan, Aleksander M\k{a}dry, Julian Michael, Neel Nanda, Dave Orr, Jakub Pachocki, Ethan Perez, Mary Phuong, Fabien Roger, Joshua Saxe, Buck Shlegeris, Mart\'in Soto, Eric Steinberger, Jasmine Wang, Wojciech Zaremba, Bowen Baker, Rohin Shah, Vlad Mikulik

  2. Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning Authors: Zheng Zhang

  3. AI Mother Tongue: Self-Emergent Communication in MARL via Endogenous Symbol Systems Authors: Hung Ming Liu

  4. Functional Neural Wavefunction Optimization Authors: Victor Armegioiu, Juan Carrasquilla, Siddhartha Mishra, Johannes M\"uller, Jannes Nys, Marius Zeinhofer, Hang Zhang

  5. KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning? Authors: Soumadeep Saha, Akshay Chaturvedi, Saptarshi Saha, Utpal Garain, Nicholas Asher

  6. Neurosymbolic Reasoning Shortcuts under the Independence Assumption Authors: Emile van Krieken, Pasquale Minervini, Edoardo Ponti, Antonio Vergari

  7. SystolicAttention: Fusing FlashAttention within a Single Systolic Array Authors: Jiawei Lin, Guokai Chen, Yuanlong Li, Thomas Bourgeat

  8. First-Order Error Matters: Accurate Compensation for Quantized Large Language Models Authors: Xingyu Zheng, Haotong Qin, Yuye Li, Jiakai Wang, Jinyang Guo, Michele Magno, Xianglong Liu

  9. Defining neurosymbolic AI Authors: Lennert De Smet, Luc De Raedt

  10. Multi-Trigger Poisoning Amplifies Backdoor Vulnerabilities in LLMs Authors: Sanhanat Sivapiromrat, Caiqi Zhang, Marco Basaldella, Nigel Collier

  11. How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction Authors: Jun Chen, Hong Chen, Yonghua Yu, Yiming Ying

  12. BioScore: A Foundational Scoring Function For Diverse Biomolecular Complexes Authors: Yuchen Zhu, Jihong Chen, Yitong Li, Xiaomin Fang, Xianbin Ye, Jingzhou He, Xujun Zhang, Jingxuan Ge, Chao Shen, Xiaonan Zhang, Tingjun Hou, Chang-Yu Hsieh

  13. Emergence of Hierarchical Emotion Organization in Large Language Models Authors: Bo Zhao, Maya Okawa, Eric J. Bigelow, Rose Yu, Tomer Ullman, Ekdeep Singh Lubana, Hidenori Tanaka

  14. Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime Authors: Amit Attia, Matan Schliserman, Uri Sherman, Tomer Koren

  15. MMOne: Representing Multiple Modalities in One Scene Authors: Zhifeng Gu, Bing Wang

  16. A Group Theoretic Analysis of the Symmetries Underlying Base Addition and Their Learnability by Neural Networks Authors: Cutter Dawes, Simon Segert, Kamesh Krishnamurthy, Jonathan D. Cohen

  17. Langevin Flows for Modeling Neural Latent Dynamics Authors: Yue Song, T. Anderson Keller, Yisong Yue, Pietro Perona, Max Welling

  18. Elk: Exploring the Efficiency of Inter-core Connected AI Chips with Deep Learning Compiler Techniques Authors: Yiqi Liu, Yuqi Xue, Noelle Crawford, Jilong Xue, Jian Huang


1. Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

ArXiv ID: 2507.11473

Authors: Tomek Korbak, Mikita Balesni, Elizabeth Barnes, Yoshua Bengio, Joe Benton, Joseph Bloom, Mark Chen, Alan Cooney, Allan Dafoe, Anca Dragan, Scott Emmons, Owain Evans, David Farhi, Ryan Greenblatt, Dan Hendrycks, Marius Hobbhahn, Evan Hubinger, Geoffrey Irving, Erik Jenner, Daniel Kokotajlo, Victoria Krakovna, Shane Legg, David Lindner, David Luan, Aleksander M\k{a}dry, Julian Michael, Neel Nanda, Dave Orr, Jakub Pachocki, Ethan Perez, Mary Phuong, Fabien Roger, Joshua Saxe, Buck Shlegeris, Mart\'in Soto, Eric Steinberger, Jasmine Wang, Wojciech Zaremba, Bowen Baker, Rohin Shah, Vlad Mikulik

Abstract: AI systems that "think" in human language offer a unique opportunity for AI safety: we can monitor their chains of thought (CoT) for the intent to misbehave. Like all other known AI oversight methods, CoT monitoring is imperfect and allows some misbehavior to go unnoticed. Nevertheless, it shows promise and we recommend further research into CoT monitorability and investment in CoT monitoring alongside existing safety methods. Because CoT monitorability may be fragile, we recommend that frontier model developers consider the impact of development decisions on CoT monitorability.

Comment: Author match


2. Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning

ArXiv ID: 2507.10624

Authors: Zheng Zhang

Abstract: Large Language Models (LLMs) display striking surface fluency yet systematically fail at tasks requiring symbolic reasoning, arithmetic accuracy, and logical consistency. This paper offers a structural diagnosis of such failures, revealing a persistent gap between \textit{comprehension} and \textit{competence}. Through controlled experiments and architectural analysis, we demonstrate that LLMs often articulate correct principles without reliably applying them--a failure rooted not in knowledge access, but in computational execution. We term this phenomenon the computational \textit{split-brain syndrome}, where instruction and action pathways are geometrically and functionally dissociated. This core limitation recurs across domains, from mathematical operations to relational inferences, and explains why model behavior remains brittle even under idealized prompting. We argue that LLMs function as powerful pattern completion engines, but lack the architectural scaffolding for principled, compositional reasoning. Our findings delineate the boundary of current LLM capabilities and motivate future models with metacognitive control, principle lifting, and structurally grounded execution. This diagnosis also clarifies why mechanistic interpretability findings may reflect training-specific pattern coordination rather than universal computational principles, and why the geometric separation between instruction and execution pathways suggests limitations in neural introspection and mechanistic analysis.

Comment: The paper provides a structural diagnosis of LLMs' limitations in symbolic reasoning, offering theoretical insights into their architectural limits. This aligns with the criteria for foundational research in LLMs.

Relevance: 10 Novelty: 9


3. AI Mother Tongue: Self-Emergent Communication in MARL via Endogenous Symbol Systems

ArXiv ID: 2507.10566

Authors: Hung Ming Liu

Abstract: In Decentralized Multi-Agent Reinforcement Learning (MARL), the development of Emergent Communication has long been constrained by the Joint Exploration Dilemma'', leading agents to fall into aCommunication Vacuum Equilibrium'' . Traditional methods address this by introducing inductive biases to facilitate communication emergence . This study fundamentally questions whether such artificial inductive biases are, in fact, over-engineering. Through experiments with the AI Mother Tongue'' (AIM) framework, based on a Vector Quantized Variational Autoencoder (VQ-VAE), we demonstrate that when agents possess an endogenous symbol system, their neural representations naturally exhibit spontaneous semantic compression and Nash equilibrium-driven semantic convergence, achieving effective symbolic communication without external inductive biases. This aligns with recent neuroscience findings suggesting that the human brain does not directly use human language for internal thought , and resonates with research onsoft thinking'' capabilities in Large Language Models (LLMs) . Compared to traditional explicit communication methods, AIM demonstrates stronger generality and efficiency. The interpretable analysis toolkit developed in this study confirms that symbol usage exhibits a significant power-law distribution, leading to three major theoretical insights: the Neural Communication Hypothesis'', theTool-First Principle'', and the Semantic Interpretability Paradigm''. Future research will explore the integration of Hierarchical Quantized Variational Autoencoders (HQ-VAE) to enhance AIM's complex expressive capabilities and investigate the potential forReinforcement Learning (RL) Low-Level Pre-training''. This discovery offers new avenues for bridging symbolism and connectionism.

Comment: The paper introduces a novel framework for emergent communication in multi-agent systems, which challenges existing assumptions and introduces new paradigms, aligning with emerging trends.

Relevance: 9 Novelty: 9


4. Functional Neural Wavefunction Optimization

ArXiv ID: 2507.10835

Authors: Victor Armegioiu, Juan Carrasquilla, Siddhartha Mishra, Johannes M\"uller, Jannes Nys, Marius Zeinhofer, Hang Zhang

Abstract: We propose a framework for the design and analysis of optimization algorithms in variational quantum Monte Carlo, drawing on geometric insights into the corresponding function space. The framework translates infinite-dimensional optimization dynamics into tractable parameter-space algorithms through a Galerkin projection onto the tangent space of the variational ansatz. This perspective unifies existing methods such as stochastic reconfiguration and Rayleigh-Gauss-Newton, provides connections to classic function-space algorithms, and motivates the derivation of novel algorithms with geometrically principled hyperparameter choices. We validate our framework with numerical experiments demonstrating its practical relevance through the accurate estimation of ground-state energies for several prototypical models in condensed matter physics modeled with neural network wavefunctions.

Comment: The paper introduces a novel framework for optimization in variational quantum Monte Carlo, which involves neural network wavefunctions. This aligns with foundational research in AI for Science, focusing on theoretical insights rather than applications.

Relevance: 9 Novelty: 8


5. KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?

ArXiv ID: 2507.11408

Authors: Soumadeep Saha, Akshay Chaturvedi, Saptarshi Saha, Utpal Garain, Nicholas Asher

Abstract: Chain-of-thought traces have been shown to improve performance of large language models in a plethora of reasoning tasks, yet there is no consensus on the mechanism through which this performance boost is achieved. To shed more light on this, we introduce Causal CoT Graphs (CCGs), which are directed acyclic graphs automatically extracted from reasoning traces that model fine-grained causal dependencies in the language model output. A collection of $1671$ mathematical reasoning problems from MATH500, GSM8K and AIME, and their associated CCGs are compiled into our dataset -- \textbf{KisMATH}. Our detailed empirical analysis with 15 open-weight LLMs shows that (i) reasoning nodes in the CCG are mediators for the final answer, a condition necessary for reasoning; and (ii) LLMs emphasise reasoning paths given by the CCG, indicating that models internally realise structures akin to our graphs. KisMATH enables controlled, graph-aligned interventions and opens up avenues for further investigation into the role of chain-of-thought in LLM reasoning.

Comment: The paper investigates the role of chain-of-thought in LLM reasoning, providing insights into the implicit structures in mathematical reasoning. This aligns with foundational research in LLMs.

Relevance: 9 Novelty: 8


6. Neurosymbolic Reasoning Shortcuts under the Independence Assumption

ArXiv ID: 2507.11357

Authors: Emile van Krieken, Pasquale Minervini, Edoardo Ponti, Antonio Vergari

Abstract: The ubiquitous independence assumption among symbolic concepts in neurosymbolic (NeSy) predictors is a convenient simplification: NeSy predictors use it to speed up probabilistic reasoning. Recent works like van Krieken et al. (2024) and Marconato et al. (2024) argued that the independence assumption can hinder learning of NeSy predictors and, more crucially, prevent them from correctly modelling uncertainty. There is, however, scepticism in the NeSy community around the scenarios in which the independence assumption actually limits NeSy systems (Faronius and Dos Martires, 2025). In this work, we settle this question by formally showing that assuming independence among symbolic concepts entails that a model can never represent uncertainty over certain concept combinations. Thus, the model fails to be aware of reasoning shortcuts, i.e., the pathological behaviour of NeSy predictors that predict correct downstream tasks but for the wrong reasons.

Comment: The paper discusses the limitations of the independence assumption in neurosymbolic predictors, providing theoretical insights into model behavior. This aligns with emerging trends in foundational research.

Relevance: 9 Novelty: 8


7. SystolicAttention: Fusing FlashAttention within a Single Systolic Array

ArXiv ID: 2507.11331

Authors: Jiawei Lin, Guokai Chen, Yuanlong Li, Thomas Bourgeat

Abstract: Transformer models rely heavily on scaled dot-product attention (SDPA), typically implemented using the FlashAttention algorithm. However, current systolic-array-based accelerators face significant challenges when executing FlashAttention. Systolic arrays can only achieve high utilization for consecutive and large matrix multiplications. In contrast, FlashAttention requires frequently interleaved matrix multiplications and softmax operations. The frequent data swaps between the systolic array and external vector units result in low systolic array utilization. This is further exacerbated by the fact that softmax involves numerous non-matrix operations, which are not well-suited for systolic arrays. Moreover, the concurrent execution of matrix multiplication on systolic arrays and softmax on vector units leads to register file and SRAM port contention, further degrading performance. To overcome these limitations, we propose FSA, an enhanced systolic array architecture that enables the entire FlashAttention algorithm to run entirely within a single systolic array, eliminating the need for external vector units. At the core of FSA is SystolicAttention, a novel scheduling algorithm that maps FlashAttention operations onto systolic arrays with fine-grained, element-wise overlap. This significantly improves array utilization while preserving the original floating-point operation order to maintain numerical stability. We implement FSA in synthesizable RTL and evaluate its performance against state-of-the-art commercial accelerators. Our results show that FSA achieves 1.77x and 4.83x higher attention FLOPs/s utilization compared to AWS NeuronCore-v2 and Google TPUv5e, respectively, with only about 10% area overhead.

Comment: The paper proposes a novel systolic array architecture for executing FlashAttention, focusing on architectural innovations for efficiency. This aligns with the model architecture and compression criteria.

Relevance: 9 Novelty: 8


8. First-Order Error Matters: Accurate Compensation for Quantized Large Language Models

ArXiv ID: 2507.11017

Authors: Xingyu Zheng, Haotong Qin, Yuye Li, Jiakai Wang, Jinyang Guo, Michele Magno, Xianglong Liu

Abstract: Post-training quantization (PTQ) offers an efficient approach to compressing large language models (LLMs), significantly reducing memory access and computational costs. Existing compensation-based weight calibration methods often rely on a second-order Taylor expansion to model quantization error, under the assumption that the first-order term is negligible in well-trained full-precision models. However, we reveal that the progressive compensation process introduces accumulated first-order deviations between latent weights and their full-precision counterparts, making this assumption fundamentally flawed. To address this, we propose FOEM, a novel PTQ method that explicitly incorporates first-order gradient terms to improve quantization error compensation. FOEM approximates gradients by directly computing the difference between latent and full-precision weights, avoiding the high cost and limited generalization of backpropagation-based gradient computation. This approach introduces minimal additional computational overhead. Moreover, FOEM leverages precomputed Cholesky factors to efficiently recover the inverse of Hessian submatrices in real time. Extensive experiments across a wide range of models and benchmarks demonstrate that FOEM consistently outperforms the classical GPTQ method. In 3-bit weight-only quantization, FOEM reduces the perplexity of Llama3-8B by 89.6%, and improves the 5-shot MMLU accuracy of Llama3-70B from 51.7% to 74.9%, approaching the full-precision performance of 78.6%. Furthermore, FOEM can be seamlessly integrated with advanced techniques such as GPTAQ and SpinQuant, yielding additional improvements under the challenging W4A4KV4 setting, and further narrowing the accuracy gap with full-precision baselines beyond what current state-of-the-art methods achieve. The code is available at https://github.com/Xingyu-Zheng/FOEM.

Comment: The paper introduces FOEM, a novel post-training quantization method that incorporates first-order gradient terms, which is relevant to model compression and efficiency breakthroughs.

Relevance: 9 Novelty: 8


9. Defining neurosymbolic AI

ArXiv ID: 2507.11127

Authors: Lennert De Smet, Luc De Raedt

Abstract: Neurosymbolic AI focuses on integrating learning and reasoning, in particular, on unifying logical and neural representations. Despite the existence of an alphabet soup of neurosymbolic AI systems, the field is lacking a generally accepted formal definition of what neurosymbolic models and inference really are. We introduce a formal definition for neurosymbolic AI that makes abstraction of its key ingredients. More specifically, we define neurosymbolic inference as the computation of an integral over a product of a logical and a belief function. We show that our neurosymbolic AI definition makes abstraction of key representative neurosymbolic AI systems.

Comment: The paper provides a formal definition of neurosymbolic AI, which is foundational research in AI, potentially introducing new paradigms.

Relevance: 9 Novelty: 8


10. Multi-Trigger Poisoning Amplifies Backdoor Vulnerabilities in LLMs

ArXiv ID: 2507.11112

Authors: Sanhanat Sivapiromrat, Caiqi Zhang, Marco Basaldella, Nigel Collier

Abstract: Recent studies have shown that Large Language Models (LLMs) are vulnerable to data poisoning attacks, where malicious training examples embed hidden behaviours triggered by specific input patterns. However, most existing works assume a phrase and focus on the attack's effectiveness, offering limited understanding of trigger mechanisms and how multiple triggers interact within the model. In this paper, we present a framework for studying poisoning in LLMs. We show that multiple distinct backdoor triggers can coexist within a single model without interfering with each other, enabling adversaries to embed several triggers concurrently. Using multiple triggers with high embedding similarity, we demonstrate that poisoned triggers can achieve robust activation even when tokens are substituted or separated by long token spans. Our findings expose a broader and more persistent vulnerability surface in LLMs. To mitigate this threat, we propose a post hoc recovery method that selectively retrains specific model components based on a layer-wise weight difference analysis. Our method effectively removes the trigger behaviour with minimal parameter updates, presenting a practical and efficient defence against multi-trigger poisoning.

Comment: The paper explores vulnerabilities in LLMs through multi-trigger poisoning, which provides insights into LLM behavior and security, aligning with foundational research in LLMs.

Relevance: 9 Novelty: 8


11. How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction

ArXiv ID: 2507.11161

Authors: Jun Chen, Hong Chen, Yonghua Yu, Yiming Ying

Abstract: In recent years, contrastive learning has achieved state-of-the-art performance in the territory of self-supervised representation learning. Many previous works have attempted to provide the theoretical understanding underlying the success of contrastive learning. Almost all of them rely on a default assumption, i.e., the label consistency assumption, which may not hold in practice (the probability of failure is called labeling error) due to the strength and randomness of common augmentation strategies, such as random resized crop (RRC). This paper investigates the theoretical impact of labeling error on the downstream classification performance of contrastive learning. We first reveal several significant negative impacts of labeling error on downstream classification risk. To mitigate these impacts, data dimensionality reduction method (e.g., singular value decomposition, SVD) is applied on original data to reduce false positive samples, and establish both theoretical and empirical evaluations. Moreover, it is also found that SVD acts as a double-edged sword, which may lead to the deterioration of downstream classification accuracy due to the reduced connectivity of the augmentation graph. Based on the above observations, we give the augmentation suggestion that we should use some moderate embedding dimension (such as $512, 1024$ in our experiments), data inflation, weak augmentation, and SVD to ensure large graph connectivity and small labeling error to improve model performance.

Comment: The paper investigates the impact of labeling error on contrastive learning, which is relevant to representation learning and provides theoretical insights.

Relevance: 9 Novelty: 7


12. BioScore: A Foundational Scoring Function For Diverse Biomolecular Complexes

ArXiv ID: 2507.10877

Authors: Yuchen Zhu, Jihong Chen, Yitong Li, Xiaomin Fang, Xianbin Ye, Jingzhou He, Xujun Zhang, Jingxuan Ge, Chao Shen, Xiaonan Zhang, Tingjun Hou, Chang-Yu Hsieh

Abstract: Structural assessment of biomolecular complexes is vital for translating molecular models into functional insights, shaping our understanding of biology and aiding drug discovery. However, current structure-based scoring functions often lack generalizability across diverse biomolecular systems. We present BioScore, a foundational scoring function that addresses key challenges -- data sparsity, cross-system representation, and task compatibility -- through a dual-scale geometric graph learning framework with tailored modules for structure assessment and affinity prediction. BioScore supports a wide range of tasks, including affinity prediction, conformation ranking, and structure-based virtual screening. Evaluated on 16 benchmarks spanning proteins, nucleic acids, small molecules, and carbohydrates, BioScore consistently outperforms or matches 70 traditional and deep learning methods. Our newly proposed PPI Benchmark further enables comprehensive evaluation of protein-protein complex scoring. BioScore demonstrates broad applicability: (1) pretraining on mixed-structure data boosts protein-protein affinity prediction by up to 40% and antigen-antibody binding correlation by over 90%; (2) cross-system generalizability enables zero- and few-shot prediction with up to 71% correlation gain; and (3) its unified representation captures chemically challenging systems such as cyclic peptides, improving affinity prediction by over 60%. BioScore establishes a robust and generalizable framework for structural assessment across complex biomolecular landscapes.

Comment: The paper introduces BioScore, a foundational scoring function for biomolecular complexes, which aligns with AI for Science and representation learning.

Relevance: 8 Novelty: 8


13. Emergence of Hierarchical Emotion Organization in Large Language Models

ArXiv ID: 2507.10599

Authors: Bo Zhao, Maya Okawa, Eric J. Bigelow, Rose Yu, Tomer Ullman, Ekdeep Singh Lubana, Hidenori Tanaka

Abstract: As large language models (LLMs) increasingly power conversational agents, understanding how they model users' emotional states is critical for ethical deployment. Inspired by emotion wheels -- a psychological framework that argues emotions organize hierarchically -- we analyze probabilistic dependencies between emotional states in model outputs. We find that LLMs naturally form hierarchical emotion trees that align with human psychological models, and larger models develop more complex hierarchies. We also uncover systematic biases in emotion recognition across socioeconomic personas, with compounding misclassifications for intersectional, underrepresented groups. Human studies reveal striking parallels, suggesting that LLMs internalize aspects of social perception. Beyond highlighting emergent emotional reasoning in LLMs, our results hint at the potential of using cognitively-grounded theories for developing better model evaluations.

Comment: The paper analyzes hierarchical emotion organization in LLMs, which provides insights into LLM behavior and interpretability, aligning with the core topic of large language models.

Relevance: 8 Novelty: 7


14. Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime

ArXiv ID: 2507.11274

Authors: Amit Attia, Matan Schliserman, Uri Sherman, Tomer Koren

Abstract: We study population convergence guarantees of stochastic gradient descent (SGD) for smooth convex objectives in the interpolation regime, where the noise at optimum is zero or near zero. The behavior of the last iterate of SGD in this setting -- particularly with large (constant) stepsizes -- has received growing attention in recent years due to implications for the training of over-parameterized models, as well as to analyzing forgetting in continual learning and to understanding the convergence of the randomized Kaczmarz method for solving linear systems. We establish that after $T$ steps of SGD on $\beta$-smooth convex loss functions with stepsize $\eta \leq 1/\beta$, the last iterate exhibits expected excess risk $\widetilde{O}(1/(\eta T^{1-\beta\eta/2}) + \eta T^{\beta\eta/2} \sigma_\star^2)$, where $\sigma_\star^2$ denotes the variance of the stochastic gradients at the optimum. In particular, for a well-tuned stepsize we obtain a near optimal $\widetilde{O}(1/T + \sigma_\star/\sqrt{T})$ rate for the last iterate, extending the results of Varre et al. (2021) beyond least squares regression; and when $\sigma_\star=0$ we obtain a rate of $O(1/\sqrt{T})$ with $\eta=1/\beta$, improving upon the best-known $O(T^{-1/4})$ rate recently established by Evron et al. (2025) in the special case of realizable linear regression.

Comment: The paper provides insights into the convergence of SGD in the smooth interpolation regime, which is relevant to training dynamics in neural networks.

Relevance: 8 Novelty: 7


15. MMOne: Representing Multiple Modalities in One Scene

ArXiv ID: 2507.11129

Authors: Zhifeng Gu, Bing Wang

Abstract: Humans perceive the world through multimodal cues to understand and interact with the environment. Learning a scene representation for multiple modalities enhances comprehension of the physical world. However, modality conflicts, arising from inherent distinctions among different modalities, present two critical challenges: property disparity and granularity disparity. To address these challenges, we propose a general framework, MMOne, to represent multiple modalities in one scene, which can be readily extended to additional modalities. Specifically, a modality modeling module with a novel modality indicator is proposed to capture the unique properties of each modality. Additionally, we design a multimodal decomposition mechanism to separate multi-modal Gaussians into single-modal Gaussians based on modality differences. We address the essential distinctions among modalities by disentangling multimodal information into shared and modality-specific components, resulting in a more compact and efficient multimodal scene representation. Extensive experiments demonstrate that our method consistently enhances the representation capability for each modality and is scalable to additional modalities. The code is available at https://github.com/Neal2020GitHub/MMOne.

Comment: The paper proposes a framework for multimodal scene representation, focusing on architectural innovations to handle modality conflicts. This aligns with the model architecture criterion.

Relevance: 8 Novelty: 7


16. A Group Theoretic Analysis of the Symmetries Underlying Base Addition and Their Learnability by Neural Networks

ArXiv ID: 2507.10678

Authors: Cutter Dawes, Simon Segert, Kamesh Krishnamurthy, Jonathan D. Cohen

Abstract: A major challenge in the use of neural networks both for modeling human cognitive function and for artificial intelligence is the design of systems with the capacity to efficiently learn functions that support radical generalization. At the roots of this is the capacity to discover and implement symmetry functions. In this paper, we investigate a paradigmatic example of radical generalization through the use of symmetry: base addition. We present a group theoretic analysis of base addition, a fundamental and defining characteristic of which is the carry function -- the transfer of the remainder, when a sum exceeds the base modulus, to the next significant place. Our analysis exposes a range of alternative carry functions for a given base, and we introduce quantitative measures to characterize these. We then exploit differences in carry functions to probe the inductive biases of neural networks in symmetry learning, by training neural networks to carry out base addition using different carries, and comparing efficacy and rate of learning as a function of their structure. We find that even simple neural networks can achieve radical generalization with the right input format and carry function, and that learning speed is closely correlated with carry function structure. We then discuss the relevance this has for cognitive science and machine learning.

Comment: The paper provides a group theoretic analysis of symmetries in base addition and explores neural networks' ability to learn these symmetries, which is relevant to representation learning and theoretical insights into neural network behavior.

Relevance: 8 Novelty: 7


17. Langevin Flows for Modeling Neural Latent Dynamics

ArXiv ID: 2507.11531

Authors: Yue Song, T. Anderson Keller, Yisong Yue, Pietro Perona, Max Welling

Abstract: Neural populations exhibit latent dynamical structures that drive time-evolving spiking activities, motivating the search for models that capture both intrinsic network dynamics and external unobserved influences. In this work, we introduce LangevinFlow, a sequential Variational Auto-Encoder where the time evolution of latent variables is governed by the underdamped Langevin equation. Our approach incorporates physical priors -- such as inertia, damping, a learned potential function, and stochastic forces -- to represent both autonomous and non-autonomous processes in neural systems. Crucially, the potential function is parameterized as a network of locally coupled oscillators, biasing the model toward oscillatory and flow-like behaviors observed in biological neural populations. Our model features a recurrent encoder, a one-layer Transformer decoder, and Langevin dynamics in the latent space. Empirically, our method outperforms state-of-the-art baselines on synthetic neural populations generated by a Lorenz attractor, closely matching ground-truth firing rates. On the Neural Latents Benchmark (NLB), the model achieves superior held-out neuron likelihoods (bits per spike) and forward prediction accuracy across four challenging datasets. It also matches or surpasses alternative methods in decoding behavioral metrics such as hand velocity. Overall, this work introduces a flexible, physics-inspired, high-performing framework for modeling complex neural population dynamics and their unobserved influences.

Comment: The paper introduces LangevinFlow, a sequential Variational Auto-Encoder with a novel approach to modeling neural latent dynamics, relevant to model architecture.

Relevance: 8 Novelty: 7


18. Elk: Exploring the Efficiency of Inter-core Connected AI Chips with Deep Learning Compiler Techniques

ArXiv ID: 2507.11506

Authors: Yiqi Liu, Yuqi Xue, Noelle Crawford, Jilong Xue, Jian Huang

Abstract: To meet the increasing demand of deep learning (DL) models, AI chips are employing both off-chip memory (e.g., HBM) and high-bandwidth low-latency interconnect for direct inter-core data exchange. However, it is not easy to explore the efficiency of these inter-core connected AI (ICCA) chips, due to a fundamental tussle among compute (per-core execution), communication (inter-core data exchange), and I/O (off-chip data access). In this paper, we develop Elk, a DL compiler framework to maximize the efficiency of ICCA chips by jointly trading off all the three performance factors discussed above. Elk structures these performance factors into configurable parameters and forms a global trade-off space in the DL compiler. To systematically explore this space and maximize overall efficiency, Elk employs a new inductive operator scheduling policy and a cost-aware on-chip memory allocation algorithm. It generates globally optimized execution plans that best overlap off-chip data loading and on-chip execution. To examine the efficiency of Elk, we build a full-fledged emulator based on a real ICCA chip IPU-POD4, and an ICCA chip simulator for sensitivity analysis with different interconnect network topologies. Elk achieves 94% of the ideal roofline performance of ICCA chips on average, showing the benefits of supporting large DL models on ICCA chips. We also show Elk's capability of enabling architecture design space exploration for new ICCA chip development.

Comment: The paper discusses a DL compiler framework for AI chips, focusing on efficiency and architecture-level innovations, which aligns with model compression and architecture.

Relevance: 8 Novelty: 7


Paper Selection Prompt

System Prompt

You are a helpful paper reading assistant whose job is to read daily posts from ArXiv and identify a few papers that your friend will enjoy reading. Your job is to carefully read the paper titles and abstracts below and find the ones that match the criteria below.

User Prompt

Instructions

Write the response in JSONL format with {ARXIVID, COMMENT, RELEVANCE, NOVELTY} on each line, one for each paper.

  • ARXIVID: should be the ArXiv ID.
  • COMMENT: should identify whether there is a criteria that match the paper very closely. These matches should not be based on general terms like "language modeling" or "advancements" and should specifically refer to a criterion. No need to mention the non-matching criteria.
  • RELEVANCE: should be a score from 1-10.
  • NOVELTY: should be a score from 1-10.

Scoring Criteria

The "Relevance" score measures how closely the paper aligns with the core topics of the prompt. The "Novelty" score assesses the originality and impact of the paper. They are two ORTHONORMAL axes and SHOULD NOT be confused with each other.

Relevance Scoring

  • Relevance 9-10 (Completely Relevant)
  • Focus: Fully aligned with core topics with no deviation, score the highest if contains relevant keywords in it.
  • Examples: Papers focused on foundational methods or theoretical research, whose titles contain topic keywords like "MoE".

  • Relevance 7-8 (Relevant)

  • Focus: Retain a solid link to the main research area, though may touch on peripheral elements.
  • Examples: Papers research on the fundamental part of MoE through a less critical aspect like its behavior in GNN.

  • Relevance 5-6 (Borderline)

  • Focus: Maintains a link to the core topic but also extends into at least one other domain/area beyond the primary focus.
  • Examples: Work referencing MoE centered on reinforcement learning.

  • Relevance 3-4 (Irrelevant)

  • Focus: Largely outside our interests with no association to our topics.
  • Examples: Application-focused papers like using MoE to solve a problem in the real world.

  • Relevance 1-2 (Ignore)

  • Focus: Purely unrelated to our topics. Completely a different domain.
  • Exception: If the paper hints at a cutting-edge, radically new direction that could eventually transform the primary domain, consider a score of 9–10 despite initial appearances. (Usually a very rare concept that belongs to the fundamental research)

Novelty Scoring

  • Novelty 9-10 (Breakthrough)
  • Definition: Groundbreaking methods/theory introducing new directions or solving major challenges.
  • Examples: Entirely new paradigm for foundational models; a novel theory transforming representation learning.

  • Novelty 7-8 (Improvements)

  • Definition: Substantial insights/enhancements, though not a full paradigm shift.
  • Examples: Modifications on existing methods yielding significantly better results.

  • Novelty 5-6 (Borderline)

  • Definition: Incremental contributions with possible long-term benefits, not immediately transformative.
  • Examples: Moderately novel extension to an existing architecture; refining current methods without fundamentally altering them.

  • Novelty 3-4 (Tangential)

  • Definition: Minor or domain-specific improvements with limited broader impact.
  • Examples: Slight modifications to known methods with strange motivation; purely engineering jobs like a new benchmark/dataset.

  • Novelty 1-2 (Low)

  • Definition: Minimal originality, applying standard approaches without real innovation.
  • Examples: Using an off-the-shelf model without adding new insights; purely application-driven studies like finetuning a pretrained model using existing methods.

Papers

[PAPER LIST HERE]

Relevant Topics

Use the following relevance criteria to focus on foundational research. Keep relevant papers and filter out irrelevant ones. Avoid purely application-driven work.

  1. Representation Learning - Relevant: Insights into how deep networks encode information, feature/dictionary learning, sparse/contrastive methods, training dynamics in neural networks. - Irrelevant: Standard applications of known techniques lacking new theoretical or methodological contributions.

  2. Model Architecture - Relevant: Mixture-of-Experts (MoE), Transformers, Conditional/Dynamic Networks, Autoencoders, analysis on existing architectures (like encoder-decoder), or other architectural innovations. - Irrelevant: Merely using existing architectures for a certain task without insights into the structure themselves.

  3. Model Compression - Relevant: Sparsity, pruning, quantization, low-rank approaches, KV cache, or other algorithmic/theoretical efficiency breakthroughs. - Irrelevant: Straightforward applications of existing compression methods to new tasks.

  4. Large Language Models (LLMs) - Relevant: Major breakthroughs in pretraining or architecture, theoretical insights into LLM behavior/interpretability. - Irrelevant: Domain-specific usage (e.g., translation, jail-breaking), finetuning or inference tricks (e.g., instruction tuning, chain-of-thoughts, data mixing), or empirical dataset/benchmark studies and text-level analysis (e.g. hallucination, reasoning, safety).

  5. AI for Science - Relevant: Foundational research in molecular/protein modeling, new generative paradigms, or significant architecture-level innovations. - Irrelevant: Conventional, domain-specific applications without new theoretical perspectives.

  6. Emerging Trends - Relevant: Cutting-edge theoretical work challenging established assumptions or introducing broad new paradigms. - Irrelevant: Incremental improvements or trend-following without novel insights.

Keywords:

  • Relevant: Mixture of Experts (MoE), Representation Learning, Compression/Efficiency, Sparse/Sparsity, Pruning, Quantization, Low-rank, Foundation Model, etc.
  • Irrelevant: Reinforcement Learning, Transfer Learning, Federated Learning, Online Learning, Diffusion Models, etc.
  • Application: Image Segmentation, Medical Imaging, 3D Vision, Video Understanding, Information Retrieval, Summarization, Recommendation Systems, Machine Translation, Speech Recognition, Signal Processing, Spatial/Temporal Modeling, Time Series, Knowledge Graph, etc.