Personalized Daily ArXiv Papers 2025-07-16

[gpt-4o]	Prompt	Completion	Total
Token	40215	4685	44900
Cost	$0.1	$0.05	$0.15

Total arXiv papers: 526

Total scanned papers: 304

Total relevant papers: 18

Table of contents with paper titles:

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety Authors: Tomek Korbak, Mikita Balesni, Elizabeth Barnes, Yoshua Bengio, Joe Benton, Joseph Bloom, Mark Chen, Alan Cooney, Allan Dafoe, Anca Dragan, Scott Emmons, Owain Evans, David Farhi, Ryan Greenblatt, Dan Hendrycks, Marius Hobbhahn, Evan Hubinger, Geoffrey Irving, Erik Jenner, Daniel Kokotajlo, Victoria Krakovna, Shane Legg, David Lindner, David Luan, Aleksander M\k{a}dry, Julian Michael, Neel Nanda, Dave Orr, Jakub Pachocki, Ethan Perez, Mary Phuong, Fabien Roger, Joshua Saxe, Buck Shlegeris, Mart\'in Soto, Eric Steinberger, Jasmine Wang, Wojciech Zaremba, Bowen Baker, Rohin Shah, Vlad Mikulik
Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning Authors: Zheng Zhang
AI Mother Tongue: Self-Emergent Communication in MARL via Endogenous Symbol Systems Authors: Hung Ming Liu
Functional Neural Wavefunction Optimization Authors: Victor Armegioiu, Juan Carrasquilla, Siddhartha Mishra, Johannes M\"uller, Jannes Nys, Marius Zeinhofer, Hang Zhang
KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning? Authors: Soumadeep Saha, Akshay Chaturvedi, Saptarshi Saha, Utpal Garain, Nicholas Asher
Neurosymbolic Reasoning Shortcuts under the Independence Assumption Authors: Emile van Krieken, Pasquale Minervini, Edoardo Ponti, Antonio Vergari
SystolicAttention: Fusing FlashAttention within a Single Systolic Array Authors: Jiawei Lin, Guokai Chen, Yuanlong Li, Thomas Bourgeat
First-Order Error Matters: Accurate Compensation for Quantized Large Language Models Authors: Xingyu Zheng, Haotong Qin, Yuye Li, Jiakai Wang, Jinyang Guo, Michele Magno, Xianglong Liu
Defining neurosymbolic AI Authors: Lennert De Smet, Luc De Raedt
Multi-Trigger Poisoning Amplifies Backdoor Vulnerabilities in LLMs Authors: Sanhanat Sivapiromrat, Caiqi Zhang, Marco Basaldella, Nigel Collier
How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction Authors: Jun Chen, Hong Chen, Yonghua Yu, Yiming Ying
BioScore: A Foundational Scoring Function For Diverse Biomolecular Complexes Authors: Yuchen Zhu, Jihong Chen, Yitong Li, Xiaomin Fang, Xianbin Ye, Jingzhou He, Xujun Zhang, Jingxuan Ge, Chao Shen, Xiaonan Zhang, Tingjun Hou, Chang-Yu Hsieh
Emergence of Hierarchical Emotion Organization in Large Language Models Authors: Bo Zhao, Maya Okawa, Eric J. Bigelow, Rose Yu, Tomer Ullman, Ekdeep Singh Lubana, Hidenori Tanaka
Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime Authors: Amit Attia, Matan Schliserman, Uri Sherman, Tomer Koren
MMOne: Representing Multiple Modalities in One Scene Authors: Zhifeng Gu, Bing Wang
A Group Theoretic Analysis of the Symmetries Underlying Base Addition and Their Learnability by Neural Networks Authors: Cutter Dawes, Simon Segert, Kamesh Krishnamurthy, Jonathan D. Cohen
Langevin Flows for Modeling Neural Latent Dynamics Authors: Yue Song, T. Anderson Keller, Yisong Yue, Pietro Perona, Max Welling
Elk: Exploring the Efficiency of Inter-core Connected AI Chips with Deep Learning Compiler Techniques Authors: Yiqi Liu, Yuqi Xue, Noelle Crawford, Jilong Xue, Jian Huang

1. Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

ArXiv ID: 2507.11473

Authors: Tomek Korbak, Mikita Balesni, Elizabeth Barnes, Yoshua Bengio, Joe Benton, Joseph Bloom, Mark Chen, Alan Cooney, Allan Dafoe, Anca Dragan, Scott Emmons, Owain Evans, David Farhi, Ryan Greenblatt, Dan Hendrycks, Marius Hobbhahn, Evan Hubinger, Geoffrey Irving, Erik Jenner, Daniel Kokotajlo, Victoria Krakovna, Shane Legg, David Lindner, David Luan, Aleksander M\k{a}dry, Julian Michael, Neel Nanda, Dave Orr, Jakub Pachocki, Ethan Perez, Mary Phuong, Fabien Roger, Joshua Saxe, Buck Shlegeris, Mart\'in Soto, Eric Steinberger, Jasmine Wang, Wojciech Zaremba, Bowen Baker, Rohin Shah, Vlad Mikulik

Abstract: AI systems that "think" in human language offer a unique opportunity for AI safety: we can monitor their chains of thought (CoT) for the intent to misbehave. Like all other known AI oversight methods, CoT monitoring is imperfect and allows some misbehavior to go unnoticed. Nevertheless, it shows promise and we recommend further research into CoT monitorability and investment in CoT monitoring alongside existing safety methods. Because CoT monitorability may be fragile, we recommend that frontier model developers consider the impact of development decisions on CoT monitorability.

Comment: Author match

2. Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning

ArXiv ID: 2507.10624

Authors: Zheng Zhang

Abstract: Large Language Models (LLMs) display striking surface fluency yet systematically fail at tasks requiring symbolic reasoning, arithmetic accuracy, and logical consistency. This paper offers a structural diagnosis of such failures, revealing a persistent gap between \textit{comprehension} and \textit{competence}. Through controlled experiments and architectural analysis, we demonstrate that LLMs often articulate correct principles without reliably applying them--a failure rooted not in knowledge access, but in computational execution. We term this phenomenon the computational \textit{split-brain syndrome}, where instruction and action pathways are geometrically and functionally dissociated. This core limitation recurs across domains, from mathematical operations to relational inferences, and explains why model behavior remains brittle even under idealized prompting. We argue that LLMs function as powerful pattern completion engines, but lack the architectural scaffolding for principled, compositional reasoning. Our findings delineate the boundary of current LLM capabilities and motivate future models with metacognitive control, principle lifting, and structurally grounded execution. This diagnosis also clarifies why mechanistic interpretability findings may reflect training-specific pattern coordination rather than universal computational principles, and why the geometric separation between instruction and execution pathways suggests limitations in neural introspection and mechanistic analysis.

Comment: The paper provides a structural diagnosis of LLMs' limitations in symbolic reasoning, offering theoretical insights into their architectural limits. This aligns with the criteria for foundational research in LLMs.

Relevance: 10 Novelty: 9

3. AI Mother Tongue: Self-Emergent Communication in MARL via Endogenous Symbol Systems

ArXiv ID: 2507.10566

Authors: Hung Ming Liu

Abstract: In Decentralized Multi-Agent Reinforcement Learning (MARL), the development of Emergent Communication has long been constrained by the Joint Exploration Dilemma'', leading agents to fall into aCommunication Vacuum Equilibrium'' . Traditional methods address this by introducing inductive biases to facilitate communication emergence . This study fundamentally questions whether such artificial inductive biases are, in fact, over-engineering. Through experiments with the AI Mother Tongue'' (AIM) framework, based on a Vector Quantized Variational Autoencoder (VQ-VAE), we demonstrate that when agents possess an endogenous symbol system, their neural representations naturally exhibit spontaneous semantic compression and Nash equilibrium-driven semantic convergence, achieving effective symbolic communication without external inductive biases. This aligns with recent neuroscience findings suggesting that the human brain does not directly use human language for internal thought , and resonates with research onsoft thinking'' capabilities in Large Language Models (LLMs) . Compared to traditional explicit communication methods, AIM demonstrates stronger generality and efficiency. The interpretable analysis toolkit developed in this study confirms that symbol usage exhibits a significant power-law distribution, leading to three major theoretical insights: the Neural Communication Hypothesis'', theTool-First Principle'', and the Semantic Interpretability Paradigm''. Future research will explore the integration of Hierarchical Quantized Variational Autoencoders (HQ-VAE) to enhance AIM's complex expressive capabilities and investigate the potential forReinforcement Learning (RL) Low-Level Pre-training''. This discovery offers new avenues for bridging symbolism and connectionism.

Comment: The paper introduces a novel framework for emergent communication in multi-agent systems, which challenges existing assumptions and introduces new paradigms, aligning with emerging trends.

Relevance: 9 Novelty: 9

4. Functional Neural Wavefunction Optimization

ArXiv ID: 2507.10835

Authors: Victor Armegioiu, Juan Carrasquilla, Siddhartha Mishra, Johannes M\"uller, Jannes Nys, Marius Zeinhofer, Hang Zhang

Abstract: We propose a framework for the design and analysis of optimization algorithms in variational quantum Monte Carlo, drawing on geometric insights into the corresponding function space. The framework translates infinite-dimensional optimization dynamics into tractable parameter-space algorithms through a Galerkin projection onto the tangent space of the variational ansatz. This perspective unifies existing methods such as stochastic reconfiguration and Rayleigh-Gauss-Newton, provides connections to classic function-space algorithms, and motivates the derivation of novel algorithms with geometrically principled hyperparameter choices. We validate our framework with numerical experiments demonstrating its practical relevance through the accurate estimation of ground-state energies for several prototypical models in condensed matter physics modeled with neural network wavefunctions.

Comment: The paper introduces a novel framework for optimization in variational quantum Monte Carlo, which involves neural network wavefunctions. This aligns with foundational research in AI for Science, focusing on theoretical insights rather than applications.