Personalized Daily ArXiv Papers 2025-08-01

[gpt-4o]	Prompt	Completion	Total
Token	24470	3003	27473
Cost	$0.06	$0.03	$0.09

Total arXiv papers: 400

Total scanned papers: 240

Total relevant papers: 14

Table of contents with paper titles:

Invisible Architectures of Thought: Toward a New Science of AI as Cognitive Infrastructure Authors: Giuseppe Riva
How does Chain of Thought Think? Mechanistic Interpretability of Chain-of-Thought Reasoning with Sparse Autoencoding Authors: Xi Chen, Aske Plaat, Niki van Stein
Model Directions, Not Words: Mechanistic Topic Models Using Sparse Autoencoders Authors: Carolina Zheng, Nicolas Beltran-Velez, Sweta Karlekar, Claudia Shi, Achille Nazaret, Asif Mallik, Amir Feder, David M. Blei
Semantic Convergence: Investigating Shared Representations Across Scaled LLMs Authors: Daniel Son, Sanjana Rathore, Andrew Rufail, Adrian Simon, Daniel Zhang, Soham Dave, Cole Blondin, Kevin Zhu, Sean O'Brien
BAR Conjecture: the Feasibility of Inference Budget-Constrained LLM Services with Authenticity and Reasoning Authors: Jinan Zhou, Rajat Ghosh, Vaishnavi Bhargava, Debojyoti Dutta, Aryan Singhal
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving Authors: Luoxin Chen, Jinming Gu, Liankai Huang, Wenhao Huang, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Kaijing Ma, Cheng Ren, Jiawei Shen, Wenlei Shi, Tong Sun, He Sun, Jiahui Wang, Siran Wang, Zhihong Wang, Chenrui Wei, Shufa Wei, Yonghui Wu, Yuchen Wu, Yihang Xia, Huajian Xin, Fan Yang, Huaiyuan Ying, Hongyi Yuan, Zheng Yuan, Tianyang Zhan, Chi Zhang, Yue Zhang, Ge Zhang, Tianyun Zhao, Jianqiu Zhao, Yichi Zhou, Thomas Hanwen Zhu
A Verifier Hierarchy Authors: Maurits Kaptein
AI paradigm for solving differential equations: first-principles data generation and scale-dilation operator AI solver Authors: Xiangshu Gong, Zhiqiang Xie, Xiaowei Jin, Chen Wang, Yanling Qu, Wangmeng Zuo, Hui Li
Differentially Private Clipped-SGD: High-Probability Convergence with Arbitrary Clipping Level Authors: Saleh Vatan Khah, Savelii Chezhegov, Shahrokh Farahmand, Samuel Horv'ath, Eduard Gorbunov
Coflex: Enhancing HW-NAS with Sparse Gaussian Processes for Efficient and Scalable DNN Accelerator Design Authors: Yinhui Ma, Tomomasa Yamasaki, Zhehui Wang, Tao Luo, Bo Wang
Efficient Machine Unlearning via Influence Approximation Authors: Jiawei Liu, Chenwang Wu, Defu Lian, Enhong Chen
Solution-aware vs global ReLU selection: partial MILP strikes back for DNN verification Authors: Yuke Liao, Blaise Genest, Kuldeep Meel, Shaan Aryaman
SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model Authors: Mingkai Deng, Jinyu Hou, Yilin Shen, Hongxia Jin, Graham Neubig, Zhiting Hu, Eric Xing
Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions Authors: Piotr Indyk, Michael Kapralov, Kshiteej Sheth, Tal Wagner

1. Invisible Architectures of Thought: Toward a New Science of AI as Cognitive Infrastructure

ArXiv ID: 2507.22893

Authors: Giuseppe Riva

Abstract: Contemporary human-AI interaction research overlooks how AI systems fundamentally reshape human cognition pre-consciously, a critical blind spot for understanding distributed cognition. This paper introduces "Cognitive Infrastructure Studies" (CIS) as a new interdisciplinary domain to reconceptualize AI as "cognitive infrastructures": foundational, often invisible systems conditioning what is knowable and actionable in digital societies. These semantic infrastructures transport meaning, operate through anticipatory personalization, and exhibit adaptive invisibility, making their influence difficult to detect. Critically, they automate "relevance judgment," shifting the "locus of epistemic agency" to non-human systems. Through narrative scenarios spanning individual (cognitive dependency), collective (democratic deliberation), and societal (governance) scales, we describe how cognitive infrastructures reshape human cognition, public reasoning, and social epistemologies. CIS aims to address how AI preprocessing reshapes distributed cognition across individual, collective, and cultural scales, requiring unprecedented integration of diverse disciplinary methods. The framework also addresses critical gaps across disciplines: cognitive science lacks population-scale preprocessing analysis capabilities, digital sociology cannot access individual cognitive mechanisms, and computational approaches miss cultural transmission dynamics. To achieve this goal CIS also provides methodological innovations for studying invisible algorithmic influence: "infrastructure breakdown methodologies", experimental approaches that reveal cognitive dependencies by systematically withdrawing AI preprocessing after periods of habituation.

Comment: The paper introduces 'Cognitive Infrastructure Studies' as a new interdisciplinary domain, which could be considered an emerging trend challenging established assumptions.

Relevance: 9 Novelty: 9

2. How does Chain of Thought Think? Mechanistic Interpretability of Chain-of-Thought Reasoning with Sparse Autoencoding

ArXiv ID: 2507.22928

Authors: Xi Chen, Aske Plaat, Niki van Stein

Abstract: Chain-of-thought (CoT) prompting boosts Large Language Models accuracy on multi-step tasks, yet whether the generated "thoughts" reflect the true internal reasoning process is unresolved. We present the first feature-level causal study of CoT faithfulness. Combining sparse autoencoders with activation patching, we extract monosemantic features from Pythia-70M and Pythia-2.8B while they tackle GSM8K math problems under CoT and plain (noCoT) prompting. Swapping a small set of CoT-reasoning features into a noCoT run raises answer log-probabilities significantly in the 2.8B model, but has no reliable effect in 70M, revealing a clear scale threshold. CoT also leads to significantly higher activation sparsity and feature interpretability scores in the larger model, signalling more modular internal computation. For example, the model's confidence in generating correct answers improves from 1.2 to 4.3. We introduce patch-curves and random-feature patching baselines, showing that useful CoT information is not only present in the top-K patches but widely distributed. Overall, our results indicate that CoT can induce more interpretable internal structures in high-capacity LLMs, validating its role as a structured prompting method.

Comment: The paper investigates the mechanistic interpretability of Chain-of-Thought reasoning using sparse autoencoding, which aligns with representation learning and LLM interpretability.

Relevance: 9 Novelty: 8

3. Model Directions, Not Words: Mechanistic Topic Models Using Sparse Autoencoders

ArXiv ID: 2507.23220

Authors: Carolina Zheng, Nicolas Beltran-Velez, Sweta Karlekar, Claudia Shi, Achille Nazaret, Asif Mallik, Amir Feder, David M. Blei

Abstract: Traditional topic models are effective at uncovering latent themes in large text collections. However, due to their reliance on bag-of-words representations, they struggle to capture semantically abstract features. While some neural variants use richer representations, they are similarly constrained by expressing topics as word lists, which limits their ability to articulate complex topics. We introduce Mechanistic Topic Models (MTMs), a class of topic models that operate on interpretable features learned by sparse autoencoders (SAEs). By defining topics over this semantically rich space, MTMs can reveal deeper conceptual themes with expressive feature descriptions. Moreover, uniquely among topic models, MTMs enable controllable text generation using topic-based steering vectors. To properly evaluate MTM topics against word-list-based approaches, we propose \textit{topic judge}, an LLM-based pairwise comparison evaluation framework. Across five datasets, MTMs match or exceed traditional and neural baselines on coherence metrics, are consistently preferred by topic judge, and enable effective steering of LLM outputs.

Comment: The paper introduces Mechanistic Topic Models using sparse autoencoders, aligning with representation learning and offering novel insights into topic modeling.

Relevance: 9 Novelty: 8

4. Semantic Convergence: Investigating Shared Representations Across Scaled LLMs

ArXiv ID: 2507.22918

Authors: Daniel Son, Sanjana Rathore, Andrew Rufail, Adrian Simon, Daniel Zhang, Soham Dave, Cole Blondin, Kevin Zhu, Sean O'Brien

Abstract: We investigate feature universality in Gemma-2 language models (Gemma-2-2B and Gemma-2-9B), asking whether models with a four-fold difference in scale still converge on comparable internal concepts. Using the Sparse Autoencoder (SAE) dictionary-learning pipeline, we utilize SAEs on each model's residual-stream activations, align the resulting monosemantic features via activation correlation, and compare the matched feature spaces with SVCCA and RSA. Middle layers yield the strongest overlap, while early and late layers show far less similarity. Preliminary experiments extend the analysis from single tokens to multi-token subspaces, showing that semantically similar subspaces interact similarly with language models. These results strengthen the case that large language models carve the world into broadly similar, interpretable features despite size differences, reinforcing universality as a foundation for cross-model interpretability.

Comment: The paper investigates feature universality in LLMs using Sparse Autoencoders, relevant to representation learning and LLM behavior analysis.

Relevance: 9 Novelty: 8

5. BAR Conjecture: the Feasibility of Inference Budget-Constrained LLM Services with Authenticity and Reasoning

ArXiv ID: 2507.23170

Authors: Jinan Zhou, Rajat Ghosh, Vaishnavi Bhargava, Debojyoti Dutta, Aryan Singhal

Abstract: When designing LLM services, practitioners care about three key properties: inference-time budget, factual authenticity, and reasoning capacity. However, our analysis shows that no model can simultaneously optimize for all three. We formally prove this trade-off and propose a principled framework named The BAR Theorem for LLM-application design.

Comment: The paper introduces the BAR Theorem, which provides a theoretical framework for understanding trade-offs in LLM services, aligning with the interest in theoretical insights into LLM behavior.

Relevance: 9 Novelty: 8

6. Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

ArXiv ID: 2507.23726

Authors: Luoxin Chen, Jinming Gu, Liankai Huang, Wenhao Huang, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Kaijing Ma, Cheng Ren, Jiawei Shen, Wenlei Shi, Tong Sun, He Sun, Jiahui Wang, Siran Wang, Zhihong Wang, Chenrui Wei, Shufa Wei, Yonghui Wu, Yuchen Wu, Yihang Xia, Huajian Xin, Fan Yang, Huaiyuan Ying, Hongyi Yuan, Zheng Yuan, Tianyang Zhan, Chi Zhang, Yue Zhang, Ge Zhang, Tianyun Zhao, Jianqiu Zhao, Yichi Zhou, Thomas Hanwen Zhu

Abstract: LLMs have demonstrated strong mathematical reasoning abilities by leveraging reinforcement learning with long chain-of-thought, yet they continue to struggle with theorem proving due to the lack of clear supervision signals when solely using natural language. Dedicated domain-specific languages like Lean provide clear supervision via formal verification of proofs, enabling effective training through reinforcement learning. In this work, we propose \textbf{Seed-Prover}, a lemma-style whole-proof reasoning model. Seed-Prover can iteratively refine its proof based on Lean feedback, proved lemmas, and self-summarization. To solve IMO-level contest problems, we design three test-time inference strategies that enable both deep and broad reasoning. Seed-Prover proves $78.1%$ of formalized past IMO problems, saturates MiniF2F, and achieves over 50% on PutnamBench, outperforming the previous state-of-the-art by a large margin. To address the lack of geometry support in Lean, we introduce a geometry reasoning engine \textbf{Seed-Geometry}, which outperforms previous formal geometry engines. We use these two systems to participate in IMO 2025 and fully prove 5 out of 6 problems. This work represents a significant advancement in automated mathematical reasoning, demonstrating the effectiveness of formal verification with long chain-of-thought reasoning.

Comment: The paper presents Seed-Prover, a model for automated theorem proving with architectural innovations, aligning with the AI for Science criterion.

Relevance: 8 Novelty: 8

7. A Verifier Hierarchy

ArXiv ID: 2507.23504

Authors: Maurits Kaptein

Abstract: We investigate the trade-off between certificate length and verifier runtime. We prove a Verifier Trade-off Theorem showing that reducing the inherent verification time of a language from (f(n)) to (g(n)), where (f(n) \ge g(n)), requires certificates of length at least (\Omega(\log(f(n) / g(n)))). This theorem induces a natural hierarchy based on certificate complexity. We demonstrate its applicability to analyzing conjectured separations between complexity classes (e.g., (\np) and (\exptime)) and to studying natural problems such as string periodicity and rotation detection. Additionally, we provide perspectives on the (\p) vs. (\np) problem by relating it to the existence of sub-linear certificates.

Comment: The paper presents a Verifier Trade-off Theorem, contributing to theoretical insights in complexity theory, which is relevant to emerging trends.

Relevance: 8 Novelty: 8

8. AI paradigm for solving differential equations: first-principles data generation and scale-dilation operator AI solver

ArXiv ID: 2507.23141

Authors: Xiangshu Gong, Zhiqiang Xie, Xiaowei Jin, Chen Wang, Yanling Qu, Wangmeng Zuo, Hui Li

Abstract: Many problems are governed by differential equations (DEs). Artificial intelligence (AI) is a new path for solving DEs. However, data is very scarce and existing AI solvers struggle with approximation of high frequency components (AHFC). We propose an AI paradigm for solving diverse DEs, including DE-ruled first-principles data generation methodology and scale-dilation operator (SDO) AI solver. Using either prior knowledge or random fields, we generate solutions and then substitute them into the DEs to derive the sources and initial/boundary conditions through balancing DEs, thus producing arbitrarily vast amount of, first-principles-consistent training datasets at extremely low computational cost. We introduce a reversible SDO that leverages the Fourier transform of the multiscale solutions to fix AHFC, and design a spatiotemporally coupled, attention-based Transformer AI solver of DEs with SDO. An upper bound on the Hessian condition number of the loss function is proven to be proportional to the squared 2-norm of the solution gradient, revealing that SDO yields a smoother loss landscape, consequently fixing AHFC with efficient training. Extensive tests on diverse DEs demonstrate that our AI paradigm achieves consistently superior accuracy over state-of-the-art methods. This work makes AI solver of DEs to be truly usable in broad nature and engineering fields.

Comment: The paper introduces a novel AI paradigm for solving differential equations using a Transformer-based AI solver, which aligns with the Model Architecture criterion. It also provides theoretical insights into the training dynamics, relevant to Representation Learning.

Relevance: 8 Novelty: 8

9. Differentially Private Clipped-SGD: High-Probability Convergence with Arbitrary Clipping Level

ArXiv ID: 2507.23512

Authors: Saleh Vatan Khah, Savelii Chezhegov, Shahrokh Farahmand, Samuel Horv'ath, Eduard Gorbunov

Abstract: Gradient clipping is a fundamental tool in Deep Learning, improving the high-probability convergence of stochastic first-order methods like SGD, AdaGrad, and Adam under heavy-tailed noise, which is common in training large language models. It is also a crucial component of Differential Privacy (DP) mechanisms. However, existing high-probability convergence analyses typically require the clipping threshold to increase with the number of optimization steps, which is incompatible with standard DP mechanisms like the Gaussian mechanism. In this work, we close this gap by providing the first high-probability convergence analysis for DP-Clipped-SGD with a fixed clipping level, applicable to both convex and non-convex smooth optimization under heavy-tailed noise, characterized by a bounded central $\alpha$-th moment assumption, $\alpha \in (1,2]$. Our results show that, with a fixed clipping level, the method converges to a neighborhood of the optimal solution with a faster rate than the existing ones. The neighborhood can be balanced against the noise introduced by DP, providing a refined trade-off between convergence speed and privacy guarantees.

Comment: The paper provides a high-probability convergence analysis for DP-Clipped-SGD with a fixed clipping level, relevant to model compression and efficiency.