Personalized Daily ArXiv Papers 2025-07-30

[gpt-4o]	Prompt	Completion	Total
Token	32402	3989	36391
Cost	$0.08	$0.04	$0.12

Total arXiv papers: 519

Total scanned papers: 339

Total relevant papers: 19

Table of contents with paper titles:

EvoSLD: Automated Neural Scaling Law Discovery With Large Language Models Authors: Haowei Lin, Xiangyu Wang, Jianzhu Ma, Yitao Liang
Hyperbolic Genome Embeddings Authors: Raiyan R. Khan, Philippe Chlenski, Itsik Pe'er
The Geometry of Harmfulness in LLMs through Subconcept Probing Authors: McNair Shah, Saleena Angeline, Adhitya Rajendra Kumar, Naitik Chheda, Kevin Zhu, Vasu Sharma, Sean O'Brien, Will Cai
MemShare: Memory Efficient Inference for Large Reasoning Models through KV Cache Reuse Authors: Kaiwen Chen, Xin Tan, Minchen Yu, Hong Xu
Shapley Uncertainty in Natural Language Generation Authors: Meilin Zhu, Gaojie Jin, Xiaowei Huang, Lijun Zhang
What Does it Mean for a Neural Network to Learn a "World Model"? Authors: Kenneth Li, Fernanda Vi\'egas, Martin Wattenberg
Reviving Your MNEME: Predicting The Side Effects of LLM Unlearning and Fine-Tuning via Sparse Model Diffing Authors: Aly M. Kassem, Zhuan Shi, Negar Rostamzadeh, Golnoosh Farnadi
Quantum Geometry of Data Authors: Alexander G. Abanov, Luca Candelori, Harold C. Steinacker, Martin T. Wells, Jerome R. Busemeyer, Cameron J. Hogan, Vahagn Kirakosyan, Nicola Marzari, Sunil Pinnamaneni, Dario Villani, Mengjia Xu, Kharen Musaelian
Higher-Order Kuramoto Oscillator Network for Dense Associative Memory Authors: Jona Nagerl, Natalia G. Berloff
Multi-state Protein Design with DynamicMPNN Authors: Alex Abrudan, Sebastian Pujalte Ojeda, Chaitanya K. Joshi, Matthew Greenig, Felipe Engelberger, Alena Khmelinskaia, Jens Meiler, Michele Vendruscolo, Tuomas P. J. Knowles
Detection Transformers Under the Knife: A Neuroscience-Inspired Approach to Ablations Authors: Nils H\"utten, Florian H\"olken, Hasan Tercan, Tobias Meisen
Reservoir Computation with Networks of Differentiating Neuron Ring Oscillators Authors: Alexander Yeung, Peter DelMastro, Arjun Karuvally, Hava Siegelmann, Edward Rietman, Hananel Hazan
DEM-NeRF: A Neuro-Symbolic Method for Scientific Discovery through Physics-Informed Simulation Authors: Wenkai Tan, Alvaro Velasquez, Houbing Song
Contrast-CAT: Contrasting Activations for Enhanced Interpretability in Transformer-based Text Classifiers Authors: Sungmin Han, Jeonghyun Lee, Sangkyun Lee
Torque-based Graph Surgery:Enhancing Graph Neural Networks with Hierarchical Rewiring Authors: Sujia Huang, Lele Fu, Zhen Cui, Tong Zhang, Na Song, Bo Huang
Weight-Parameterization in Continuous Time Deep Neural Networks for Surrogate Modeling Authors: Haley Rosso, Lars Ruthotto, Khachik Sargsyan
Deep Polynomial Chaos Expansion Authors: Johannes Exenberger, Sascha Ranftl, Robert Peharz
Hierarchical Stochastic Differential Equation Models for Latent Manifold Learning in Neural Time Series Authors: Pedram Rajaei, Maryam Ostadsharif Memar, Navid Ziaei, Behzad Nazari, Ali Yousefi
Unlocking Interpretability for RF Sensing: A Complex-Valued White-Box Transformer Authors: Xie Zhang, Yina Wang, Chenshu Wu

1. EvoSLD: Automated Neural Scaling Law Discovery With Large Language Models

ArXiv ID: 2507.21184

Authors: Haowei Lin, Xiangyu Wang, Jianzhu Ma, Yitao Liang

Abstract: Scaling laws are fundamental mathematical relationships that predict how neural network performance evolves with changes in variables such as model size, dataset size, and computational resources. Traditionally, discovering these laws requires extensive human expertise and manual experimentation. We introduce EvoSLD, an automated framework for Scaling Law Discovery (SLD) that leverages evolutionary algorithms guided by Large Language Models (LLMs) to co-evolve symbolic expressions and their optimization routines. Formulated to handle scaling variables, control variables, and response metrics across diverse experimental settings, EvoSLD searches for parsimonious, universal functional forms that minimize fitting errors on grouped data subsets. Evaluated on five real-world scenarios from recent literature, EvoSLD rediscovers exact human-derived laws in two cases and surpasses them in others, achieving up to orders-of-magnitude reductions in normalized mean squared error on held-out test sets. Compared to baselines like symbolic regression and ablated variants, EvoSLD demonstrates superior accuracy, interpretability, and efficiency, highlighting its potential to accelerate AI research. Code is available at https://github.com/linhaowei1/SLD.

Comment: The paper introduces EvoSLD, an automated framework for discovering scaling laws using evolutionary algorithms and LLMs, which aligns with foundational research in LLMs and representation learning.

Relevance: 9 Novelty: 8

2. Hyperbolic Genome Embeddings

ArXiv ID: 2507.21648

Authors: Raiyan R. Khan, Philippe Chlenski, Itsik Pe'er

Abstract: Current approaches to genomic sequence modeling often struggle to align the inductive biases of machine learning models with the evolutionarily-informed structure of biological systems. To this end, we formulate a novel application of hyperbolic CNNs that exploits this structure, enabling more expressive DNA sequence representations. Our strategy circumvents the need for explicit phylogenetic mapping while discerning key properties of sequences pertaining to core functional and regulatory behavior. Across 37 out of 42 genome interpretation benchmark datasets, our hyperbolic models outperform their Euclidean equivalents. Notably, our approach even surpasses state-of-the-art performance on seven GUE benchmark datasets, consistently outperforming many DNA language models while using orders of magnitude fewer parameters and avoiding pretraining. Our results include a novel set of benchmark datasets--the Transposable Elements Benchmark--which explores a major but understudied component of the genome with deep evolutionary significance. We further motivate our work by exploring how our hyperbolic models recognize genomic signal under various data-generating conditions and by constructing an empirical method for interpreting the hyperbolicity of dataset embeddings. Throughout these assessments, we find persistent evidence highlighting the potential of our hyperbolic framework as a robust paradigm for genome representation learning. Our code and benchmark datasets are available at https://github.com/rrkhan/HGE.

Comment: The paper introduces hyperbolic CNNs for genomic sequence modeling, contributing to representation learning with a novel approach.

Relevance: 9 Novelty: 8

3. The Geometry of Harmfulness in LLMs through Subconcept Probing

ArXiv ID: 2507.21141

Authors: McNair Shah, Saleena Angeline, Adhitya Rajendra Kumar, Naitik Chheda, Kevin Zhu, Vasu Sharma, Sean O'Brien, Will Cai

Abstract: Recent advances in large language models (LLMs) have intensified the need to understand and reliably curb their harmful behaviours. We introduce a multidimensional framework for probing and steering harmful content in model internals. For each of 55 distinct harmfulness subconcepts (e.g., racial hate, employment scams, weapons), we learn a linear probe, yielding 55 interpretable directions in activation space. Collectively, these directions span a harmfulness subspace that we show is strikingly low-rank. We then test ablation of the entire subspace from model internals, as well as steering and ablation in the subspace's dominant direction. We find that dominant direction steering allows for near elimination of harmfulness with a low decrease in utility. Our findings advance the emerging view that concept subspaces provide a scalable lens on LLM behaviour and offer practical tools for the community to audit and harden future generations of language models.

Comment: The paper explores the geometry of harmfulness in LLMs through subconcept probing, providing theoretical insights into LLM behavior and interpretability.

Relevance: 9 Novelty: 8

4. MemShare: Memory Efficient Inference for Large Reasoning Models through KV Cache Reuse

ArXiv ID: 2507.21433

Authors: Kaiwen Chen, Xin Tan, Minchen Yu, Hong Xu

Abstract: Large Reasoning Models (LRMs) have achieved significant advances in mathematical reasoning and formal logic tasks. However, their tendency to generate lengthy chain-of-thought sequences leads to substantial memory overhead during inference. We observe that LRMs frequently produce highly similar intermediate reasoning steps, which correspond to similar KV cache states across layers. Motivated by this observation, we propose MemShare, a novel KV cache management approach that effectively reduces memory overhead. MemShare employs a collaborative filtering algorithm to efficiently identify reusable KV cache blocks and enables zero copy cache reuse to significantly reduce memory overhead, improve throughput while maintaining accuracy. Experimental results demonstrate that MemShare delivers up to 84.79\% improvement in throughput while maintaining better accuracy compared to existing KV cache management methods.

Comment: The paper introduces MemShare, a novel KV cache management approach for memory efficiency in large reasoning models, which aligns with model compression through KV cache reuse.

Relevance: 9 Novelty: 8

5. Shapley Uncertainty in Natural Language Generation

ArXiv ID: 2507.21406

Authors: Meilin Zhu, Gaojie Jin, Xiaowei Huang, Lijun Zhang

Abstract: In question-answering tasks, determining when to trust the outputs is crucial to the alignment of large language models (LLMs). Kuhn et al. (2023) introduces semantic entropy as a measure of uncertainty, by incorporating linguistic invariances from the same meaning. It primarily relies on setting threshold to measure the level of semantic equivalence relation. We propose a more nuanced framework that extends beyond such thresholding by developing a Shapley-based uncertainty metric that captures the continuous nature of semantic relationships. We establish three fundamental properties that characterize valid uncertainty metrics and prove that our Shapley uncertainty satisfies these criteria. Through extensive experiments, we demonstrate that our Shapley uncertainty more accurately predicts LLM performance in question-answering and other datasets, compared to similar baseline measures.

Comment: The paper introduces a Shapley-based uncertainty metric for LLMs, which provides theoretical insights into LLM behavior and interpretability.

Relevance: 9 Novelty: 8

6. What Does it Mean for a Neural Network to Learn a "World Model"?

ArXiv ID: 2507.21513

Authors: Kenneth Li, Fernanda Vi\'egas, Martin Wattenberg

Abstract: We propose a set of precise criteria for saying a neural net learns and uses a "world model." The goal is to give an operational meaning to terms that are often used informally, in order to provide a common language for experimental investigation. We focus specifically on the idea of representing a latent "state space" of the world, leaving modeling the effect of actions to future work. Our definition is based on ideas from the linear probing literature, and formalizes the notion of a computation that factors through a representation of the data generation process. An essential addition to the definition is a set of conditions to check that such a "world model" is not a trivial consequence of the neural net's data or task.

Comment: The paper proposes criteria for neural networks to learn a 'world model', offering theoretical insights into representation learning.

Relevance: 9 Novelty: 8

7. Reviving Your MNEME: Predicting The Side Effects of LLM Unlearning and Fine-Tuning via Sparse Model Diffing

ArXiv ID: 2507.21084

Authors: Aly M. Kassem, Zhuan Shi, Negar Rostamzadeh, Golnoosh Farnadi

Abstract: Large language models (LLMs) are frequently fine-tuned or unlearned to adapt to new tasks or eliminate undesirable behaviors. While existing evaluation methods assess performance after such interventions, there remains no general approach for detecting unintended side effects, such as unlearning biology content degrading performance on chemistry tasks, particularly when these effects are unpredictable or emergent. To address this issue, we introduce MNEME, Model diffiNg for Evaluating Mechanistic Effects, a lightweight framework for identifying these side effects using sparse model diffing. MNEME compares base and fine-tuned models on task-agnostic data (for example, The Pile, LMSYS-Chat-1M) without access to fine-tuning data to isolate behavioral shifts. Applied to five LLMs across three scenarios: WMDP knowledge unlearning, emergent misalignment, and benign fine-tuning, MNEME achieves up to 95 percent accuracy in predicting side effects, aligning with known benchmarks and requiring no custom heuristics. Furthermore, we show that retraining on high-activation samples can partially reverse these effects. Our results demonstrate that sparse probing and diffing offer a scalable and automated lens into fine-tuning-induced model changes, providing practical tools for understanding and managing LLM behavior.

Comment: The paper presents MNEME, a framework for detecting side effects in LLMs using sparse model diffing, contributing to theoretical insights into LLM behavior.

Relevance: 9 Novelty: 7

8. Quantum Geometry of Data

ArXiv ID: 2507.21135

Authors: Alexander G. Abanov, Luca Candelori, Harold C. Steinacker, Martin T. Wells, Jerome R. Busemeyer, Cameron J. Hogan, Vahagn Kirakosyan, Nicola Marzari, Sunil Pinnamaneni, Dario Villani, Mengjia Xu, Kharen Musaelian

Abstract: We demonstrate how Quantum Cognition Machine Learning (QCML) encodes data as quantum geometry. In QCML, features of the data are represented by learned Hermitian matrices, and data points are mapped to states in Hilbert space. The quantum geometry description endows the dataset with rich geometric and topological structure - including intrinsic dimension, quantum metric, and Berry curvature - derived directly from the data. QCML captures global properties of data, while avoiding the curse of dimensionality inherent in local methods. We illustrate this on a number of synthetic and real-world examples. Quantum geometric representation of QCML could advance our understanding of cognitive phenomena within the framework of quantum cognition.

Comment: The paper explores quantum geometry in data representation, which is relevant to emerging trends in representation learning.

Relevance: 8 Novelty: 8

9. Higher-Order Kuramoto Oscillator Network for Dense Associative Memory

ArXiv ID: 2507.21984

Authors: Jona Nagerl, Natalia G. Berloff

Abstract: Networks of phase oscillators can serve as dense associative memories if they incorporate higher-order coupling beyond the classical Kuramoto model's pairwise interactions. Here we introduce a generalized Kuramoto model with combined second-harmonic (pairwise) and fourth-harmonic (quartic) coupling, inspired by dense Hopfield memory theory. Using mean-field theory and its dynamical approximation, we obtain a phase diagram for dense associative memory model that exhibits a tricritical point at which the continuous onset of memory retrieval is supplanted by a discontinuous, hysteretic transition. In the quartic-dominated regime, the system supports bistable phase-locked states corresponding to stored memory patterns, with a sizable energy barrier between memory and incoherent states. We analytically determine this bistable region and show that the escape time from a memory state (due to noise) grows exponentially with network size, indicating robust storage. Extending the theory to finite memory load, we show that higher-order couplings achieve superlinear scaling of memory capacity with system size, far exceeding the limit of pairwise-only oscillators. Large-scale simulations of the oscillator network confirm our theoretical predictions, demonstrating rapid pattern retrieval and robust storage of many phase patterns. These results bridge the Kuramoto synchronization with modern Hopfield memories, pointing toward experimental realization of high-capacity, analog associative memory in oscillator systems.

Comment: The paper introduces a higher-order Kuramoto oscillator network for dense associative memory, which is relevant to emerging trends in foundational research.

Relevance: 8 Novelty: 8

10. Multi-state Protein Design with DynamicMPNN

ArXiv ID: 2507.21938

Authors: Alex Abrudan, Sebastian Pujalte Ojeda, Chaitanya K. Joshi, Matthew Greenig, Felipe Engelberger, Alena Khmelinskaia, Jens Meiler, Michele Vendruscolo, Tuomas P. J. Knowles

Abstract: Structural biology has long been dominated by the one sequence, one structure, one function paradigm, yet many critical biological processes - from enzyme catalysis to membrane transport - depend on proteins that adopt multiple conformational states. Existing multi-state design approaches rely on post-hoc aggregation of single-state predictions, achieving poor experimental success rates compared to single-state design. We introduce DynamicMPNN, an inverse folding model explicitly trained to generate sequences compatible with multiple conformations through joint learning across conformational ensembles. Trained on 46,033 conformational pairs covering 75% of CATH superfamilies and evaluated using AlphaFold initial guess, DynamicMPNN outperforms ProteinMPNN by up to 13% on structure-normalized RMSD across our challenging multi-state protein benchmark.

Comment: The paper introduces DynamicMPNN, a model for multi-state protein design, which is relevant to foundational research in AI for Science.