Personalized Daily ArXiv Papers 2025-07-31

[gpt-4o]	Prompt	Completion	Total
Token	25146	2372	27518
Cost	$0.06	$0.02	$0.09

Total arXiv papers: 377

Total scanned papers: 224

Total relevant papers: 17

Table of contents with paper titles:

Teaching the Teacher: Improving Neural Network Distillability for Symbolic Regression via Jacobian Regularization Authors: Soumyadeep Dhar, Kei Sen Fong, Mehul Motani
FGFP: A Fractional Gaussian Filter and Pruning for Deep Neural Networks Compression Authors: Kuan-Ting Tu, Po-Hsien Yu, Yu-Syuan Tseng, Shao-Yi Chien
Better Together: Cross and Joint Covariances Enhance Signal Detectability in Undersampled Data Authors: Arabind Swain, Sean Alexander Ridout, Ilya Nemenman
LLM-Crowdsourced: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models Authors: Qianhong Guo, Wei Xie, Xiaofang Cai, Enze Wang, Shuoyoucheng Ma, Kai Chen, Xiaofeng Wang, Baosheng Wang
Amorphous Solid Model of Vectorial Hopfield Neural Networks Authors: F. Gallavotti, A. Zaccone
DO-EM: Density Operator Expectation Maximization Authors: Adit Vishnu, Abhay Shastry, Dhruva Kashyap, Chiranjib Bhattacharyya
MSQ: Memory-Efficient Bit Sparsification Quantization Authors: Seokho Han, Seoyeon Yoon, Jinhee Kim, Dongwei Wang, Kang Eun Jeon, Huanrui Yang, Jong Hwan Ko
MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention Authors: Yuqi Pang, Bowen Yang, Yun Cao, Fan Rong, Xiaoyu Li, Chen He
Representation biases: will we achieve complete understanding by analyzing representations? Authors: Andrew Kyle Lampinen, Stephanie C. Y. Chan, Yuxuan Li, Katherine Hermann
Synchronization of mean-field models on the circle Authors: Yury Polyanskiy, Philippe Rigollet, Andrew Yao
Stacked SVD or SVD stacked? A Random Matrix Theory perspective on data integration Authors: Tavor Z. Baharav, Phillip B. Nicol, Rafael A. Irizarry, Rong Ma
When Truthful Representations Flip Under Deceptive Instructions? Authors: Xianxuan Long, Yao Fu, Runchao Li, Mu Sheng, Haotian Yu, Xiaotian Han, Pan Li
Consistency of Feature Attribution in Deep Learning Architectures for Multi-Omics Authors: Daniel Claborne, Javier Flores, Samantha Erwin, Luke Durell, Rachel Richardson, Ruby Fore, Lisa Bramer
What is an "Abstract Reasoner"? Revisiting Experiments and Arguments about Large Language Models Authors: Tian Yun, Chen Sun, Ellie Pavlick
Meaning-infused grammar: Gradient Acceptability Shapes the Geometric Representations of Constructions in LLMs Authors: Supantho Rakshit, Adele Goldberg
RCR-AF: Enhancing Model Generalization via Rademacher Complexity Reduction Activation Function Authors: Yunrui Yu, Kafeng Wang, Hang Su, Jun Zhu
Subgrid BoostCNN: Efficient Boosting of Convolutional Networks via Gradient-Guided Feature Selection Authors: Biyi Fang, Jean Utke, Truong Vo, Diego Klabjan

1. Teaching the Teacher: Improving Neural Network Distillability for Symbolic Regression via Jacobian Regularization

ArXiv ID: 2507.22767

Authors: Soumyadeep Dhar, Kei Sen Fong, Mehul Motani

Abstract: Distilling large neural networks into simple, human-readable symbolic formulas is a promising path toward trustworthy and interpretable AI. However, this process is often brittle, as the complex functions learned by standard networks are poor targets for symbolic discovery, resulting in low-fidelity student models. In this work, we propose a novel training paradigm to address this challenge. Instead of passively distilling a pre-trained network, we introduce a \textbf{Jacobian-based regularizer} that actively encourages the ``teacher'' network to learn functions that are not only accurate but also inherently smoother and more amenable to distillation. We demonstrate through extensive experiments on a suite of real-world regression benchmarks that our method is highly effective. By optimizing the regularization strength for each problem, we improve the $R^2$ score of the final distilled symbolic model by an average of \textbf{120\% (relative)} compared to the standard distillation pipeline, all while maintaining the teacher's predictive accuracy. Our work presents a practical and principled method for significantly improving the fidelity of interpretable models extracted from complex neural networks.

Comment: The paper introduces a novel training paradigm using Jacobian regularization to improve neural network distillability, which is relevant to representation learning and model compression.

Relevance: 9 Novelty: 8

2. FGFP: A Fractional Gaussian Filter and Pruning for Deep Neural Networks Compression

ArXiv ID: 2507.22527

Authors: Kuan-Ting Tu, Po-Hsien Yu, Yu-Syuan Tseng, Shao-Yi Chien

Abstract: Network compression techniques have become increasingly important in recent years because the loads of Deep Neural Networks (DNNs) are heavy for edge devices in real-world applications. While many methods compress neural network parameters, deploying these models on edge devices remains challenging. To address this, we propose the fractional Gaussian filter and pruning (FGFP) framework, which integrates fractional-order differential calculus and Gaussian function to construct fractional Gaussian filters (FGFs). To reduce the computational complexity of fractional-order differential operations, we introduce Gr\"unwald-Letnikov fractional derivatives to approximate the fractional-order differential equation. The number of parameters for each kernel in FGF is minimized to only seven. Beyond the architecture of Fractional Gaussian Filters, our FGFP framework also incorporates Adaptive Unstructured Pruning (AUP) to achieve higher compression ratios. Experiments on various architectures and benchmarks show that our FGFP framework outperforms recent methods in accuracy and compression. On CIFAR-10, ResNet-20 achieves only a 1.52% drop in accuracy while reducing the model size by 85.2%. On ImageNet2012, ResNet-50 achieves only a 1.63% drop in accuracy while reducing the model size by 69.1%.

Comment: The paper introduces a novel framework for model compression using fractional Gaussian filters and pruning, which aligns with interests in model compression and efficiency.

Relevance: 9 Novelty: 8

3. Better Together: Cross and Joint Covariances Enhance Signal Detectability in Undersampled Data

ArXiv ID: 2507.22207

Authors: Arabind Swain, Sean Alexander Ridout, Ilya Nemenman

Abstract: Many data-science applications involve detecting a shared signal between two high-dimensional variables. Using random matrix theory methods, we determine when such signal can be detected and reconstructed from sample correlations, despite the background of sampling noise induced correlations. We consider three different covariance matrices constructed from two high-dimensional variables: their individual self covariance, their cross covariance, and the self covariance of the concatenated (joint) variable, which incorporates the self and the cross correlation blocks. We observe the expected Baik, Ben Arous, and P\'ech\'e detectability phase transition in all these covariance matrices, and we show that joint and cross covariance matrices always reconstruct the shared signal earlier than the self covariances. Whether the joint or the cross approach is better depends on the mismatch of dimensionalities between the variables. We discuss what these observations mean for choosing the right method for detecting linear correlations in data and how these findings may generalize to nonlinear statistical dependencies.

Comment: The paper discusses the use of covariance matrices to detect shared signals in high-dimensional data, which aligns with representation learning and foundational research in understanding data encoding.

Relevance: 9 Novelty: 8

4. LLM-Crowdsourced: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models

ArXiv ID: 2507.22359

Authors: Qianhong Guo, Wei Xie, Xiaofang Cai, Enze Wang, Shuoyoucheng Ma, Kai Chen, Xiaofeng Wang, Baosheng Wang

Abstract: Although large language models (LLMs) demonstrate remarkable capabilities across various tasks, evaluating their capabilities remains a challenging task. Existing evaluation methods suffer from issues such as data contamination, black-box operation, and subjective preference. These issues make it difficult to evaluate the LLMs' true capabilities comprehensively. To tackle these challenges, we propose a novel benchmark-free evaluation paradigm, LLM-Crowdsourced. It utilizes LLMs to generate questions, answer independently, and evaluate mutually. This method integrates four key evaluation criteria: dynamic, transparent, objective, and professional, which existing evaluation methods cannot satisfy simultaneously. Experiments on eight mainstream LLMs across mathematics and programming verify the advantages of our method in distinguishing LLM performance. Furthermore, our study reveals several novel findings that are difficult for traditional methods to detect, including but not limited to: (1) Gemini demonstrates the highest original and professional question-design capabilities among others; (2) Some LLMs exhibit ''memorization-based answering'' by misrecognizing questions as familiar ones with a similar structure; (3) LLM evaluation results demonstrate high consistency (robustness).

Comment: The paper proposes a novel benchmark-free evaluation paradigm for LLMs, which aligns with foundational research in evaluating LLM capabilities and introduces a new evaluation method.

Relevance: 9 Novelty: 8

5. Amorphous Solid Model of Vectorial Hopfield Neural Networks

ArXiv ID: 2507.22787

Authors: F. Gallavotti, A. Zaccone

Abstract: We present a vectorial extension of the Hopfield associative memory model inspired by the theory of amorphous solids, where binary neural states are replaced by unit vectors $\mathbf{s}_i \in \mathbb{R}^3$ on the sphere $S^2$. The generalized Hebbian learning rule creates a block-structured weight matrix through outer products of stored pattern vectors, analogous to the Hessian matrix structure in amorphous solids. We demonstrate that this model exhibits quantifiable structural properties characteristic of disordered materials: energy landscapes with deep minima for stored patterns versus random configurations (energy gaps $\sim 7$ units), strongly anisotropic correlations encoded in the weight matrix (anisotropy ratios $\sim 10^2$), and order-disorder transitions controlled by the pattern density $\gamma = P/(N \cdot d)$. The enhanced memory capacity ($\gamma_c \approx 0.55$ for a fully-connected network) compared to binary networks ($\gamma_c \approx 0.138$) and the emergence of orientational correlations establish connections between associative memory mechanisms and amorphous solid physics, particularly in systems with continuous orientational degrees of freedom. We also unveil the scaling with the coordination number $Z$ of the memory capacity: $\gamma_c \sim (Z-6)$ from the isostatic point $Z_c =6$ of the 3D elastic network, which closely mirrors the scaling of the shear modulus $G \sim (Z-6)$ in 3D central-force spring networks.

Comment: The paper introduces a novel extension of the Hopfield model with connections to amorphous solid physics, which is relevant to representation learning and emerging trends.

Relevance: 9 Novelty: 8

6. DO-EM: Density Operator Expectation Maximization

ArXiv ID: 2507.22786

Authors: Adit Vishnu, Abhay Shastry, Dhruva Kashyap, Chiranjib Bhattacharyya

Abstract: Density operators, quantum generalizations of probability distributions, are gaining prominence in machine learning due to their foundational role in quantum computing. Generative modeling based on density operator models (\textbf{DOMs}) is an emerging field, but existing training algorithms -- such as those for the Quantum Boltzmann Machine -- do not scale to real-world data, such as the MNIST dataset. The Expectation-Maximization algorithm has played a fundamental role in enabling scalable training of probabilistic latent variable models on real-world datasets. \textit{In this paper, we develop an Expectation-Maximization framework to learn latent variable models defined through \textbf{DOMs} on classical hardware, with resources comparable to those used for probabilistic models, while scaling to real-world data.} However, designing such an algorithm is nontrivial due to the absence of a well-defined quantum analogue to conditional probability, which complicates the Expectation step. To overcome this, we reformulate the Expectation step as a quantum information projection (QIP) problem and show that the Petz Recovery Map provides a solution under sufficient conditions. Using this formulation, we introduce the Density Operator Expectation Maximization (DO-EM) algorithm -- an iterative Minorant-Maximization procedure that optimizes a quantum evidence lower bound. We show that the \textbf{DO-EM} algorithm ensures non-decreasing log-likelihood across iterations for a broad class of models. Finally, we present Quantum Interleaved Deep Boltzmann Machines (\textbf{QiDBMs}), a \textbf{DOM} that can be trained with the same resources as a DBM. When trained with \textbf{DO-EM} under Contrastive Divergence, a \textbf{QiDBM} outperforms larger classical DBMs in image generation on the MNIST dataset, achieving a 40--60\% reduction in the Fr\'echet Inception Distance.

Comment: The paper presents a novel Expectation-Maximization framework for density operator models, which is a foundational research in quantum generative modeling, aligning with emerging trends.

Relevance: 9 Novelty: 8

7. MSQ: Memory-Efficient Bit Sparsification Quantization

ArXiv ID: 2507.22349

Authors: Seokho Han, Seoyeon Yoon, Jinhee Kim, Dongwei Wang, Kang Eun Jeon, Huanrui Yang, Jong Hwan Ko

Abstract: As deep neural networks (DNNs) see increased deployment on mobile and edge devices, optimizing model efficiency has become crucial. Mixed-precision quantization is widely favored, as it offers a superior balance between efficiency and accuracy compared to uniform quantization. However, finding the optimal precision for each layer is challenging. Recent studies utilizing bit-level sparsity have shown promise, yet they often introduce substantial training complexity and high GPU memory requirements. In this paper, we propose Memory-Efficient Bit Sparsification Quantization (MSQ), a novel approach that addresses these limitations. MSQ applies a round-clamp quantizer to enable differentiable computation of the least significant bits (LSBs) from model weights. It further employs regularization to induce sparsity in these LSBs, enabling effective precision reduction without explicit bit-level parameter splitting. Additionally, MSQ incorporates Hessian information, allowing the simultaneous pruning of multiple LSBs to further enhance training efficiency. Experimental results show that MSQ achieves up to 8.00x reduction in trainable parameters and up to 86% reduction in training time compared to previous bit-level quantization, while maintaining competitive accuracy and compression rates. This makes it a practical solution for training efficient DNNs on resource-constrained devices.

Comment: The paper proposes a novel quantization method, MSQ, which addresses memory efficiency and training complexity, aligning with the model compression criterion.

Relevance: 9 Novelty: 8

8. MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention

ArXiv ID: 2507.22805

Authors: Yuqi Pang, Bowen Yang, Yun Cao, Fan Rong, Xiaoyu Li, Chen He

Abstract: Vision large language models (VLLMs) are focusing primarily on handling complex and fine-grained visual information by incorporating advanced vision encoders and scaling up visual models. However, these approaches face high training and inference costs, as well as challenges in extracting visual details, effectively bridging across modalities. In this work, we propose a novel visual framework, MoCHA, to address these issues. Our framework integrates four vision backbones (i.e., CLIP, SigLIP, DINOv2 and ConvNeXt) to extract complementary visual features and is equipped with a sparse Mixture of Experts Connectors (MoECs) module to dynamically select experts tailored to different visual dimensions. To mitigate redundant or insufficient use of the visual information encoded by the MoECs module, we further design a Hierarchical Group Attention (HGA) with intra- and inter-group operations and an adaptive gating strategy for encoded visual features. We train MoCHA on two mainstream LLMs (e.g., Phi2-2.7B and Vicuna-7B) and evaluate their performance across various benchmarks. Notably, MoCHA outperforms state-of-the-art open-weight models on various tasks. For example, compared to CuMo (Mistral-7B), our MoCHA (Phi2-2.7B) presents outstanding abilities to mitigate hallucination by showing improvements of 3.25% in POPE and to follow visual instructions by raising 153 points on MME. Finally, ablation studies further confirm the effectiveness and robustness of the proposed MoECs and HGA in improving the overall performance of MoCHA.

Comment: The paper presents MoCHA, which uses a sparse Mixture of Experts Connectors module, aligning with the model architecture criterion.

Relevance: 9 Novelty: 8

9. Representation biases: will we achieve complete understanding by analyzing representations?

ArXiv ID: 2507.22216

Authors: Andrew Kyle Lampinen, Stephanie C. Y. Chan, Yuxuan Li, Katherine Hermann

Abstract: A common approach in neuroscience is to study neural representations as a means to understand a system -- increasingly, by relating the neural representations to the internal representations learned by computational models. However, a recent work in machine learning (Lampinen, 2024) shows that learned feature representations may be biased to over-represent certain features, and represent others more weakly and less-consistently. For example, simple (linear) features may be more strongly and more consistently represented than complex (highly nonlinear) features. These biases could pose challenges for achieving full understanding of a system through representational analysis. In this perspective, we illustrate these challenges -- showing how feature representation biases can lead to strongly biased inferences from common analyses like PCA, regression, and RSA. We also present homomorphic encryption as a simple case study of the potential for strong dissociation between patterns of representation and computation. We discuss the implications of these results for representational comparisons between systems, and for neuroscience more generally.

Comment: The paper discusses representation biases in neural networks, which aligns with the topic of representation learning by providing insights into how deep networks encode information.

Relevance: 9 Novelty: 7

10. Synchronization of mean-field models on the circle

ArXiv ID: 2507.22857

Authors: Yury Polyanskiy, Philippe Rigollet, Andrew Yao

Abstract: This paper considers a mean-field model of $n$ interacting particles whose state space is the unit circle, a generalization of the classical Kuramoto model. Global synchronization is said to occur if after starting from almost any initial state, all particles coalesce to a common point on the circle. We propose a general synchronization criterion in terms of $L_1$-norm of the third derivative of the particle interaction function. As an application we resolve a conjecture for the so-called self-attention dynamics (stylized model of transformers), by showing synchronization for all $\beta \ge -0.16$, which significantly extends the previous bound of $0\le \beta \le 1$ from Criscitiello, Rebjock, McRae, and Boumal (2024). We also show that global synchronization does not occur when $\beta < -2/3$.

Comment: The paper provides theoretical insights into synchronization in mean-field models, with an application to self-attention dynamics in transformers, which aligns with the interest in model architecture analysis.