Personalized Daily ArXiv Papers 2025-07-24

[gpt-4o]	Prompt	Completion	Total
Token	25620	3048	28668
Cost	$0.06	$0.03	$0.09

Total arXiv papers: 434

Total scanned papers: 269

Total relevant papers: 15

Table of contents with paper titles:

Principled Multimodal Representation Learning Authors: Xiaohao Liu, Xiaobo Xia, See-Kiong Ng, Tat-Seng Chua
Navigation through Non-Compact Symmetric Spaces: a mathematical perspective on Cartan Neural Networks Authors: Pietro Giuseppe Fr\'e, Federico Milanesio, Guido Sanguinetti, Matteo Santoro
SiLQ: Simple Large Language Model Quantization-Aware Training Authors: Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani, Rathinakumar Appuswamy, Dharmendra S. Modha
Dataset Distillation as Data Compression: A Rate-Utility Perspective Authors: Youneng Bao, Yiping Liu, Zhuo Chen, Yongsheng Liang, Mu Li, Kede Ma
CompLeak: Deep Learning Model Compression Exacerbates Privacy Leakage Authors: Na Li, Yansong Gao, Hongsheng Hu, Boyu Kuang, Anmin Fu
Should Bias Always be Eliminated? A Principled Framework to Use Data Bias for OOD Generation Authors: Yan Li, Guangyi Chen, Yunlong Deng, Zijian Li, Zeyu Tang, Anpeng Wu, Kun Zhang
On the Interaction of Compressibility and Adversarial Robustness Authors: Melih Barsbey, Ant\^onio H. Ribeiro, Umut \c{S}im\c{s}ekli, Tolga Birdal
Demonstration of Efficient Predictive Surrogates for Large-scale Quantum Processors Authors: Wei-You Liao, Yuxuan Du, Xinbiao Wang, Tian-Ci Tian, Yong Luo, Bo Du, Dacheng Tao, He-Liang Huang
Mammo-Mamba: A Hybrid State-Space and Transformer Architecture with Sequential Mixture of Experts for Multi-View Mammography Authors: Farnoush Bayatmakou, Reza Taleei, Nicole Simone, Arash Mohammadi
From Black Box to Biomarker: Sparse Autoencoders for Interpreting Speech Models of Parkinson's Disease Authors: Peter Plantinga, Jen-Kai Chen, Roozbeh Sattari, Mirco Ravanelli, Denise Klein
Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility Authors: Melih Barsbey, Lucas Prieto, Stefanos Zafeiriou, Tolga Birdal
DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD Authors: Xianbiao Qi, Marco Chen, Wenjie Xiao, Jiaquan Ye, Yelin He, Chun-Guang Li, Zhouchen Lin
Improving LLMs' Generalized Reasoning Abilities by Graph Problems Authors: Qifan Zhang, Nuo Chen, Zehua Li, Miao Peng, Jing Tang, Jia Li
HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter Merging Authors: Taha Ceritli, Ondrej Bohdal, Mete Ozay, Jijoong Moon, Kyeng-Hun Lee, Hyeonmok Ko, Umberto Michieli
C3RL: Rethinking the Combination of Channel-independence and Channel-mixing from Representation Learning Authors: Shusen Ma, Yun-Bo Zhao, Yu Kang

1. Principled Multimodal Representation Learning

ArXiv ID: 2507.17343

Authors: Xiaohao Liu, Xiaobo Xia, See-Kiong Ng, Tat-Seng Chua

Abstract: Multimodal representation learning seeks to create a unified representation space by integrating diverse data modalities to improve multimodal understanding. Traditional methods often depend on pairwise contrastive learning, which relies on a predefined anchor modality, restricting alignment across all modalities. Recent advances have investigated the simultaneous alignment of multiple modalities, yet several challenges remain, such as limitations imposed by fixed anchor points and instability arising from optimizing the product of singular values. To address the challenges, in this paper, we propose Principled Multimodal Representation Learning (PMRL), a novel framework that achieves simultaneous alignment of multiple modalities without anchor dependency in a more stable manner. Specifically, grounded in the theoretical insight that full alignment corresponds to a rank-1 Gram matrix, PMRL optimizes the dominant singular value of the representation matrix to align modalities along a shared leading direction. We propose a softmax-based loss function that treats singular values as logits to prioritize the largest singular value. Besides, instance-wise contrastive regularization on the leading eigenvectors maintains inter-instance separability and prevents representation collapse. Extensive experiments across diverse tasks demonstrate PMRL's superiority compared to baseline methods. The source code will be publicly available.

Comment: The paper proposes a novel framework for multimodal representation learning, addressing challenges in simultaneous alignment of multiple modalities, which aligns with representation learning.

Relevance: 9 Novelty: 8

ArXiv ID: 2507.16871

Authors: Pietro Giuseppe Fr\'e, Federico Milanesio, Guido Sanguinetti, Matteo Santoro

Abstract: Recent work has identified non-compact symmetric spaces U/H as a promising class of homogeneous manifolds to develop a geometrically consistent theory of neural networks. An initial implementation of these concepts has been presented in a twin paper under the moniker of Cartan Neural Networks, showing both the feasibility and the performance of these geometric concepts in a machine learning context. The current paper expands on the mathematical structures underpinning Cartan Neural Networks, detailing the geometric properties of the layers and how the maps between layers interact with such structures to make Cartan Neural Networks covariant and geometrically interpretable. Together, these twin papers constitute a first step towards a fully geometrically interpretable theory of neural networks exploiting group-theoretic structures

Comment: The paper expands on the mathematical structures of Cartan Neural Networks, providing insights into their geometric properties, which aligns with emerging trends in theoretical work.

Relevance: 9 Novelty: 8

3. SiLQ: Simple Large Language Model Quantization-Aware Training

ArXiv ID: 2507.16933

Authors: Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani, Rathinakumar Appuswamy, Dharmendra S. Modha

Abstract: Large language models can be quantized to reduce inference time latency, model size, and energy consumption, thereby delivering a better user experience at lower cost. A challenge exists to deliver quantized models with minimal loss of accuracy in reasonable time, and in particular to do so without requiring mechanisms incompatible with specialized inference accelerators. Here, we demonstrate a simple, end-to-end quantization-aware training approach that, with an increase in total model training budget of less than 0.1%, outperforms the leading published quantization methods by large margins on several modern benchmarks, with both base and instruct model variants. The approach easily generalizes across different model architectures, can be applied to activations, cache, and weights, and requires the introduction of no additional operations to the model other than the quantization itself.

Comment: The paper presents a quantization-aware training approach for large language models, which is relevant to model compression and efficiency.

Relevance: 9 Novelty: 8

4. Dataset Distillation as Data Compression: A Rate-Utility Perspective

ArXiv ID: 2507.17221

Authors: Youneng Bao, Yiping Liu, Zhuo Chen, Yongsheng Liang, Mu Li, Kede Ma

Abstract: Driven by the ``scale-is-everything'' paradigm, modern machine learning increasingly demands ever-larger datasets and models, yielding prohibitive computational and storage requirements. Dataset distillation mitigates this by compressing an original dataset into a small set of synthetic samples, while preserving its full utility. Yet, existing methods either maximize performance under fixed storage budgets or pursue suitable synthetic data representations for redundancy removal, without jointly optimizing both objectives. In this work, we propose a joint rate-utility optimization method for dataset distillation. We parameterize synthetic samples as optimizable latent codes decoded by extremely lightweight networks. We estimate the Shannon entropy of quantized latents as the rate measure and plug any existing distillation loss as the utility measure, trading them off via a Lagrange multiplier. To enable fair, cross-method comparisons, we introduce bits per class (bpc), a precise storage metric that accounts for sample, label, and decoder parameter costs. On CIFAR-10, CIFAR-100, and ImageNet-128, our method achieves up to $170\times$ greater compression than standard distillation at comparable accuracy. Across diverse bpc budgets, distillation losses, and backbone architectures, our approach consistently establishes better rate-utility trade-offs.

Comment: The paper proposes a joint rate-utility optimization method for dataset distillation, relevant to model compression and efficiency.

Relevance: 9 Novelty: 8

5. CompLeak: Deep Learning Model Compression Exacerbates Privacy Leakage

ArXiv ID: 2507.16872

Authors: Na Li, Yansong Gao, Hongsheng Hu, Boyu Kuang, Anmin Fu

Abstract: Model compression is crucial for minimizing memory storage and accelerating inference in deep learning (DL) models, including recent foundation models like large language models (LLMs). Users can access different compressed model versions according to their resources and budget. However, while existing compression operations primarily focus on optimizing the trade-off between resource efficiency and model performance, the privacy risks introduced by compression remain overlooked and insufficiently understood. In this work, through the lens of membership inference attack (MIA), we propose CompLeak, the first privacy risk evaluation framework examining three widely used compression configurations that are pruning, quantization, and weight clustering supported by the commercial model compression framework of Google's TensorFlow-Lite (TF-Lite) and Facebook's PyTorch Mobile. CompLeak has three variants, given available access to the number of compressed models and original model. CompLeakNR starts by adopting existing MIA methods to attack a single compressed model, and identifies that different compressed models influence members and non-members differently. When the original model and one compressed model are available, CompLeakSR leverages the compressed model as a reference to the original model and uncovers more privacy by combining meta information (e.g., confidence vector) from both models. When multiple compressed models are available with/without accessing the original model, CompLeakMR innovatively exploits privacy leakage info from multiple compressed versions to substantially signify the overall privacy leakage. We conduct extensive experiments on seven diverse model architectures (from ResNet to foundation models of BERT and GPT-2), and six image and textual benchmark datasets.

Comment: The paper discusses model compression techniques like pruning and quantization, which are relevant to model compression and efficiency breakthroughs.

Relevance: 9 Novelty: 7

6. Should Bias Always be Eliminated? A Principled Framework to Use Data Bias for OOD Generation

ArXiv ID: 2507.17001

Authors: Yan Li, Guangyi Chen, Yunlong Deng, Zijian Li, Zeyu Tang, Anpeng Wu, Kun Zhang

Abstract: Most existing methods for adapting models to out-of-distribution (OOD) domains rely on invariant representation learning to eliminate the influence of biased features. However, should bias always be eliminated -- and if not, when should it be retained, and how can it be leveraged? To address these questions, we first present a theoretical analysis that explores the conditions under which biased features can be identified and effectively utilized. Building on this theoretical foundation, we introduce a novel framework that strategically leverages bias to complement invariant representations during inference. The framework comprises two key components that leverage bias in both direct and indirect ways: (1) using invariance as guidance to extract predictive ingredients from bias, and (2) exploiting identified bias to estimate the environmental condition and then use it to explore appropriate bias-aware predictors to alleviate environment gaps. We validate our approach through experiments on both synthetic datasets and standard domain generalization benchmarks. Results consistently demonstrate that our method outperforms existing approaches, underscoring its robustness and adaptability.

Comment: The paper presents a framework that leverages data bias for out-of-distribution generation, providing theoretical insights, which aligns with emerging trends in challenging established assumptions.

Relevance: 8 Novelty: 8

7. On the Interaction of Compressibility and Adversarial Robustness

ArXiv ID: 2507.17725

Authors: Melih Barsbey, Ant\^onio H. Ribeiro, Umut \c{S}im\c{s}ekli, Tolga Birdal

Abstract: Modern neural networks are expected to simultaneously satisfy a host of desirable properties: accurate fitting to training data, generalization to unseen inputs, parameter and computational efficiency, and robustness to adversarial perturbations. While compressibility and robustness have each been studied extensively, a unified understanding of their interaction still remains elusive. In this work, we develop a principled framework to analyze how different forms of compressibility - such as neuron-level sparsity and spectral compressibility - affect adversarial robustness. We show that these forms of compression can induce a small number of highly sensitive directions in the representation space, which adversaries can exploit to construct effective perturbations. Our analysis yields a simple yet instructive robustness bound, revealing how neuron and spectral compressibility impact $L_\infty$ and $L_2$ robustness via their effects on the learned representations. Crucially, the vulnerabilities we identify arise irrespective of how compression is achieved - whether via regularization, architectural bias, or implicit learning dynamics. Through empirical evaluations across synthetic and realistic tasks, we confirm our theoretical predictions, and further demonstrate that these vulnerabilities persist under adversarial training and transfer learning, and contribute to the emergence of universal adversarial perturbations. Our findings show a fundamental tension between structured compressibility and robustness, and suggest new pathways for designing models that are both efficient and secure.

Comment: The paper explores the interaction between compressibility and adversarial robustness, providing insights into representation learning and model compression.

Relevance: 8 Novelty: 8

8. Demonstration of Efficient Predictive Surrogates for Large-scale Quantum Processors

ArXiv ID: 2507.17470

Authors: Wei-You Liao, Yuxuan Du, Xinbiao Wang, Tian-Ci Tian, Yong Luo, Bo Du, Dacheng Tao, He-Liang Huang

Abstract: The ongoing development of quantum processors is driving breakthroughs in scientific discovery. Despite this progress, the formidable cost of fabricating large-scale quantum processors means they will remain rare for the foreseeable future, limiting their widespread application. To address this bottleneck, we introduce the concept of predictive surrogates, which are classical learning models designed to emulate the mean-value behavior of a given quantum processor with provably computational efficiency. In particular, we propose two predictive surrogates that can substantially reduce the need for quantum processor access in diverse practical scenarios. To demonstrate their potential in advancing digital quantum simulation, we use these surrogates to emulate a quantum processor with up to 20 programmable superconducting qubits, enabling efficient pre-training of variational quantum eigensolvers for families of transverse-field Ising models and identification of non-equilibrium Floquet symmetry-protected topological phases. Experimental results reveal that the predictive surrogates not only reduce measurement overhead by orders of magnitude, but can also surpass the performance of conventional, quantum-resource-intensive approaches. Collectively, these findings establish predictive surrogates as a practical pathway to broadening the impact of advanced quantum processors.

Comment: The paper introduces predictive surrogates for quantum processors, which is relevant to AI for Science with a focus on foundational research in quantum modeling.

Relevance: 8 Novelty: 8

9. Mammo-Mamba: A Hybrid State-Space and Transformer Architecture with Sequential Mixture of Experts for Multi-View Mammography

ArXiv ID: 2507.17662

Authors: Farnoush Bayatmakou, Reza Taleei, Nicole Simone, Arash Mohammadi

Abstract: Breast cancer (BC) remains one of the leading causes of cancer-related mortality among women, despite recent advances in Computer-Aided Diagnosis (CAD) systems. Accurate and efficient interpretation of multi-view mammograms is essential for early detection, driving a surge of interest in Artificial Intelligence (AI)-powered CAD models. While state-of-the-art multi-view mammogram classification models are largely based on Transformer architectures, their computational complexity scales quadratically with the number of image patches, highlighting the need for more efficient alternatives. To address this challenge, we propose Mammo-Mamba, a novel framework that integrates Selective State-Space Models (SSMs), transformer-based attention, and expert-driven feature refinement into a unified architecture. Mammo-Mamba extends the MambaVision backbone by introducing the Sequential Mixture of Experts (SeqMoE) mechanism through its customized SecMamba block. The SecMamba is a modified MambaVision block that enhances representation learning in high-resolution mammographic images by enabling content-adaptive feature refinement. These blocks are integrated into the deeper stages of MambaVision, allowing the model to progressively adjust feature emphasis through dynamic expert gating, effectively mitigating the limitations of traditional Transformer models. Evaluated on the CBIS-DDSM benchmark dataset, Mammo-Mamba achieves superior classification performance across all key metrics while maintaining computational efficiency.

Comment: The paper introduces a novel architecture combining state-space models, transformers, and a sequential mixture of experts, which aligns with model architecture innovations.