Previous Day 2025-05-07
Monthly Overview 2025-05
Next Day 2025-05-09

Personalized Daily ArXiv Papers 2025-05-08

[gpt-4o] Prompt Completion Total
Token 28086 3939 32025
Cost $0.07 $0.04 $0.11

Total arXiv papers: 417

Total scanned papers: 292

Total relevant papers: 12

Table of contents with paper titles:

  1. Large Language Model Compression with Global Rank and Sparsity Optimization Authors: Changhai Zhou, Qian Qiao, Weizhong Zhang, Cheng Jin

  2. Quiet Feature Learning in Algorithmic Tasks Authors: Prudhviraj Naidu, Zixian Wang, Leon Bergen, Ramamohan Paturi

  3. Position: Foundation Models Need Digital Twin Representations Authors: Yiqing Shen, Hao Ding, Lalithkumar Seenivasan, Tianmin Shu, Mathias Unberath

  4. Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth Authors: Changhai Zhou, Yuhua Zhou, Qian Qiao, Weizhong Zhang, Cheng Jin

  5. LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection Authors: Xinyue Zeng, Haohui Wang, Junhong Lin, Jun Wu, Tyler Cody, Dawei Zhou

  6. ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via $\alpha$-$\beta$-Divergence Authors: Guanghui Wang, Zhiyong Yang, Zitai Wang, Shi Wang, Qianqian Xu, Qingming Huang

  7. APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design Authors: Yonghao Tan, Pingcheng Dong, Yongkun Wu, Yu Liu, Xuejiao Liu, Peng Luo, Shih-Yang Liu, Xijie Huang, Dong Zhang, Luhong Liang, Kwang-Ting Cheng

  8. AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design Authors: Yanbiao Liang, Huihong Shi, Haikuo Shao, Zhongfeng Wang

  9. Grouped Sequency-arranged Rotation: Optimizing Rotation Transformation for Quantization for Free Authors: Euntae Choi, Sumin Song, Woosang Lim, Sungjoo Yoo

  10. Is the end of Insight in Sight ? Authors: Jean-Michel Tucny, Mihir Durve, Sauro Succi

  11. Sparsity is All You Need: Rethinking Biological Pathway-Informed Approaches in Deep Learning Authors: Isabella Caranzano, Corrado Pancotti, Cesare Rollo, Flavio Sartori, Pietro Li`o, Piero Fariselli, Tiziana Sanavia

  12. Information Filtering Networks: Theoretical Foundations, Generative Methodologies, and Real-World Applications Authors: Tomaso Aste


1. Large Language Model Compression with Global Rank and Sparsity Optimization

ArXiv ID: 2505.03801

Authors: Changhai Zhou, Qian Qiao, Weizhong Zhang, Cheng Jin

Abstract: Low-rank and sparse composite approximation is a natural idea to compress Large Language Models (LLMs). However, such an idea faces two primary challenges that adversely affect the performance of existing methods. The first challenge relates to the interaction and cooperation between low-rank and sparse matrices, while the second involves determining weight allocation across different layers, as redundancy varies considerably among them. To address these challenges, we propose a novel two-stage LLM compression method with the capability of global rank and sparsity optimization. It is noteworthy that the overall optimization space is vast, making comprehensive optimization computationally prohibitive. Therefore, to reduce the optimization space, our first stage utilizes robust principal component analysis to decompose the weight matrices of LLMs into low-rank and sparse components, which span the low dimensional and sparse spaces containing the resultant low-rank and sparse matrices, respectively. In the second stage, we propose a probabilistic global optimization technique to jointly identify the low-rank and sparse structures within the above two spaces. The appealing feature of our approach is its ability to automatically detect the redundancy across different layers and to manage the interaction between the sparse and low-rank components. Extensive experimental results indicate that our method significantly surpasses state-of-the-art techniques for sparsification and composite approximation.

Comment: Proposes a two-stage LLM compression method combining low-rank and sparse approximations with global optimization. This directly addresses foundational challenges in model compression and sparsity.

Relevance: 10 Novelty: 8


2. Quiet Feature Learning in Algorithmic Tasks

ArXiv ID: 2505.03997

Authors: Prudhviraj Naidu, Zixian Wang, Leon Bergen, Ramamohan Paturi

Abstract: We train Transformer-based language models on ten foundational algorithmic tasks and observe pronounced phase transitions in their loss curves that deviate from established power-law scaling trends. Over large ranges of compute, the validation loss barely improves, then abruptly decreases. Probing the models' internal representations reveals the learning of quiet features during the stagnant phase, followed by sudden acquisition of loud features that coincide with the sharp drop in loss. Our ablation experiments show that disrupting a single learned feature can dramatically degrade performance, providing evidence of their causal role in task performance. These findings challenge the prevailing assumption that next-token predictive loss reliably tracks incremental progress; instead, key internal features may be developing below the surface until they coalesce, triggering a rapid performance gain.

Comment: The paper provides insights into representation learning by analyzing how features are encoded and emerge in Transformer-based models during training. This aligns closely with the 'Representation Learning' criterion, particularly in understanding training dynamics and feature learning.

Relevance: 10 Novelty: 8


3. Position: Foundation Models Need Digital Twin Representations

ArXiv ID: 2505.03798

Authors: Yiqing Shen, Hao Ding, Lalithkumar Seenivasan, Tianmin Shu, Mathias Unberath

Abstract: Current foundation models (FMs) rely on token representations that directly fragment continuous real-world multimodal data into discrete tokens. They limit FMs to learning real-world knowledge and relationships purely through statistical correlation rather than leveraging explicit domain knowledge. Consequently, current FMs struggle with maintaining semantic coherence across modalities, capturing fine-grained spatial-temporal dynamics, and performing causal reasoning. These limitations cannot be overcome by simply scaling up model size or expanding datasets. This position paper argues that the machine learning community should consider digital twin (DT) representations, which are outcome-driven digital representations that serve as building blocks for creating virtual replicas of physical processes, as an alternative to the token representation for building FMs. Finally, we discuss how DT representations can address these challenges by providing physically grounded representations that explicitly encode domain knowledge and preserve the continuous nature of real-world processes.

Comment: The position paper argues for digital twin representations as an alternative to token-based representations in foundation models. This aligns with emerging trends and challenges established assumptions in representation learning.

Relevance: 9 Novelty: 8


4. Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth

ArXiv ID: 2505.03802

Authors: Changhai Zhou, Yuhua Zhou, Qian Qiao, Weizhong Zhang, Cheng Jin

Abstract: QLoRA effectively combines low-bit quantization and LoRA to achieve memory-friendly fine-tuning for large language models (LLM). Recently, methods based on SVD for continuous update iterations to initialize LoRA matrices to accommodate quantization errors have generally failed to consistently improve performance. Dynamic mixed precision is a natural idea for continuously improving the fine-tuning performance of quantized models, but previous methods often optimize low-rank subspaces or quantization components separately, without considering their synergy. To address this, we propose \textbf{QR-Adaptor}, a unified, gradient-free strategy that uses partial calibration data to jointly search the quantization components and the rank of low-rank spaces for each layer, thereby continuously improving model performance. QR-Adaptor does not minimize quantization error but treats precision and rank allocation as a discrete optimization problem guided by actual downstream performance and memory usage. Compared to state-of-the-art (SOTA) quantized LoRA fine-tuning methods, our approach achieves a 4.89\% accuracy improvement on GSM8K, and in some cases even outperforms the 16-bit fine-tuned model while maintaining the memory footprint of the 4-bit setting.

Comment: The paper introduces QR-Adaptor, a method for fine-tuning quantized models by jointly optimizing quantization and low-rank components. This aligns well with the model compression criterion, particularly in low-rank approaches and quantization.

Relevance: 9 Novelty: 8


5. LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection

ArXiv ID: 2505.03793

Authors: Xinyue Zeng, Haohui Wang, Junhong Lin, Jun Wu, Tyler Cody, Dawei Zhou

Abstract: The proliferation of open-sourced Large Language Models (LLMs) and diverse downstream tasks necessitates efficient model selection, given the impracticality of fine-tuning all candidates due to computational constraints. Despite the recent advances in LLM selection, a fundamental research question largely remains nascent: how can we model the dynamic behaviors of LLMs during fine-tuning, thereby enhancing our understanding of their generalization performance across diverse downstream tasks? In this work, we propose a novel theoretical framework that provides a proper lens to assess the generalization capabilities of LLMs, thereby enabling accurate and efficient LLM selection for downstream applications. In particular, we first derive a Hessian-based PAC-Bayes generalization bound that unveils fine-tuning dynamics of LLMs and then introduce LENSLLM, a Neural Tangent Kernel(NTK)-based Rectified Scaling Model that enables accurate performance predictions across diverse tasks while maintaining computational efficiency. Extensive empirical results on 3 large-scale benchmarks demonstrate that our model achieves up to 91.1% accuracy and reduces up to 88.5% computational cost in LLM selection, outperforming 5 state-of-the-art methods. We open-source our proposed LENSLLM model and corresponding results at the Github link: https://github.com/Susan571/LENSLLM.git.

Comment: Proposes a theoretical framework for understanding fine-tuning dynamics in LLMs, leveraging PAC-Bayes bounds and NTK-based models. This aligns with foundational research in LLM behavior and generalization.

Relevance: 9 Novelty: 8


6. ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via $\alpha$-$\beta$-Divergence

ArXiv ID: 2505.04560

Authors: Guanghui Wang, Zhiyong Yang, Zitai Wang, Shi Wang, Qianqian Xu, Qingming Huang

Abstract: Knowledge Distillation (KD) transfers knowledge from a large teacher model to a smaller student model by minimizing the divergence between their output distributions, typically using forward Kullback-Leibler divergence (FKLD) or reverse KLD (RKLD). It has become an effective training paradigm due to the broader supervision information provided by the teacher distribution compared to one-hot labels. We identify that the core challenge in KD lies in balancing two mode-concentration effects: the \textbf{\textit{Hardness-Concentration}} effect, which refers to focusing on modes with large errors, and the \textbf{\textit{Confidence-Concentration}} effect, which refers to focusing on modes with high student confidence. Through an analysis of how probabilities are reassigned during gradient updates, we observe that these two effects are entangled in FKLD and RKLD, but in extreme forms. Specifically, both are too weak in FKLD, causing the student to fail to concentrate on the target class. In contrast, both are too strong in RKLD, causing the student to overly emphasize the target class while ignoring the broader distributional information from the teacher. To address this imbalance, we propose ABKD, a generic framework with $\alpha$-$\beta$-divergence. Our theoretical results show that ABKD offers a smooth interpolation between FKLD and RKLD, achieving an effective trade-off between these effects. Extensive experiments on 17 language/vision datasets with 12 teacher-student settings confirm its efficacy. The code is available at https://github.com/ghwang-s/abkd.

Comment: The paper proposes a novel framework for knowledge distillation using alpha-beta divergence, addressing challenges in balancing concentration effects. This aligns with the 'Model Compression' criterion, particularly in advancing theoretical understanding of distillation.

Relevance: 9 Novelty: 8


7. APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design

ArXiv ID: 2505.03748

Authors: Yonghao Tan, Pingcheng Dong, Yongkun Wu, Yu Liu, Xuejiao Liu, Peng Luo, Shih-Yang Liu, Xijie Huang, Dong Zhang, Luhong Liang, Kwang-Ting Cheng

Abstract: DNN accelerators, significantly advanced by model compression and specialized dataflow techniques, have marked considerable progress. However, the frequent access of high-precision partial sums (PSUMs) leads to excessive memory demands in architectures utilizing input/weight stationary dataflows. Traditional compression strategies have typically overlooked PSUM quantization, which may account for 69% of power consumption. This study introduces a novel Additive Partial Sum Quantization (APSQ) method, seamlessly integrating PSUM accumulation into the quantization framework. A grouping strategy that combines APSQ with PSUM quantization enhanced by a reconfigurable architecture is further proposed. The APSQ performs nearly lossless on NLP and CV tasks across BERT, Segformer, and EfficientViT models while compressing PSUMs to INT8. This leads to a notable reduction in energy costs by 28-87%. Extended experiments on LLaMA2-7B demonstrate the potential of APSQ for large language models. Code is available at https://github.com/Yonghao-Tan/APSQ.

Comment: The paper introduces a novel quantization method (APSQ) for partial sums in DNN accelerators, which aligns with the model compression criterion, particularly in energy efficiency and quantization innovations.

Relevance: 9 Novelty: 8


8. AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design

ArXiv ID: 2505.03745

Authors: Yanbiao Liang, Huihong Shi, Haikuo Shao, Zhongfeng Wang

Abstract: Recently, large language models (LLMs) have achieved huge success in the natural language processing (NLP) field, driving a growing demand to extend their deployment from the cloud to edge devices. However, deploying LLMs on resource-constrained edge devices poses significant challenges, including (1) intensive computations and huge model sizes, (2) great memory and bandwidth demands introduced by the autoregressive generation process, and (3) limited scalability for handling long sequences. To address these challenges, we propose AccLLM, a comprehensive acceleration framework that enables efficient and fast long-context LLM inference through algorithm and hardware co-design. At the algorithmic level, we integrate (1) pruning, (2) {\Lambda}-shaped attention, and (3) an innovative W2A8KV4 (2-bit weights, 8-bit activations, and 4-bit KV cache) quantization scheme, thus effectively reducing memory and bandwidth requirements while facilitating LLMs' long-sequence generation. At the hardware level, we design a dedicated FPGA-based accelerator with a reconfigurable computing engine to effectively and flexibly accommodate diverse operations arising from our compression algorithm, thereby fully translating the algorithmic innovations into tangible hardware efficiency. We validate AccLLM on the Xilinx Alveo U280 FPGA, demonstrating a 4.07x energy efficiency and a 2.98x throughput compared to the state-of-the-art work FlightLLM.

Comment: The paper proposes AccLLM, a framework for accelerating LLM inference with algorithm-hardware co-design, including pruning and quantization innovations, which aligns with model compression and efficiency breakthroughs.

Relevance: 9 Novelty: 8


9. Grouped Sequency-arranged Rotation: Optimizing Rotation Transformation for Quantization for Free

ArXiv ID: 2505.03810

Authors: Euntae Choi, Sumin Song, Woosang Lim, Sungjoo Yoo

Abstract: Large Language Models (LLMs) face deployment challenges due to high computational costs, and while Post-Training Quantization (PTQ) offers a solution, existing rotation-based methods struggle at very low bit-widths like 2-bit. We introduce a novel, training-free approach to construct an improved rotation matrix, addressing the limitations of current methods. The key contributions include leveraging the Walsh-Hadamard transform with sequency ordering, which clusters similar frequency components to reduce quantization error compared to standard Hadamard matrices, significantly improving performance. Furthermore, we propose a Grouped Sequency-arranged Rotation (GSR) using block-diagonal matrices with smaller Walsh blocks, effectively isolating outlier impacts and achieving performance comparable to optimization-based methods without requiring any training. Our method demonstrates robust performance on reasoning tasks and Perplexity (PPL) score on WikiText-2. Our method also enhances results even when applied over existing learned rotation techniques.

Comment: This paper proposes a novel quantization method leveraging the Walsh-Hadamard transform and introduces Grouped Sequency-arranged Rotation (GSR), which directly addresses model compression and efficiency. The method is innovative and relevant to foundational research in model compression.

Relevance: 9 Novelty: 8


10. Is the end of Insight in Sight ?

ArXiv ID: 2505.04627

Authors: Jean-Michel Tucny, Mihir Durve, Sauro Succi

Abstract: It is shown that the weight matrices of a Physics-informed neural network (PINN)-based deep learning application to a rarefied gas dynamics problem described by the Boltzmann equation bear no evident link to the mathematical structure of the physical problem. Instead, the weights appear close to Gaussian distributed random matrices. Although significantly more work is needed to support a robust assessment in this direction, these results suggest that deep-learning and the numerical solution of the Boltzmann equation represent two equivalent, but largely distinct paths to the same physical knowledge. If so, Explainable AI might be an unrealistic target and possibly even an ill-posed one.

Comment: The paper questions the explainability of AI by analyzing weight matrices in PINNs, suggesting a potential challenge to the goal of Explainable AI. This aligns with emerging trends challenging established assumptions.

Relevance: 8 Novelty: 7


11. Sparsity is All You Need: Rethinking Biological Pathway-Informed Approaches in Deep Learning

ArXiv ID: 2505.04300

Authors: Isabella Caranzano, Corrado Pancotti, Cesare Rollo, Flavio Sartori, Pietro Li`o, Piero Fariselli, Tiziana Sanavia

Abstract: Biologically-informed neural networks typically leverage pathway annotations to enhance performance in biomedical applications. We hypothesized that the benefits of pathway integration does not arise from its biological relevance, but rather from the sparsity it introduces. We conducted a comprehensive analysis of all relevant pathway-based neural network models for predictive tasks, critically evaluating each study's contributions. From this review, we curated a subset of methods for which the source code was publicly available. The comparison of the biologically informed state-of-the-art deep learning models and their randomized counterparts showed that models based on randomized information performed equally well as biologically informed ones across different metrics and datasets. Notably, in 3 out of the 15 analyzed models, the randomized versions even outperformed their biologically informed counterparts. Moreover, pathway-informed models did not show any clear advantage in interpretability, as randomized models were still able to identify relevant disease biomarkers despite lacking explicit pathway information. Our findings suggest that pathway annotations may be too noisy or inadequately explored by current methods. Therefore, we propose a methodology that can be applied to different domains and can serve as a robust benchmark for systematically comparing novel pathway-informed models against their randomized counterparts. This approach enables researchers to rigorously determine whether observed performance improvements can be attributed to biological insights.

Comment: The paper critiques biologically-informed neural networks and highlights the role of sparsity, which aligns with the 'Representation Learning' and 'Model Compression' criteria, particularly in understanding sparsity's role in neural networks.

Relevance: 8 Novelty: 7


12. Information Filtering Networks: Theoretical Foundations, Generative Methodologies, and Real-World Applications

ArXiv ID: 2505.03812

Authors: Tomaso Aste

Abstract: Information Filtering Networks (IFNs) provide a powerful framework for modeling complex systems through globally sparse yet locally dense and interpretable structures that capture multivariate dependencies. This review offers a comprehensive account of IFNs, covering their theoretical foundations, construction methodologies, and diverse applications. Tracing their origins from early network-based models to advanced formulations such as the Triangulated Maximally Filtered Graph (TMFG) and the Maximally Filtered Clique Forest (MFCF), the paper highlights how IFNs address key challenges in high-dimensional data-driven modeling. IFNs and their construction methodologies are intrinsically higher-order networks that generate simplicial complexes-structures that are only now becoming popular in the broader literature. Applications span fields including finance, biology, psychology, and artificial intelligence, where IFNs improve interpretability, computational efficiency, and predictive performance. Special attention is given to their role in graphical modeling, where IFNs enable the estimation of sparse inverse covariance matrices with greater accuracy and scalability than traditional approaches like Graphical LASSO. Finally, the review discusses recent developments that integrate IFNs with machine learning and deep learning, underscoring their potential not only to bridge classical network theory with contemporary data-driven paradigms, but also to shape the architectures of deep learning models themselves.

Comment: The paper provides a comprehensive review of Information Filtering Networks (IFNs), discussing their theoretical foundations and potential integration with deep learning architectures, which aligns with emerging trends and foundational research.

Relevance: 8 Novelty: 7


Paper Selection Prompt

You are a helpful paper reading assistant whose job is to read daily posts from ArXiv and identify a few papers that your friend will enjoy reading. Your job is to carefully read the paper titles and abstracts below and find the ones that match the criteria below.

Instructions

Write the response in JSONL format with {ARXIVID, COMMENT, RELEVANCE, NOVELTY} on each line, one for each paper.

Scoring Criteria

The "Relevance" score measures how closely the paper aligns with the core topics of the prompt. The "Novelty" score assesses the originality and impact of the paper. They are two ORTHONORMAL axes and SHOULD NOT be confused with each other.

Relevance Scoring

Novelty Scoring

Papers

[PAPER LIST HERE]

Relevant Topics

Use the following relevance criteria to focus on foundational research. Keep relevant papers and filter out irrelevant ones. Avoid purely application-driven work.

  1. Representation Learning - Relevant: Insights into how deep networks encode information, feature/dictionary learning, sparse/contrastive methods, training dynamics in neural networks. - Irrelevant: Standard applications of known techniques lacking new theoretical or methodological contributions.

  2. Model Architecture - Relevant: Mixture-of-Experts (MoE), Transformers, Conditional/Dynamic Networks, Autoencoders, analysis on existing architectures (like encoder-decoder), or other architectural innovations. - Irrelevant: Merely using existing architectures for a certain task without insights into the structure themselves.

  3. Model Compression - Relevant: Sparsity, pruning, quantization, low-rank approaches, KV cache, or other algorithmic/theoretical efficiency breakthroughs. - Irrelevant: Straightforward applications of existing compression methods to new tasks.

  4. Large Language Models (LLMs) - Relevant: Major breakthroughs in pretraining or architecture, theoretical insights into LLM behavior/interpretability. - Irrelevant: Domain-specific usage (e.g., translation, jail-breaking), finetuning or inference tricks (e.g., instruction tuning, chain-of-thoughts, data mixing), or empirical dataset/benchmark studies and text-level analysis (e.g. hallucination, reasoning, safety).

  5. AI for Science - Relevant: Foundational research in molecular/protein modeling, new generative paradigms, or significant architecture-level innovations. - Irrelevant: Conventional, domain-specific applications without new theoretical perspectives.

  6. Emerging Trends - Relevant: Cutting-edge theoretical work challenging established assumptions or introducing broad new paradigms. - Irrelevant: Incremental improvements or trend-following without novel insights.

Keywords: