Previous Day 2025-04-01
Monthly Overview 2025-04
Next Day 2025-04-03

Personalized Daily Arxiv Papers 4/02/2025

[gpt-4o] Prompt Completion Total
Token 29076 3861 32937
Cost $0.07 $0.04 $0.11

Total arXiv papers: 468

Total scanned papers: 277

Total relevant papers: 15

Table of contents with paper titles:

  1. DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism Authors: Dengchun Li, Naizheng Wang, Zihao Zhang, Haoyang Yin, Lei Duan, Meng Xiao, Mingjie Tang

  2. Minimum Description Length of a Spectrum Variational Autoencoder: A Theory Authors: Canlin Zhang, Xiuwen Liu

  3. Deep Generative Models: Complexity, Dimensionality, and Approximation Authors: Kevin Wang, Hongqian Niu, Yixin Wang, Didong Li

  4. MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Authors: Siyuan Li, Luyuan Zhang, Zedong Wang, Juanxi Tian, Cheng Tan, Zicheng Liu, Chang Yu, Qingsong Xie, Haonan Lu, Haoqian Wang, Zhen Lei

  5. Spectral Architecture Search for Neural Networks Authors: Gianluca Peri, Lorenzo Giambagli, Lorenzo Chicchi, Duccio Fanelli

  6. ObscuraCoder: Powering Efficient Code LM Pre-Training Via Obfuscation Grounding Authors: Indraneil Paul, Haoyi Yang, Goran Glava\v{s}, Kristian Kersting, Iryna Gurevych

  7. Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations Authors: Chongjie Si, Zhiyi Shi, Xuehui Wang, Yichen Xiao, Xiaokang Yang, Wei Shen

  8. Logical perspectives on learning statistical objects Authors: Aaron Anderson, Michael Benedikt

  9. ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning Authors: Huandong Chang, Zicheng Ma, Mingyuan Ma, Zhenting Qi, Andrew Sabot, Hong Jiang, H. T. Kung

  10. When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning Authors: Nishad Singhi, Hritik Bansal, Arian Hosseini, Aditya Grover, Kai-Wei Chang, Marcus Rohrbach, Anna Rohrbach

  11. Less is More: Efficient Black-box Attribution via Minimal Interpretable Subset Selection Authors: Ruoyu Chen, Siyuan Liang, Jingzhi Li, Shiming Liu, Li Liu, Hua Zhang, Xiaochun Cao

  12. NeuraLUT-Assemble: Hardware-aware Assembling of Sub-Neural Networks for Efficient LUT Inference Authors: Marta Andronic, George A. Constantinides

  13. Self-Evolving Visual Concept Library using Vision-Language Critics Authors: Atharva Sehgal, Patrick Yuan, Ziniu Hu, Yisong Yue, Jennifer J. Sun, Swarat Chaudhuri

  14. Geometric Median Matching for Robust k-Subset Selection from Noisy Data Authors: Anish Acharya, Sujay Sanghavi, Alexandros G Dimakis, Inderjit S Dhillon

  15. Hawkeye:Efficient Reasoning with Model Collaboration Authors: Jianshu She, Zhuohao Li, Zhemin Huang, Qi Li, Peiran Xu, Haonan Li, Qirong Ho


1. DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism

ArXiv ID: 2504.00661

Authors: Dengchun Li, Naizheng Wang, Zihao Zhang, Haoyang Yin, Lei Duan, Meng Xiao, Mingjie Tang

Abstract: Instruction-based fine-tuning of large language models (LLMs) has achieved remarkable success in various natural language processing (NLP) tasks. Parameter-efficient fine-tuning (PEFT) methods, such as Mixture of LoRA Experts (MoLE), combine the efficiency of Low-Rank Adaptation (LoRA) with the versatility of Mixture of Experts (MoE) models, demonstrating significant potential for handling multiple downstream tasks. However, the existing routing mechanisms for MoLE often involve a trade-off between computational efficiency and predictive accuracy, and they fail to fully address the diverse expert selection demands across different transformer layers. In this work, we propose DynMoLE, a hybrid routing strategy that dynamically adjusts expert selection based on the Tsallis entropy of the router's probability distribution. This approach mitigates router uncertainty, enhances stability, and promotes more equitable expert participation, leading to faster convergence and improved model performance. Additionally, we introduce an auxiliary loss based on Tsallis entropy to further guide the model toward convergence with reduced uncertainty, thereby improving training stability and performance. Our extensive experiments on commonsense reasoning benchmarks demonstrate that DynMoLE achieves substantial performance improvements, outperforming LoRA by 9.6% and surpassing the state-of-the-art MoLE method, MoLA, by 2.3%. We also conduct a comprehensive ablation study to evaluate the contributions of DynMoLE's key components.

Comment: DynMoLE proposes a hybrid routing mechanism for MoE models, which directly aligns with foundational research in Mixture-of-Experts and efficiency improvements.

Relevance: 10 Novelty: 8


2. Minimum Description Length of a Spectrum Variational Autoencoder: A Theory

ArXiv ID: 2504.00395

Authors: Canlin Zhang, Xiuwen Liu

Abstract: Deep neural networks (DNNs) trained through end-to-end learning have achieved remarkable success across diverse machine learning tasks, yet they are not explicitly designed to adhere to the Minimum Description Length (MDL) principle, which posits that the best model provides the shortest description of the data. In this paper, we argue that MDL is essential to deep learning and propose a further generalized principle: Understanding is the use of a small amount of information to represent a large amount of information. To this end, we introduce a novel theoretical framework for designing and evaluating deep Variational Autoencoders (VAEs) based on MDL. In our theory, we designed the Spectrum VAE, a specific VAE architecture whose MDL can be rigorously evaluated under given conditions. Additionally, we introduce the concept of latent dimension combination, or pattern of spectrum, and provide the first theoretical analysis of their role in achieving MDL. We claim that a Spectrum VAE understands the data distribution in the most appropriate way when the MDL is achieved. This work is entirely theoretical and lays the foundation for future research on designing deep learning systems that explicitly adhere to information-theoretic principles.

Comment: The paper introduces a theoretical framework for Variational Autoencoders (VAEs) based on the Minimum Description Length (MDL) principle, which is highly relevant to representation learning and foundational research.

Relevance: 9 Novelty: 9


3. Deep Generative Models: Complexity, Dimensionality, and Approximation

ArXiv ID: 2504.00820

Authors: Kevin Wang, Hongqian Niu, Yixin Wang, Didong Li

Abstract: Generative networks have shown remarkable success in learning complex data distributions, particularly in generating high-dimensional data from lower-dimensional inputs. While this capability is well-documented empirically, its theoretical underpinning remains unclear. One common theoretical explanation appeals to the widely accepted manifold hypothesis, which suggests that many real-world datasets, such as images and signals, often possess intrinsic low-dimensional geometric structures. Under this manifold hypothesis, it is widely believed that to approximate a distribution on a $d$-dimensional Riemannian manifold, the latent dimension needs to be at least $d$ or $d+1$. In this work, we show that this requirement on the latent dimension is not necessary by demonstrating that generative networks can approximate distributions on $d$-dimensional Riemannian manifolds from inputs of any arbitrary dimension, even lower than $d$, taking inspiration from the concept of space-filling curves. This approach, in turn, leads to a super-exponential complexity bound of the deep neural networks through expanded neurons. Our findings thus challenge the conventional belief on the relationship between input dimensionality and the ability of generative networks to model data distributions. This novel insight not only corroborates the practical effectiveness of generative networks in handling complex data structures, but also underscores a critical trade-off between approximation error, dimensionality, and model complexity.

Comment: The paper provides a theoretical insight into generative networks and challenges the conventional belief about the relationship between input dimensionality and data distribution modeling. This aligns with foundational research in representation learning.

Relevance: 9 Novelty: 8


4. MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization

ArXiv ID: 2504.00999

Authors: Siyuan Li, Luyuan Zhang, Zedong Wang, Juanxi Tian, Cheng Tan, Zicheng Liu, Chang Yu, Qingsong Xie, Haonan Lu, Haoqian Wang, Zhen Lei

Abstract: Masked Image Modeling (MIM) with Vector Quantization (VQ) has achieved great success in both self-supervised pre-training and image generation. However, most existing methods struggle to address the trade-off in shared latent space for generation quality vs. representation learning and efficiency. To push the limits of this paradigm, we propose MergeVQ, which incorporates token merging techniques into VQ-based generative models to bridge the gap between image generation and visual representation learning in a unified architecture. During pre-training, MergeVQ decouples top-k semantics from latent space with the token merge module after self-attention blocks in the encoder for subsequent Look-up Free Quantization (LFQ) and global alignment and recovers their fine-grained details through cross-attention in the decoder for reconstruction. As for the second-stage generation, we introduce MergeAR, which performs KV Cache compression for efficient raster-order prediction. Extensive experiments on ImageNet verify that MergeVQ as an AR generative model achieves competitive performance in both visual representation learning and image generation tasks while maintaining favorable token efficiency and inference speed. The code and model will be available at https://apexgen-x.github.io/MergeVQ.

Comment: MergeVQ introduces a unified framework combining token merging and quantization, which is relevant to representation learning and efficiency improvements in model architecture.

Relevance: 9 Novelty: 8


5. Spectral Architecture Search for Neural Networks

ArXiv ID: 2504.00885

Authors: Gianluca Peri, Lorenzo Giambagli, Lorenzo Chicchi, Duccio Fanelli

Abstract: Architecture design and optimization are challenging problems in the field of artificial neural networks. Working in this context, we here present SPARCS (SPectral ARchiteCture Search), a novel architecture search protocol which exploits the spectral attributes of the inter-layer transfer matrices. SPARCS allows one to explore the space of possible architectures by spanning continuous and differentiable manifolds, thus enabling for gradient-based optimization algorithms to be eventually employed. With reference to simple benchmark models, we show that the newly proposed method yields a self-emerging architecture with a minimal degree of expressivity to handle the task under investigation and with a reduced parameter count as compared to other viable alternatives.

Comment: SPARCS introduces a novel spectral-based architecture search method, which aligns with foundational research in model architecture optimization.

Relevance: 9 Novelty: 8


6. ObscuraCoder: Powering Efficient Code LM Pre-Training Via Obfuscation Grounding

ArXiv ID: 2504.00019

Authors: Indraneil Paul, Haoyi Yang, Goran Glava\v{s}, Kristian Kersting, Iryna Gurevych

Abstract: Language models (LMs) have become a staple of the code-writing toolbox. Their pre-training recipe has, however, remained stagnant over recent years, barring the occasional changes in data sourcing and filtering strategies. In particular, research exploring modifications to Code-LMs' pre-training objectives, geared towards improving data efficiency and better disentangling between syntax and semantics, has been noticeably sparse, especially compared with corresponding efforts in natural language LMs. In this work, we examine grounding on obfuscated code as a means of helping Code-LMs look beyond the surface-form syntax and enhance their pre-training sample efficiency. To this end, we compile ObscuraX, a dataset of approximately 55M source and obfuscated code pairs in seven languages. Subsequently, we pre-train ObscuraCoder models, ranging in size from 255M to 2.8B parameters, on a 272B-token corpus that includes ObscuraX and demonstrate that our obfuscation-based pre-training recipe leads to consistent improvements in Code-LMs' abilities compared to both vanilla autoregressive pre-training as well as existing de-obfuscation (DOBF) objectives. ObscuraCoder demonstrates sizeable gains across multiple tests of syntactic and semantic code understanding, along with improved capabilities in multilingual code completion, multilingual code commit summarization, and multi-purpose library-oriented code generation.

Comment: The paper proposes a novel pretraining objective for Code-LMs using obfuscation grounding, which aligns with foundational research in representation learning and LLM pretraining. The approach introduces methodological innovations.

Relevance: 9 Novelty: 8


7. Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations

ArXiv ID: 2504.00851

Authors: Chongjie Si, Zhiyi Shi, Xuehui Wang, Yichen Xiao, Xiaokang Yang, Wei Shen

Abstract: Adapting pre-trained foundation models for diverse downstream tasks is a core practice in artificial intelligence. However, the wide range of tasks and high computational costs make full fine-tuning impractical. To overcome this, parameter-efficient fine-tuning (PEFT) methods like LoRA have emerged and are becoming a growing research focus. Despite the success of these methods, they are primarily designed for linear layers, focusing on two-dimensional matrices while largely ignoring higher-dimensional parameter spaces like convolutional kernels. Moreover, directly applying these methods to higher-dimensional parameter spaces often disrupts their structural relationships. Given the rapid advancements in matrix-based PEFT methods, rather than designing a specialized strategy, we propose a generalization that extends matrix-based PEFT methods to higher-dimensional parameter spaces without compromising their structural properties. Specifically, we treat parameters as elements of a Lie group, with updates modeled as perturbations in the corresponding Lie algebra. These perturbations are mapped back to the Lie group through the exponential map, ensuring smooth, consistent updates that preserve the inherent structure of the parameter space. Extensive experiments on computer vision and natural language processing validate the effectiveness and versatility of our approach, demonstrating clear improvements over existing methods.

Comment: The paper generalizes parameter-efficient fine-tuning methods to higher-dimensional parameter spaces using Lie group transformations, which aligns with foundational research in model compression and efficiency.

Relevance: 9 Novelty: 8


8. Logical perspectives on learning statistical objects

ArXiv ID: 2504.00847

Authors: Aaron Anderson, Michael Benedikt

Abstract: We consider the relationship between learnability of a base class'' of functions on a set X and learnability of a class of statistical functions derived from the base class. For example, we refine results showing that learnability of a family of functions implies learnability of the family of functions mapping a function in the class to its expectation under a distribution. We will look at both Probably Approximately Correct (PAC) learning, where example inputs and outputs are chosen at random, and online learning, where the examples are chosen adversarially. We establish improved bounds on the sample complexity of learning for statistical classes, stated in terms of combinatorial dimensions of the base class. We do this by adapting techniques introduced in model theory forrandomizing a structure''. We give particular attention to classes derived from logical formulas, and relate learnability of the statistical classes to properties of the formula. Finally, we provide bounds on the complexity of learning the statistical classes built on top of a logic-based hypothesis class.

Comment: The paper provides theoretical insights into learnability and sample complexity, which aligns with foundational research in representation learning. It also explores connections to logical formulas, adding a novel perspective.

Relevance: 9 Novelty: 8


9. ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

ArXiv ID: 2504.00254

Authors: Huandong Chang, Zicheng Ma, Mingyuan Ma, Zhenting Qi, Andrew Sabot, Hong Jiang, H. T. Kung

Abstract: Low-Rank Adaptation (LoRA) has become a widely adopted technique for fine-tuning large-scale pre-trained models with minimal parameter updates. However, existing methods rely on fixed ranks or focus solely on either rank pruning or expansion, failing to adapt ranks dynamically to match the importance of different layers during training. In this work, we propose ElaLoRA, an adaptive low-rank adaptation framework that dynamically prunes and expands ranks based on gradient-derived importance scores. To the best of our knowledge, ElaLoRA is the first method that enables both rank pruning and expansion during fine-tuning. Experiments across multiple benchmarks demonstrate that ElaLoRA consistently outperforms existing PEFT methods across different parameter budgets. Furthermore, our studies validate that layers receiving higher rank allocations contribute more significantly to model performance, providing theoretical justification for our adaptive strategy. By introducing a principled and adaptive rank allocation mechanism, ElaLoRA offers a scalable and efficient fine-tuning solution, particularly suited for resource-constrained environments.

Comment: The paper introduces an adaptive low-rank adaptation framework (ElaLoRA) for fine-tuning, which is highly relevant to model compression and efficiency. The dynamic rank allocation mechanism adds a novel theoretical contribution.

Relevance: 9 Novelty: 8


10. When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning

ArXiv ID: 2504.01005

Authors: Nishad Singhi, Hritik Bansal, Arian Hosseini, Aditya Grover, Kai-Wei Chang, Marcus Rohrbach, Anna Rohrbach

Abstract: Scaling test-time compute has emerged as a key strategy for enhancing the reasoning capabilities of large language models (LLMs), particularly in tasks like mathematical problem-solving. A traditional approach, Self-Consistency (SC), generates multiple solutions to a problem and selects the most common answer via majority voting. Another common method involves scoring each solution with a reward model (verifier) and choosing the best one. Recent advancements in Generative Reward Models (GenRM) reframe verification as a next-token prediction task, enabling inference-time scaling along a new axis. Specifically, GenRM generates multiple verification chains-of-thought to score each solution. Under a limited inference budget, this introduces a fundamental trade-off: should you spend the budget on scaling solutions via SC or generate fewer solutions and allocate compute to verification via GenRM? To address this, we evaluate GenRM against SC under a fixed inference budget. Interestingly, we find that SC is more compute-efficient than GenRM for most practical inference budgets across diverse models and datasets. For instance, GenRM first matches SC after consuming up to 8x the inference compute and requires significantly more compute to outperform it. Furthermore, we derive inference scaling laws for the GenRM paradigm, revealing that compute-optimal inference favors scaling solution generation more aggressively than scaling the number of verifications. Our work provides practical guidance on optimizing test-time scaling by balancing solution generation and verification. The code is available at https://github.com/nishadsinghi/sc-genrm-scaling.

Comment: The paper provides theoretical insights into compute-optimal strategies for reasoning in LLMs, addressing trade-offs in solution generation and verification. This contributes to foundational understanding of LLM behavior.

Relevance: 9 Novelty: 8


11. Less is More: Efficient Black-box Attribution via Minimal Interpretable Subset Selection

ArXiv ID: 2504.00470

Authors: Ruoyu Chen, Siyuan Liang, Jingzhi Li, Shiming Liu, Li Liu, Hua Zhang, Xiaochun Cao

Abstract: To develop a trustworthy AI system, which aim to identify the input regions that most influence the models decisions. The primary task of existing attribution methods lies in efficiently and accurately identifying the relationships among input-prediction interactions. Particularly when the input data is discrete, such as images, analyzing the relationship between inputs and outputs poses a significant challenge due to the combinatorial explosion. In this paper, we propose a novel and efficient black-box attribution mechanism, LiMA (Less input is More faithful for Attribution), which reformulates the attribution of important regions as an optimization problem for submodular subset selection. First, to accurately assess interactions, we design a submodular function that quantifies subset importance and effectively captures their impact on decision outcomes. Then, efficiently ranking input sub-regions by their importance for attribution, we improve optimization efficiency through a novel bidirectional greedy search algorithm. LiMA identifies both the most and least important samples while ensuring an optimal attribution boundary that minimizes errors. Extensive experiments on eight foundation models demonstrate that our method provides faithful interpretations with fewer regions and exhibits strong generalization, shows an average improvement of 36.3% in Insertion and 39.6% in Deletion. Our method also outperforms the naive greedy search in attribution efficiency, being 1.6 times faster. Furthermore, when explaining the reasons behind model prediction errors, the average highest confidence achieved by our method is, on average, 86.1% higher than that of state-of-the-art attribution algorithms. The code is available at https://github.com/RuoyuChen10/LIMA.

Comment: The paper introduces a black-box attribution method (LiMA) for interpretability, which aligns with representation learning by focusing on input-prediction interactions. The novel optimization approach and submodular function design are significant contributions.

Relevance: 8 Novelty: 8


12. NeuraLUT-Assemble: Hardware-aware Assembling of Sub-Neural Networks for Efficient LUT Inference

ArXiv ID: 2504.00592

Authors: Marta Andronic, George A. Constantinides

Abstract: Efficient neural networks (NNs) leveraging lookup tables (LUTs) have demonstrated significant potential for emerging AI applications, particularly when deployed on field-programmable gate arrays (FPGAs) for edge computing. These architectures promise ultra-low latency and reduced resource utilization, broadening neural network adoption in fields such as particle physics. However, existing LUT-based designs suffer from accuracy degradation due to the large fan-in required by neurons being limited by the exponential scaling of LUT resources with input width. In practice, in prior work this tension has resulted in the reliance on extremely sparse models. We present NeuraLUT-Assemble, a novel framework that addresses these limitations by combining mixed-precision techniques with the assembly of larger neurons from smaller units, thereby increasing connectivity while keeping the number of inputs of any given LUT manageable. Additionally, we introduce skip-connections across entire LUT structures to improve gradient flow. NeuraLUT-Assemble closes the accuracy gap between LUT-based methods and (fully-connected) MLP-based models, achieving competitive accuracy on tasks such as network intrusion detection, digit classification, and jet classification, demonstrating up to $8.42\times$ reduction in the area-delay product compared to the state-of-the-art at the time of the publication.

Comment: The paper proposes a hardware-aware framework for LUT-based neural networks, focusing on efficiency and sparsity. It aligns with model compression topics, particularly in addressing sparsity and resource utilization.

Relevance: 8 Novelty: 7


13. Self-Evolving Visual Concept Library using Vision-Language Critics

ArXiv ID: 2504.00185

Authors: Atharva Sehgal, Patrick Yuan, Ziniu Hu, Yisong Yue, Jennifer J. Sun, Swarat Chaudhuri

Abstract: We study the problem of building a visual concept library for visual recognition. Building effective visual concept libraries is challenging, as manual definition is labor-intensive, while relying solely on LLMs for concept generation can result in concepts that lack discriminative power or fail to account for the complex interactions between them. Our approach, ESCHER, takes a library learning perspective to iteratively discover and improve visual concepts. ESCHER uses a vision-language model (VLM) as a critic to iteratively refine the concept library, including accounting for interactions between concepts and how they affect downstream classifiers. By leveraging the in-context learning abilities of LLMs and the history of performance using various concepts, ESCHER dynamically improves its concept generation strategy based on the VLM critic's feedback. Finally, ESCHER does not require any human annotations, and is thus an automated plug-and-play framework. We empirically demonstrate the ability of ESCHER to learn a concept library for zero-shot, few-shot, and fine-tuning visual classification tasks. This work represents, to our knowledge, the first application of concept library learning to real-world visual tasks.

Comment: The paper introduces a novel approach to iteratively refine a visual concept library using vision-language models, which aligns with representation learning and foundational insights into training dynamics.

Relevance: 8 Novelty: 7


14. Geometric Median Matching for Robust k-Subset Selection from Noisy Data

ArXiv ID: 2504.00564

Authors: Anish Acharya, Sujay Sanghavi, Alexandros G Dimakis, Inderjit S Dhillon

Abstract: Data pruning -- the combinatorial task of selecting a small and representative subset from a large dataset, is crucial for mitigating the enormous computational costs associated with training data-hungry modern deep learning models at scale. Since large scale data collections are invariably noisy, developing data pruning strategies that remain robust even in the presence of corruption is critical in practice. However, existing data pruning methods often fail under high corruption rates due to their reliance on empirical mean estimation, which is highly sensitive to outliers. In response, we propose Geometric Median (GM) Matching, a novel k-subset selection strategy that leverages Geometric Median -- a robust estimator with an optimal breakdown point of 1/2; to enhance resilience against noisy data. Our method iteratively selects a k-subset such that the mean of the subset approximates the GM of the (potentially) noisy dataset, ensuring robustness even under arbitrary corruption. We provide theoretical guarantees, showing that GM Matching enjoys an improved O(1/k) convergence rate -- a quadratic improvement over random sampling, even under arbitrary corruption. Extensive experiments across image classification and image generation tasks demonstrate that GM Matching consistently outperforms existing pruning approaches, particularly in high-corruption settings and at high pruning rates; making it a strong baseline for robust data pruning.

Comment: The paper introduces a novel k-subset selection strategy using the Geometric Median for robust data pruning, which aligns with model compression and efficiency topics. The theoretical guarantees and robustness improvements add to its relevance.

Relevance: 8 Novelty: 7


15. Hawkeye:Efficient Reasoning with Model Collaboration

ArXiv ID: 2504.00424

Authors: Jianshu She, Zhuohao Li, Zhemin Huang, Qi Li, Peiran Xu, Haonan Li, Qirong Ho

Abstract: Chain-of-Thought (CoT) reasoning has demonstrated remarkable effectiveness in enhancing the reasoning abilities of large language models (LLMs). However, its efficiency remains a challenge due to the generation of excessive intermediate reasoning tokens, which introduce semantic redundancy and overly detailed reasoning steps. Moreover, computational expense and latency are significant concerns, as the cost scales with the number of output tokens, including those intermediate steps. In this work, we observe that most CoT tokens are unnecessary, and retaining only a small portion of them is sufficient for producing high-quality responses. Inspired by this, we propose HAWKEYE, a novel post-training and inference framework where a large model produces concise CoT instructions to guide a smaller model in response generation. HAWKEYE quantifies redundancy in CoT reasoning and distills high-density information via reinforcement learning. By leveraging these concise CoTs, HAWKEYE is able to expand responses while reducing token usage and computational cost significantly. Our evaluation shows that HAWKEYE can achieve comparable response quality using only 35% of the full CoTs, while improving clarity, coherence, and conciseness by approximately 10%. Furthermore, HAWKEYE can accelerate end-to-end reasoning by up to 3.4x on complex math tasks while reducing inference cost by up to 60%. HAWKEYE will be open-sourced and the models will be available soon.

Comment: The paper proposes HAWKEYE, a framework for efficient reasoning in LLMs by reducing redundancy in Chain-of-Thought reasoning. This aligns with model compression and efficiency breakthroughs, making it relevant.

Relevance: 8 Novelty: 7


Paper Selection Prompt

You are a helpful paper reading assistant whose job is to read daily posts from ArXiv and identify a few papers that your friend will enjoy reading. Your job is to carefully read the paper titles and abstracts below and find the ones that match the criteria below.

Instructions

Write the response in JSONL format with {ARXIVID, COMMENT, RELEVANCE, NOVELTY} on each line, one for each paper.

Scoring Criteria

The "Relevance" score measures how closely the paper aligns with the core topics of the prompt. The "Novelty" score assesses the originality and impact of the paper. They are two ORTHONORMAL axes and SHOULD NOT be confused with each other.

Relevance Scoring

Novelty Scoring

Papers

[PAPER LIST HERE]

Relevant Topics

Use the following relevance criteria to focus on foundational research. Keep relevant papers and filter out irrelevant ones. Avoid purely application-driven work.

  1. Representation Learning - Relevant: Insights into how deep networks encode information, feature/dictionary learning, sparse/contrastive methods, training dynamics in neural networks. - Irrelevant: Standard applications of known techniques lacking new theoretical or methodological contributions.

  2. Model Architecture - Relevant: Mixture-of-Experts (MoE), Transformers, Conditional/Dynamic Networks, Autoencoders, analysis on existing architectures (like encoder-decoder), or other architectural innovations. - Irrelevant: Merely using existing architectures for a certain task without insights into the structure themselves.

  3. Model Compression - Relevant: Sparsity, pruning, quantization, low-rank approaches, KV cache, or other algorithmic/theoretical efficiency breakthroughs. - Irrelevant: Straightforward applications of existing compression methods to new tasks.

  4. Large Language Models (LLMs) - Relevant: Major breakthroughs in pretraining or architecture, theoretical insights into LLM behavior/interpretability. - Irrelevant: Domain-specific usage (e.g., translation, jail-breaking), finetuning or inference tricks (e.g., instruction tuning, chain-of-thoughts, data mixing), or empirical dataset/benchmark studies and text-level analysis (e.g. hallucination, reasoning, safety).

  5. AI for Science - Relevant: Foundational research in molecular/protein modeling, new generative paradigms, or significant architecture-level innovations. - Irrelevant: Conventional, domain-specific applications without new theoretical perspectives.

  6. Emerging Trends - Relevant: Cutting-edge theoretical work challenging established assumptions or introducing broad new paradigms. - Irrelevant: Incremental improvements or trend-following without novel insights.

Keywords: