Previous Day 2025-05-13
Monthly Overview 2025-05
Next Day 2025-05-15

Personalized Daily ArXiv Papers 2025-05-14

[gpt-4o] Prompt Completion Total
Token 38723 5195 43918
Cost $0.1 $0.05 $0.15

Total arXiv papers: 475

Total scanned papers: 312

Total relevant papers: 17

Table of contents with paper titles:

  1. Iteratively reweighted kernel machines efficiently learn sparse functions Authors: Libin Zhu, Damek Davis, Dmitriy Drusvyatskiy, Maryam Fazel

  2. Super-fast rates of convergence for Neural Networks Classifiers under the Hard Margin Condition Authors: Nathanael Tepakbong, Ding-Xuan Zhou, Xiang Zhou

  3. Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain Authors: Hyowon Wi, Jeongwhan Choi, Noseong Park

  4. Lost in Transmission: When and Why LLMs Fail to Reason Globally Authors: Tobias Schnabel, Kiran Tomlinson, Adith Swaminathan, Jennifer Neville

  5. Recovering Event Probabilities from Large Language Model Embeddings via Axiomatic Constraints Authors: Jian-Qiao Zhu, Haijiang Yan, Thomas L. Griffiths

  6. InfoPO: On Mutual Information Maximization for Large Language Model Alignment Authors: Teng Xiao, Zhen Ge, Sujay Sanghavi, Tian Wang, Julian Katz-Samuels, Marc Versage, Qingjun Cui, Trishul Chilimbi

  7. Blockbuster, Part 1: Block-level AI Operator Fusion Authors: Ofer Dekel

  8. PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts Authors: Yang Su, Na Yan, Yansha Deng, Robert Schober

  9. Efficient Unstructured Pruning of Mamba State-Space Models for Resource-Constrained Environments Authors: Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

  10. Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders Authors: Dong Shu, Xuansheng Wu, Haiyan Zhao, Mengnan Du, Ninghao Liu

  11. Rapid Overfitting of Multi-Pass Stochastic Gradient Descent in Stochastic Convex Optimization Authors: Shira Vansover-Hager, Tomer Koren, Roi Livni

  12. Mirror Mirror on the Wall, Have I Forgotten it All? A New Framework for Evaluating Machine Unlearning Authors: Brennon Brimhall, Philip Mathew, Neil Fendley, Yinzhi Cao, Matthew Green

  13. SPAT: Sensitivity-based Multihead-attention Pruning on Time Series Forecasting Models Authors: Suhan Guo, Jiahong Deng, Mengjun Yi, Furao Shen, Jian Zhao

  14. Manifold Learning with Normalizing Flows: Towards Regularity, Expressivity and Iso-Riemannian Geometry Authors: Willem Diepeveen, Deanna Needell

  15. Behind the Noise: Conformal Quantile Regression Reveals Emergent Representations Authors: Petrus H. Zwart, Tamas Varga, Odeta Qafoku, James A. Sethian

  16. Scaling Laws for Speculative Decoding Authors: Siyuan Yan, Mo Zhu, Guo-qing Jiang, Jianfei Wang, Jiaxing Chen, Wentai Zhang, Xiang Liao, Xiao Cui, Chen Zhang, Zhuoran Song, Ran Zhu

  17. The Correspondence Between Bounded Graph Neural Networks and Fragments of First-Order Logic Authors: Bernardo Cuenca Grau, Przemys{\l}aw A. Wa{\l}\k{e}ga


1. Iteratively reweighted kernel machines efficiently learn sparse functions

ArXiv ID: 2505.08277

Authors: Libin Zhu, Damek Davis, Dmitriy Drusvyatskiy, Maryam Fazel

Abstract: The impressive practical performance of neural networks is often attributed to their ability to learn low-dimensional data representations and hierarchical structure directly from data. In this work, we argue that these two phenomena are not unique to neural networks, and can be elicited from classical kernel methods. Namely, we show that the derivative of the kernel predictor can detect the influential coordinates with low sample complexity. Moreover, by iteratively using the derivatives to reweight the data and retrain kernel machines, one is able to efficiently learn hierarchical polynomials with finite leap complexity. Numerical experiments illustrate the developed theory.

Comment: The paper explores sparse function learning using kernel machines, which aligns with representation learning through sparse methods. It provides theoretical insights into kernel methods, making it relevant to foundational research.

Relevance: 9 Novelty: 8


2. Super-fast rates of convergence for Neural Networks Classifiers under the Hard Margin Condition

ArXiv ID: 2505.08262

Authors: Nathanael Tepakbong, Ding-Xuan Zhou, Xiang Zhou

Abstract: We study the classical binary classification problem for hypothesis spaces of Deep Neural Networks (DNNs) with ReLU activation under Tsybakov's low-noise condition with exponent $q>0$, and its limit-case $q\to\infty$ which we refer to as the "hard-margin condition". We show that DNNs which minimize the empirical risk with square loss surrogate and $\ell_p$ penalty can achieve finite-sample excess risk bounds of order $\mathcal{O}\left(n^{-\alpha}\right)$ for arbitrarily large $\alpha>0$ under the hard-margin condition, provided that the regression function $\eta$ is sufficiently smooth. The proof relies on a novel decomposition of the excess risk which might be of independent interest.

Comment: The paper provides theoretical insights into the performance of deep neural networks under specific conditions, which aligns with foundational research in representation learning and training dynamics.

Relevance: 9 Novelty: 8


3. Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain

ArXiv ID: 2505.08516

Authors: Hyowon Wi, Jeongwhan Choi, Noseong Park

Abstract: Transformers have demonstrated remarkable performance across diverse domains. The key component of Transformers is self-attention, which learns the relationship between any two tokens in the input sequence. Recent studies have revealed that the self-attention can be understood as a normalized adjacency matrix of a graph. Notably, from the perspective of graph signal processing (GSP), the self-attention can be equivalently defined as a simple graph filter, applying GSP using the value vector as the signal. However, the self-attention is a graph filter defined with only the first order of the polynomial matrix, and acts as a low-pass filter preventing the effective leverage of various frequency information. Consequently, existing self-attention mechanisms are designed in a rather simplified manner. Therefore, we propose a novel method, called \underline{\textbf{A}}ttentive \underline{\textbf{G}}raph \underline{\textbf{F}}ilter (AGF), interpreting the self-attention as learning the graph filter in the singular value domain from the perspective of graph signal processing for directed graphs with the linear complexity w.r.t. the input length $n$, i.e., $\mathcal{O}(nd^2)$. In our experiments, we demonstrate that AGF achieves state-of-the-art performance on various tasks, including Long Range Arena benchmark and time series classification.

Comment: The paper proposes a novel self-attention mechanism interpreted as a graph filter in the singular value domain, which aligns with architectural innovations in Transformers.

Relevance: 9 Novelty: 8


4. Lost in Transmission: When and Why LLMs Fail to Reason Globally

ArXiv ID: 2505.08140

Authors: Tobias Schnabel, Kiran Tomlinson, Adith Swaminathan, Jennifer Neville

Abstract: Despite their many successes, transformer-based large language models (LLMs) continue to struggle with tasks that require complex reasoning over large parts of their input. We argue that these failures arise due to capacity limits on the accurate flow of information within LLMs. To formalize this issue, we introduce the bounded attention prefix oracle (BAPO) model, a new computational framework that models bandwidth constraints on attention heads, the mechanism for internal communication in LLMs. We show that several important reasoning problems like graph reachability require high communication bandwidth for BAPOs to solve; we call these problems BAPO-hard. Our experiments corroborate our theoretical predictions: GPT-4, Claude, and Gemini succeed on BAPO-easy tasks and fail even on relatively small BAPO-hard tasks. BAPOs also reveal another benefit of chain of thought (CoT): we prove that breaking down a task using CoT can turn any BAPO-hard problem into a BAPO-easy one. Our results offer principled explanations for key LLM failures and suggest directions for architectures and inference methods that mitigate bandwidth limits.

Comment: The paper introduces the BAPO model to analyze reasoning failures in LLMs, providing theoretical insights into LLM behavior and interpretability, which aligns with the LLM criterion.

Relevance: 9 Novelty: 8


5. Recovering Event Probabilities from Large Language Model Embeddings via Axiomatic Constraints

ArXiv ID: 2505.07883

Authors: Jian-Qiao Zhu, Haijiang Yan, Thomas L. Griffiths

Abstract: Rational decision-making under uncertainty requires coherent degrees of belief in events. However, event probabilities generated by Large Language Models (LLMs) have been shown to exhibit incoherence, violating the axioms of probability theory. This raises the question of whether coherent event probabilities can be recovered from the embeddings used by the models. If so, those derived probabilities could be used as more accurate estimates in events involving uncertainty. To explore this question, we propose enforcing axiomatic constraints, such as the additive rule of probability theory, in the latent space learned by an extended variational autoencoder (VAE) applied to LLM embeddings. This approach enables event probabilities to naturally emerge in the latent space as the VAE learns to both reconstruct the original embeddings and predict the embeddings of semantically related events. We evaluate our method on complementary events (i.e., event A and its complement, event not-A), where the true probabilities of the two events must sum to 1. Experiment results on open-weight language models demonstrate that probabilities recovered from embeddings exhibit greater coherence than those directly reported by the corresponding models and align closely with the true probabilities.

Comment: This paper explores recovering coherent event probabilities from LLM embeddings using an extended VAE, which aligns with foundational research in representation learning and theoretical insights into LLM behavior.

Relevance: 9 Novelty: 8


6. InfoPO: On Mutual Information Maximization for Large Language Model Alignment

ArXiv ID: 2505.08507

Authors: Teng Xiao, Zhen Ge, Sujay Sanghavi, Tian Wang, Julian Katz-Samuels, Marc Versage, Qingjun Cui, Trishul Chilimbi

Abstract: We study the post-training of large language models (LLMs) with human preference data. Recently, direct preference optimization and its variants have shown considerable promise in aligning language models, eliminating the need for reward models and online sampling. Despite these benefits, these methods rely on explicit assumptions about the Bradley-Terry (BT) model, which makes them prone to overfitting and results in suboptimal performance, particularly on reasoning-heavy tasks. To address these challenges, we propose a principled preference fine-tuning algorithm called InfoPO, which effectively and efficiently aligns large language models using preference data. InfoPO eliminates the reliance on the BT model and prevents the likelihood of the chosen response from decreasing. Extensive experiments confirm that InfoPO consistently outperforms established baselines on widely used open benchmarks, particularly in reasoning tasks.

Comment: The paper proposes InfoPO, a novel algorithm for aligning LLMs using preference data, which addresses foundational challenges in LLM alignment and optimization.

Relevance: 9 Novelty: 8


7. Blockbuster, Part 1: Block-level AI Operator Fusion

ArXiv ID: 2505.07829

Authors: Ofer Dekel

Abstract: Blockbuster is a framework for AI operator fusion in inference programs. The Blockbuster framework is compatible with any multiprocessor architecture that has a tiered memory hierarchy, including GPUs, multi-core CPUs, and some AI accelerator chips. It includes a graph-based representation for AI workloads, called a block program, which explicitly models how blocks of data move between the memory tiers. It also includes an operator fusion procedure, which is made up of a candidate selection algorithm and a fusion algorithm that fuses each individual candidate - this two-algorithm structure makes Blockbuster especially suitable for large AI programs. The current paper focuses on the fusion algorithm, which is a rule-based technique. While the literature is full of previous rule-based fusion algorithms, what sets our algorithm apart is its direct modeling of data movement between memory tiers, resulting in uniquely powerful fusion results. As a first sanity check, we demonstrate how our algorithm automatically rediscovers the well-known Flash Attention kernel. Then, we demonstrate the real power of our approach by fusing LayerNorm with matrix multiplication and RMSNorm with FNN-SwiGLU - the latter involves fusing three matrix multiplications, a Hadamard product, a reduction, and a few elementwise operations into a single mega-kernel.

Comment: The paper introduces a novel operator fusion framework, which directly models data movement between memory tiers and achieves significant efficiency improvements. This aligns with the model compression and efficiency breakthroughs criterion.

Relevance: 9 Novelty: 8


8. PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts

ArXiv ID: 2505.08719

Authors: Yang Su, Na Yan, Yansha Deng, Robert Schober

Abstract: Large language models (LLMs) hosted on cloud servers alleviate the computational and storage burdens on local devices but raise privacy concerns due to sensitive data transmission and require substantial communication bandwidth, which is challenging in constrained environments. In contrast, small language models (SLMs) running locally enhance privacy but suffer from limited performance on complex tasks. To balance computational cost, performance, and privacy protection under bandwidth constraints, we propose a privacy-aware wireless collaborative mixture of experts (PWC-MoE) framework. Specifically, PWC-MoE employs a sparse privacy-aware gating network to dynamically route sensitive tokens to privacy experts located on local clients, while non-sensitive tokens are routed to non-privacy experts located at the remote base station. To achieve computational efficiency, the gating network ensures that each token is dynamically routed to and processed by only one expert. To enhance scalability and prevent overloading of specific experts, we introduce a group-wise load-balancing mechanism for the gating network that evenly distributes sensitive tokens among privacy experts and non-sensitive tokens among non-privacy experts. To adapt to bandwidth constraints while preserving model performance, we propose a bandwidth-adaptive and importance-aware token offloading scheme. This scheme incorporates an importance predictor to evaluate the importance scores of non-sensitive tokens, prioritizing the most important tokens for transmission to the base station based on their predicted importance and the available bandwidth. Experiments demonstrate that the PWC-MoE framework effectively preserves privacy and maintains high performance even in bandwidth-constrained environments, offering a practical solution for deploying LLMs in privacy-sensitive and bandwidth-limited scenarios.

Comment: The paper proposes a privacy-aware MoE framework with dynamic token routing and bandwidth-adaptive mechanisms. This aligns closely with the Mixture-of-Experts (MoE) and model architecture criteria.

Relevance: 9 Novelty: 8


9. Efficient Unstructured Pruning of Mamba State-Space Models for Resource-Constrained Environments

ArXiv ID: 2505.08299

Authors: Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

Abstract: State-space models (SSMs), particularly the Mamba architecture, have emerged as powerful alternatives to Transformers for sequence modeling, offering linear-time complexity and competitive performance across diverse tasks. However, their large parameter counts pose significant challenges for deployment in resource-constrained environments. We propose a novel unstructured pruning framework tailored for Mamba models that achieves up to 70\% parameter reduction while retaining over 95\% of the original performance. Our approach integrates three key innovations: (1) a gradient-aware magnitude pruning technique that combines weight magnitude and gradient information to identify less critical parameters, (2) an iterative pruning schedule that gradually increases sparsity to maintain model stability, and (3) a global pruning strategy that optimizes parameter allocation across the entire model. Through extensive experiments on WikiText-103, Long Range Arena, and ETT time-series benchmarks, we demonstrate significant efficiency gains with minimal performance degradation. Our analysis of pruning effects on Mamba's components reveals critical insights into the architecture's redundancy and robustness, enabling practical deployment in resource-constrained settings while broadening Mamba's applicability.

Comment: The paper proposes a novel unstructured pruning framework for Mamba state-space models, which is relevant to model compression. The gradient-aware magnitude pruning and iterative pruning schedule are innovative contributions.

Relevance: 9 Novelty: 8


10. Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders

ArXiv ID: 2505.08080

Authors: Dong Shu, Xuansheng Wu, Haiyan Zhao, Mengnan Du, Ninghao Liu

Abstract: Sparse Autoencoders (SAEs) have recently emerged as powerful tools for interpreting and steering the internal representations of large language models (LLMs). However, conventional approaches to analyzing SAEs typically rely solely on input-side activations, without considering the causal influence between each latent feature and the model's output. This work is built on two key hypotheses: (1) activated latents do not contribute equally to the construction of the model's output, and (2) only latents with high causal influence are effective for model steering. To validate these hypotheses, we propose Gradient Sparse Autoencoder (GradSAE), a simple yet effective method that identifies the most influential latents by incorporating output-side gradient information.

Comment: The paper introduces Gradient Sparse Autoencoders (GradSAE) to identify influential latents, which is highly relevant to representation learning. The focus on causality and gradient-based methods adds theoretical depth.

Relevance: 9 Novelty: 8


11. Rapid Overfitting of Multi-Pass Stochastic Gradient Descent in Stochastic Convex Optimization

ArXiv ID: 2505.08306

Authors: Shira Vansover-Hager, Tomer Koren, Roi Livni

Abstract: We study the out-of-sample performance of multi-pass stochastic gradient descent (SGD) in the fundamental stochastic convex optimization (SCO) model. While one-pass SGD is known to achieve an optimal $\Theta(1/\sqrt{n})$ excess population loss given a sample of size $n$, much less is understood about the multi-pass version of the algorithm which is widely used in practice. Somewhat surprisingly, we show that in the general non-smooth case of SCO, just a few epochs of SGD can already hurt its out-of-sample performance significantly and lead to overfitting. In particular, using a step size $\eta = \Theta(1/\sqrt{n})$, which gives the optimal rate after one pass, can lead to population loss as large as $\Omega(1)$ after just one additional pass. More generally, we show that the population loss from the second pass onward is of the order $\Theta(1/(\eta T) + \eta \sqrt{T})$, where $T$ is the total number of steps. These results reveal a certain phase-transition in the out-of-sample behavior of SGD after the first epoch, as well as a sharp separation between the rates of overfitting in the smooth and non-smooth cases of SCO. Additionally, we extend our results to with-replacement SGD, proving that the same asymptotic bounds hold after $O(n \log n)$ steps. Finally, we also prove a lower bound of $\Omega(\eta \sqrt{n})$ on the generalization gap of one-pass SGD in dimension $d = \smash{\widetilde O}(n)$, improving on recent results of Koren et al.(2022) and Schliserman et al.(2024).

Comment: The paper studies the generalization behavior of multi-pass stochastic gradient descent (SGD) in stochastic convex optimization, providing theoretical insights into overfitting dynamics. This aligns with foundational research in training dynamics.

Relevance: 9 Novelty: 8


12. Mirror Mirror on the Wall, Have I Forgotten it All? A New Framework for Evaluating Machine Unlearning

ArXiv ID: 2505.08138

Authors: Brennon Brimhall, Philip Mathew, Neil Fendley, Yinzhi Cao, Matthew Green

Abstract: Machine unlearning methods take a model trained on a dataset and a forget set, then attempt to produce a model as if it had only been trained on the examples not in the forget set. We empirically show that an adversary is able to distinguish between a mirror model (a control model produced by retraining without the data to forget) and a model produced by an unlearning method across representative unlearning methods from the literature. We build distinguishing algorithms based on evaluation scores in the literature (i.e. membership inference scores) and Kullback-Leibler divergence. We propose a strong formal definition for machine unlearning called computational unlearning. Computational unlearning is defined as the inability for an adversary to distinguish between a mirror model and a model produced by an unlearning method. If the adversary cannot guess better than random (except with negligible probability), then we say that an unlearning method achieves computational unlearning. Our computational unlearning definition provides theoretical structure to prove unlearning feasibility results. For example, our computational unlearning definition immediately implies that there are no deterministic computational unlearning methods for entropic learning algorithms. We also explore the relationship between differential privacy (DP)-based unlearning methods and computational unlearning, showing that DP-based approaches can satisfy computational unlearning at the cost of an extreme utility collapse. These results demonstrate that current methodology in the literature fundamentally falls short of achieving computational unlearning. We conclude by identifying several open questions for future work.

Comment: The paper introduces a new theoretical framework for machine unlearning, which aligns with foundational research in representation learning and training dynamics. The focus on computational unlearning and its relationship with differential privacy provides theoretical insights.

Relevance: 8 Novelty: 8


13. SPAT: Sensitivity-based Multihead-attention Pruning on Time Series Forecasting Models

ArXiv ID: 2505.08768

Authors: Suhan Guo, Jiahong Deng, Mengjun Yi, Furao Shen, Jian Zhao

Abstract: Attention-based architectures have achieved superior performance in multivariate time series forecasting but are computationally expensive. Techniques such as patching and adaptive masking have been developed to reduce their sizes and latencies. In this work, we propose a structured pruning method, SPAT ($\textbf{S}$ensitivity $\textbf{P}$runer for $\textbf{At}$tention), which selectively removes redundant attention mechanisms and yields highly effective models. Different from previous approaches, SPAT aims to remove the entire attention module, which reduces the risk of overfitting and enables speed-up without demanding specialized hardware. We propose a dynamic sensitivity metric, $\textbf{S}$ensitivity $\textbf{E}$nhanced $\textbf{N}$ormalized $\textbf{D}$ispersion (SEND) that measures the importance of each attention module during the pre-training phase. Experiments on multivariate datasets demonstrate that SPAT-pruned models achieve reductions of 2.842% in MSE, 1.996% in MAE, and 35.274% in FLOPs. Furthermore, SPAT-pruned models outperform existing lightweight, Mamba-based and LLM-based SOTA methods in both standard and zero-shot inference, highlighting the importance of retaining only the most effective attention mechanisms. We have made our code publicly available https://anonymous.4open.science/r/SPAT-6042.

Comment: The paper introduces a structured pruning method for attention mechanisms in time series forecasting, which aligns with the model compression criterion, particularly sparsity and pruning.

Relevance: 8 Novelty: 7


14. Manifold Learning with Normalizing Flows: Towards Regularity, Expressivity and Iso-Riemannian Geometry

ArXiv ID: 2505.08087

Authors: Willem Diepeveen, Deanna Needell

Abstract: Modern machine learning increasingly leverages the insight that high-dimensional data often lie near low-dimensional, non-linear manifolds, an idea known as the manifold hypothesis. By explicitly modeling the geometric structure of data through learning Riemannian geometry algorithms can achieve improved performance and interpretability in tasks like clustering, dimensionality reduction, and interpolation. In particular, learned pullback geometry has recently undergone transformative developments that now make it scalable to learn and scalable to evaluate, which further opens the door for principled non-linear data analysis and interpretable machine learning. However, there are still steps to be taken when considering real-world multi-modal data. This work focuses on addressing distortions and modeling errors that can arise in the multi-modal setting and proposes to alleviate both challenges through isometrizing the learned Riemannian structure and balancing regularity and expressivity of the diffeomorphism parametrization. We showcase the effectiveness of the synergy of the proposed approaches in several numerical experiments with both synthetic and real data.

Comment: This paper explores manifold learning with normalizing flows, addressing challenges in multi-modal data and proposing methods to improve regularity and expressivity. It aligns with representation learning by focusing on geometric structure and interpretability, making it relevant to foundational research.

Relevance: 8 Novelty: 7


15. Behind the Noise: Conformal Quantile Regression Reveals Emergent Representations

ArXiv ID: 2505.08176

Authors: Petrus H. Zwart, Tamas Varga, Odeta Qafoku, James A. Sethian

Abstract: Scientific imaging often involves long acquisition times to obtain high-quality data, especially when probing complex, heterogeneous systems. However, reducing acquisition time to increase throughput inevitably introduces significant noise into the measurements. We present a machine learning approach that not only denoises low-quality measurements with calibrated uncertainty bounds, but also reveals emergent structure in the latent space. By using ensembles of lightweight, randomly structured neural networks trained via conformal quantile regression, our method performs reliable denoising while uncovering interpretable spatial and chemical features -- without requiring labels or segmentation. Unlike conventional approaches focused solely on image restoration, our framework leverages the denoising process itself to drive the emergence of meaningful representations. We validate the approach on real-world geobiochemical imaging data, showing how it supports confident interpretation and guides experimental design under resource constraints.

Comment: The paper presents a denoising method using conformal quantile regression that uncovers emergent representations, aligning with foundational research in representation learning and interpretability.

Relevance: 8 Novelty: 7


16. Scaling Laws for Speculative Decoding

ArXiv ID: 2505.07858

Authors: Siyuan Yan, Mo Zhu, Guo-qing Jiang, Jianfei Wang, Jiaxing Chen, Wentai Zhang, Xiang Liao, Xiao Cui, Chen Zhang, Zhuoran Song, Ran Zhu

Abstract: The escalating demand for efficient decoding in large language models (LLMs) is particularly critical for reasoning-intensive architectures like OpenAI-o3 and DeepSeek-R1, which depend on extended chain-of-thought reasoning. This study investigates speculative decoding techniques through dense LLM architectures to establish foundational insights for accelerating reasoning tasks. While speculative decoding methods leveraging parallel draft-verification cycles have emerged as promising acceleration techniques, the scaling laws governing decoding efficiency remain under-explored compared to conventional backbone LLMs developed through Pretraining->SFT->RLHF training paradigms. In this work, we discover Log-linear Scaling Laws (Theorem 1.1, 1.2 and 1.3) governing draft model acceptance rate (or decoding speed) across three dimensions: pretraining token volume, draft model capacity, and decoding batch size. Building on these laws, we achieve Scylla, which coordinates multi-dimensional scaling for popular LLMs (Llama2/3, Qwen2.5). Empirical validation shows Scylla achieves 1.5-2.2 higher acceptance rate than EAGLE2 and 0.3 higher than EAGLE3 at temperature T = 0, with peak performance gains on summarization and QA tasks (Figure 2). Industrial inference engine deployments demonstrate 2X decoding throughput improvements over EAGLE2 (Table 5), validating the transformative potential of systematic scaling for efficient LLM inference. Code will be released later.

Comment: The paper explores speculative decoding techniques and establishes scaling laws for decoding efficiency in LLMs. This provides theoretical insights into LLM behavior and decoding efficiency, aligning with the LLM criterion.

Relevance: 8 Novelty: 7


17. The Correspondence Between Bounded Graph Neural Networks and Fragments of First-Order Logic

ArXiv ID: 2505.08021

Authors: Bernardo Cuenca Grau, Przemys{\l}aw A. Wa{\l}\k{e}ga

Abstract: Graph Neural Networks (GNNs) address two key challenges in applying deep learning to graph-structured data: they handle varying size input graphs and ensure invariance under graph isomorphism. While GNNs have demonstrated broad applicability, understanding their expressive power remains an important question. In this paper, we show that bounded GNN architectures correspond to specific fragments of first-order logic (FO), including modal logic (ML), graded modal logic (GML), modal logic with the universal modality (ML(A)), the two-variable fragment (FO2) and its extension with counting quantifiers (C2). To establish these results, we apply methods and tools from finite model theory of first-order and modal logics to the domain of graph representation learning. This provides a unifying framework for understanding the logical expressiveness of GNNs within FO.

Comment: This paper provides a theoretical analysis of the expressive power of Graph Neural Networks (GNNs) by linking them to fragments of first-order logic. It aligns with foundational research in representation learning and model analysis.

Relevance: 8 Novelty: 7


Paper Selection Prompt

You are a helpful paper reading assistant whose job is to read daily posts from ArXiv and identify a few papers that your friend will enjoy reading. Your job is to carefully read the paper titles and abstracts below and find the ones that match the criteria below.

Instructions

Write the response in JSONL format with {ARXIVID, COMMENT, RELEVANCE, NOVELTY} on each line, one for each paper.

Scoring Criteria

The "Relevance" score measures how closely the paper aligns with the core topics of the prompt. The "Novelty" score assesses the originality and impact of the paper. They are two ORTHONORMAL axes and SHOULD NOT be confused with each other.

Relevance Scoring

Novelty Scoring

Papers

[PAPER LIST HERE]

Relevant Topics

Use the following relevance criteria to focus on foundational research. Keep relevant papers and filter out irrelevant ones. Avoid purely application-driven work.

  1. Representation Learning - Relevant: Insights into how deep networks encode information, feature/dictionary learning, sparse/contrastive methods, training dynamics in neural networks. - Irrelevant: Standard applications of known techniques lacking new theoretical or methodological contributions.

  2. Model Architecture - Relevant: Mixture-of-Experts (MoE), Transformers, Conditional/Dynamic Networks, Autoencoders, analysis on existing architectures (like encoder-decoder), or other architectural innovations. - Irrelevant: Merely using existing architectures for a certain task without insights into the structure themselves.

  3. Model Compression - Relevant: Sparsity, pruning, quantization, low-rank approaches, KV cache, or other algorithmic/theoretical efficiency breakthroughs. - Irrelevant: Straightforward applications of existing compression methods to new tasks.

  4. Large Language Models (LLMs) - Relevant: Major breakthroughs in pretraining or architecture, theoretical insights into LLM behavior/interpretability. - Irrelevant: Domain-specific usage (e.g., translation, jail-breaking), finetuning or inference tricks (e.g., instruction tuning, chain-of-thoughts, data mixing), or empirical dataset/benchmark studies and text-level analysis (e.g. hallucination, reasoning, safety).

  5. AI for Science - Relevant: Foundational research in molecular/protein modeling, new generative paradigms, or significant architecture-level innovations. - Irrelevant: Conventional, domain-specific applications without new theoretical perspectives.

  6. Emerging Trends - Relevant: Cutting-edge theoretical work challenging established assumptions or introducing broad new paradigms. - Irrelevant: Incremental improvements or trend-following without novel insights.

Keywords: