Personalized Daily ArXiv Papers 2025-07-14

[gpt-4o]	Prompt	Completion	Total
Token	28988	3407	32395
Cost	$0.07	$0.03	$0.11

Total arXiv papers: 379

Total scanned papers: 221

Total relevant papers: 19

Table of contents with paper titles:

KELPS: A Framework for Verified Multi-Language Autoformalization via Semantic-Syntactic Alignment Authors: Jiyao Zhang, Chengli Zhong, Hui Xu, Qige Li, Yi Zhou
From Language to Logic: A Bi-Level Framework for Structured Reasoning Authors: Keying Yang, Hao Wang, Kai Yang
Compress Any Segment Anything Model (SAM) Authors: Juntong Fan, Zhiwei Hao, Jianqiang Shen, Shang-Ling Jui, Yi Zhang, Jing-Xiao Liao, Feng-Lei Fan
Greedy Low-Rank Gradient Compression for Distributed Learning with Convergence Guarantees Authors: Chuyan Chen, Yutong He, Pengrui Li, Weichen Jia, Kun Yuan
Low-rank Momentum Factorization for Memory Efficient Training Authors: Pouria Mahdavinia, Mehrdad Mahdavi
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability? Authors: Denis Sutter, Julian Minder, Thomas Hofmann, Tiago Pimentel
BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity Authors: Chenyang Song, Weilin Zhao, Xu Han, Chaojun Xiao, Yingfa Chen, Yuxuan Li, Zhiyuan Liu, Maosong Sun
Krul: Efficient State Restoration for Multi-turn Conversations with Dynamic Cross-layer KV Sharing Authors: Junyi Wen, Junyuan Liang, Zicong Hong, Wuhui Chen, Zibin Zheng
Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores Authors: Vivek Chari, Benjamin Van Durme
A document is worth a structured record: Principled inductive bias design for document recognition Authors: Benjamin Meyer, Lukas Tuggener, Sascha H\"anzi, Daniel Schmid, Erdal Ayfer, Benjamin F. Grewe, Ahmed Abdulkadir, Thilo Stadelmann
Ranked Set Sampling-Based Multilayer Perceptron: Improving Generalization via Variance-Based Bounds Authors: Feijiang Li, Liuya Zhang, Jieting Wang, Tao Yan, Yuhua Qian
PanMatch: Unleashing the Potential of Large Vision Models for Unified Matching Models Authors: Yongjian Zhang, Longguang Wang, Kunhong Li, Ye Zhang, Yun Wang, Liang Lin, Yulan Guo
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation Authors: Anlin Zheng, Xin Wen, Xuanyang Zhang, Chuofan Ma, Tiancai Wang, Gang Yu, Xiangyu Zhang, Xiaojuan Qi
Admissibility of Stein Shrinkage for Batch Normalization in the Presence of Adversarial Attacks Authors: Sofia Ivolgina, P. Thomas Fletcher, Baba C. Vemuri
AbbIE: Autoregressive Block-Based Iterative Encoder for Efficient Sequence Modeling Authors: Preslav Aleksandrov, Meghdad Kurmanji, Fernando Garcia Redondo, David O'Shea, William Shen, Alex Iacob, Lorenzo Sani, Xinchi Qiu, Nicola Cancedda, Nicholas D. Lane
scE$^2$TM: Toward Interpretable Single-Cell Embedding via Topic Modeling Authors: Hegang Chen, Yuyin Lu, Zhiming Dai, Fu Lee Wang, Qing Li, Yanghui Rao
ConsNoTrainLoRA: Data-driven Weight Initialization of Low-rank Adapters using Constraints Authors: Debasmit Das, Hyoungwoo Park, Munawar Hayat, Seokeon Choi, Sungrack Yun, Fatih Porikli
Filter Equivariant Functions: A symmetric account of length-general extrapolation on lists Authors: Owen Lewis, Neil Ghani, Andrew Dudzik, Christos Perivolaropoulos, Razvan Pascanu, Petar Veli\v{c}kovi\'c
Space filling positionality and the Spiroformer Authors: M. Maurin, M. \'A. Evangelista-Alvarado, P. Su\'arez-Serrato

1. KELPS: A Framework for Verified Multi-Language Autoformalization via Semantic-Syntactic Alignment

ArXiv ID: 2507.08665

Authors: Jiyao Zhang, Chengli Zhong, Hui Xu, Qige Li, Yi Zhou

Abstract: Modern large language models (LLMs) show promising progress in formalizing informal mathematics into machine-verifiable theorems. However, these methods still face bottlenecks due to the limited quantity and quality of multilingual parallel corpora. In this paper, we propose a novel neuro-symbolic framework KELPS (Knowledge-Equation based Logical Processing System) to address these problems. KELPS is an iterative framework for translating, synthesizing, and filtering informal data into multiple formal languages (Lean, Coq, and Isabelle). First, we translate natural language into Knowledge Equations (KEs), a novel language that we designed, theoretically grounded in assertional logic. Next, we convert them to target languages through rigorously defined rules that preserve both syntactic structure and semantic meaning. This process yielded a parallel corpus of over 60,000 problems. Our framework achieves 88.9% syntactic accuracy (pass@1) on MiniF2F, outperforming SOTA models such as Deepseek-V3 (81%) and Herald (81.3%) across multiple datasets. All datasets and codes are available in the supplementary materials.

Comment: The paper introduces a framework for autoformalization in multiple languages, which involves foundational research in LLMs and theoretical insights.

Relevance: 9 Novelty: 8

2. From Language to Logic: A Bi-Level Framework for Structured Reasoning

ArXiv ID: 2507.08501

Authors: Keying Yang, Hao Wang, Kai Yang

Abstract: Structured reasoning over natural language inputs remains a core challenge in artificial intelligence, as it requires bridging the gap between unstructured linguistic expressions and formal logical representations. In this paper, we propose a novel \textbf{bi-level framework} that maps language to logic through a two-stage process: high-level task abstraction and low-level logic generation. At the upper level, a large language model (LLM) parses natural language queries into intermediate structured representations specifying the problem type, objectives, decision variables, and symbolic constraints. At the lower level, the LLM uses these representations to generate symbolic workflows or executable reasoning programs for accurate and interpretable decision making. The framework supports modular reasoning, enforces explicit constraints, and generalizes across domains such as mathematical problem solving, question answering, and logical inference. We further optimize the framework with an end-to-end {bi-level} optimization approach that jointly refines both the high-level abstraction and low-level logic generation stages. Experiments on multiple realistic reasoning benchmarks demonstrate that our approach significantly outperforms existing baselines in accuracy, with accuracy gains reaching as high as 40\%. Moreover, the bi-level design enhances transparency and error traceability, offering a promising step toward trustworthy and systematic reasoning with LLMs.

Comment: The paper introduces a novel bi-level framework for structured reasoning, which aligns with foundational research in representation learning and LLM behavior.

Relevance: 9 Novelty: 8

3. Compress Any Segment Anything Model (SAM)

ArXiv ID: 2507.08765

Authors: Juntong Fan, Zhiwei Hao, Jianqiang Shen, Shang-Ling Jui, Yi Zhang, Jing-Xiao Liao, Feng-Lei Fan

Abstract: Due to the excellent performance in yielding high-quality, zero-shot segmentation, Segment Anything Model (SAM) and its variants have been widely applied in diverse scenarios such as healthcare and intelligent manufacturing. Therefore, effectively compressing SAMs has become an increasingly pressing practical need. In this study, we propose Birkhoff, a novel data-free compression algorithm for SAM and its variants. Unlike quantization, pruning, distillation, and other compression methods, Birkhoff embodies versatility across model types, agility in deployment, faithfulness to the original model, and compactness in model size. Specifically, Birkhoff introduces a novel compression algorithm: Hyper-Compression, whose core principle is to find a dense trajectory to turn a high-dimensional parameter vector into a low-dimensional scalar. Furthermore, Birkhoff designs a dedicated linear layer operator, HyperLinear, to fuse decompression and matrix multiplication to significantly accelerate inference of the compressed SAMs. Extensive experiments on 18 SAMs in the COCO, LVIS, and SA-1B datasets show that Birkhoff performs consistently and competitively in compression time, compression ratio, post-compression performance, and inference speed. For example, Birkhoff can achieve a compression ratio of 5.17x on SAM2-B, with less than 1% performance drop without using any fine-tuning data. Moreover, the compression is finished within 60 seconds for all models.

Comment: The paper proposes a novel compression algorithm for SAM models, which is relevant to model compression through a new hyper-compression method.

Relevance: 9 Novelty: 8

4. Greedy Low-Rank Gradient Compression for Distributed Learning with Convergence Guarantees

ArXiv ID: 2507.08784

Authors: Chuyan Chen, Yutong He, Pengrui Li, Weichen Jia, Kun Yuan

Abstract: Distributed optimization is pivotal for large-scale signal processing and machine learning, yet communication overhead remains a major bottleneck. Low-rank gradient compression, in which the transmitted gradients are approximated by low-rank matrices to reduce communication, offers a promising remedy. Existing methods typically adopt either randomized or greedy compression strategies: randomized approaches project gradients onto randomly chosen subspaces, introducing high variance and degrading empirical performance; greedy methods select the most informative subspaces, achieving strong empirical results but lacking convergence guarantees. To address this gap, we propose GreedyLore--the first Greedy Low-Rank gradient compression algorithm for distributed learning with rigorous convergence guarantees. GreedyLore incorporates error feedback to correct the bias introduced by greedy compression and introduces a semi-lazy subspace update that ensures the compression operator remains contractive throughout all iterations. With these techniques, we prove that GreedyLore achieves a convergence rate of $\mathcal{O}(\sigma/\sqrt{NT} + 1/T)$ under standard optimizers such as MSGD and Adam--marking the first linear speedup convergence rate for low-rank gradient compression. Extensive experiments are conducted to validate our theoretical findings.

Comment: The paper focuses on low-rank gradient compression, which is relevant to model compression and efficiency breakthroughs.

Relevance: 9 Novelty: 8

5. Low-rank Momentum Factorization for Memory Efficient Training

ArXiv ID: 2507.08091

Authors: Pouria Mahdavinia, Mehrdad Mahdavi

Abstract: Fine-tuning large foundation models presents significant memory challenges due to stateful optimizers like AdamW, often requiring several times more GPU memory than inference. While memory-efficient methods like parameter-efficient fine-tuning (e.g., LoRA) and optimizer state compression exist, recent approaches like GaLore bridge these by using low-rank gradient projections and subspace moment accumulation. However, such methods may struggle with fixed subspaces or computationally costly offline resampling (e.g., requiring full-matrix SVDs). We propose Momentum Factorized SGD (MoFaSGD), which maintains a dynamically updated low-rank SVD representation of the first-order momentum, closely approximating its full-rank counterpart throughout training. This factorization enables a memory-efficient fine-tuning method that adaptively updates the optimization subspace at each iteration. Crucially, MoFaSGD leverages the computed low-rank momentum factors to perform efficient spectrally normalized updates, offering an alternative to subspace moment accumulation. We establish theoretical convergence guarantees for MoFaSGD, proving it achieves an optimal rate for non-convex stochastic optimization under standard assumptions. Empirically, we demonstrate MoFaSGD's effectiveness on large language model alignment benchmarks, achieving a competitive trade-off between memory reduction (comparable to LoRA) and performance compared to state-of-the-art low-rank optimization methods. Our implementation is available at https://github.com/pmahdavi/MoFaSGD.

Comment: The paper proposes a low-rank momentum factorization method for memory-efficient training, relevant to model compression and efficiency.

Relevance: 9 Novelty: 8

6. The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?

ArXiv ID: 2507.08802

Authors: Denis Sutter, Julian Minder, Thomas Hofmann, Tiago Pimentel

Abstract: The concept of causal abstraction got recently popularised to demystify the opaque decision-making processes of machine learning models; in short, a neural network can be abstracted as a higher-level algorithm if there exists a function which allows us to map between them. Notably, most interpretability papers implement these maps as linear functions, motivated by the linear representation hypothesis: the idea that features are encoded linearly in a model's representations. However, this linearity constraint is not required by the definition of causal abstraction. In this work, we critically examine the concept of causal abstraction by considering arbitrarily powerful alignment maps. In particular, we prove that under reasonable assumptions, any neural network can be mapped to any algorithm, rendering this unrestricted notion of causal abstraction trivial and uninformative. We complement these theoretical findings with empirical evidence, demonstrating that it is possible to perfectly map models to algorithms even when these models are incapable of solving the actual task; e.g., on an experiment using randomly initialised language models, our alignment maps reach 100% interchange-intervention accuracy on the indirect object identification task. This raises the non-linear representation dilemma: if we lift the linearity constraint imposed to alignment maps in causal abstraction analyses, we are left with no principled way to balance the inherent trade-off between these maps' complexity and accuracy. Together, these results suggest an answer to our title's question: causal abstraction is not enough for mechanistic interpretability, as it becomes vacuous without assumptions about how models encode information. Studying the connection between this information-encoding assumption and causal abstraction should lead to exciting future work.

Comment: The paper discusses the limitations of causal abstraction in mechanistic interpretability, which is relevant to representation learning as it explores how information is encoded in neural networks.

Relevance: 9 Novelty: 8

7. BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity

ArXiv ID: 2507.08771

Authors: Chenyang Song, Weilin Zhao, Xu Han, Chaojun Xiao, Yingfa Chen, Yuxuan Li, Zhiyuan Liu, Maosong Sun

Abstract: To alleviate the computational burden of large language models (LLMs), architectures with activation sparsity, represented by mixture-of-experts (MoE), have attracted increasing attention. However, the non-differentiable and inflexible routing of vanilla MoE hurts model performance. Moreover, while each token activates only a few parameters, these sparsely-activated architectures exhibit low chunk-level sparsity, indicating that the union of multiple consecutive tokens activates a large ratio of parameters. Such a sparsity pattern is unfriendly for acceleration under low-resource conditions (e.g., end-side devices) and incompatible with mainstream acceleration techniques (e.g., speculative decoding). To address these challenges, we introduce a novel MoE architecture, BlockFFN, as well as its efficient training and deployment techniques. Specifically, we use a router integrating ReLU activation and RMSNorm for differentiable and flexible routing. Next, to promote both token-level sparsity (TLS) and chunk-level sparsity (CLS), CLS-aware training objectives are designed, making BlockFFN more acceleration-friendly. Finally, we implement efficient acceleration kernels, combining activation sparsity and speculative decoding for the first time. The experimental results demonstrate the superior performance of BlockFFN over other MoE baselines, achieving over 80% TLS and 70% 8-token CLS. Our kernels achieve up to 3.67$\times$ speedup on real end-side devices than dense models. All codes and checkpoints are available publicly (https://github.com/thunlp/BlockFFN).

Comment: The paper presents a novel MoE architecture, BlockFFN, focusing on activation sparsity and efficient training, which is relevant to model architecture and compression.

Relevance: 9 Novelty: 8

ArXiv ID: 2507.08045

Authors: Junyi Wen, Junyuan Liang, Zicong Hong, Wuhui Chen, Zibin Zheng

Abstract: Efficient state restoration in multi-turn conversations with large language models (LLMs) remains a critical challenge, primarily due to the overhead of recomputing or loading full key-value (KV) caches for all historical tokens. To address this, existing approaches compress KV caches across adjacent layers with highly similar attention patterns. However, these methods often apply a fixed compression scheme across all conversations, selecting the same layer pairs for compression without considering conversation-specific attention dynamics. This static strategy overlooks variability in attention pattern similarity across different conversations, which can lead to noticeable accuracy degradation. We present Krul, a multi-turn LLM inference system that enables accurate and efficient KV cache restoration. Krul dynamically selects compression strategies based on attention similarity across layer pairs and uses a recomputation-loading pipeline to restore the KV cache. It introduces three key innovations: 1) a preemptive compression strategy selector to preserve critical context for future conversation turns and selects a customized strategy for the conversation; 2) a token-wise heterogeneous attention similarity estimator to mitigate the attention similarity computation and storage overhead during model generation; 3) a bubble-free restoration scheduler to reduce potential bubbles brought by the imbalance of recomputing and loading stream due to compressed KV caches. Empirical evaluations on real-world tasks demonstrate that Krul achieves a 1.5x-2.68x reduction in time-to-first-token (TTFT) and a 1.33x-2.35x reduction in KV cache storage compared to state-of-the-art methods without compromising generation quality.

Comment: The paper discusses efficient state restoration in LLMs using dynamic KV cache compression, which aligns with model compression and efficiency breakthroughs.

Relevance: 9 Novelty: 7

9. Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores

ArXiv ID: 2507.08143

Authors: Vivek Chari, Benjamin Van Durme

Abstract: Modern Large Language Models (LLMs) are increasingly trained to support very large context windows. Unfortunately the ability to use long contexts in generation is complicated by the large memory requirement of the KV cache, which scales linearly with the context length. This memory footprint is often the dominant resource bottleneck in real-world deployments, limiting throughput and increasing serving cost. One way to address this is by compressing the KV cache, which can be done either with knowledge of the question being asked (query-aware) or without knowledge of the query (query-agnostic). We present Compactor, a parameter-free, query-agnostic KV compression strategy that uses approximate leverage scores to determine token importance. We show that Compactor can achieve the same performance as competing methods while retaining 1/2 the tokens in both synthetic and real-world context tasks, with minimal computational overhead. We further introduce a procedure for context-calibrated compression, which allows one to infer the maximum compression ratio a given context can support. Using context-calibrated compression, we show that Compactor achieves full KV performance on Longbench while reducing the KV memory burden by 63%, on average. To demonstrate the efficacy and generalizability of our approach, we apply Compactor to 27 synthetic and real-world tasks from RULER and Longbench, with models from both the Qwen 2.5 and Llama 3.1 families.

Comment: The paper presents Compactor, a KV cache compression strategy, which is relevant to model compression, particularly in LLMs.

Relevance: 9 Novelty: 7

10. A document is worth a structured record: Principled inductive bias design for document recognition

ArXiv ID: 2507.08458

Authors: Benjamin Meyer, Lukas Tuggener, Sascha H\"anzi, Daniel Schmid, Erdal Ayfer, Benjamin F. Grewe, Ahmed Abdulkadir, Thilo Stadelmann

Abstract: Many document types use intrinsic, convention-driven structures that serve to encode precise and structured information, such as the conventions governing engineering drawings. However, state-of-the-art approaches treat document recognition as a mere computer vision problem, neglecting these underlying document-type-specific structural properties, making them dependent on sub-optimal heuristic post-processing and rendering many less frequent or more complicated document types inaccessible to modern document recognition. We suggest a novel perspective that frames document recognition as a transcription task from a document to a record. This implies a natural grouping of documents based on the intrinsic structure inherent in their transcription, where related document types can be treated (and learned) similarly. We propose a method to design structure-specific inductive biases for the underlying machine-learned end-to-end document recognition systems, and a respective base transformer architecture that we successfully adapt to different structures. We demonstrate the effectiveness of the so-found inductive biases in extensive experiments with progressively complex record structures from monophonic sheet music, shape drawings, and simplified engineering drawings. By integrating an inductive bias for unrestricted graph structures, we train the first-ever successful end-to-end model to transcribe engineering drawings to their inherently interlinked information. Our approach is relevant to inform the design of document recognition systems for document types that are less well understood than standard OCR, OMR, etc., and serves as a guide to unify the design of future document foundation models.

Comment: The paper proposes a novel perspective on document recognition with a base transformer architecture, aligning with model architecture innovations.

Relevance: 8 Novelty: 8

11. Ranked Set Sampling-Based Multilayer Perceptron: Improving Generalization via Variance-Based Bounds

ArXiv ID: 2507.08465

Authors: Feijiang Li, Liuya Zhang, Jieting Wang, Tao Yan, Yuhua Qian

Abstract: Multilayer perceptron (MLP), one of the most fundamental neural networks, is extensively utilized for classification and regression tasks. In this paper, we establish a new generalization error bound, which reveals how the variance of empirical loss influences the generalization ability of the learning model. Inspired by this learning bound, we advocate to reduce the variance of empirical loss to enhance the ability of MLP. As is well-known, bagging is a popular ensemble method to realize variance reduction. However, bagging produces the base training data sets by the Simple Random Sampling (SRS) method, which exhibits a high degree of randomness. To handle this issue, we introduce an ordered structure in the training data set by Rank Set Sampling (RSS) to further reduce the variance of loss and develop a RSS-MLP method. Theoretical results show that the variance of empirical exponential loss and the logistic loss estimated by RSS are smaller than those estimated by SRS, respectively. To validate the performance of RSS-MLP, we conduct comparison experiments on twelve benchmark data sets in terms of the two convex loss functions under two fusion methods. Extensive experimental results and analysis illustrate the effectiveness and rationality of the propose method.

Comment: The paper introduces a new method for improving generalization in MLPs through variance reduction, which relates to representation learning and training dynamics.