Personalized Daily Arxiv Papers 03/20/2025

[gpt-4o]	Prompt	Completion	Total
Token	33003	4587	37590
Cost	$0.08	$0.05	$0.13

Total arXiv papers: 413

Total scanned papers: 236

Total relevant papers: 17

Table of contents with paper titles:

Borsuk-Ulam and Replicable Learning of Large-Margin Halfspaces Authors: Ari Blondal, Hamed Hatami, Pooya Hatami, Chavdar Lalov, Sivan Tretiak
Exploring the Limits of KV Cache Compression in Visual Autoregressive Transformers Authors: Bo Chen, Xiaoyu Li, Yekun Ke, Yingyu Liang, Zhenmei Shi, Zhao Song
Machine Unlearning in Hyperbolic vs. Euclidean Multimodal Contrastive Learning: Adapting Alignment Calibration to MERU Authors: `Alex Pujol Vidal, Sergio Escalera, Kamal Nasrollahi, Thomas B. Moeslund
Conjuring Positive Pairs for Efficient Unification of Representation Learning and Image Synthesis Authors: Imanol G. Estepa, Jes\'us M. Rodr\'iguez-de-Vera, Ignacio Saras\'ua, Bhalaji Nagarajan, Petia Radeva
Robust Weight Imprinting: Insights from Neural Collapse and Proxy-Based Aggregation Authors: Justus Westerhoff, Golzar Atefi, Mario Koddenbrock, Alexei Figueroa, Alexander L\"oser, Erik Rodner, Felix A. Gers
LIFT: Latent Implicit Functions for Task- and Data-Agnostic Encoding Authors: Amirhossein Kazerouni, Soroush Mehraban, Michael Brudno, Babak Taati
Robustness of Nonlinear Representation Learning Authors: Simon Buchholz, Bernhard Sch\"olkopf
Reasoning Effort and Problem Complexity: A Scaling Analysis in LLMs Authors: Benjamin Estermann, Roger Wattenhofer
Natural Quantization of Neural Networks Authors: Richard Barney, Djamil Lakhdar-Hamina, Victor Galitski
Unique Hard Attention: A Tale of Two Sides Authors: Selim Jerad, Anej Svete, Jiaoda Li, Ryan Cotterell
Efficient Personalization of Quantized Diffusion Model without Backpropagation Authors: Hoigi Seo, Wongi Jeong, Kyungryeol Lee, Se Young Chun
Long Context Modeling with Ranked Memory-Augmented Retrieval Authors: Ghadir Alselwi, Hao Xue, Shoaib Jameel, Basem Suleiman, Flora D. Salim, Imran Razzak
Squeeze Out Tokens from Sample for Finer-Grained Data Governance Authors: Weixiong Lin, Chen Ju, Haicheng Wang, Shengchao Hu, Shuai Xiao, Mengting Chen, Yuheng Jiao, Mingshuai Yao, Jinsong Lan, Qingwen Liu, Ying Chen
Dynamic Accumulated Attention Map for Interpreting Evolution of Decision-Making in Vision Transformer Authors: Yi Liao, Yongsheng Gao, Weichuan Zhang
Foundation models may exhibit staged progression in novel CBRN threat disclosure Authors: Kevin M Esvelt
Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence Authors: Sophia Hager, David Mueller, Kevin Duh, Nicholas Andrews
SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders Authors: Qing Li, Jiahui Geng, Derui Zhu, Fengyu Cai, Chenyang Lyu, Fakhri Karray

1. Borsuk-Ulam and Replicable Learning of Large-Margin Halfspaces

ArXiv ID: 2503.15294

Authors: Ari Blondal, Hamed Hatami, Pooya Hatami, Chavdar Lalov, Sivan Tretiak

Abstract: Recent advances in learning theory have established that, for total concepts, list replicability, global stability, differentially private (DP) learnability, and shared-randomness replicability coincide precisely with the finiteness of the Littlestone dimension. Does the same hold for partial concept classes? We answer this question by studying the large-margin half-spaces class, which has bounded Littlestone dimension and is purely DP-learnable and shared-randomness replicable even in high dimensions. We prove that the list replicability number of $\gamma$-margin half-spaces satisfies [ \frac{d}{2} + 1 \le \mathrm{LR}(H_{\gamma}^d) \le d, ] which increases with the dimension $d$. This reveals a surprising separation for partial concepts: list replicability and global stability do not follow from bounded Littlestone dimension, DP-learnability, or shared-randomness replicability. By applying our main theorem, we also answer the following open problems. - We prove that any disambiguation of an infinite-dimensional large-margin half-space to a total concept class has unbounded Littlestone dimension, answering an open question of Alon et al. (FOCS '21). - We prove that the maximum list-replicability number of any finite set of points and homogeneous half-spaces in $d$-dimensional Euclidean space is $d$, resolving a problem of Chase et al. (FOCS '23). - We prove that any disambiguation of the Gap Hamming Distance problem in the large gap regime has unbounded public-coin randomized communication complexity. This answers an open problem of Fang et al. (STOC '25). We prove the lower bound via a topological argument involving the local Borsuk-Ulam theorem of Chase et al. (STOC '24). For the upper bound, we design a learning rule that relies on certain triangulations of the cross-polytope and recent results on the generalization properties of SVM.

Comment: The paper provides theoretical insights into learning large-margin halfspaces and addresses open problems in learning theory, making it highly relevant to foundational research in representation learning.

Relevance: 10 Novelty: 9

2. Exploring the Limits of KV Cache Compression in Visual Autoregressive Transformers

ArXiv ID: 2503.14881

Authors: Bo Chen, Xiaoyu Li, Yekun Ke, Yingyu Liang, Zhenmei Shi, Zhao Song

Abstract: A fundamental challenge in Visual Autoregressive models is the substantial memory overhead required during inference to store previously generated representations. Despite various attempts to mitigate this issue through compression techniques, prior works have not explicitly formalized the problem of KV-cache compression in this context. In this work, we take the first step in formally defining the KV-cache compression problem for Visual Autoregressive transformers. We then establish a fundamental negative result, proving that any mechanism for sequential visual token generation under attention-based architectures must use at least $\Omega(n^2 d)$ memory, when $d = \Omega(\log n)$, where $n$ is the number of tokens generated and $d$ is the embedding dimensionality. This result demonstrates that achieving truly sub-quadratic memory usage is impossible without additional structural constraints. Our proof is constructed via a reduction from a computational lower bound problem, leveraging randomized embedding techniques inspired by dimensionality reduction principles. Finally, we discuss how sparsity priors on visual representations can influence memory efficiency, presenting both impossibility results and potential directions for mitigating memory overhead.

Comment: The paper formalizes KV-cache compression in visual autoregressive transformers, directly addressing the 'Model Compression' criterion with theoretical insights into memory efficiency.

Relevance: 10 Novelty: 9

3. Machine Unlearning in Hyperbolic vs. Euclidean Multimodal Contrastive Learning: Adapting Alignment Calibration to MERU

ArXiv ID: 2503.15166

Authors: `Alex Pujol Vidal, Sergio Escalera, Kamal Nasrollahi, Thomas B. Moeslund

Abstract: Machine unlearning methods have become increasingly important for selective concept removal in large pre-trained models. While recent work has explored unlearning in Euclidean contrastive vision-language models, the effectiveness of concept removal in hyperbolic spaces remains unexplored. This paper investigates machine unlearning in hyperbolic contrastive learning by adapting Alignment Calibration to MERU, a model that embeds images and text in hyperbolic space to better capture semantic hierarchies. Through systematic experiments and ablation studies, we demonstrate that hyperbolic geometry offers distinct advantages for concept removal, achieving near perfect forgetting with reasonable performance on retained concepts, particularly when scaling to multiple concept removal. Our approach introduces hyperbolic-specific components including entailment calibration and norm regularization that leverage the unique properties of hyperbolic space. Comparative analysis with Euclidean models reveals fundamental differences in unlearning dynamics, with hyperbolic unlearning reorganizing the semantic hierarchy while Euclidean approaches merely disconnect cross-modal associations. These findings not only advance machine unlearning techniques but also provide insights into the geometric properties that influence concept representation and removal in multimodal models. Source code available at https://github.com/alex-pv01/HAC

Comment: The paper explores machine unlearning in hyperbolic contrastive learning, which aligns with representation learning and provides insights into geometric properties influencing concept representation. The focus on hyperbolic-specific components and unlearning dynamics adds theoretical depth.