Personalized Daily ArXiv Papers 2025-09-22

[gpt-5]	Prompt	Completion	Total
Token	46024	49441	95465
Cost	$0.06	$0.49	$0.55

Total arXiv papers: 520

Total scanned papers: 289

Total relevant papers: 25

Table of contents with paper titles:

IMPQ: Interaction-Aware Layerwise Mixed Precision Quantization for LLMs Authors: Junchen Zhao, Ali Derakhshan, Dushyant Bharadwaj, Jayden Kana Hyman, Junhao Dong, Sangeetha Abdu Jyothi, Ian Harris
Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems Authors: Saeed Amizadeh, Sara Abdali, Yinheng Li, Kazuhito Koishida
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer Authors: Yanghao Li, Rui Qian, Bowen Pan, Haotian Zhang, Haoshuo Huang, Bowen Zhang, Jialing Tong, Haoxuan You, Xianzhi Du, Zhe Gan, Hyunjik Kim, Chao Jia, Zhenbang Wang, Yinfei Yang, Mingfei Gao, Zi-Yi Dou, Wenze Hu, Chang Gao, Dongxu Li, Philipp Dufter, Zirui Wang, Guoli Yin, Zhengdong Zhang, Chen Chen, Yang Zhao, Ruoming Pang, Zhifeng Chen
RMT-KD: Random Matrix Theoretic Causal Knowledge Distillation Authors: Davide Ettori, Nastaran Darabi, Sureshkumar Senthilkumar, Amit Ranjan Trivedi
Localmax dynamics for attention in transformers and its asymptotic behavior Authors: Henri Cimeti`ere, Maria Teresa Chiri, Bahman Gharesifard
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification Authors: Zinan Lin, Enshu Liu, Xuefei Ning, Junyi Zhu, Wenyu Wang, Sergey Yekhanin
Distribution-Aligned Decoding for Efficient LLM Task Adaptation Authors: Senkang Hu, Xudong Han, Jinqi Jiang, Yihang Tao, Zihan Fang, Sam Tak Wu Kwong, Yuguang Fang
Emulating Human-like Adaptive Vision for Efficient and Flexible Machine Visual Perception Authors: Yulin Wang, Yang Yue, Yang Yue, Huanqian Wang, Haojun Jiang, Yizeng Han, Zanlin Ni, Yifan Pu, Minglei Shi, Rui Lu, Qisen Yang, Andrew Zhao, Zhuofan Xia, Shiji Song, Gao Huang
Sparse-Autoencoder-Guided Internal Representation Unlearning for Large Language Models Authors: Tomoya Yamashita, Akira Ito, Yuuki Yamanaka, Masanori Yamada, Takayuki Miura, Toshiki Shibahara
Region-Aware Deformable Convolutions Authors: Abolfazl Saheban Maleki, Maryam Imani
Nonconvex Decentralized Stochastic Bilevel Optimization under Heavy-Tailed Noises Authors: Xinwen Zhang, Yihan Zhang, Hongchang Gao
Universal Learning of Stochastic Dynamics for Exact Belief Propagation using Bernstein Normalizing Flows Authors: Peter Amorese, Morteza Lahijanian
Synthetic bootstrapped pretraining Authors: Zitong Yang, Aonan Zhang, Hong Liu, Tatsunori Hashimoto, Emmanuel Cand`es, Chong Wang, Ruoming Pang
The Multi-Query Paradox in Zeroth-Order Optimization Authors: Wei Lin, Qingyu Song, Hong Xu
On the Convergence of Muon and Beyond Authors: Da Chang, Yongxiang Liu, Ganzhao Yuan
BEFT: Bias-Efficient Fine-Tuning of Language Models Authors: Baichuan Huang, Ananth Balashankar, Amir Aminifar
Detail Across Scales: Multi-Scale Enhancement for Full Spectrum Neural Representations Authors: Yuan Ni, Zhantao Chen, Cheng Peng, Rajan Plumley, Chun Hong Yoon, Jana B. Thayer, Joshua J. Turner
MTS-DMAE: Dual-Masked Autoencoder for Unsupervised Multivariate Time Series Representation Learning Authors: Yi Xu, Yitian Zhang, Yun Fu
Stochastic Sample Approximations of (Local) Moduli of Continuity Authors: Rodion Nazarov, Allen Gehret, Robert Shorten, Jakub Marecek
Computing Linear Regions in Neural Networks with Skip Connections Authors: Johnny Joyce, Jan Verschelde
Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers Authors: Krati Saxena, Federico Jurado Ruiz, Guido Manzi, Dianbo Liu, Alex Lamb
Global Pre-fixing, Local Adjusting: A Simple yet Effective Contrastive Strategy for Continual Learning Authors: Jia Tang, Xinrui Wang, Songcan Chen
SABER: Uncovering Vulnerabilities in Safety Alignment via Cross-Layer Residual Connection Authors: Maithili Joshi, Palash Nandi, Tanmoy Chakraborty
Toward Efficient Influence Function: Dropout as a Compression Tool Authors: Yuchen Zhang, Mohammad Mohammadi Amiri
On Optimal Steering to Achieve Exact Fairness Authors: Mohit Sharma, Amit Jayant Deshpande, Chiranjib Bhattacharyya, Rajiv Ratn Shah

1. IMPQ: Interaction-Aware Layerwise Mixed Precision Quantization for LLMs

ArXiv ID: 2509.15455

Authors: Junchen Zhao, Ali Derakhshan, Dushyant Bharadwaj, Jayden Kana Hyman, Junhao Dong, Sangeetha Abdu Jyothi, Ian Harris

Abstract: Large Language Models (LLMs) promise impressive capabilities, yet their multi-billion-parameter scale makes on-device or low-resource deployment prohibitive. Mixed-precision quantization offers a compelling solution, but existing methods struggle when the average precision drops below four bits, as they rely on isolated, layer-specific metrics that overlook critical inter-layer interactions affecting overall performance. In this paper, we propose two innovations to address these limitations. First, we frame the mixed-precision quantization problem as a cooperative game among layers and introduce Shapley-based Progressive Quantization Estimation (SPQE) to efficiently obtain accurate Shapley estimates of layer sensitivities and inter-layer interactions. Second, building upon SPQE, we propose Interaction-aware Mixed-Precision Quantization (IMPQ) which translates these Shapley estimates into a binary quadratic optimization formulation, assigning either 2 or 4-bit precision to layers under strict memory constraints. Comprehensive experiments conducted on Llama-3, Gemma-2, and Qwen-3 models across three independent PTQ backends (Quanto, HQQ, GPTQ) demonstrate IMPQ's scalability and consistently superior performance compared to methods relying solely on isolated metrics. Across average precisions spanning 4 bit down to 2 bit, IMPQ cuts Perplexity by 20 to 80 percent relative to the best baseline, with the margin growing as the bit-width tightens.

Comment: Model Compression and Efficiency: interaction-aware mixed-precision quantization using Shapley-based layer sensitivity/interactions and binary quadratic optimization for 2/4-bit LLMs.

Relevance: 10 Novelty: 8

2. Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems

ArXiv ID: 2509.15448

Authors: Saeed Amizadeh, Sara Abdali, Yinheng Li, Kazuhito Koishida

Abstract: Transformers and their attention mechanism have been revolutionary in the field of Machine Learning. While originally proposed for the language data, they quickly found their way to the image, video, graph, etc. data modalities with various signal geometries. Despite this versatility, generalizing the attention mechanism to scenarios where data is presented at different scales from potentially different modalities is not straightforward. The attempts to incorporate hierarchy and multi-modality within transformers are largely based on ad hoc heuristics, which are not seamlessly generalizable to similar problems with potentially different structures. To address this problem, in this paper, we take a fundamentally different approach: we first propose a mathematical construct to represent multi-modal, multi-scale data. We then mathematically derive the neural attention mechanics for the proposed construct from the first principle of entropy minimization. We show that the derived formulation is optimal in the sense of being the closest to the standard Softmax attention while incorporating the inductive biases originating from the hierarchical/geometric information of the problem. We further propose an efficient algorithm based on dynamic programming to compute our derived attention mechanism. By incorporating it within transformers, we show that the proposed hierarchical attention mechanism not only can be employed to train transformer models in hierarchical/multi-modal settings from scratch, but it can also be used to inject hierarchical information into classical, pre-trained transformer models post training, resulting in more efficient models in zero-shot manner.

Comment: Model Architecture: derives a hierarchical self-attention mechanism from first principles with a dynamic-programming algorithm, enabling multi-scale transformers and post-hoc hierarchical injection.

Relevance: 10 Novelty: 8

3. MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

ArXiv ID: 2509.16197

Authors: Yanghao Li, Rui Qian, Bowen Pan, Haotian Zhang, Haoshuo Huang, Bowen Zhang, Jialing Tong, Haoxuan You, Xianzhi Du, Zhe Gan, Hyunjik Kim, Chao Jia, Zhenbang Wang, Yinfei Yang, Mingfei Gao, Zi-Yi Dou, Wenze Hu, Chang Gao, Dongxu Li, Philipp Dufter, Zirui Wang, Guoli Yin, Zhengdong Zhang, Chen Chen, Yang Zhao, Ruoming Pang, Zhifeng Chen

Abstract: Unified multimodal Large Language Models (LLMs) that can both understand and generate visual content hold immense potential. However, existing open-source models often suffer from a performance trade-off between these capabilities. We present Manzano, a simple and scalable unified framework that substantially reduces this tension by coupling a hybrid image tokenizer with a well-curated training recipe. A single shared vision encoder feeds two lightweight adapters that produce continuous embeddings for image-to-text understanding and discrete tokens for text-to-image generation within a common semantic space. A unified autoregressive LLM predicts high-level semantics in the form of text and image tokens, with an auxiliary diffusion decoder subsequently translating the image tokens into pixels. The architecture, together with a unified training recipe over understanding and generation data, enables scalable joint learning of both capabilities. Manzano achieves state-of-the-art results among unified models, and is competitive with specialist models, particularly on text-rich evaluation. Our studies show minimal task conflicts and consistent gains from scaling model size, validating our design choice of a hybrid tokenizer.

Comment: Model Architecture: unified multimodal LLM with a hybrid vision tokenizer and dual adapters enabling joint image understanding and generation.