Personalized Daily Arxiv Papers 4/02/2025

[gpt-4o]	Prompt	Completion	Total
Token	29076	3861	32937
Cost	$0.07	$0.04	$0.11

Total arXiv papers: 468

Total scanned papers: 277

Total relevant papers: 15

Table of contents with paper titles:

DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism Authors: Dengchun Li, Naizheng Wang, Zihao Zhang, Haoyang Yin, Lei Duan, Meng Xiao, Mingjie Tang
Minimum Description Length of a Spectrum Variational Autoencoder: A Theory Authors: Canlin Zhang, Xiuwen Liu
Deep Generative Models: Complexity, Dimensionality, and Approximation Authors: Kevin Wang, Hongqian Niu, Yixin Wang, Didong Li
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Authors: Siyuan Li, Luyuan Zhang, Zedong Wang, Juanxi Tian, Cheng Tan, Zicheng Liu, Chang Yu, Qingsong Xie, Haonan Lu, Haoqian Wang, Zhen Lei
Spectral Architecture Search for Neural Networks Authors: Gianluca Peri, Lorenzo Giambagli, Lorenzo Chicchi, Duccio Fanelli
ObscuraCoder: Powering Efficient Code LM Pre-Training Via Obfuscation Grounding Authors: Indraneil Paul, Haoyi Yang, Goran Glava\v{s}, Kristian Kersting, Iryna Gurevych
Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations Authors: Chongjie Si, Zhiyi Shi, Xuehui Wang, Yichen Xiao, Xiaokang Yang, Wei Shen
Logical perspectives on learning statistical objects Authors: Aaron Anderson, Michael Benedikt
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning Authors: Huandong Chang, Zicheng Ma, Mingyuan Ma, Zhenting Qi, Andrew Sabot, Hong Jiang, H. T. Kung
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning Authors: Nishad Singhi, Hritik Bansal, Arian Hosseini, Aditya Grover, Kai-Wei Chang, Marcus Rohrbach, Anna Rohrbach
Less is More: Efficient Black-box Attribution via Minimal Interpretable Subset Selection Authors: Ruoyu Chen, Siyuan Liang, Jingzhi Li, Shiming Liu, Li Liu, Hua Zhang, Xiaochun Cao
NeuraLUT-Assemble: Hardware-aware Assembling of Sub-Neural Networks for Efficient LUT Inference Authors: Marta Andronic, George A. Constantinides
Self-Evolving Visual Concept Library using Vision-Language Critics Authors: Atharva Sehgal, Patrick Yuan, Ziniu Hu, Yisong Yue, Jennifer J. Sun, Swarat Chaudhuri
Geometric Median Matching for Robust k-Subset Selection from Noisy Data Authors: Anish Acharya, Sujay Sanghavi, Alexandros G Dimakis, Inderjit S Dhillon
Hawkeye:Efficient Reasoning with Model Collaboration Authors: Jianshu She, Zhuohao Li, Zhemin Huang, Qi Li, Peiran Xu, Haonan Li, Qirong Ho

1. DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism

ArXiv ID: 2504.00661

Authors: Dengchun Li, Naizheng Wang, Zihao Zhang, Haoyang Yin, Lei Duan, Meng Xiao, Mingjie Tang

Abstract: Instruction-based fine-tuning of large language models (LLMs) has achieved remarkable success in various natural language processing (NLP) tasks. Parameter-efficient fine-tuning (PEFT) methods, such as Mixture of LoRA Experts (MoLE), combine the efficiency of Low-Rank Adaptation (LoRA) with the versatility of Mixture of Experts (MoE) models, demonstrating significant potential for handling multiple downstream tasks. However, the existing routing mechanisms for MoLE often involve a trade-off between computational efficiency and predictive accuracy, and they fail to fully address the diverse expert selection demands across different transformer layers. In this work, we propose DynMoLE, a hybrid routing strategy that dynamically adjusts expert selection based on the Tsallis entropy of the router's probability distribution. This approach mitigates router uncertainty, enhances stability, and promotes more equitable expert participation, leading to faster convergence and improved model performance. Additionally, we introduce an auxiliary loss based on Tsallis entropy to further guide the model toward convergence with reduced uncertainty, thereby improving training stability and performance. Our extensive experiments on commonsense reasoning benchmarks demonstrate that DynMoLE achieves substantial performance improvements, outperforming LoRA by 9.6% and surpassing the state-of-the-art MoLE method, MoLA, by 2.3%. We also conduct a comprehensive ablation study to evaluate the contributions of DynMoLE's key components.

Comment: DynMoLE proposes a hybrid routing mechanism for MoE models, which directly aligns with foundational research in Mixture-of-Experts and efficiency improvements.

Relevance: 10 Novelty: 8

2. Minimum Description Length of a Spectrum Variational Autoencoder: A Theory

ArXiv ID: 2504.00395

Authors: Canlin Zhang, Xiuwen Liu

Abstract: Deep neural networks (DNNs) trained through end-to-end learning have achieved remarkable success across diverse machine learning tasks, yet they are not explicitly designed to adhere to the Minimum Description Length (MDL) principle, which posits that the best model provides the shortest description of the data. In this paper, we argue that MDL is essential to deep learning and propose a further generalized principle: Understanding is the use of a small amount of information to represent a large amount of information. To this end, we introduce a novel theoretical framework for designing and evaluating deep Variational Autoencoders (VAEs) based on MDL. In our theory, we designed the Spectrum VAE, a specific VAE architecture whose MDL can be rigorously evaluated under given conditions. Additionally, we introduce the concept of latent dimension combination, or pattern of spectrum, and provide the first theoretical analysis of their role in achieving MDL. We claim that a Spectrum VAE understands the data distribution in the most appropriate way when the MDL is achieved. This work is entirely theoretical and lays the foundation for future research on designing deep learning systems that explicitly adhere to information-theoretic principles.

Comment: The paper introduces a theoretical framework for Variational Autoencoders (VAEs) based on the Minimum Description Length (MDL) principle, which is highly relevant to representation learning and foundational research.

Relevance: 9 Novelty: 9

3. Deep Generative Models: Complexity, Dimensionality, and Approximation

ArXiv ID: 2504.00820

Authors: Kevin Wang, Hongqian Niu, Yixin Wang, Didong Li

Abstract: Generative networks have shown remarkable success in learning complex data distributions, particularly in generating high-dimensional data from lower-dimensional inputs. While this capability is well-documented empirically, its theoretical underpinning remains unclear. One common theoretical explanation appeals to the widely accepted manifold hypothesis, which suggests that many real-world datasets, such as images and signals, often possess intrinsic low-dimensional geometric structures. Under this manifold hypothesis, it is widely believed that to approximate a distribution on a $d$-dimensional Riemannian manifold, the latent dimension needs to be at least $d$ or $d+1$. In this work, we show that this requirement on the latent dimension is not necessary by demonstrating that generative networks can approximate distributions on $d$-dimensional Riemannian manifolds from inputs of any arbitrary dimension, even lower than $d$, taking inspiration from the concept of space-filling curves. This approach, in turn, leads to a super-exponential complexity bound of the deep neural networks through expanded neurons. Our findings thus challenge the conventional belief on the relationship between input dimensionality and the ability of generative networks to model data distributions. This novel insight not only corroborates the practical effectiveness of generative networks in handling complex data structures, but also underscores a critical trade-off between approximation error, dimensionality, and model complexity.

Comment: The paper provides a theoretical insight into generative networks and challenges the conventional belief about the relationship between input dimensionality and data distribution modeling. This aligns with foundational research in representation learning.