Personalized Daily ArXiv Papers 2025-05-01

[gpt-4o]	Prompt	Completion	Total
Token	23745	3079	26824
Cost	$0.06	$0.03	$0.09

Total arXiv papers: 389

Total scanned papers: 229

Total relevant papers: 11

Table of contents with paper titles:

TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts Authors: Pradip Kunwar, Minh N. Vu, Maanak Gupta, Mahmoud Abdelsalam, Manish Bhattarai
Recursive KL Divergence Optimization: A Dynamic Framework for Representation Learning Authors: Anthony D Martin
PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight Authors: Ben Goertzel, Paulos Yibelo
AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization Authors: Haotian Luo, Haiying He, Yibo Wang, Jinluan Yang, Rui Liu, Naiqiang Tan, Xiaochun Cao, Dacheng Tao, Li Shen
Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization Authors: Shuai Gong, Chaoran Cui, Xiaolin Dong, Xiushan Nie, Lei Zhu, Xiaojun Chang
Memorization and Knowledge Injection in Gated LLMs Authors: Xu Pan, Ely Hahami, Zechen Zhang, Haim Sompolinsky
Efficient LLMs with AMP: Attention Heads and MLP Pruning Authors: Leandro Giusti Mugnaini, Bruno Lopes Yamamoto, Lucas Lauton de Alcantara, Victor Zacarias, Edson Bollis, Lucas Pellicer, Anna Helena Reali Costa, Artur Jordao
Param$\Delta$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost Authors: Sheng Cao, Mingrui Wu, Karthik Prasad, Yuandong Tian, Zechun Liu
Deep Learning Optimization Using Self-Adaptive Weighted Auxiliary Variables Authors: Yaru Liu, Yiqi Gu, Michael K. Ng
Low-rank computation of the posterior mean in Multi-Output Gaussian Processes Authors: Sebastian Esche, Martin Stoll
NeuRel-Attack: Neuron Relearning for Safety Disalignment in Large Language Models Authors: Yi Zhou, Wenpeng Xing, Dezhang Kong, Changting Lin, Meng Han

1. TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts

ArXiv ID: 2504.21190

Authors: Pradip Kunwar, Minh N. Vu, Maanak Gupta, Mahmoud Abdelsalam, Manish Bhattarai

Abstract: We propose Tensor-Trained Low-Rank Adaptation Mixture of Experts (TT-LoRA MoE), a novel computational framework integrating Parameter-Efficient Fine-Tuning (PEFT) with sparse MoE routing to address scalability challenges in large model deployments. Unlike traditional MoE approaches, which face substantial computational overhead as expert counts grow, TT-LoRA MoE decomposes training into two distinct, optimized stages. First, we independently train lightweight, tensorized low-rank adapters (TT-LoRA experts), each specialized for specific tasks. Subsequently, these expert adapters remain frozen, eliminating inter-task interference and catastrophic forgetting in multi-task setting. A sparse MoE router, trained separately, dynamically leverages base model representations to select exactly one specialized adapter per input at inference time, automating expert selection without explicit task specification. Comprehensive experiments confirm our architecture retains the memory efficiency of low-rank adapters, seamlessly scales to large expert pools, and achieves robust task-level optimization. This structured decoupling significantly enhances computational efficiency and flexibility: uses only 2% of LoRA, 0.3% of Adapters and 0.03% of AdapterFusion parameters and outperforms AdapterFusion by 4 value in multi-tasking, enabling practical and scalable multi-task inference deployments.

Comment: The paper introduces TT-LoRA MoE, which integrates sparse Mixture-of-Experts (MoE) with low-rank adaptation, aligning closely with the 'Model Architecture' and 'Model Compression' criteria. It provides a novel approach to scalability and efficiency in multi-task settings.

Relevance: 10 Novelty: 8

2. Recursive KL Divergence Optimization: A Dynamic Framework for Representation Learning

ArXiv ID: 2504.21707

Authors: Anthony D Martin

Abstract: We propose a generalization of modern representation learning objectives by reframing them as recursive divergence alignment processes over localized conditional distributions While recent frameworks like Information Contrastive Learning I-Con unify multiple learning paradigms through KL divergence between fixed neighborhood conditionals we argue this view underplays a crucial recursive structure inherent in the learning process. We introduce Recursive KL Divergence Optimization RKDO a dynamic formalism where representation learning is framed as the evolution of KL divergences across data neighborhoods. This formulation captures contrastive clustering and dimensionality reduction methods as static slices while offering a new path to model stability and local adaptation. Our experiments demonstrate that RKDO offers dual efficiency advantages approximately 30 percent lower loss values compared to static approaches across three different datasets and 60 to 80 percent reduction in computational resources needed to achieve comparable results. This suggests that RKDOs recursive updating mechanism provides a fundamentally more efficient optimization landscape for representation learning with significant implications for resource constrained applications.

Comment: The paper proposes a recursive KL divergence optimization framework for representation learning, which directly aligns with foundational research in representation learning and training dynamics.