Personalized Daily ArXiv Papers 2025-09-01

[gpt-4o]	Prompt	Completion	Total
Token	17129	1713	18842
Cost	$0.04	$0.02	$0.06

Total arXiv papers: 337

Total scanned papers: 208

Total relevant papers: 11

Table of contents with paper titles:

Adaptive Heavy-Tailed Stochastic Gradient Descent Authors: Bodu Gong, Gustavo Enrique Batista, Pierre Lafaye de Micheaux
Distribution-Aware Feature Selection for SAEs Authors: Narmeen Oozeer, Nirmalendu Prakash, Michael Lan, Alice Rigg, Amirali Abdullah
Normalized Maximum Likelihood Code-Length on Riemannian Manifold Data Spaces Authors: Kota Fukuzawa, Atsushi Suzuki, Kenji Yamanishi
NSPDI-SNN: An efficient lightweight SNN based on nonlinear synaptic pruning and dendritic integration Authors: Wuque Cai, Hongze Sun, Jiayi He, Qianqian Liao, Yunliang Zang, Duo Chen, Dezhong Yao, Daqing Guo
RelP: Faithful and Efficient Circuit Discovery via Relevance Patching Authors: Farnoush Rezaei Jafari, Oliver Eberle, Ashkan Khakzar, Neel Nanda
Domain Generalization in-the-Wild: Disentangling Classification from Domain-Aware Representations Authors: Ha Min Son, Zhe Zhao, Shahbaz Rezaei, Xin Liu
MoE-Health: A Mixture of Experts Framework for Robust Multimodal Healthcare Prediction Authors: Xiaoyang Wang, Christopher C. Yang
Weighted Support Points from Random Measures: An Interpretable Alternative for Generative Modeling Authors: Peiqi Zhao, Carlos E. Rodr\'iguez, Rams\'es H. Mena, Stephen G. Walker
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers Authors: Ming Hu, Chenglong Ma, Wei Li, Wanghan Xu, Jiamin Wu, Jucheng Hu, Tianbin Li, Guohang Zhuang, Jiaqi Liu, Yingzhou Lu, Ying Chen, Chaoyang Zhang, Cheng Tan, Jie Ying, Guocheng Wu, Shujian Gao, Pengcheng Chen, Jiashi Lin, Haitao Wu, Lulu Chen, Fengxiang Wang, Yuanyuan Zhang, Xiangyu Zhao, Feilong Tang, Encheng Su, Junzhi Ning, Xinyao Liu, Ye Du, Changkai Ji, Cheng Tang, Huihui Xu, Ziyang Chen, Ziyan Huang, Jiyao Liu, Pengfei Jiang, Yizhou Wang, Chen Tang, Jianyu Wu, Yuchen Ren, Siyuan Yan, Zhonghua Wang, Zhongxing Xu, Shiyan Su, Shangquan Sun, Runkai Zhao, Zhisheng Zhang, Yu Liu, Fudi Wang, Yuanfeng Ji, Yanzhou Su, Hongming Shan, Chunmei Feng, Jiahao Xu, Jiangtao Yan, Wenhao Tang, Diping Song, Lihao Liu, Yanyan Huang, Lequan Yu, Bin Fu, Shujun Wang, Xiaomeng Li, Xiaowei Hu, Yun Gu, Ben Fei, Zhongying Deng, Benyou Wang, Yuewen Cao, Minjie Shen, Haodong Duan, Jie Xu, Yirong Chen, Fang Yan, Hongxia Hao, Jielan Li, Jiajun Du, Yanbo Wang, Imran Razzak, Chi Zhang, Lijun Wu, Conghui He, Zhaohui Lu, Jinhai Huang, Yihao Liu, Fenghua Ling, Yuqiang Li, Aoran Wang, Qihao Zheng, Nanqing Dong, Tianfan Fu, Dongzhan Zhou, Yan Lu, Wenlong Zhang, Jin Ye, Jianfei Cai, Wanli Ouyang, Yu Qiao, Zongyuan Ge, Shixiang Tang, Junjun He, Chunfeng Song, Lei Bai, Bowen Zhou
Rethinking Layer-wise Model Merging through Chain of Merges Authors: Pietro Buzzega, Riccardo Salami, Angelo Porrello, Simone Calderara
Deep Residual Echo State Networks: exploring residual orthogonal connections in untrained Recurrent Neural Networks Authors: Matteo Pinna, Andrea Ceni, Claudio Gallicchio

1. Adaptive Heavy-Tailed Stochastic Gradient Descent

ArXiv ID: 2508.21353

Authors: Bodu Gong, Gustavo Enrique Batista, Pierre Lafaye de Micheaux

Abstract: In the era of large-scale neural network models, optimization algorithms often struggle with generalization due to an overreliance on training loss. One key insight widely accepted in the machine learning community is the idea that wide basins (regions around a local minimum where the loss increases gradually) promote better generalization by offering greater stability to small changes in input data or model parameters. In contrast, sharp minima are typically more sensitive and less stable. Motivated by two key empirical observations - the inherent heavy-tailed distribution of gradient noise in stochastic gradient descent and the Edge of Stability phenomenon during neural network training, in which curvature grows before settling at a plateau, we introduce Adaptive Heavy Tailed Stochastic Gradient Descent (AHTSGD). The algorithm injects heavier-tailed noise into the optimizer during the early stages of training to enhance exploration and gradually transitions to lighter-tailed noise as sharpness stabilizes. By dynamically adapting to the sharpness of the loss landscape throughout training, AHTSGD promotes accelerated convergence to wide basins. AHTSGD is the first algorithm to adjust the nature of injected noise into an optimizer based on the Edge of Stability phenomenon. AHTSGD consistently outperforms SGD and other noise-based methods on benchmarks like MNIST and CIFAR-10, with marked gains on noisy datasets such as SVHN. It ultimately accelerates early training from poor initializations and improves generalization across clean and noisy settings, remaining robust to learning rate choices.

Comment: The paper introduces Adaptive Heavy Tailed Stochastic Gradient Descent (AHTSGD), which is relevant to representation learning as it provides insights into training dynamics and optimization in neural networks.