Personalized Daily ArXiv Papers 2025-07-11

[gpt-4o]	Prompt	Completion	Total
Token	29558	3654	33212
Cost	$0.07	$0.04	$0.11

Total arXiv papers: 435

Total scanned papers: 267

Total relevant papers: 23

Table of contents with paper titles:

A statistical physics framework for optimal learning Authors: Francesca Mignacco, Francesco Mori
AI Should Sense Better, Not Just Scale Bigger: Adaptive Sensing as a Paradigm Shift Authors: Eunsu Baek, Keondo Park, Jeonggil Ko, Min-hwan Oh, Taesik Gong, Hyung-Sin Kim
Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding Authors: Nidhi Bhatia, Ankit More, Ritika Borkar, Tiyasa Mitra, Ramon Matas, Ritchie Zhao, Maximilian Golub, Dheevatsa Mudigere, Brian Pharris, Bita Darvish Rouhani
Position: We Need An Algorithmic Understanding of Generative AI Authors: Oliver Eberle, Thomas McGee, Hamza Giaffar, Taylor Webb, Ida Momennejad
UnIT: Scalable Unstructured Inference-Time Pruning for MAC-efficient Neural Inference on MCUs Authors: Ashe Neth, Sawinder kaur, Mohammad Nur Hossain Khan, Subrata Biswas, Asif Salekin, Bashima Islam
Neural networks leverage nominally quantum and post-quantum representations Authors: Paul M. Riechers, Thomas J. Elliott, Adam S. Shai
Neural Concept Verifier: Scaling Prover-Verifier Games via Concept Encodings Authors: Berkant Turan, Suhrab Asadulla, David Steinmann, Wolfgang Stammer, Sebastian Pokutta
Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts Authors: Samin Yeasar Arnob, Zhan Su, Minseon Kim, Oleksiy Ostapenko, Riyasat Ohib, Esra'a Saleh, Doina Precup, Lucas Caccia, Alessandro Sordoni
CCQ: Convolutional Code for Extreme Low-bit Quantization in LLMs Authors: Zhaojing Zhou, Xunchao Li, Minghao Li, Handi Zhang, Haoshuang Wang, Wenbin Chang, Yiqun Liu, Qingqing Dang, Dianhai Yu, Yanjun Ma, Haifeng Wang
On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment Authors: Sarah Ball, Greg Gluch, Shafi Goldwasser, Frauke Kreuter, Omer Reingold, Guy N. Rothblum
Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs Authors: Jeongseok Hyun, Sukjun Hwang, Su Ho Han, Taeoh Kim, Inwoong Lee, Dongyoon Wee, Joon-Young Lee, Seon Joo Kim, Minho Shim
Why is Your Language Model a Poor Implicit Reward Model? Authors: Noam Razin, Yong Lin, Jiarui Yao, Sanjeev Arora
Stable Preference Optimization for LLMs: A Bilevel Approach Beyond Direct Preference Optimization Authors: Chengtao Jian, Kai Yang, Ye Ouyang, Xiaozhou Ye
Leveraging Manifold Embeddings for Enhanced Graph Transformer Representations and Learning Authors: Ankit Jyothish, Ali Jannesari
Optimization Guarantees for Square-Root Natural-Gradient Variational Inference Authors: Navish Kumar, Thomas M\"ollenhoff, Mohammad Emtiyaz Khan, Aurelien Lucchi
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs Authors: Ziyue Li, Yang Li, Tianyi Zhou
Single-pass Adaptive Image Tokenization for Minimum Program Search Authors: Shivam Duggal, Sanghyun Byun, William T. Freeman, Antonio Torralba, Phillip Isola
Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders Authors: Dimitrios Bralios, Jonah Casebeer, Paris Smaragdis
Efficient Parametric SVD of Koopman Operator for Stochastic Dynamical Systems Authors: Minchan Jeong, J. Jon Ryu, Se-Young Yun, Gregory W. Wornell
DAF: An Efficient End-to-End Dynamic Activation Framework for on-Device DNN Training Authors: Renyuan Liu (Richard), Yuyang Leng (Richard), Kaiyan Liu (Richard), Shaohan Hu (Richard), Chun-Fu (Richard), Chen, Peijun Zhao, Heechul Yun, Shuochao Yao
Bayesian Double Descent Authors: Nick Polson, Vadim Sokolov
Str-GCL: Structural Commonsense Driven Graph Contrastive Learning Authors: Dongxiao He, Yongqi Huang, Jitao Zhao, Xiaobao Wang, Zhen Wang
Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning Authors: Wooseong Jeong, Kuk-Jin Yoon

1. A statistical physics framework for optimal learning

ArXiv ID: 2507.07907

Authors: Francesca Mignacco, Francesco Mori

Abstract: Learning is a complex dynamical process shaped by a range of interconnected decisions. Careful design of hyperparameter schedules for artificial neural networks or efficient allocation of cognitive resources by biological learners can dramatically affect performance. Yet, theoretical understanding of optimal learning strategies remains sparse, especially due to the intricate interplay between evolving meta-parameters and nonlinear learning dynamics. The search for optimal protocols is further hindered by the high dimensionality of the learning space, often resulting in predominantly heuristic, difficult to interpret, and computationally demanding solutions. Here, we combine statistical physics with control theory in a unified theoretical framework to identify optimal protocols in prototypical neural network models. In the high-dimensional limit, we derive closed-form ordinary differential equations that track online stochastic gradient descent through low-dimensional order parameters. We formulate the design of learning protocols as an optimal control problem directly on the dynamics of the order parameters with the goal of minimizing the generalization error at the end of training. This framework encompasses a variety of learning scenarios, optimization constraints, and control budgets. We apply it to representative cases, including optimal curricula, adaptive dropout regularization and noise schedules in denoising autoencoders. We find nontrivial yet interpretable strategies highlighting how optimal protocols mediate crucial learning tradeoffs, such as maximizing alignment with informative input directions while minimizing noise fitting. Finally, we show how to apply our framework to real datasets. Our results establish a principled foundation for understanding and designing optimal learning protocols and suggest a path toward a theory of meta-learning grounded in statistical physics.

Comment: The paper presents a statistical physics framework for optimal learning, which is relevant to representation learning and emerging trends in foundational research.

Relevance: 9 Novelty: 9

2. AI Should Sense Better, Not Just Scale Bigger: Adaptive Sensing as a Paradigm Shift

ArXiv ID: 2507.07820

Authors: Eunsu Baek, Keondo Park, Jeonggil Ko, Min-hwan Oh, Taesik Gong, Hyung-Sin Kim

Abstract: Current AI advances largely rely on scaling neural models and expanding training datasets to achieve generalization and robustness. Despite notable successes, this paradigm incurs significant environmental, economic, and ethical costs, limiting sustainability and equitable access. Inspired by biological sensory systems, where adaptation occurs dynamically at the input (e.g., adjusting pupil size, refocusing vision)--we advocate for adaptive sensing as a necessary and foundational shift. Adaptive sensing proactively modulates sensor parameters (e.g., exposure, sensitivity, multimodal configurations) at the input level, significantly mitigating covariate shifts and improving efficiency. Empirical evidence from recent studies demonstrates that adaptive sensing enables small models (e.g., EfficientNet-B0) to surpass substantially larger models (e.g., OpenCLIP-H) trained with significantly more data and compute. We (i) outline a roadmap for broadly integrating adaptive sensing into real-world applications spanning humanoid, healthcare, autonomous systems, agriculture, and environmental monitoring, (ii) critically assess technical and ethical integration challenges, and (iii) propose targeted research directions, such as standardized benchmarks, real-time adaptive algorithms, multimodal integration, and privacy-preserving methods. Collectively, these efforts aim to transition the AI community toward sustainable, robust, and equitable artificial intelligence systems.

Comment: The paper advocates for adaptive sensing as a paradigm shift, which is an emerging trend challenging established assumptions.

Relevance: 9 Novelty: 9

3. Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding

ArXiv ID: 2507.07120

Authors: Nidhi Bhatia, Ankit More, Ritika Borkar, Tiyasa Mitra, Ramon Matas, Ritchie Zhao, Maximilian Golub, Dheevatsa Mudigere, Brian Pharris, Bita Darvish Rouhani

Abstract: As LLMs scale to multi-million-token KV histories, real-time autoregressive decoding under tight Token-to-Token Latency (TTL) constraints faces growing pressure. Two core bottlenecks dominate: accessing Feed-Forward Network (FFN) weights and reading long KV caches. While Tensor Parallelism (TP) helps mitigate the cost of FFN weight reads, it does not scale well for attention. When TP width exceeds the number of KV heads, it leads to inefficient KV duplication, limits parallelism, and constrains batch size. Simultaneously, DRAM reads for long KV histories scale linearly with batch size, further capping efficiency. We introduce Helix Parallelism, a hybrid execution strategy that applies KV parallelism during attention to shard KV caches across GPUs, then reuses the same GPUs for TP in dense LLMs or TPxExpert Parallel (EP) in MoEs during FFN computation. To preserve exact attention behavior, Helix includes a lightweight communication step. To minimize the exposed communication cost, we introduce Helix HOP-B. Helix HOP-B effectively minimizes communication overhead through batchwise overlap, preserving low TTL while improving GPU efficiency. Compared to conventional parallelism approaches, Helix reduces TTL by up to 1.5x at fixed batch sizes and supports up to 32x larger batches under the same latency budget for DeepSeek-R1, pushing forward the throughput-latency Pareto on Blackwell and making real-time inference with ultra-long-sequence practical.

Comment: The paper presents Helix Parallelism, a new execution strategy for LLMs, which is relevant to model architecture and efficiency improvements.