Personalized Daily ArXiv Papers 2025-05-12

[gpt-4o]	Prompt	Completion	Total
Token	35399	4453	39852
Cost	$0.09	$0.04	$0.13

Total arXiv papers: 328

Total scanned papers: 212

Total relevant papers: 18

Table of contents with paper titles:

FloE: On-the-Fly MoE Inference Authors: Yuxin Zhou, Zheng Li, Jun Zhang, Jue Wang, Yiping Wang, Zhongle Xie, Ke Chen, Lidan Shou
Continuous Thought Machines Authors: Luke Darlow, Ciaran Regan, Sebastian Risi, Jeffrey Seely, Llion Jones
Deep-ICE: The first globally optimal algorithm for empirical risk minimization of two-layer maxout and ReLU networks Authors: Xi He, Yi Miao, Max A. Little
On the Depth of Monotone ReLU Neural Networks and ICNNs Authors: Egor Bakaev, Florestan Brunck, Christoph Hertrich, Daniel Reichman, Amir Yehudayoff
Towards a Unified Representation Evaluation Framework Beyond Downstream Tasks Authors: Christos Plachouras, Julien Guinot, George Fazekas, Elio Quinton, Emmanouil Benetos, Johan Pauwels
Rethinking Graph Contrastive Learning through Relative Similarity Preservation Authors: Zhiyuan Ning, Pengfei Wang, Ziyue Qiao, Pengyang Wang, Yuanchun Zhou
How to Train Your Metamorphic Deep Neural Network Authors: Thomas Sommariva, Simone Calderara, Angelo Porrello
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design Authors: Haojie Duanmu, Xiuhong Li, Zhihang Yuan, Size Zheng, Jiangfei Duan, Xingcheng Zhang, Dahua Lin
Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM Authors: Zehao Fan, Garrett Gagnon, Zhenyu Liu, Liu Liu
New Statistical and Computational Results for Learning Junta Distributions Authors: Lorenzo Beretta
Hypergraph Neural Sheaf Diffusion: A Symmetric Simplicial Set Framework for Higher-Order Learning Authors: Seongjin Choi, Gahee Kim, Yong-Geun Oh
Neuro-Symbolic Concepts Authors: Jiayuan Mao, Joshua B. Tenenbaum, Jiajun Wu
Generative Discovery of Partial Differential Equations by Learning from Math Handbooks Authors: Hao Xu, Yuntian Chen, Rui Cao, Tianning Tang, Mengge Du, Jian Li, Adrian H. Callaghan, Dongxiao Zhang
Insertion Language Models: Sequence Generation with Arbitrary-Position Insertions Authors: Dhruvesh Patel, Aishwarya Sahoo, Avinash Amballa, Tahira Naseem, Tim G. J. Rudner, Andrew McCallum
Griffin: Towards a Graph-Centric Relational Database Foundation Model Authors: Yanbo Wang, Xiyuan Wang, Quan Gan, Minjie Wang, Qibin Yang, David Wipf, Muhan Zhang
Autoencoder-Based Hybrid Replay for Class-Incremental Learning Authors: Milad Khademi Nori, Il-Min Kim, Guanghui Wang
UniSymNet: A Unified Symbolic Network Guided by Transformer Authors: Xinxin Li, Juan Zhang, Da Li, Xingyu Liu, Jin Xu, Junping Yin
Register and CLS tokens yield a decoupling of local and global features in large ViTs Authors: Alexander Lappe, Martin A. Giese

1. FloE: On-the-Fly MoE Inference

ArXiv ID: 2505.05950

Authors: Yuxin Zhou, Zheng Li, Jun Zhang, Jue Wang, Yiping Wang, Zhongle Xie, Ke Chen, Lidan Shou

Abstract: With the widespread adoption of Mixture-of-Experts (MoE) models, there is a growing demand for efficient inference on memory-constrained devices. While offloading expert parameters to CPU memory and loading activated experts on demand has emerged as a potential solution, the large size of activated experts overburdens the limited PCIe bandwidth, hindering the effectiveness in latency-sensitive scenarios. To mitigate this, we propose FloE, an on-the-fly MoE inference system on memory-constrained GPUs. FloE is built on the insight that there exists substantial untapped redundancy within sparsely activated experts. It employs various compression techniques on the expert's internal parameter matrices to reduce the data movement load, combined with low-cost sparse prediction, achieving perceptible inference acceleration in wall-clock time on resource-constrained devices. Empirically, FloE achieves a 9.3x compression of parameters per expert in Mixtral-8x7B; enables deployment on a GPU with only 11GB VRAM, reducing the memory footprint by up to 8.5x; and delivers a 48.7x inference speedup compared to DeepSpeed-MII on a single GeForce RTX 3090.

Comment: The paper proposes FloE, an on-the-fly MoE inference system, which directly aligns with the Mixture-of-Experts (MoE) criterion under model architecture and compression. The compression techniques and inference acceleration are novel and impactful.

Relevance: 10 Novelty: 8

2. Continuous Thought Machines

ArXiv ID: 2505.05522

Authors: Luke Darlow, Ciaran Regan, Sebastian Risi, Jeffrey Seely, Llion Jones

Abstract: Biological brains demonstrate complex neural activity, where the timing and interplay between neurons is critical to how brains process information. Most deep learning architectures simplify neural activity by abstracting away temporal dynamics. In this paper we challenge that paradigm. By incorporating neuron-level processing and synchronization, we can effectively reintroduce neural timing as a foundational element. We present the Continuous Thought Machine (CTM), a model designed to leverage neural dynamics as its core representation. The CTM has two core innovations: (1) neuron-level temporal processing, where each neuron uses unique weight parameters to process a history of incoming signals; and (2) neural synchronization employed as a latent representation. The CTM aims to strike a balance between oversimplified neuron abstractions that improve computational efficiency, and biological realism. It operates at a level of abstraction that effectively captures essential temporal dynamics while remaining computationally tractable for deep learning. We demonstrate the CTM's strong performance and versatility across a range of challenging tasks, including ImageNet-1K classification, solving 2D mazes, sorting, parity computation, question-answering, and RL tasks. Beyond displaying rich internal representations and offering a natural avenue for interpretation owing to its internal process, the CTM is able to perform tasks that require complex sequential reasoning. The CTM can also leverage adaptive compute, where it can stop earlier for simpler tasks, or keep computing when faced with more challenging instances. The goal of this work is to share the CTM and its associated innovations, rather than pushing for new state-of-the-art results. To that end, we believe the CTM represents a significant step toward developing more biologically plausible and powerful artificial intelligence systems.

Comment: The paper introduces the Continuous Thought Machine, which challenges established paradigms by incorporating neuron-level temporal dynamics, aligning with emerging trends and architectural innovations.

Relevance: 9 Novelty: 9

3. Deep-ICE: The first globally optimal algorithm for empirical risk minimization of two-layer maxout and ReLU networks

ArXiv ID: 2505.05740

Authors: Xi He, Yi Miao, Max A. Little

Abstract: This paper introduces the first globally optimal algorithm for the empirical risk minimization problem of two-layer maxout and ReLU networks, i.e., minimizing the number of misclassifications. The algorithm has a worst-case time complexity of $O\left(N^{DK+1}\right)$, where $K$ denotes the number of hidden neurons and $D$ represents the number of features. It can be can be generalized to accommodate arbitrary computable loss functions without affecting its computational complexity. Our experiments demonstrate that the proposed algorithm provides provably exact solutions for small-scale datasets. To handle larger datasets, we introduce a novel coreset selection method that reduces the data size to a manageable scale, making it feasible for our algorithm. This extension enables efficient processing of large-scale datasets and achieves significantly improved performance, with a 20-30\% reduction in misclassifications for both training and prediction, compared to state-of-the-art approaches (neural networks trained using gradient descent and support vector machines), when applied to the same models (two-layer networks with fixed hidden nodes and linear models).

Comment: The paper introduces a globally optimal algorithm for empirical risk minimization in two-layer maxout and ReLU networks, which is highly relevant to foundational research in representation learning and training dynamics. The coreset selection method for scaling the algorithm adds significant novelty.