← Previous Summary | Monthly Overview | Next Summary →
2025-08 | 2025-09 | 2025-10

Personalized Monthly Topic Summary 2025/09

Metric	Value
Total Papers	685
Model Architecture	182
Model Compression and Efficiency	183
High Performance Computing	43
Representation Learning	263
Other Foundational Research	14

Model Architecture (182)

LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures - Score: 20.0 (R=0, N=0) - Date: 2025-09-19 - Comment: Author match
Inductive Bias and Spectral Properties of Single-Head Attention in High Dimensions - Score: 19 (R=10, N=9) - Date: 2025-09-30 - Comment: Architecture analysis: theoretical study of single-head attention’s inductive bias and spectral properties, revealing implicit low-rank regularization via weight decay.
The Impossibility of Inverse Permutation Learning in Transformer Models - Score: 19 (R=10, N=9) - Date: 2025-09-30 - Comment: Model Architecture Theory: impossibility result for inverse permutation learning in decoder-only Transformers; shows fixes via encoder-decoder or scratch tokens.
Clebsch-Gordan Transformer: Fast and Global Equivariant Attention - Score: 19 (R=10, N=9) - Date: 2025-09-30 - Comment: Matches Model Architecture and Efficiency: O(N log N) global equivariant attention via Clebsch–Gordan convolution supporting high-order features.
Towards a Comprehensive Scaling Law of Mixture-of-Experts - Score: 19 (R=10, N=9) - Date: 2025-09-30 - Comment: Matches Mixture-of-Experts + Scaling Laws: comprehensive MoE scaling law over key factors (N, Na, G, S, D) with optimal configuration guidance.
On the Capacity of Self-Attention - Score: 19 (R=10, N=9) - Date: 2025-09-30 - Comment: Matches Model Architecture (theory): capacity scaling law for self-attention and principled multi-head budget allocation.
Bridging Kolmogorov Complexity and Deep Learning: Asymptotically Optimal Description Length Objectives for Transformers - Score: 19 (R=10, N=9) - Date: 2025-09-29 - Comment: Model Compression and Generalization Theory: proposes asymptotically optimal MDL objectives for Transformers grounded in Kolmogorov complexity; constructs a tractable variational objective.
Structured Sparse Transition Matrices to Enable State Tracking in State-Space Models - Score: 19 (R=10, N=9) - Date: 2025-09-29 - Comment: Model Architecture with structured sparsity: PD-SSM factorizes transition matrices (one-hot P times diagonal D) enabling FSA state tracking at diagonal-SSM cost with strong expressivity guarantees.
Circuit Complexity From Physical Constraints: Scaling Limitations of Attention - Score: 19 (R=10, N=9) - Date: 2025-09-24 - Comment: Model Architecture/Theory: introduces new circuit complexity classes capturing physical constraints and derives scaling limits for attention-based Transformers.
Towards Interpretable and Efficient Attention: Compressing All by Contracting a Few - Score: 19 (R=10, N=9) - Date: 2025-09-23 - Comment: Model Architecture: derives an interpretable Contract-and-Broadcast Self-Attention that compresses tokens into low-dimensional structures and generalizes existing attention; Model Compression and Efficiency: achieves linear-time attention by contracting a few representative tokens.
Circuit realization and hardware linearization of monotone operator equilibrium networks - Score: 19 (R=10, N=9) - Date: 2025-09-18 - Comment: Model Architecture and HPC: analog circuit realization of monotone operator equilibrium networks with in-hardware gradient computation (hardware linearization), enabling trainable analog implementations.
Positional Encoding via Token-Aware Phase Attention - Score: 19 (R=10, N=9) - Date: 2025-09-17 - Comment: Matches Model Architecture: new positional encoding (TAPA) for Transformers with theory on RoPE’s bias; improves long-context extrapolation.
Fast attention mechanisms: a tale of parallelism - Score: 19 (R=10, N=9) - Date: 2025-09-12 - Comment: Compression/Efficiency and Model Architecture — introduces sub-quadratic Approximate Nearest Neighbor Attention with theoretical guarantees (MPC-equivalence) and connections to low-rank transformers.
Customizing the Inductive Biases of Softmax Attention using Structured Matrices - Score: 19 (R=10, N=9) - Date: 2025-09-10 - Comment: Strongly matches Model Architecture and Efficiency: proposes new attention scoring via high-rank efficient structured matrices (BTT/MLR) to encode distance-dependent compute biases and improve scaling.
Causal Attention with Lookahead Keys - Score: 19 (R=10, N=9) - Date: 2025-09-10 - Comment: Model Architecture: introduces CASTLE, a causal attention variant with lookahead keys and an equivalent parallelizable formulation.
SpikingBrain Technical Report: Spiking Brain-inspired Large Models - Score: 19 (R=10, N=9) - Date: 2025-09-08 - Comment: Model Architecture, Compression/Efficiency, and HPC: spiking LLMs with linear/hybrid-linear attention and MoE, sparse/event-driven inference with near-constant memory, and custom distributed training on non-NVIDIA hardware.
InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation - Score: 18 (R=10, N=8) - Date: 2025-09-30 - Comment: Trainable sparse attention with dense–sparse switch; parameter reuse enables seamless short-to-long adaptation and 4x speedups.
MoE-PHDS: One MoE checkpoint for flexible runtime sparsity - Score: 18 (R=10, N=8) - Date: 2025-09-30 - Comment: Model Architecture and Efficiency (MoE): PHDS enables flexible runtime sparsity (dial k) from a single MoE checkpoint via lightweight SFT, giving controllable accuracy/latency trade-offs.
Statistical Advantage of Softmax Attention: Insights from Single-Location Regression - Score: 18 (R=10, N=8) - Date: 2025-09-29 - Comment: Representation Learning/Architecture Analysis: high-dimensional theory shows softmax attention attains Bayes risk and outperforms linear attention.
Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts - Score: 18 (R=10, N=8) - Date: 2025-09-29 - Comment: Model Architecture (MoE): training framework that enables scaling the number of activated experts at inference by fostering expert collaboration and robust routing.
Behind RoPE: How Does Causal Mask Encode Positional Information? - Score: 18 (R=10, N=8) - Date: 2025-09-26 - Comment: Model Architecture/Representation Learning: theoretical and empirical analysis of positional information from causal masks and their interaction with RoPE in Transformer decoders, revealing non-trivial induced attention patterns.
Hierarchical Resolution Transformers: A Wavelet-Inspired Architecture for Multi-Scale Language Understanding - Score: 18 (R=10, N=8) - Date: 2025-09-26 - Comment: Model Architecture and Efficiency: introduces a wavelet-inspired Hierarchical Resolution Transformer with multi-resolution attention and O(n log n) complexity.
Dynamic Reasoning Chains through Depth-Specialized Mixture-of-Experts in Transformer Architectures - Score: 18 (R=10, N=8) - Date: 2025-09-26 - Comment: Model Architecture: Mixture-of-Experts with depth-specialized experts and learned routing (conditional/dynamic computation) in Transformers.
Faster, Smaller, and Smarter: Task-Aware Expert Merging for Online MoE Inference - Score: 18 (R=10, N=8) - Date: 2025-09-25 - Comment: Direct MoE architecture/efficiency: task-aware expert merging with adaptive neural bandit router for online inference.
Symphony-MoE: Harmonizing Disparate Pre-trained Models into a Coherent Mixture-of-Experts - Score: 18 (R=10, N=8) - Date: 2025-09-24 - Comment: Model Architecture (MoE): builds a coherent MoE from disparate pretrained models via training-free functional alignment and router learning.
PiMoE: Token-Level Routing for Integrating High-Precision Computation and Reasoning - Score: 18 (R=10, N=8) - Date: 2025-09-24 - Comment: Model Architecture (MoE): token-level routing in a physically-isolated Mixture-of-Experts integrating computation modules with reasoning within a single chain of thought.
Qwen3-Omni Technical Report - Score: 18 (R=10, N=8) - Date: 2025-09-24 - Comment: Model Architecture: MoE (Thinker–Talker) unifying text/image/audio/video; Efficiency: low-latency streaming replacing diffusion with causal ConvNet and codebook prediction.
MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE - Score: 18 (R=10, N=8) - Date: 2025-09-24 - Comment: MoE inference-time method: training-free hyper-parallel token-level scaling (RoE) with efficient batching and specialized KV cache for better accuracy–compute tradeoffs.
Achilles' Heel of Mamba: Essential difficulties of the Mamba architecture demonstrated by synthetic data - Score: 18 (R=10, N=8) - Date: 2025-09-23 - Comment: Matches Model Architecture: foundational analysis of Mamba’s nonlinear convolution revealing asymmetry bias and architectural limitations/suggested fixes.
Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems - Score: 18 (R=10, N=8) - Date: 2025-09-22 - Comment: Model Architecture: derives a hierarchical self-attention mechanism from first principles with a dynamic-programming algorithm, enabling multi-scale transformers and post-hoc hierarchical injection.
On Linear Mode Connectivity of Mixture-of-Experts Architectures - Score: 18 (R=10, N=8) - Date: 2025-09-17 - Comment: Model Architecture (MoE): analyzes symmetries and establishes linear mode connectivity in MoE; introduces expert/gating alignment algorithm.
Decoupling the "What" and "Where" With Polar Coordinate Positional Embeddings - Score: 18 (R=10, N=8) - Date: 2025-09-17 - Comment: Model Architecture: introduces PoPE, a positional encoding that disentangles content vs. position in Transformers and improves length extrapolation.
Mixture-of-Clustered-Experts: Advancing Expert Specialization and Generalization in Instruction Tuning - Score: 18 (R=10, N=8) - Date: 2025-09-17 - Comment: Model Architecture (MoE): proposes dual-stage routing (sequence-level group routing + token-level top-k) to improve expert specialization/generalization.
Multipole Semantic Attention: A Fast Approximation of Softmax Attention for Pretraining - Score: 18 (R=10, N=8) - Date: 2025-09-15 - Comment: Model Efficiency/Transformer Architecture: fast softmax attention via separate query/key clustering and multipole (monopole+dipole) approximations; hierarchical causal block scheme; drop-in attention replacement.
Steering MoE LLMs via Expert (De)Activation - Score: 18 (R=10, N=8) - Date: 2025-09-12 - Comment: Model Architecture (MoE): identifies behavior-linked experts via activation patterns and steers behavior by selective expert (de)activation at inference without retraining.
Accelerating Mixture-of-Expert Inference with Adaptive Expert Split Mechanism - Score: 18 (R=10, N=8) - Date: 2025-09-11 - Comment: High Performance Computing and Efficiency — MoE-specific inference system with expert offloading, caching/prefetch, vertical expert split, and adaptive cache configuration to reduce VRAM and latency.
Ban&Pick: Achieving Free Performance Gains and Inference Speedup via Smarter Routing in MoE-LLMs - Score: 18 (R=10, N=8) - Date: 2025-09-09 - Comment: Model Architecture and Efficiency (MoE): post-training routing strategy that reinforces key experts and dynamically prunes redundant experts to improve accuracy and speed without retraining.
HoPE: Hyperbolic Rotary Positional Encoding for Stable Long-Range Dependency Modeling in Large Language Models - Score: 18 (R=10, N=8) - Date: 2025-09-08 - Comment: Model Architecture: introduces hyperbolic rotary positional encoding (HoPE), a geometric generalization of RoPE for stable long-range dependencies.
Interpreting Transformer Architectures as Implicit Multinomial Regression - Score: 18 (R=10, N=8) - Date: 2025-09-08 - Comment: Model Architecture/Mechanistic Interpretability: establishes a theoretical link between attention dynamics in transformers and optimal feature recovery in multinomial regression.
LongCat-Flash Technical Report - Score: 18 (R=10, N=8) - Date: 2025-09-03 - Comment: The paper introduces LongCat-Flash, a Mixture-of-Experts language model, focusing on architectural innovations and efficiency improvements.
Parameterized Hardness of Zonotope Containment and Neural Network Verification - Score: 18 (R=9, N=9) - Date: 2025-09-30 - Comment: Foundational theory/verification: strongest parameterized hardness results for properties of ReLU networks (e.g., positivity, Lipschitz), directly analyzing neural network architecture capabilities.
Linear Transformers Implicitly Discover Unified Numerical Algorithms - Score: 18 (R=9, N=9) - Date: 2025-09-25 - Comment: Model Architecture: linear-attention Transformer analysis/unrolling revealing a unified iterative solver with theoretical convergence.
Contextuality, Holonomy and Discrete Fiber Bundles in Group-Valued Boltzmann Machines - Score: 18 (R=9, N=9) - Date: 2025-09-17 - Comment: Matches Model Architecture: extends RBMs to group-valued weights and introduces a holonomy-based contextuality index (topological/geometric regularization).
Are We Really Learning the Score Function? Reinterpreting Diffusion Models Through Wasserstein Gradient Flow Matching - Score: 18 (R=9, N=9) - Date: 2025-09-03 - Comment: The paper reinterprets diffusion models through a new theoretical perspective, relevant to emerging trends in generative models.
LLaDA-MoE: A Sparse MoE Diffusion Language Model - Score: 17 (R=10, N=7) - Date: 2025-09-30 - Comment: Mixture-of-Experts (sparse MoE) architecture for diffusion language models; efficient inference with few active parameters.
Uni-NTFM: A Unified Foundation Model for EEG Signal Representation Learning - Score: 17 (R=10, N=7) - Date: 2025-09-30 - Comment: Model Architecture (MoE): Mixture-of-Experts Transformer to scale capacity via routing specialized subnetworks; decoupled time/frequency streams and topological embeddings.
Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms - Score: 17 (R=10, N=7) - Date: 2025-09-30 - Comment: Model Architecture (MoE): introduces an internal metric to analyze routing and expert/neuron utilization, revealing specialization and training dynamics.
Defending MoE LLMs against Harmful Fine-Tuning via Safety Routing Alignment - Score: 17 (R=10, N=7) - Date: 2025-09-30 - Comment: Model Architecture (MoE): safety-preserving fine-tuning by aligning MoE routing weights to prevent harmful expert drift (safety routing alignment).
Dynamic Expert Specialization: Towards Catastrophic Forgetting-Free Multi-Domain MoE Adaptation - Score: 17 (R=10, N=7) - Date: 2025-09-24 - Comment: Model Architecture: MoE adaptation with dynamic expert specialization and router distillation to avoid catastrophic forgetting; training schedule isolates domain-specific gradients.
Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts - Score: 17 (R=10, N=7) - Date: 2025-09-17 - Comment: Strongly matches Model Architecture (MoE/Transformers): proposes grouped multi-head attention, dual-scale shared experts, and adaptive dynamic routing for efficient expert allocation.
MoE-Compression: How the Compression Error of Experts Affects the Inference Accuracy of MoE Model? - Score: 17 (R=10, N=7) - Date: 2025-09-10 - Comment: Matches both Model Architecture (MoE) and Model Compression and Efficiency: compresses non-activated experts with error-bounded lossy methods and analyzes layer-wise sensitivity on inference accuracy.
High-Dimensional Analysis of Single-Layer Attention for Sparse-Token Classification - Score: 17 (R=9, N=8) - Date: 2025-09-30 - Comment: Matches Architecture/Training Dynamics: theoretical analysis of single-layer attention’s adaptive token selection and learnability in high dimensions.
A multiscale analysis of mean-field transformers in the moderate interaction regime - Score: 17 (R=9, N=8) - Date: 2025-09-30 - Comment: Transformer Theory: mean-field multiscale analysis of token dynamics through depth (fast/intermediate/slow phases).
MARCOS: Deep Thinking by Markov Chain of Continuous Thoughts - Score: 17 (R=9, N=8) - Date: 2025-09-30 - Comment: Model Architecture/Training Dynamics: replaces token CoT with a latent Markov chain of continuous thoughts and variational training for faster, decoupled reasoning.
Short window attention enables long-term memorization - Score: 17 (R=9, N=8) - Date: 2025-09-30 - Comment: Model Architecture: hybrid sliding-window attention + xLSTM with stochastic window-size training; insight that short windows strengthen long-term memory.
FS-KAN: Permutation Equivariant Kolmogorov-Arnold Networks via Function Sharing - Score: 17 (R=9, N=8) - Date: 2025-09-30 - Comment: Model Architecture: principled permutation-equivariant Kolmogorov–Arnold Networks via function sharing with theoretical expressivity guarantees and symmetry-aware design.
Trained Mamba Emulates Online Gradient Descent in In-Context Linear Regression - Score: 17 (R=9, N=8) - Date: 2025-09-30 - Comment: Model Architecture/Training Dynamics: theoretical analysis showing Mamba performs online gradient descent for in-context linear regression.
HieraTok: Multi-Scale Visual Tokenizer Improves Image Reconstruction and Generation - Score: 17 (R=9, N=8) - Date: 2025-09-30 - Comment: Model Architecture: multi-scale ViT-based tokenizer with scale-causal attention, improving latent representation quality for reconstruction/generation.
Signal Preserving Weight Initialization for Odd-Sigmoid Activations - Score: 17 (R=9, N=8) - Date: 2025-09-30 - Comment: Model Architecture/Training Dynamics: closed-form signal-preserving weight initialization tailored to odd-sigmoid activations, enabling stable training without normalization.
Scale-Wise VAR is Secretly Discrete Diffusion - Score: 17 (R=9, N=8) - Date: 2025-09-29 - Comment: Model architecture: proves VAR with Markovian attention is equivalent to discrete diffusion, enabling diffusion-style iterative refinement for AR transformers and improving efficiency.
StateX: Enhancing RNN Recall via Post-training State Expansion - Score: 17 (R=9, N=8) - Date: 2025-09-29 - Comment: Model Architecture and Efficiency: post-training recurrent state expansion to boost recall for linear-attention/SSM RNNs with minimal parameter growth.
Wavelet-Induced Rotary Encodings: RoPE Meets Graphs - Score: 17 (R=9, N=8) - Date: 2025-09-29 - Comment: Model Architecture/Representation Learning: Generalizes RoPE to graphs (WIRE) with theoretical guarantees (permutation equivariance, linear-attention compatibility).
Go With The Flow: Churn-Tolerant Decentralized Training of Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-09-26 - Comment: High Performance Computing: decentralized, churn-tolerant LLM training with a new flow-based routing algorithm for microbatch scheduling on heterogeneous clients.
TyphoonMLA: A Mixed Naive-Absorb MLA Kernel For Shared Prefix - Score: 17 (R=9, N=8) - Date: 2025-09-26 - Comment: High Performance Computing: hybrid MLA attention kernel that combines naive and absorb formulations to exploit shared-prefix reuse while reducing bandwidth/computation during decoding.
Decoupled-Value Attention for Prior-Data Fitted Networks: GP Inference for Physical Equations - Score: 17 (R=9, N=8) - Date: 2025-09-26 - Comment: Matches Model Architecture: introduces Decoupled-Value Attention mirroring GP updates; localized attention for scalable PFNs, architecture-level innovation.
Theory of periodic convolutional neural network - Score: 17 (R=9, N=8) - Date: 2025-09-24 - Comment: Model architecture: introduces periodic CNNs and proves sharp approximation properties, characterizing expressive power.
KANO: Kolmogorov-Arnold Neural Operator - Score: 17 (R=9, N=8) - Date: 2025-09-24 - Comment: Model Architecture: introduces a new neural operator (KANO) combining spectral and spatial bases with theoretical expressivity gains over FNO.
Deep Hierarchical Learning with Nested Subspace Networks - Score: 17 (R=9, N=8) - Date: 2025-09-23 - Comment: Model Architecture & Efficiency: Nested Subspace Networks reparameterize linear layers for rank-nested functions enabling continuous compute scaling in pre-trained LLMs.
DISCO: Disentangled Communication Steering for Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-09-23 - Comment: Matches Model Architecture: steers models by directly modulating attention queries/values, enabling finer control than residual-stream steering.
ViTCAE: ViT-based Class-conditioned Autoencoder - Score: 17 (R=9, N=8) - Date: 2025-09-23 - Comment: Model Architecture: ViT-based class-conditioned autoencoder where the class token controls a global latent prior; Model Compression and Efficiency: convergence-aware attention temperature scheduler that freezes/prunes converged heads to cut compute during training.
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer - Score: 17 (R=9, N=8) - Date: 2025-09-22 - Comment: Model Architecture: unified multimodal LLM with a hybrid vision tokenizer and dual adapters enabling joint image understanding and generation.
Localmax dynamics for attention in transformers and its asymptotic behavior - Score: 17 (R=9, N=8) - Date: 2025-09-22 - Comment: Model Architecture / Representation Learning: theoretical analysis of transformer attention via localmax dynamics interpolating softmax and hardmax with asymptotic behavior results.
Asymptotic Study of In-context Learning with Random Transformers through Equivalent Models - Score: 17 (R=9, N=8) - Date: 2025-09-19 - Comment: Representation Learning: theoretical analysis of Transformer in-context learning, proving equivalence to a finite-degree Hermite polynomial model in an asymptotic regime.
Stochastic Clock Attention for Aligning Continuous and Ordered Sequences - Score: 17 (R=9, N=8) - Date: 2025-09-19 - Comment: Model Architecture: introduces a new attention mechanism (clock-based alignment) enforcing continuity/monotonicity as a drop-in replacement.
Asterisk Operator - Score: 17 (R=9, N=8) - Date: 2025-09-18 - Comment: Model Architecture: introduces a new reasoning operator (Asterisk Operator) with analysis of convergence/universality and a compact distilled model.
Selective Induction Heads: How Transformers Select Causal Structures In Context - Score: 17 (R=9, N=8) - Date: 2025-09-11 - Comment: Representation Learning/Architecture Analysis: mechanistic study of transformers introducing selective induction heads, with a constructive 3-layer design and theory on convergence to MLE.
Breaking the Conventional Forward-Backward Tie in Neural Networks: Activation Functions - Score: 17 (R=9, N=8) - Date: 2025-09-10 - Comment: Model Architecture/Training Dynamics: relaxes forward-backward gradient symmetry, enabling non-differentiable activations (e.g., Heaviside) with alternative gradient signals.
Riemannian Batch Normalization: A Gyro Approach - Score: 17 (R=9, N=8) - Date: 2025-09-10 - Comment: Matches Model Architecture: introduces a principled Riemannian batch normalization (GyroBN) for gyrogroups with theoretical conditions and instantiations across multiple manifolds.
From Noise to Narrative: Tracing the Origins of Hallucinations in Transformers - Score: 17 (R=9, N=8) - Date: 2025-09-09 - Comment: Representation Learning: uses sparse autoencoders to analyze transformer internal concept activations and link them to hallucination behavior under input uncertainty.
Barycentric Neural Networks and Length-Weighted Persistent Entropy Loss: A Green Geometric and Topological Framework for Function Approximation - Score: 17 (R=9, N=8) - Date: 2025-09-09 - Comment: Model Architecture: introduces a small, shallow Barycentric Neural Network that exactly represents CPLFs and optimizes base points; also proposes a topological loss (length-weighted persistent entropy).
Rethinking the long-range dependency in Mamba/SSM and transformer models - Score: 17 (R=9, N=8) - Date: 2025-09-05 - Comment: The paper provides a theoretical analysis of long-range dependency in SSM and transformer models, which is relevant to model architecture and offers insights into the behavior of these models.
LExI: Layer-Adaptive Active Experts for Efficient MoE Model Inference - Score: 17 (R=9, N=8) - Date: 2025-09-04 - Comment: The paper introduces LExI, a novel optimization technique for MoE models, which aligns with the core topic of model architecture and efficiency improvements.
MoPEQ: Mixture of Mixed Precision Quantized Experts - Score: 17 (R=9, N=8) - Date: 2025-09-03 - Comment: The paper introduces a mixed precision quantization method for MoE architectures, relevant to model compression and efficiency.
MEPT: Mixture of Expert Prompt Tuning as a Manifold Mapper - Score: 17 (R=9, N=8) - Date: 2025-09-03 - Comment: The paper introduces Mixture of Expert Prompt Tuning (MEPT), which is relevant to the Mixture-of-Experts architecture.
DTRNet: Dynamic Token Routing Network to Reduce Quadratic Costs in Transformers - Score: 17 (R=9, N=8) - Date: 2025-09-03 - Comment: The paper introduces DTRNet, a dynamic token routing network to reduce quadratic costs in Transformers, relevant to model architecture.
Scalable GANs with Transformers - Score: 16 (R=9, N=7) - Date: 2025-09-30 - Comment: Model Architecture/Scalability: purely transformer-based GANs trained in VAE latent space with scale-friendly stabilization (intermediate supervision, width-aware LR).
Limit Analysis for Symbolic Multi-step Reasoning Tasks with Information Propagation Rules Based on Transformers - Score: 16 (R=9, N=7) - Date: 2025-09-30 - Comment: Theoretical analysis of Transformer reasoning limits via information propagation rules (steps scale with number of layers).
Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time - Score: 16 (R=9, N=7) - Date: 2025-09-29 - Comment: Model Architecture (MoE) + Test-time scaling: controls number of activated experts during inference to induce diverse reasoning paths without extra cost.
IIET: Efficient Numerical Transformer via Implicit Iterative Euler Method - Score: 16 (R=9, N=7) - Date: 2025-09-29 - Comment: Model Architecture + Efficiency: ODE-based Transformer using iterative implicit Euler with an influence-aware distillation scheme for improved performance-efficiency and compressibility.
Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just What They Say - Score: 16 (R=9, N=7) - Date: 2025-09-26 - Comment: Model Architecture (MoE-like): latent-space collaboration among heterogeneous LLM experts via a learned router and interaction layers enabling conditional routing with single-pass inference.
WAVECLIP: Wavelet Tokenization for Adaptive-Resolution CLIP - Score: 16 (R=9, N=7) - Date: 2025-09-26 - Comment: Model Architecture and Efficiency: wavelet-based tokenization replacing patch embeddings enables adaptive-resolution inference with KV caching and cross-level attention for compute-accuracy trade-offs.
On the Rate of Convergence of Kolmogorov-Arnold Network Regression Estimators - Score: 16 (R=9, N=7) - Date: 2025-09-25 - Comment: Model Architecture: theoretical convergence guarantees and minimax rates for Kolmogorov-Arnold Networks, informing structured function approximation.
Self-Evolving LLMs via Continual Instruction Tuning - Score: 16 (R=9, N=7) - Date: 2025-09-24 - Comment: Mixture-of-Experts design with dual LoRA experts and an adversarial discriminator for continual instruction tuning—parameter-efficient architecture (Model Architecture; Efficiency).
AdaMixT: Adaptive Weighted Mixture of Multi-Scale Expert Transformers for Time Series Forecasting - Score: 16 (R=9, N=7) - Date: 2025-09-24 - Comment: Model Architecture (MoE): introduces an adaptive weighted mixture of multi-scale expert Transformers with a gating network for time series forecasting.
Region-Aware Deformable Convolutions - Score: 16 (R=9, N=7) - Date: 2025-09-22 - Comment: Model Architecture: introduces Region-Aware Deformable Convolution with boundary-offset-defined receptive fields, combining attention-like adaptability with convolution efficiency.
Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting - Score: 16 (R=9, N=7) - Date: 2025-09-19 - Comment: Model Architecture (MoE): introduces a lightweight pretrained mixture-of-experts with spectral gating for expert selection; also aligns with efficiency-focused design.
SparseDoctor: Towards Efficient Chat Doctor with Mixture of Experts Enhanced Large Language Models - Score: 16 (R=9, N=7) - Date: 2025-09-19 - Comment: Directly matches Model Architecture (MoE with dynamic routing) and Compression/Efficiency (LoRA experts, sparse activation, memory optimization via expert memory queue).
MoWE : A Mixture of Weather Experts - Score: 16 (R=9, N=7) - Date: 2025-09-12 - Comment: Model Architecture (MoE): ViT-based gating for conditional per-grid, per-lead-time expert weighting to combine multiple forecasters; also emphasizes computational efficiency of training the gating network.
Two Sides of the Same Optimization Coin: Model Degradation and Representation Collapse in Graph Foundation Models - Score: 16 (R=9, N=7) - Date: 2025-09-11 - Comment: Model Architecture and Representation Learning criteria: mixture-of-codebooks with domain-aware routing (MoE-like) plus regularization to prevent representation collapse in graph foundation models.
Mitigating Attention Localization in Small Scale: Self-Attention Refinement via One-step Belief Propagation - Score: 16 (R=9, N=7) - Date: 2025-09-10 - Comment: Matches Model Architecture: refines self-attention via one-step belief propagation to counter attention localization; introduces GTD to analyze multi-hop dependencies.
HAVE: Head-Adaptive Gating and ValuE Calibration for Hallucination Mitigation in Large Language Models - Score: 16 (R=9, N=7) - Date: 2025-09-09 - Comment: Model Architecture/Efficiency: parameter-free, instance-level head gating and value calibration at decoding—dynamic attention head weighting to mitigate hallucinations.
TreeGPT: A Novel Hybrid Architecture for Abstract Syntax Tree Processing with Global Parent-Child Aggregation - Score: 16 (R=9, N=7) - Date: 2025-09-09 - Comment: Model Architecture: introduces a hybrid Transformer + TreeFFN with global parent–child aggregation for AST processing (conditional/dynamic structure-aware network).
MEPG:Multi-Expert Planning and Generation for Compositionally-Rich Image Generation - Score: 16 (R=9, N=7) - Date: 2025-09-05 - Comment: The paper proposes a Multi-Expert Planning and Generation Framework, which involves mixture-of-experts, relevant to model architecture.
Cache Management for Mixture-of-Experts LLMs -- extended version - Score: 16 (R=9, N=7) - Date: 2025-09-03 - Comment: The paper discusses cache management for Mixture-of-Experts LLMs, which is relevant to model architecture and efficiency.
MODE: Mixture of Document Experts for RAG - Score: 16 (R=9, N=7) - Date: 2025-09-03 - Comment: The paper introduces a Mixture of Document Experts (MODE) for retrieval-augmented generation, which aligns with the interest in Mixture-of-Experts architectures.
MoE-Health: A Mixture of Experts Framework for Robust Multimodal Healthcare Prediction - Score: 16 (R=9, N=7) - Date: 2025-09-01 - Comment: The paper presents MoE-Health, a Mixture of Experts framework for multimodal healthcare prediction, which is relevant to model architecture as it involves MoE and dynamic networks.
Adaptive Canonicalization with Application to Invariant Anisotropic Geometric Networks - Score: 16 (R=8, N=8) - Date: 2025-09-30 - Comment: Model Architecture/Equivariance: adaptive canonicalization ensures continuity and symmetry with universal approximation; addresses eigenbasis/rotation ambiguities in geometric networks.
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation - Score: 16 (R=8, N=8) - Date: 2025-09-30 - Comment: Model Architecture/Training Dynamics: stabilizes continuous-token AR generation via hyperspherical VAE latents to prevent variance collapse, improving AR image models.
Global Convergence in Neural ODEs: Impact of Activation Functions - Score: 16 (R=8, N=8) - Date: 2025-09-29 - Comment: Training Dynamics/Architecture: establishes global convergence of Neural ODEs under smooth, sufficiently nonlinear activations via NTK properties.
TRACE: Learning to Compute on Graphs - Score: 16 (R=8, N=8) - Date: 2025-09-29 - Comment: Model Architecture: Hierarchical Transformer backbone aligned with stepwise computation and a function-shift learning objective for graph computation.
Sobolev acceleration for neural networks - Score: 16 (R=8, N=8) - Date: 2025-09-25 - Comment: Training dynamics theory: rigorous analysis showing Sobolev training improves conditioning and accelerates convergence of ReLU networks.
Holographic Transformers for Complex-Valued Signal Processing: Integrating Phase Interference into Self-Attention - Score: 16 (R=8, N=8) - Date: 2025-09-25 - Comment: Model Architecture: introduces a physics-inspired complex-valued self-attention (holographic attention) within Transformers that explicitly models phase interference.
Physics-informed sensor coverage through structure preserving machine learning - Score: 16 (R=8, N=8) - Date: 2025-09-15 - Comment: Model Architecture: structure-preserving operator learning via conditional neural Whitney forms with transformer-based attention, enforcing discrete conservation.
MoSE: Unveiling Structural Patterns in Graphs via Mixture of Subgraph Experts - Score: 16 (R=8, N=8) - Date: 2025-09-12 - Comment: Model Architecture — Mixture-of-Experts for graphs (Mixture of Subgraph Experts) with dynamic routing over subgraph semantics and formal expressivity beyond SWL.
Geometric Foundations of Tuning without Forgetting in Neural ODEs - Score: 16 (R=8, N=8) - Date: 2025-09-04 - Comment: The paper provides a theoretical foundation for Tuning without Forgetting in neural ODEs, which is relevant to emerging trends in model architecture.
DRIFT-Net: A Spectral--Coupled Neural Operator for PDEs Learning - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Model Architecture/Efficiency: dual spectral and image branches with bandwise fusion for globally coupled neural operators, reducing parameters and improving throughput vs attention.
One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Model Architecture (MoE): sparse mixture of prompt experts with gating for continual learning, mitigating interference while keeping efficiency.
Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Model Architecture: a two-end-separated, middle-shared transformer to mitigate modality gradient conflicts in unified multimodal AR models.
Multi-Scale Geometric Autoencoder - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Matches model architecture and representation learning: proposes a new autoencoder design with asymmetric global/local geometric constraints, with theory.
Echo Flow Networks - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Model Architecture: extended Echo State Networks with matrix-gated composite activations and a dual-stream design for long-range memory with constant memory/compute.
Understanding and Enhancing the Planning Capability of Language Models via Multi-Token Prediction - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Representation learning/training dynamics: theoretical analysis of multi-token prediction in Transformers for transitive relations, plus architectural tweaks (NTI, transformer-based transfer layer).
Convolutional Set Transformer - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Model Architecture: Convolutional Set Transformer operates directly on 3D image tensors to jointly perform feature extraction and set-context modeling.
Unlocking the Power of Mixture-of-Experts for Task-Aware Time Series Analytics - Score: 15 (R=8, N=7) - Date: 2025-09-29 - Comment: Model Architecture: Mixture-of-Experts with task-aware recurrent noisy gating and temporal/channel token routing with load-balancing loss.
ChaosNexus: A Foundation Model for Universal Chaotic System Forecasting with Multi-scale Representations - Score: 15 (R=8, N=7) - Date: 2025-09-29 - Comment: Model Architecture: foundation model with a multi-scale transformer (ScaleFormer) augmented by Mixture-of-Experts layers—explicitly matching MoE criterion.
A circuit for predicting hierarchical structure in-context in Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-09-29 - Comment: Representation Learning: mechanistic analysis of Transformer induction heads revealing adaptive circuits for hierarchical in-context learning and latent-context routing.
Maxout Polytopes - Score: 15 (R=8, N=7) - Date: 2025-09-26 - Comment: Matches Model Architecture: theoretical characterization of maxout network geometry (polytopes, separating hypersurfaces) is foundational analysis of an activation/architecture class.
Why Attention Fails: The Degeneration of Transformers into MLPs in Time Series Forecasting - Score: 15 (R=8, N=7) - Date: 2025-09-26 - Comment: Analysis of existing architectures: explains attention failure in time-series Transformers (degeneration into MLPs) with theory and controlled data, yielding insights into embeddings and training dynamics.
Unlocking Noise-Resistant Vision: Key Architectural Secrets for Robust Models - Score: 15 (R=8, N=7) - Date: 2025-09-26 - Comment: Model Architecture and Representation Learning: theoretical analysis linking stem kernels, downsampling, pooling, and preprocessing to robustness, yielding principled architectural design rules.
RoboSSM: Scalable In-context Imitation Learning via State-Space Models - Score: 15 (R=8, N=7) - Date: 2025-09-25 - Comment: Model Architecture/Efficiency: replaces Transformers with state-space models (SSM) for linear-time inference and long-context extrapolation.
Shared-Weights Extender and Gradient Voting for Neural Network Expansion - Score: 15 (R=8, N=7) - Date: 2025-09-24 - Comment: Model Architecture/Efficiency: dynamic network expansion (SWE) to integrate new neurons + gradient-based allocation (SVoD) to grow capacity without retraining.
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe - Score: 15 (R=8, N=7) - Date: 2025-09-24 - Comment: Model Architecture and Efficiency: introduces a unified 3D-Resampler for compact image/video encoding with substantial memory and inference-time reductions in an 8B MLLM.
Probabilistic Token Alignment for Large Language Model Fusion - Score: 15 (R=8, N=7) - Date: 2025-09-24 - Comment: Model Architecture/Fusion: probabilistic token alignment via optimal transport enables soft vocabulary mapping for LLM fusion across architectures.
Interpreting Attention Heads for Image-to-Text Information Flow in Large Vision-Language Models - Score: 15 (R=8, N=7) - Date: 2025-09-23 - Comment: Matches Representation Learning: attention-head attribution to elucidate image-to-text information flow in LVLMs.
Geometric Mixture Classifier (GMC): A Discriminative Per-Class Mixture of Hyperplanes - Score: 15 (R=8, N=7) - Date: 2025-09-23 - Comment: Model Architecture: introduces a discriminative per-class mixture-of-hyperplanes with temperature-controlled soft-OR (log-sum-exp), yielding an interpretable conditional mixture model with linear-time inference.
Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers - Score: 15 (R=8, N=7) - Date: 2025-09-22 - Comment: Model Architecture: introduces a VQVAE-based attention-schema controller for transformers to dynamically manage attention allocation (conditional/dynamic network) with efficiency aims.
Attention Beyond Neighborhoods: Reviving Transformer for Graph Clustering - Score: 15 (R=8, N=7) - Date: 2025-09-19 - Comment: Matches Model Architecture by embedding attention directly into graph structure for clustering; also includes an efficiency-oriented KV cache mechanism.
Exploring the Global-to-Local Attention Scheme in Graph Transformers: An Empirical Study - Score: 15 (R=8, N=7) - Date: 2025-09-19 - Comment: Matches Model Architecture: proposes a global-to-local attention scheme in Graph Transformers with cross-layer fusion, aiming to mitigate information loss while maintaining linear complexity.
Property-Isometric Variational Autoencoders for Sequence Modeling and Design - Score: 15 (R=8, N=7) - Date: 2025-09-19 - Comment: Matches Representation Learning and Architecture via a variational autoencoder with an isometric regularizer and property-graph-guided encoder to preserve manifold geometry of properties.
State Space Models over Directed Graphs - Score: 15 (R=8, N=7) - Date: 2025-09-18 - Comment: Matches Model Architecture: introduces a directed-graph SSM architecture (DirGraphSSM) with a novel k-hop ego sequentialization for causal dependencies in directed graphs.
MapAnything: Universal Feed-Forward Metric 3D Reconstruction - Score: 15 (R=8, N=7) - Date: 2025-09-18 - Comment: Model Architecture: proposes a unified transformer-based feed-forward 3D reconstruction backbone with a factored multi-view scene representation.
Semantic Fusion with Fuzzy-Membership Features for Controllable Language Modelling - Score: 15 (R=8, N=7) - Date: 2025-09-18 - Comment: Matches Model Architecture: augments a Transformer LM with a gated semantic feature channel and auxiliary semantic reconstruction for controllable generation with small overhead.
Learning to Route: Per-Sample Adaptive Routing for Multimodal Multitask Prediction - Score: 15 (R=8, N=7) - Date: 2025-09-17 - Comment: Matches Model Architecture: per-sample adaptive routing among modality experts and shared/independent heads (MoE-style dynamic network).
M4GN: Mesh-based Multi-segment Hierarchical Graph Network for Dynamic Simulations - Score: 15 (R=8, N=7) - Date: 2025-09-17 - Comment: Matches Model Architecture/Efficiency: hierarchical GNN with segment-centric aggregation and macro transformer to reduce propagation depth and over-smoothing.
Neural Scaling Laws for Deep Regression - Score: 15 (R=8, N=7) - Date: 2025-09-15 - Comment: Training dynamics/scaling laws: empirical neural scaling laws for deep regression across architectures, providing foundational insights.
Joint Learning using Mixture-of-Expert-Based Representation for Enhanced Speech Generation and Robust Emotion Recognition - Score: 15 (R=8, N=7) - Date: 2025-09-11 - Comment: Model Architecture: proposes Sparse MERIT, a sparse Mixture-of-Experts with frame-wise routing and task-specific gating for multi-task learning, enabling conditional/dynamic computation.
DeepGraphLog for Layered Neurosymbolic AI - Score: 15 (R=8, N=7) - Date: 2025-09-10 - Comment: Model architecture: neurosymbolic framework that layers symbolic reasoning with Graph Neural Predicates, enabling neural–symbolic integration beyond fixed pipelines.
Long-Range Graph Wavelet Networks - Score: 15 (R=8, N=7) - Date: 2025-09-09 - Comment: Model Architecture: introduces a wavelet-based GNN with spectral-domain parameterization to capture long-range interactions within a unified local/global design.
VCMamba: Bridging Convolutions with Multi-Directional Mamba for Efficient Visual Representation - Score: 15 (R=8, N=7) - Date: 2025-09-08 - Comment: Matches Model Architecture criterion—hybrid CNN + multi-directional Mamba SSM backbone with hierarchical design for linear-complexity global modeling.
Manipulating Transformer-Based Models: Controllability, Steerability, and Robust Interventions - Score: 15 (R=8, N=7) - Date: 2025-09-08 - Comment: Model Architecture/Representation Learning: principled activation- and weight-space interventions with theory for controllable transformers.
Detecting Regional Spurious Correlations in Vision Transformers via Token Discarding - Score: 15 (R=8, N=7) - Date: 2025-09-05 - Comment: The paper focuses on detecting spurious correlations in vision transformers, which aligns with the interest in understanding model architectures and their training dynamics.
Hierarchical Federated Foundation Models over Wireless Networks for Multi-Modal Multi-Task Intelligence: Integration of Edge Learning with D2D/P2P-Enabled Fog Learning Architectures - Score: 15 (R=8, N=7) - Date: 2025-09-05 - Comment: The paper discusses hierarchical federated foundation models with a focus on mixture-of-experts (MoE) and architectural innovations, which aligns with the model architecture criterion.
LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence - Score: 15 (R=8, N=7) - Date: 2025-09-04 - Comment: The paper introduces LimiX, a foundation model for structured data, which aligns with the core topic of foundational models and architecture-level innovations.
DrDiff: Dynamic Routing Diffusion with Hierarchical Attention for Breaking the Efficiency-Quality Trade-off - Score: 15 (R=8, N=7) - Date: 2025-09-04 - Comment: The paper introduces a novel framework for long-text generation with a dynamic expert scheduling mechanism and hierarchical sparse attention, which aligns with the interest in model architecture innovations and efficiency improvements.
Balanced Multimodal Learning: An Unidirectional Dynamic Interaction Perspective - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper proposes a new strategy for multimodal learning that addresses modality imbalance, which is relevant to model architecture innovations.
Conditional-$t^3$VAE: Equitable Latent Space Allocation for Fair Generation - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper proposes Conditional-$t^3$VAE, which is relevant to model architecture innovations, particularly in variational autoencoders.
Unsupervised Training of Vision Transformers with Synthetic Negatives - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper explores the use of synthetic negatives in self-supervised learning for vision transformers, contributing to representation learning.
GradES: Significantly Faster Training in Transformers with Gradient-Based Early Stopping - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper introduces GradES, a gradient-based early stopping method for transformers, which is relevant to model architecture and efficiency improvements.
Preconditioned Regularized Wasserstein Proximal Sampling - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper proposes a preconditioned sampling method with connections to transformer architectures, relevant to model architecture and efficiency.
Efficient Transformer-Inspired Variants of Physics-Informed Deep Operator Networks - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper proposes Transformer-inspired variants of Deep Operator Networks, contributing to model architecture innovations.
Equivariant U-Shaped Neural Operators for the Cahn-Hilliard Phase-Field Model - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper introduces an equivariant U-shaped neural operator, which is relevant to model architecture innovations.
An Explainable Gaussian Process Auto-encoder for Tabular Data - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper proposes a novel Gaussian process auto-encoder, which is relevant to model architecture innovations, particularly in autoencoders.
Quantum Circuits for Quantum Convolutions: A Quantum Convolutional Autoencoder - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper discusses quantum convolutional autoencoders, which is relevant to model architecture innovations, particularly in autoencoders.
Memory Limitations of Prompt Tuning in Transformers - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper explores the memory limitations of prompt tuning in transformers, relevant to model architecture insights.
Rethinking Layer-wise Model Merging through Chain of Merges - Score: 15 (R=8, N=7) - Date: 2025-09-01 - Comment: The paper proposes a novel method for model merging that accounts for inter-layer dependencies, which is relevant to model architecture and efficiency. It introduces a new approach to mitigate internal covariate shift, showing substantial insights.
Deep Residual Echo State Networks: exploring residual orthogonal connections in untrained Recurrent Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-09-01 - Comment: The paper presents Deep Residual Echo State Networks, exploring residual orthogonal connections in untrained RNNs, which is relevant to model architecture innovations.
Towards universal property prediction in Cartesian space: TACE is all you need - Score: 15 (R=7, N=8) - Date: 2025-09-19 - Comment: Matches Model Architecture: introduces a unified Cartesian-tensor framework (TACE/TMP) ensuring invariance/equivariance for scalar and tensor property prediction—foundational architectural design.
HyPINO: Multi-Physics Neural Operators via HyperPINNs and the Method of Manufactured Solutions - Score: 15 (R=7, N=8) - Date: 2025-09-08 - Comment: Architecture: hypernetwork-based neural operator that maps PDE specs to PINNs with mixed supervision and iterative refinement; advances zero-shot operator generalization with computational gains.
Differential-Integral Neural Operator for Long-Term Turbulence Forecasting - Score: 14 (R=7, N=7) - Date: 2025-09-26 - Comment: Model Architecture: introduces a physics-grounded dual-branch operator (local differential conv with provable derivative + global Transformer kernel) enabling stable long-range operator learning.
CAD-Tokenizer: Towards Text-based CAD Prototyping via Modality-Specific Tokenization - Score: 14 (R=7, N=7) - Date: 2025-09-26 - Comment: Model Architecture: modality-specific tokenization via a sequence-based VQ-VAE with primitive-level pooling and constrained decoding (autoencoder/tokenizer innovation).
Learning Greens Operators through Hierarchical Neural Networks Inspired by the Fast Multipole Method - Score: 14 (R=7, N=7) - Date: 2025-09-26 - Comment: Matches Model Architecture: hierarchical network inspired by Fast Multipole Method to learn Green’s operators—operator-learning architecture design.
Understanding and Improving Adversarial Robustness of Neural Probabilistic Circuits - Score: 14 (R=7, N=7) - Date: 2025-09-26 - Comment: Architecture analysis/robustness criterion: theoretical study of adversarial robustness in Neural Probabilistic Circuits and a new RNPC with class-wise integration achieving provable robustness improvements.
Graph Variate Neural Networks - Score: 14 (R=7, N=7) - Date: 2025-09-25 - Comment: Model architecture: GVNN layers with signal-dependent connectivity tensor; efficient spatio-temporal convolution with linear sequence complexity.
You Only Measure Once: On Designing Single-Shot Quantum Machine Learning Models - Score: 14 (R=7, N=7) - Date: 2025-09-25 - Comment: Model Architecture/Efficiency: probability-aggregation outputs for QML enable accurate single-shot inference, reducing measurement cost.
Modular Machine Learning with Applications to Genetic Circuit Composition - Score: 14 (R=7, N=7) - Date: 2025-09-25 - Comment: Model Architecture/Representation: modular learning with theoretical identifiability of component functions leveraging compositional structure.
Frame-Stacked Local Transformers For Efficient Multi-Codebook Speech Generation - Score: 14 (R=7, N=7) - Date: 2025-09-25 - Comment: Model Architecture/Efficiency: frame-stacked local transformers for multi-codebook decoding capture intra-timestep dependencies and accelerate generation—architectural and decoding-efficiency innovation.
Unrolled Graph Neural Networks for Constrained Optimization - Score: 14 (R=7, N=7) - Date: 2025-09-23 - Comment: Matches Model Architecture: unrolls dual ascent into coupled primal/dual GNNs with training constraints mirroring the optimization dynamics.
SABER: Uncovering Vulnerabilities in Safety Alignment via Cross-Layer Residual Connection - Score: 14 (R=7, N=7) - Date: 2025-09-22 - Comment: Model Architecture / Representation Learning: introduces a cross-layer residual connection to bypass safety and analyzes localization of alignment signals in mid–late transformer layers.
AToken: A Unified Tokenizer for Vision - Score: 14 (R=7, N=7) - Date: 2025-09-19 - Comment: Model Architecture: unified visual tokenizer with a pure Transformer and 4D rotary position embeddings, supporting continuous/discrete tokens across images, video, and 3D.
Kalman Bayesian Transformer - Score: 14 (R=7, N=7) - Date: 2025-09-17 - Comment: Matches training methodology for Transformers: Bayesian sequential fine-tuning with Kalman BNN-style moment propagation and uncertainty-aware updates.
Quantum Graph Attention Networks: Trainable Quantum Encoders for Inductive Graph Learning - Score: 14 (R=7, N=7) - Date: 2025-09-16 - Comment: Model Architecture: introduces quantum attention mechanisms in Quantum Graph Neural Networks for inductive graph learning.
From Grounding to Skolemization: A Logic-Constrained Vector Symbolic Architecture for Complex Query Answering - Score: 14 (R=7, N=7) - Date: 2025-09-16 - Comment: Model Architecture/Representation Learning: a logic-constrained Vector Symbolic Architecture with differentiable Skolemization and neural negation; theoretical universality for EFO1.
Learning spatially structured open quantum dynamics with regional-attention transformers - Score: 14 (R=7, N=7) - Date: 2025-09-09 - Comment: Model Architecture: introduces a regional-attention Transformer incorporating translational invariance as an inductive bias and conditioning on global controls.

Model Compression and Efficiency (183)

Pretraining Large Language Models with NVFP4 - Score: 19 (R=10, N=9) - Date: 2025-09-30 - Comment: Model Compression and Efficiency + High Performance Computing: stable 4-bit (NVFP4) LLM pretraining using RHT, 2D quantization, stochastic rounding, and selective high-precision layers.
LOTFormer: Doubly-Stochastic Linear Attention via Low-Rank Optimal Transport - Score: 19 (R=10, N=9) - Date: 2025-09-30 - Comment: Efficiency: linear-time attention via low-rank entropic optimal transport with provably doubly-stochastic maps and O(nr) compute.
Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization - Score: 19 (R=10, N=9) - Date: 2025-09-30 - Comment: Compression/Efficiency: FP4 post-training quantization tailored to MXFP4/NVFP4 (MR-GPTQ) with format-specific algorithms and high-performance GPU kernels for LLM inference.
HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space - Score: 19 (R=10, N=9) - Date: 2025-09-29 - Comment: Model Architecture (MoE) + Compression/Efficiency: second-order, Hessian-based atomic expert pruning with reduced complexity enables fine-grained MoE compression.
Beyond Johnson-Lindenstrauss: Uniform Bounds for Sketched Bilinear Forms - Score: 19 (R=10, N=9) - Date: 2025-09-29 - Comment: Compression/Efficiency/HPC Theory: Uniform bounds for sketched bilinear forms via generic chaining; extends JL/RIP and improves guarantees for gradient compression and bandits.
A Recovery Guarantee for Sparse Neural Networks - Score: 19 (R=10, N=9) - Date: 2025-09-25 - Comment: Model Compression and Efficiency—sparsity: theoretical sparse recovery guarantees for ReLU networks via iterative hard thresholding with linear-memory footprint.
PTQTP: Post-Training Quantization to Trit-Planes for Large Language Models - Score: 19 (R=10, N=9) - Date: 2025-09-23 - Comment: Compression/Efficiency: introduces ternary post-training quantization to trit-planes enabling multiplication-free inference with a progressive approximation algorithm.
ProxyAttn: Guided Sparse Attention via Representative Heads - Score: 18 (R=10, N=8) - Date: 2025-09-30 - Comment: Compression/Efficiency: guided sparse attention via representative heads with dynamic budgets, yielding training-free acceleration of attention and prefill.
Differentiable Sparsity via $D$-Gating: Simple and Versatile Structured Penalization - Score: 18 (R=10, N=8) - Date: 2025-09-30 - Comment: Compression/Efficiency: differentiable structured sparsity via D-Gating, theoretically equivalent to non-smooth group penalties with improved optimization dynamics.
Tequila: Trapping-free Ternary Quantization for Large Language Models - Score: 18 (R=10, N=8) - Date: 2025-09-30 - Comment: Compression/Efficiency: ternary quantization for LLMs with trapping-free optimization by repurposing deadzone-trapped weights as dynamic biases.
PATCH: Learnable Tile-level Hybrid Sparsity for LLMs - Score: 18 (R=10, N=8) - Date: 2025-09-30 - Comment: Compression and Efficiency: hybrid tile-level sparsity mixing dense and 2:4 patterns via learnable masks for LLMs, enabling controllable sparsity with practical speedups.
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights - Score: 18 (R=10, N=8) - Date: 2025-09-30 - Comment: Compression/Efficiency: calibration-free low-precision LLM quantization using Sinkhorn-normalized per-row/column scaling to reduce matrix imbalance.
Compute-Optimal Quantization-Aware Training - Score: 18 (R=10, N=8) - Date: 2025-09-30 - Comment: Model Compression and Efficiency: compute allocation scaling law for QAT vs FP, tokens-per-parameter-byte predictor, and fused cooldown+QAT for efficient quantized training.
OjaKV: Context-Aware Online Low-Rank KV Cache Compression with Oja's Rule - Score: 18 (R=10, N=8) - Date: 2025-09-29 - Comment: Model Compression and Efficiency: context-aware online low-rank KV cache compression with Oja’s rule and a hybrid storage policy; practical long-context memory optimization compatible with FlashAttention.
Myosotis: structured computation for attention like layer - Score: 18 (R=10, N=8) - Date: 2025-09-26 - Comment: Model Architecture + Efficiency: proposes an attention-like layer combining sparsity and recurrence via efficient inversion of tree-structured matrices to reduce quadratic compute/memory.
Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment - Score: 18 (R=10, N=8) - Date: 2025-09-25 - Comment: Compression/Efficiency: weight-only PTQ for LLMs with fractional-bit quantizers and optimal bit allocation; practical CUDA kernels and mixed-scheme layer fusion.
CR-Net: Scaling Parameter-Efficient Training with Cross-Layer Low-Rank Structure - Score: 18 (R=10, N=8) - Date: 2025-09-24 - Comment: Cross-layer low-rank residual architecture with activation recomputation for memory/compute efficiency (Compression/Efficiency; Low-rank; Systems-level memory optimization).
IMPQ: Interaction-Aware Layerwise Mixed Precision Quantization for LLMs - Score: 18 (R=10, N=8) - Date: 2025-09-22 - Comment: Model Compression and Efficiency: interaction-aware mixed-precision quantization using Shapley-based layer sensitivity/interactions and binary quadratic optimization for 2/4-bit LLMs.
NIRVANA: Structured pruning reimagined for large language models compression - Score: 18 (R=10, N=8) - Date: 2025-09-18 - Comment: Matches Model Compression and Efficiency: structured pruning (sparsity) for LLMs with NTK-based saliency and adaptive layer/module sparsity allocation.
Reasoning Models Can be Accurately Pruned Via Chain-of-Thought Reconstruction - Score: 18 (R=10, N=8) - Date: 2025-09-17 - Comment: Model Compression and Efficiency: pruning via joint reconstruction of inputs and on-policy chain-of-thought for decode-dominated reasoning models (RAC).
Low-rank Orthogonalization for Large-scale Matrix Optimization with Applications to Foundation Model Training - Score: 18 (R=10, N=8) - Date: 2025-09-17 - Comment: Model Compression and Efficiency: introduces low-rank orthogonalization exploiting low-rank gradients; also relevant to HPC/foundation model training optimizers.
From PowerSGD to PowerSGD+: Low-Rank Gradient Compression for Distributed Optimization with Convergence Guarantees - Score: 18 (R=10, N=8) - Date: 2025-09-17 - Comment: Model Compression and Efficiency: low-rank gradient compression with periodic SVD subspace updates and formal convergence guarantees (PowerSGD+).
PHLoRA: data-free Post-hoc Low-Rank Adapter extraction from full-rank checkpoint - Score: 18 (R=10, N=8) - Date: 2025-09-17 - Comment: Matches Compression/Efficiency: data-free low-rank adapter (LoRA) extraction from full-rank checkpoints enabling scalable inference and pruning.
AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models - Score: 18 (R=10, N=8) - Date: 2025-09-16 - Comment: Mixed-precision weight-only quantization for LLMs with AutoML search (Model Compression and Efficiency: quantization, layer-wise bit-width assignment).
AQUA: Attention via QUery mAgnitudes for Memory and Compute Efficient Inference in LLMs - Score: 18 (R=10, N=8) - Date: 2025-09-16 - Comment: Model Compression/Efficiency: attention approximation via SVD-based projection with dynamic dimension sparsification, including formal efficiency analysis and direct KV/compute reductions.
LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation - Score: 18 (R=10, N=8) - Date: 2025-09-15 - Comment: Model Compression and Efficiency: dynamic KV-cache eviction with per-layer and per-head budget allocation derived from minimizing residual-stream information loss in Transformers.
ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms - Score: 18 (R=10, N=8) - Date: 2025-09-12 - Comment: Model Compression and Efficiency: ultra-low-bit LLM quantization via learnable orthogonal butterfly transforms (O(n log n) complexity) enabling layer-adaptive rotations for outlier suppression.
EvolKV: Evolutionary KV Cache Compression for LLM Inference - Score: 18 (R=10, N=8) - Date: 2025-09-11 - Comment: Efficiency/HPC criterion: task-driven KV cache compression for LLM inference via adaptive layer-wise budget allocation using evolutionary search.
KVCompose: Efficient Structured KV Cache Compression with Composite Tokens - Score: 18 (R=10, N=8) - Date: 2025-09-08 - Comment: Model Compression and Efficiency: structured KV cache compression via attention-guided, layer-adaptive composite tokens compatible with standard inference engines.
Globally aware optimization with resurgence - Score: 18 (R=9, N=9) - Date: 2025-09-03 - Comment: The paper introduces a novel optimization framework using resurgence theory, which is relevant to emerging trends in optimization.
Enhancing Low-Rank Adaptation with Structured Nonlinear Transformations - Score: 17 (R=10, N=7) - Date: 2025-09-29 - Comment: Model Compression and Efficiency: non-linear low-rank adaptation (LoRAN) with sine-based activation for parameter-efficient fine-tuning.
1 bit is all we need: binary normalized neural networks - Score: 17 (R=10, N=7) - Date: 2025-09-10 - Comment: Strongly matches Compression/Efficiency: introduces binary normalized layers with 1-bit (0/1) parameters across all layers, an extreme quantization approach claiming near-parity performance.
LoaQ: Layer-wise Output Approximation Quantization - Score: 17 (R=10, N=7) - Date: 2025-09-09 - Comment: Model Compression and Efficiency: a layer-wise PTQ method targeting output-level consistency with a simple closed-form solution, improving quantization quality.
Scaling with Collapse: Efficient and Predictable Training of LLM Families - Score: 17 (R=9, N=8) - Date: 2025-09-30 - Comment: HPC/Training Efficiency: loss-curve collapse as a signature of optimal scaling; enables early diagnostics and hyperparameter tuning for LLM families.
SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching - Score: 17 (R=9, N=8) - Date: 2025-09-30 - Comment: Efficiency/Cache: semantic KV-cache sharing across prompts using token-level LSH and RoPE-aware matching to reduce memory and computation.
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention - Score: 17 (R=9, N=8) - Date: 2025-09-30 - Comment: Compression/Efficiency and Transformer Attention: Sparse-Linear Attention fuses sparse and linear attention with a custom GPU kernel, drastically reducing attention cost with minimal quality loss.
F-Adapter: Frequency-Adaptive Parameter-Efficient Fine-Tuning in Scientific Machine Learning - Score: 17 (R=9, N=8) - Date: 2025-09-30 - Comment: PEFT/Model Architecture: frequency-adaptive adapters for Fourier operator models with theory on LoRA’s approximation limits vs adapters; parameter-efficient fine-tuning advances.
Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding - Score: 17 (R=9, N=8) - Date: 2025-09-29 - Comment: Efficiency for LLM inference: Group Tree Optimization aligns training with speculative decoding’s tree policy, with a provable reward tied to acceptance length and speedup.
SlimDiff: Training-Free, Activation-Guided Hands-free Slimming of Diffusion Models - Score: 17 (R=9, N=8) - Date: 2025-09-29 - Comment: Compression/Efficiency: training-free, activation-guided structural slimming (low-rank/sparsity) of diffusion models for speed and parameter reduction.
Explicit and Effectively Symmetric Schemes for Neural SDEs - Score: 17 (R=9, N=8) - Date: 2025-09-26 - Comment: High Performance Computing / Efficiency: proposes stable, near-reversible explicit Runge–Kutta schemes for neural SDEs enabling memory-efficient training with accurate gradients.
FastEagle: Cascaded Drafting for Accelerating Speculative Decoding - Score: 17 (R=9, N=8) - Date: 2025-09-26 - Comment: Model Compression and Efficiency: introduces a non-autoregressive cascaded drafter and constrained draft tree for speculative decoding, removing sequential passes and enabling lossless LLM inference acceleration.
TensLoRA: Tensor Alternatives for Low-Rank Adaptation - Score: 17 (R=9, N=8) - Date: 2025-09-25 - Comment: Model Compression/Efficiency: generalizes LoRA to tensorized low-rank adaptations with mode-specific compression across attention projections.
Otters: An Energy-Efficient SpikingTransformer via Optical Time-to-First-Spike Encoding - Score: 17 (R=9, N=8) - Date: 2025-09-24 - Comment: Model Architecture and Efficiency: spiking Transformer with optical TTFS encoding and QNN-to-SNN conversion for energy-efficient inference.
HyperAdapt: Simple High-Rank Adaptation - Score: 17 (R=9, N=8) - Date: 2025-09-24 - Comment: Model compression and efficiency (PEFT): row/column-wise diagonal scaling inducing high-rank updates with only O(n+m) trainable parameters per matrix.
FastMTP: Accelerating LLM Inference with Enhanced Multi-Token Prediction - Score: 17 (R=9, N=8) - Date: 2025-09-24 - Comment: Model Compression and Efficiency/HPC: accelerates LLM inference via enhanced multi-token prediction aligned with inference and dynamic vocabulary compression for speculative decoding.
SBVR: Summation of BitVector Representation for Efficient LLM Quantization - Score: 17 (R=9, N=8) - Date: 2025-09-24 - Comment: Strong match to Model Compression and Efficiency: non-uniform PTQ with hardware-friendly codes and custom CUDA enabling compute in quantized domain.
Accurate and Efficient Low-Rank Model Merging in Core Space - Score: 17 (R=9, N=8) - Date: 2025-09-24 - Comment: Model Compression/Efficiency: low-rank (LoRA) model merging via a shared Core Space with proof of no information loss and complexity analysis.
Elucidating the Design Space of FP4 training - Score: 17 (R=9, N=8) - Date: 2025-09-23 - Comment: Matches Model Compression and Efficiency: unified FP4 low-precision training framework for microscaling quantization with cost analysis and stabilization techniques.
Flow-Induced Diagonal Gaussian Processes - Score: 17 (R=9, N=8) - Date: 2025-09-23 - Comment: Matches Model Compression and Efficiency: compresses NN weight uncertainty via low-dimensional inducing weights with normalizing-flow priors and spectral regularization.
Language Modeling with Learned Meta-Tokens - Score: 17 (R=9, N=8) - Date: 2025-09-23 - Comment: Model Architecture: introduces meta-attention and learned meta-tokens that compress/cache context; Efficiency: enables long-context length generalization via implicit context compression.
Distribution-Aligned Decoding for Efficient LLM Task Adaptation - Score: 17 (R=9, N=8) - Date: 2025-09-22 - Comment: Compression/Efficiency: decoding-time task adaptation via a KL-gradient-derived steering vector; PEFT-compatible with theoretical first-order equivalence to full fine-tuning.
RMT-KD: Random Matrix Theoretic Causal Knowledge Distillation - Score: 17 (R=9, N=8) - Date: 2025-09-22 - Comment: Compression/Efficiency: Random Matrix Theory–guided dimensionality reduction/knowledge distillation preserving informative directions without pruning/heuristic ranks.
Emulating Human-like Adaptive Vision for Efficient and Flexible Machine Visual Perception - Score: 17 (R=9, N=8) - Date: 2025-09-22 - Comment: Model Architecture + Efficiency: conditional/dynamic vision framework (sequential fixations, early stopping) with theory enabling end-to-end training of non-differentiable policies.
Pre-training under infinite compute - Score: 17 (R=9, N=8) - Date: 2025-09-19 - Comment: Matches Compression/Efficiency and Representation Learning via data-constrained pretraining insights: strong regularization, epoch/parameter scaling laws, ensemble scaling and distillation for data efficiency.
Dense Video Understanding with Gated Residual Tokenization - Score: 17 (R=9, N=8) - Date: 2025-09-18 - Comment: Directly targets compression/efficiency via conditional tokenization (motion-compensated gating and token merging) to achieve sub-linear token growth in VLLMs.
Slim-SC: Thought Pruning for Efficient Scaling with Self-Consistency - Score: 17 (R=9, N=8) - Date: 2025-09-18 - Comment: Model Compression and Efficiency: test-time scaling efficiency via step-wise pruning of self-consistency reasoning chains using inter-chain similarity, cutting KV cache and latency with theoretical backing.
TinyServe: Query-Aware Cache Selection for Efficient LLM Serving - Score: 17 (R=9, N=8) - Date: 2025-09-17 - Comment: High Performance Computing/Systems: query-aware KV-cache page selection with fused CUDA kernel enabling structured KV sparsity for efficient LLM serving.
Harnessing Optimization Dynamics for Curvature-Informed Model Merging - Score: 17 (R=9, N=8) - Date: 2025-09-17 - Comment: Model Compression and Efficiency: curvature-aware model merging (OTA) with sparse/low-rank grafting (FFG) to compose SFT capabilities without joint retraining.
SpecVLM: Fast Speculative Decoding in Vision-Language Models - Score: 17 (R=9, N=8) - Date: 2025-09-16 - Comment: Compression/Efficiency and Systems: Speculative decoding for VLMs with KV-cache-aware design and elastic visual compression (pruning/pooling/resampler) for accelerated inference.
Resource-Aware Neural Network Pruning Using Graph-based Reinforcement Learning - Score: 17 (R=9, N=8) - Date: 2025-09-16 - Comment: Model Compression/Efficiency: resource-aware pruning via graph-based RL (GAT encoder over network graph) with fine-grained binary channel actions under constraints.
ENSI: Efficient Non-Interactive Secure Inference for Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-09-12 - Comment: HE–LLM co-design (BitNet integration, sigmoid attention under HE, bootstrapping fused with RMSNorm) for efficient secure inference—systems/efficiency innovation.
SQAP-VLA: A Synergistic Quantization-Aware Pruning Framework for High-Performance Vision-Language-Action Models - Score: 17 (R=9, N=8) - Date: 2025-09-12 - Comment: Model Compression and Efficiency: co-designed quantization-aware token pruning with improved quantizer; training-free, structured inference acceleration for VLA models.
Facet: highly efficient E(3)-equivariant networks for interatomic potentials - Score: 17 (R=9, N=8) - Date: 2025-09-11 - Comment: Model Architecture and Efficiency: presents a faster E(3)-equivariant layer via spherical-grid projection plus MLPs and spline distance encodings, cutting compute/memory and enabling >10x faster training of equivariant GNNs.
OCTANE -- Optimal Control for Tensor-based Autoencoder Network Emergence: Explicit Case - Score: 17 (R=9, N=8) - Date: 2025-09-11 - Comment: Model Architecture and Compression/Efficiency — optimal-control formulation of autoencoders with low-rank tensor manifold integration for memory-efficient training and automated architecture discovery.
Lookup multivariate Kolmogorov-Arnold Networks - Score: 17 (R=9, N=8) - Date: 2025-09-10 - Comment: Model Architecture + Compression/Efficiency: proposes lmKANs as a drop-in replacement for linear layers using spline lookup multivariate functions, cutting inference FLOPs (up to 6x) with dedicated CUDA kernels.
Sample-efficient Integration of New Modalities into Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-09-08 - Comment: Model architecture: a hypernetwork generates adapters for a shared modality-agnostic projector, enabling few-shot integration of new modalities into an LLM (sample-efficient modality extension of foundation models).
Enhancing LLM Efficiency: Targeted Pruning for Prefill-Decode Disaggregation in Inference - Score: 17 (R=9, N=8) - Date: 2025-09-08 - Comment: Strongly matches Model Compression and Efficiency—stage-specific block pruning for prefill vs. decode and token-aware KV cache pruning tailored to PD disaggregation.
IPA: An Information-Preserving Input Projection Framework for Efficient Foundation Model Adaptation - Score: 17 (R=9, N=8) - Date: 2025-09-05 - Comment: The paper proposes a new framework for efficient foundation model adaptation, which aligns with model compression and efficiency improvements.
PagedEviction: Structured Block-wise KV Cache Pruning for Efficient Large Language Model Inference - Score: 17 (R=9, N=8) - Date: 2025-09-05 - Comment: The paper introduces PagedEviction, a novel KV cache pruning strategy for LLM inference, which aligns with the model compression criterion focusing on efficiency breakthroughs.
Binary Quantization For LLMs Through Dynamic Grouping - Score: 17 (R=9, N=8) - Date: 2025-09-04 - Comment: The paper presents a novel optimization objective for binary quantization in LLMs, which is relevant to model compression.
Fast and Accurate SVD-Type Updating in Streaming Data - Score: 17 (R=9, N=8) - Date: 2025-09-04 - Comment: The paper presents new algorithms for efficient SVD-type updating, which is relevant to model compression through low-rank approaches.
The Price of Sparsity: Sufficient Conditions for Sparse Recovery using Sparse and Sparsified Measurements - Score: 17 (R=9, N=8) - Date: 2025-09-03 - Comment: The paper provides theoretical insights into sparse recovery using sparse measurement matrices, relevant to model compression and sparsity.
LiquidGEMM: Hardware-Efficient W4A8 GEMM Kernel for High-Performance LLM Serving - Score: 17 (R=9, N=8) - Date: 2025-09-03 - Comment: The paper presents LiquidGEMM, a hardware-efficient GEMM kernel for LLM serving, focusing on quantization and efficiency, which aligns with model compression.
REFRAG: Rethinking RAG based Decoding - Score: 17 (R=9, N=8) - Date: 2025-09-03 - Comment: The paper proposes REFRAG, an efficient decoding framework for retrieval-augmented generation (RAG) that exploits sparsity structure to improve latency, which aligns with model compression and efficiency breakthroughs.
KVComp: A High-Performance, LLM-Aware, Lossy Compression Framework for KV Cache - Score: 17 (R=9, N=8) - Date: 2025-09-03 - Comment: The paper introduces a lossy compression framework for KV cache in LLMs, relevant to model compression and efficiency.
Universal Properties of Activation Sparsity in Modern Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-09-03 - Comment: The paper studies activation sparsity in LLMs, providing insights into model efficiency and behavior, which is relevant to representation learning and model compression.
Metis: Training Large Language Models with Advanced Low-Bit Quantization - Score: 17 (R=9, N=8) - Date: 2025-09-03 - Comment: The paper introduces Metis, a framework for training large language models with low-bit quantization, addressing model compression through quantization and efficiency improvements.
ReLATE: Learning Efficient Sparse Encoding for High-Performance Tensor Decomposition - Score: 17 (R=9, N=8) - Date: 2025-09-03 - Comment: The paper presents ReLATE, a framework for learning efficient sparse tensor encodings, which aligns with model compression and efficiency breakthroughs.
ZeroQAT: Your Quantization-aware Training but Efficient - Score: 17 (R=9, N=8) - Date: 2025-09-03 - Comment: The paper introduces ZeroQAT, a novel quantization-aware training framework, which is relevant to model compression and efficiency.
NSPDI-SNN: An efficient lightweight SNN based on nonlinear synaptic pruning and dendritic integration - Score: 17 (R=9, N=8) - Date: 2025-09-01 - Comment: The paper introduces a novel method for spiking neural networks (SNNs) focusing on nonlinear synaptic pruning and dendritic integration, which aligns with model compression and representation learning through sparsity and pruning.
Query Circuits: Explaining How Language Models Answer User Prompts - Score: 16 (R=9, N=7) - Date: 2025-09-30 - Comment: Representation Learning: input-level mechanistic explanations via sparse “query circuits” within the model, with a fidelity metric (NDF) and efficient discovery.
CLQ: Cross-Layer Guided Orthogonal-based Quantization for Diffusion Transformers - Score: 16 (R=9, N=7) - Date: 2025-09-30 - Comment: Compression/Efficiency: PTQ for Diffusion Transformers with cross-block calibration, orthogonal-based smoothing via Hadamard transforms, and cross-layer parameter search.
A Second-Order Perspective on Pruning at Initialization and Knowledge Transfer - Score: 16 (R=9, N=7) - Date: 2025-09-30 - Comment: Compression/Efficiency: analyzes pruning-at-initialization for pre-trained models with second-order perspective; shows cross-task transferability.
HiViS: Hiding Visual Tokens from the Drafter for Speculative Decoding in Vision-Language Models - Score: 16 (R=9, N=7) - Date: 2025-09-30 - Comment: Model Compression/Efficiency: removes visual tokens from the drafter and reuses target hidden states to shorten prefill and speed up speculative decoding in VLMs while preserving quality.
LUQ: Layerwise Ultra-Low Bit Quantization for Multimodal Large Language Models - Score: 16 (R=9, N=7) - Date: 2025-09-30 - Comment: Model Compression and Efficiency: ultra-low-bit (<4-bit) layerwise quantization for MLLMs with selective per-layer PTQ based on activation entropy.
Beyond Outliers: A Study of Optimizers Under Quantization - Score: 16 (R=9, N=7) - Date: 2025-09-30 - Comment: Compression/Efficiency: systematic analysis of optimizer–quantization interactions (PTQ/QAT), with scaling laws and optimizer comparisons (e.g., Shampoo) for quantized training.
Memory-Efficient Fine-Tuning via Low-Rank Activation Compression - Score: 16 (R=9, N=7) - Date: 2025-09-30 - Comment: Model Compression and Efficiency: low-rank activation compression during fine-tuning with a novel sampling-based orthogonal decomposition for low-rank matrices.
Effective Quantization of Muon Optimizer States - Score: 16 (R=9, N=7) - Date: 2025-09-30 - Comment: Model Compression/Efficiency: 8-bit quantization of Muon optimizer states (blockwise, linear/dynamic) with robustness analysis, yielding large memory savings.
VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs - Score: 16 (R=9, N=7) - Date: 2025-09-30 - Comment: Compression/Efficiency: vocabulary pruning of the drafter LM head for speculative decoding to reduce memory-bound drafting latency.
InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models - Score: 16 (R=9, N=7) - Date: 2025-09-29 - Comment: Model Compression and Efficiency: comprehensive FP8 quantized training recipe; High Performance Computing: hybrid-granularity quantization improves throughput, memory, and training time for LLMs.
Stochastic activations - Score: 16 (R=9, N=7) - Date: 2025-09-29 - Comment: Model Architecture and Efficiency: introduces stochastic activations to enable ReLU at inference for sparse latent vectors and reduced FLOPs; addresses optimization/training dynamics of activation functions.
Lightweight error mitigation strategies for post-training N:M activation sparsity in LLMs - Score: 16 (R=9, N=7) - Date: 2025-09-29 - Comment: Strong match to Model Compression and Efficiency: post-training N:M activation sparsity for LLMs with lightweight error mitigation and analysis of hardware-friendly patterns (e.g., 8:16, 16:32).
CubistMerge: Spatial-Preserving Token Merging For Diverse ViT Backbones - Score: 16 (R=9, N=7) - Date: 2025-09-29 - Comment: Compression/Efficiency: spatial-preserving token merging tailored to ViT backbones with window/relative position designs, improving speed with minimal accuracy loss.
Blockwise Hadamard high-Rank Adaptation for Parameter-Efficient LLM Fine-Tuning - Score: 16 (R=9, N=7) - Date: 2025-09-29 - Comment: Model Compression/Efficiency (PEFT): blockwise Hadamard high-rank adaptation increases effective rank under fixed parameter budget, improving fine-tuning efficiency.
PreLoRA: Hybrid Pre-training of Vision Transformers with Full Training and Low-Rank Adapters - Score: 16 (R=9, N=7) - Date: 2025-09-29 - Comment: Model Compression/Efficiency: hybrid pretraining that switches to Low-Rank Adapters with per-layer rank selection to cut trainable parameters and memory.
LANCE: Low Rank Activation Compression for Efficient On-Device Continual Learning - Score: 16 (R=9, N=7) - Date: 2025-09-29 - Comment: Compression/Efficiency: Low-rank activation compression via one-shot HOSVD for memory-optimized backprop; reusable subspaces enable efficient on-device continual learning.
General Pruning Criteria for Fast SBL - Score: 16 (R=9, N=7) - Date: 2025-09-29 - Comment: Compression/Efficiency: theoretical pruning criteria in Sparse Bayesian Learning clarifying when weights are pruned (sparsity).
LATTS: Locally Adaptive Test-Time Scaling - Score: 16 (R=9, N=7) - Date: 2025-09-26 - Comment: Model Efficiency: locally adaptive test-time scaling with verifier-driven resample/backtrack/restart decisions for better accuracy–compute tradeoffs.
Faster Than SVD, Smarter Than SGD: The OPLoRA Alternating Update - Score: 16 (R=9, N=7) - Date: 2025-09-25 - Comment: Compression/Efficiency: alternating least-squares optimizer for LoRA approximates SVDLoRA with low memory, improving low-rank adaptation updates.
Sparse Training Scheme for Multimodal LLM - Score: 16 (R=9, N=7) - Date: 2025-09-24 - Comment: Model Compression/Efficiency: sparse training with visual token compression and dynamic layer skipping during forward/backward passes across MLLMs.
Confidence-gated training for efficient early-exit neural networks - Score: 16 (R=9, N=7) - Date: 2025-09-24 - Comment: Model Compression and Efficiency / Conditional/Dynamic Networks: confidence-gated training for early-exit networks aligns training with inference to reduce compute while maintaining accuracy.
Adaptive Overclocking: Dynamic Control of Thinking Path Length via Real-Time Reasoning Signals - Score: 16 (R=9, N=7) - Date: 2025-09-24 - Comment: Conditional/Dynamic Networks: real-time adaptive control of reasoning compute via uncertainty signals and input complexity for better accuracy-latency trade-offs.
SEQR: Secure and Efficient QR-based LoRA Routing - Score: 16 (R=9, N=7) - Date: 2025-09-23 - Comment: Model Compression/Efficiency: unsupervised LoRA routing via activation-norm maximization (SEQR) with provable guarantees for efficient adapter selection.
Q-ROAR: Outlier-Aware Rescaling for RoPE Position Interpolation in Quantized Long-Context LLMs - Score: 16 (R=9, N=7) - Date: 2025-09-19 - Comment: Compression/Efficiency: quantization-aware RoPE position interpolation with new diagnostics and weight-only stabilization, no fine-tuning required.
Deep Lookup Network - Score: 16 (R=9, N=7) - Date: 2025-09-18 - Comment: Matches Compression/Efficiency: replaces multiplications with differentiable lookup operations and provides training strategies for LUT-based networks, yielding energy and speed gains.
Judge Q: Trainable Queries for Optimized Information Retention in KV Cache Eviction - Score: 16 (R=9, N=7) - Date: 2025-09-16 - Comment: Model Compression/Efficiency: proposes trainable soft query tokens to score KV-cache entries using global attention, improving eviction decisions and reducing memory/compute without full model retraining.
Adaptive Token Merging for Efficient Transformer Semantic Communication at the Edge - Score: 16 (R=9, N=7) - Date: 2025-09-15 - Comment: Model Compression and Efficiency: training-free adaptive token merging for Transformers with per-layer similarity thresholds and Pareto-optimized trade-offs in compute/communication.
Sensitivity-LoRA: Low-Load Sensitivity-Based Fine-Tuning for Large Language Models - Score: 16 (R=9, N=7) - Date: 2025-09-12 - Comment: PEFT with low-rank adaptation (LoRA) using Hessian-based sensitivity for dynamic rank allocation—compression/efficiency criterion.
Mitigating Catastrophic Forgetting in Large Language Models with Forgetting-aware Pruning - Score: 16 (R=9, N=7) - Date: 2025-09-11 - Comment: Model Compression and Efficiency — pruning-based method (FAPM) using a task-vector criterion to control catastrophic forgetting without changing training or architecture.
DPQuant: Efficient and Differentially-Private Model Training via Dynamic Quantization Scheduling - Score: 16 (R=9, N=7) - Date: 2025-09-04 - Comment: The paper presents a dynamic quantization framework for differentially-private model training, which is relevant to model compression and efficiency.
Pruning Weights but Not Truth: Safeguarding Truthfulness While Pruning LLMs - Score: 16 (R=9, N=7) - Date: 2025-09-03 - Comment: The paper addresses pruning in LLMs while preserving truthfulness, which aligns with model compression and efficiency breakthroughs.
SIRI: Scaling Iterative Reinforcement Learning with Interleaved Compression - Score: 16 (R=8, N=8) - Date: 2025-09-30 - Comment: Model Compression/Efficiency: interleaved compression–expansion of rollout length in RL training to reduce redundant tokens and improve the performance–efficiency Pareto for LRMs.
Model Merging Scaling Laws in Large Language Models - Score: 16 (R=8, N=8) - Date: 2025-09-30 - Comment: Matches Efficiency/Scaling Laws: empirical and theoretical scaling laws for model merging, enabling compute-efficient composition of specialists as an alternative to multitask training.
Sequential Diffusion Language Models - Score: 16 (R=8, N=8) - Date: 2025-09-30 - Comment: Model architecture and efficiency: introduces Next Sequence Prediction and SDLM to retrofit AR LMs for diffusion-style inference while preserving KV-cache compatibility and adaptive decoding.
Space Group Conditional Flow Matching - Score: 16 (R=8, N=8) - Date: 2025-09-30 - Comment: Model Architecture/Equivariance: group-conditioned flow matching with efficient group-averaged equivariant vector fields enforcing space-group/Wyckoff constraints.
Vision-Language Alignment from Compressed Image Representations using 2D Gaussian Splatting - Score: 16 (R=8, N=8) - Date: 2025-09-29 - Comment: Matches Model Compression and Efficiency: 2D Gaussian Splatting compresses visual inputs and reduces tokenization cost; High Performance Computing: batched CUDA kernels with ~90x faster fitting and high GPU utilization; efficient adapter over a frozen Transformer.
Overclocking Electrostatic Generative Models - Score: 16 (R=8, N=8) - Date: 2025-09-29 - Comment: Matches Model Compression and Efficiency: introduces IPFM, a distillation framework to accelerate electrostatic generative models (PFGM++) across auxiliary dimensions, reducing function evaluations.
On the Convergence of Muon and Beyond - Score: 16 (R=8, N=8) - Date: 2025-09-22 - Comment: High-Performance Training/Efficiency: variance-reduced Muon optimizer with optimal T^-1/3 convergence and guarantees under PL condition.
The Multi-Query Paradox in Zeroth-Order Optimization - Score: 16 (R=8, N=8) - Date: 2025-09-22 - Comment: Model Efficiency: resolves multi-query allocation in zeroth-order optimization with a new aggregation method (ZO-Align) and explicit convergence rates across settings.
Low-rank surrogate modeling and stochastic zero-order optimization for training of neural networks with black-box layers - Score: 16 (R=8, N=8) - Date: 2025-09-19 - Comment: Model Efficiency/HPC: dynamic low-rank surrogate modeling with zeroth-order optimization enables end-to-end training through black-box layers.
Towards Pre-trained Graph Condensation via Optimal Transport - Score: 16 (R=8, N=8) - Date: 2025-09-19 - Comment: Matches Model Compression and Efficiency via graph condensation framed with optimal transport, yielding task- and architecture-agnostic distilled graphs for accelerated GNN training.
Black-box Model Merging for Language-Model-as-a-Service with Massive Model Repositories - Score: 16 (R=8, N=8) - Date: 2025-09-17 - Comment: Matches Model Architecture/Efficiency: black-box model merging via derivative-free optimization with sparsity-based denoising and sign-aware scaling.
MOSAIC: Minimax-Optimal Sparsity-Adaptive Inference for Change Points in Dynamic Networks - Score: 16 (R=8, N=8) - Date: 2025-09-09 - Comment: Sparsity/Low-rank Theory: minimax detection/testing boundaries and near-optimal eigen-based tests for change points in dynamic networks with sparse changes and low-rank structure.
Transition Models: Rethinking the Generative Learning Objective - Score: 16 (R=8, N=8) - Date: 2025-09-05 - Comment: The paper introduces a novel generative paradigm, Transition Models, which is a significant contribution to generative modeling.
DaCe AD: Unifying High-Performance Automatic Differentiation for Machine Learning and Scientific Computing - Score: 16 (R=8, N=8) - Date: 2025-09-03 - Comment: The paper presents DaCe AD, an efficient automatic differentiation engine, which is relevant to emerging trends in computational efficiency.
BiHDTrans: binary hyperdimensional transformer for efficient multivariate time series classification - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Model Architecture + Compression/Efficiency: proposes a binary hyperdimensional Transformer (binarization/quantization) with theoretical analysis of information distortion for efficient sequence modeling.
SpecExit: Accelerating Large Reasoning Model via Speculative Exit - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Model Compression and Efficiency: speculative early-exit using a lightweight draft model’s hidden states to signal stopping, reducing latency/length without probes.
Why Alignment Must Precede Distillation: A Minimal Working Explanation - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Model compression/distillation: theoretical and empirical evidence that Align->KD outperforms KD->Align due to recall constraints.
Beyond Greedy Exits: Improved Early Exit Decisions for Risk Control and Reliability - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Model Compression/Efficiency: adaptive early-exit thresholding via bandits with risk guarantees for reliable speed–accuracy trade-offs under distribution shift.
Focusing on What Matters: Object-Agent-centric Tokenization for Vision Language Action models - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Model Compression/Efficiency: object–agent-centric visual tokenization drastically reduces visual tokens for VLA training while retaining performance.
Characteristic Root Analysis and Regularization for Linear Time Series Forecasting - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Matches compression/efficiency and representation learning theory: analyzes linear forecasting via characteristic roots and proposes low-rank (rank reduction) and a new Root Purge regularization.
Sketching Low-Rank Plus Diagonal Matrices - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Matches Compression/Efficiency criterion: sketching method to jointly estimate low-rank plus diagonal structure of large operators (e.g., Hessians).
HTMA-Net: Towards Multiplication-Avoiding Neural Networks via Hadamard Transform and In-Memory Computing - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Matches Compression/Efficiency and Model Architecture: multiplication-avoiding network via Hadamard transforms and SRAM in-memory computing.
Sparse Deep Additive Model with Interactions: Enhancing Interpretability and Predictability - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Matches Model Architecture and Sparsity: sparse deep additive model with structured regularization and explicit interaction modeling for interpretability.
Kernel Regression of Multi-Way Data via Tensor Trains with Hadamard Overparametrization: The Dynamic Graph Flow Case - Score: 15 (R=8, N=7) - Date: 2025-09-29 - Comment: Compression/Efficiency: low-rank tensor-train parameterization with sparsity via Hadamard overparametrization and Riemannian optimization for multi-way data regression.
From Long to Lean: Performance-aware and Adaptive Chain-of-Thought Compression via Multi-round Refinement - Score: 15 (R=8, N=7) - Date: 2025-09-29 - Comment: Compression/Efficiency: adaptive multi-round Chain-of-Thought compression reducing token budget and latency while preserving accuracy.
R-Capsule: Compressing High-Level Plans for Efficient Large Language Model Reasoning - Score: 15 (R=8, N=7) - Date: 2025-09-29 - Comment: Model Architecture/Efficiency: Compresses reasoning via learned latent tokens (information bottleneck) to reduce CoT token footprint while retaining accuracy.
Null-Space Filtering for Data-Free Continual Model Merging: Preserving Transparency, Promoting Fidelity - Score: 15 (R=8, N=7) - Date: 2025-09-29 - Comment: Model Compression/Efficiency: data-free continual model merging via null-space filtering and lightweight LoRA adapters aligned with representation subspaces.
Toward Robust and Efficient ML-Based GPU Caching for Modern Inference - Score: 15 (R=8, N=7) - Date: 2025-09-26 - Comment: Matches High Performance Computing: learning-augmented, robust GPU caching (KV/embeddings) improving inference efficiency with adaptive policy.
How deep is your network? Deep vs. shallow learning of transfer operators - Score: 15 (R=8, N=7) - Date: 2025-09-25 - Comment: Compression/Efficiency/Architecture: randomized network with fixed hidden layers and closed-form output for learning operators, reducing training cost and providing interpretable eigenfunctions.
Variational Task Vector Composition - Score: 15 (R=8, N=7) - Date: 2025-09-24 - Comment: Matches Model Compression/Efficiency and Representation Learning: sparse (Spike-and-Slab) variational task-vector composition with sample-specific coefficients.
Amortized Latent Steering: Low-Cost Alternative to Test-Time Optimization - Score: 15 (R=8, N=7) - Date: 2025-09-24 - Comment: Efficiency: amortized latent steering collapses test-time optimization into a constant-cost vector applied at inference.
Dendritic Resonate-and-Fire Neuron for Effective and Efficient Long Sequence Modeling - Score: 15 (R=8, N=7) - Date: 2025-09-24 - Comment: Model Architecture/Efficiency: introduces a dendritic Resonate-and-Fire neuron with adaptive thresholding for sparse, efficient long-sequence modeling.
PruneCD: Contrasting Pruned Self Model to Improve Decoding Factuality - Score: 15 (R=8, N=7) - Date: 2025-09-23 - Comment: Model Compression/Efficiency: uses layer pruning to construct an auxiliary model for contrastive decoding, improving factuality with minimal overhead.
Low-Rank Adaptation of Evolutionary Deep Neural Networks for Efficient Learning of Time-Dependent PDEs - Score: 15 (R=8, N=7) - Date: 2025-09-23 - Comment: Model Compression and Efficiency: constrains updates to a low-rank subspace (layer-wise SVD), cutting trainable parameters and compute while preserving accuracy.
BEFT: Bias-Efficient Fine-Tuning of Language Models - Score: 15 (R=8, N=7) - Date: 2025-09-22 - Comment: Compression/Efficiency: parameter-efficient fine-tuning via principled selection of which bias term (e.g., Q/K/V biases) to update.
LiMuon: Light and Fast Muon Optimizer for Large Models - Score: 15 (R=8, N=7) - Date: 2025-09-19 - Comment: Model Compression and Efficiency: optimizer with lower memory via randomized SVD and provably reduced sample complexity for large-model training.
BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching - Score: 15 (R=8, N=7) - Date: 2025-09-18 - Comment: Matches Model Compression and Efficiency: training-free block-wise feature caching in Diffusion Transformers to reduce inference compute.
ResidualViT for Efficient Temporally Dense Video Encoding - Score: 15 (R=8, N=7) - Date: 2025-09-17 - Comment: Matches Compression/Efficiency and Architecture: ResidualViT with temporal redundancy exploitation and token reduction for faster dense video encoding.
CIARD: Cyclic Iterative Adversarial Robustness Distillation - Score: 15 (R=8, N=7) - Date: 2025-09-17 - Comment: Model Compression and Efficiency: adversarial robustness distillation to lightweight students via multi-teacher contrastive alignment and cyclic adversarial retraining.
Visualization and Analysis of the Loss Landscape in Graph Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-09-17 - Comment: Matches Representation Learning: analyzes loss landscape and training dynamics in GNNs; also studies sparsification and quantization effects (compression/efficiency).
Semantic-guided LoRA Parameters Generation - Score: 15 (R=8, N=7) - Date: 2025-09-17 - Comment: Matches Compression/Efficiency: Low-Rank Adaptation (LoRA) parameter generation via semantic guidance for zero-shot personalization without retraining.
CoDiCodec: Unifying Continuous and Discrete Compressed Representations of Audio - Score: 15 (R=8, N=7) - Date: 2025-09-15 - Comment: Model Compression and Efficiency: unified audio autoencoder using quantization (FSQ with FSQ-dropout) to produce both continuous and discrete compressed codes and enable efficient parallel decoding.
Balancing Utility and Privacy: Dynamically Private SGD with Random Projection - Score: 15 (R=8, N=7) - Date: 2025-09-12 - Comment: Matches Model Compression and Efficiency via random projection–accelerated SGD and dynamic DP (algorithmic optimizer improving training efficiency with provable rates).
Merge-of-Thought Distillation - Score: 15 (R=8, N=7) - Date: 2025-09-11 - Comment: Matches Model Compression and Efficiency: proposes a multi-teacher distillation framework with alternating teacher-specific SFT branches and weight-space model merging to compress long CoT reasoning into a smaller student while mitigating forgetting.
DEQuify your force field: More efficient simulations using deep equilibrium models - Score: 15 (R=8, N=7) - Date: 2025-09-11 - Comment: Matches Model Architecture and Efficiency: recasts an equivariant network as a deep equilibrium model (DEQ) to recycle features across timesteps, improving speed and memory efficiency for sequence-like inference.
Variational Rank Reduction Autoencoders for Generative - Score: 15 (R=8, N=7) - Date: 2025-09-11 - Comment: Matches Model Architecture (autoencoders) using low-rank latent factorization (truncated SVD) to structure representations and mitigate posterior collapse; also aligns with Compression/Efficiency via rank reduction.
MEGS$^{2}$: Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning - Score: 15 (R=8, N=7) - Date: 2025-09-10 - Comment: Model Compression/Efficiency: unified soft pruning and reduced per-primitive parameters to cut memory in 3D Gaussian Splatting.
FineServe: Precision-Aware KV Slab and Two-Level Scheduling for Heterogeneous Precision LLM Serving - Score: 15 (R=8, N=7) - Date: 2025-09-09 - Comment: High Performance Computing/Efficiency: precision-aware KV cache management (KV Slab) and two-level scheduling for mixed-precision LLM serving to reduce fragmentation and boost throughput.
From Long to Short: LLMs Excel at Trimming Own Reasoning Chains - Score: 15 (R=8, N=7) - Date: 2025-09-09 - Comment: Model Compression/Efficiency: proposes a test-time scaling method (EDIT) that trims reasoning chains to jointly optimize brevity and correctness, improving inference efficiency.
An Improved Template for Approximate Computing - Score: 15 (R=8, N=7) - Date: 2025-09-09 - Comment: Compression/Efficiency: new parametrizable boolean-rewriting template for approximate adders/multipliers improving area savings for given accuracy loss.
SpecPrune-VLA: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning - Score: 15 (R=8, N=7) - Date: 2025-09-09 - Comment: Model Compression and Efficiency: training-free two-level token pruning with an action-aware controller to accelerate VLA models.
MTQA:Matrix of Thought for Enhanced Reasoning in Complex Question Answering - Score: 15 (R=8, N=7) - Date: 2025-09-05 - Comment: The paper proposes a novel thought structure for enhancing reasoning in LLMs, which aligns with the core topic of large language models and their theoretical insights.
FoMEMO: Towards Foundation Models for Expensive Multi-objective Optimization - Score: 15 (R=8, N=7) - Date: 2025-09-04 - Comment: The paper introduces a new paradigm for foundation models in multi-objective optimization, which aligns with foundational research in AI for Science.
MLP-Offload: Multi-Level, Multi-Path Offloading for LLM Pre-training to Break the GPU Memory Wall - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper introduces MLP-Offload, a novel offloading engine for LLM training, which addresses efficiency and memory constraints. It is relevant to model compression and efficiency improvements.
Low Power Approximate Multiplier Architecture for Deep Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper proposes a low power approximate multiplier architecture, which is relevant to model compression and efficiency.
An Efficient GNNs-to-KANs Distillation via Self-Attention Dynamic Sampling with Potential for Consumer Electronics Edge Deployment - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper discusses knowledge distillation and model compression techniques, which are relevant to model compression and efficiency.
Efficient Decoding Methods for Language Models on Encrypted Data - Score: 15 (R=7, N=8) - Date: 2025-09-11 - Comment: Model Efficiency: introduces a polynomial-time HE-friendly argmax (cutmax) and the first HE-compatible top-p sampling, reducing ciphertext operations for encrypted LLM decoding with convergence guarantees.
Any-Step Density Ratio Estimation via Interval-Annealed Secant Alignment - Score: 15 (R=7, N=8) - Date: 2025-09-08 - Comment: Algorithmic efficiency: any-step density ratio estimation via a Secant Alignment identity and interval annealing, eliminating numerical integration and reducing function evaluations.
Parallel Thinking, Sequential Answering: Bridging NAR and AR for Efficient Reasoning - Score: 14 (R=7, N=7) - Date: 2025-09-26 - Comment: Model architecture/efficiency criterion: integrates NAR (parallel discrete diffusion) to produce intermediate reasoning traces that guide an AR model, reducing inference cost while maintaining quality.
Energy-convergence trade off for the training of neural networks on bio-inspired hardware - Score: 14 (R=7, N=7) - Date: 2025-09-24 - Comment: High Performance/Efficiency: hardware-aware training on memristive devices with a symmetry-point shifting algorithm to improve energy–convergence trade-offs.
Efficient Sliced Wasserstein Distance Computation via Adaptive Bayesian Optimization - Score: 14 (R=7, N=7) - Date: 2025-09-23 - Comment: Algorithmic efficiency: adaptive Bayesian optimization to learn projection directions for sliced Wasserstein, improving computation in optimization loops.
Toward Efficient Influence Function: Dropout as a Compression Tool - Score: 14 (R=7, N=7) - Date: 2025-09-22 - Comment: Model Compression and Efficiency: leverages dropout as a gradient compression mechanism to scale influence-function computation with reduced memory/compute.
From Correction to Mastery: Reinforced Distillation of Large Language Model Agents - Score: 14 (R=7, N=7) - Date: 2025-09-19 - Comment: Matches Model Compression/Efficiency via a novel distillation recipe (student-centered correction + short-horizon RL) reducing dependence on larger backbones.
Quantum Variational Activation Functions Empower Kolmogorov-Arnold Networks - Score: 14 (R=7, N=7) - Date: 2025-09-18 - Comment: Matches Model Architecture: introduces variational activation functions (DARUAN) embedded into KANs/HQKANs as MLP replacements for parameter-efficient expressivity.
Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning - Score: 14 (R=7, N=7) - Date: 2025-09-18 - Comment: Model Editing/Efficiency: machine unlearning (post-hoc) to erase memorized sensitive code segments without full retraining, improving safety while preserving utility.
Learning quantum many-body data locally: A provably scalable framework - Score: 14 (R=7, N=7) - Date: 2025-09-18 - Comment: Algorithmic efficiency and representation: local quantum kernel exploiting decay of correlations to improve sample complexity.
A reduced-order derivative-informed neural operator for subsurface fluid-flow - Score: 14 (R=7, N=7) - Date: 2025-09-18 - Comment: Matches Efficiency/Training for operators: derivative-informed training with FIM-projected Jacobians integrated into FNOs reduces derivative complexity while preserving gradient fidelity.
Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors - Score: 14 (R=7, N=7) - Date: 2025-09-17 - Comment: Model Compression and Efficiency: introduces a behavior “handbook” and behavior-conditioned SFT to cache/distill recurring reasoning, cutting inference tokens and latency.
Gradient Estimation Methods of Approximate Multipliers for High-Accuracy Retraining of Deep Learning Models - Score: 14 (R=7, N=7) - Date: 2025-09-17 - Comment: Model Compression and Efficiency: improves retraining under approximate multipliers via LUT-based gradient estimators, enabling hardware-aware accuracy-efficiency tradeoffs.
EfficientUICoder: Efficient MLLM-based UI Code Generation via Input and Output Token Compression - Score: 14 (R=7, N=7) - Date: 2025-09-16 - Comment: Efficiency/Compression: reduces both input image tokens and output code tokens using attention-guided token pruning and adaptive duplicate suppression to cut inference cost.
Latency and Token-Aware Test-Time Compute - Score: 14 (R=7, N=7) - Date: 2025-09-15 - Comment: Model Compression and Efficiency: latency- and token-aware dynamic allocation of test-time compute with per-query method selection (e.g., beam search vs best-of-N).
H$_{2}$OT: Hierarchical Hourglass Tokenizer for Efficient Video Pose Transformers - Score: 14 (R=7, N=7) - Date: 2025-09-09 - Comment: Compression/Efficiency: dynamic token pruning and recovery modules within transformers to reduce compute while restoring full-length outputs for efficient inference.

High Performance Computing (43)

PreScope: Unleashing the Power of Prefetching for Resource-Constrained MoE Inference - Score: 18 (R=10, N=8) - Date: 2025-09-30 - Comment: Matches Model Compression and Efficiency + High-Performance Systems for MoE inference: prediction-driven expert prefetching/scheduling (LLaPor, PreSched, AsyncIO) to overcome PCIe/memory bottlenecks.
Partial Parameter Updates for Efficient Distributed Training - Score: 18 (R=10, N=8) - Date: 2025-09-29 - Comment: Matches High Performance Computing and Efficiency: introduces partial parameter updates for low-communication distributed training, reducing memory/FLOPs and avoiding activation exchange while maintaining perplexity.
SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips - Score: 18 (R=10, N=8) - Date: 2025-09-26 - Comment: High Performance Computing: Superchip-centric offloading with adaptive weight offload, bucketization repartitioning, casting, speculative execution, and CPU-optimized Adam for large-scale LLM training.
Understanding Outer Optimizers in Local SGD: Learning Rates, Momentum, and Acceleration - Score: 18 (R=10, N=8) - Date: 2025-09-15 - Comment: High-Performance/Distributed Training: algorithmic analysis and design of outer optimizers (learning rate > 1, momentum, acceleration) for Local SGD with new convergence guarantees.
Conda: Column-Normalized Adam for Training Large Language Models Faster - Score: 17 (R=9, N=8) - Date: 2025-09-30 - Comment: HPC/Optimization for large-scale training: Column-Normalized Adam improves spectral conditioning while retaining per-coordinate adaptivity, yielding 2–2.5x faster LLM pretraining.
AdaPtis: Reducing Pipeline Bubbles with Adaptive Pipeline Parallelism on Heterogeneous Models - Score: 17 (R=9, N=8) - Date: 2025-09-30 - Comment: High Performance Computing: adaptive pipeline parallelism co-optimizing partition, placement, and scheduling guided by a performance model.
Data-Centric Elastic Pipeline Parallelism for Efficient Long-Context LLM Training - Score: 17 (R=9, N=8) - Date: 2025-09-26 - Comment: Matches High Performance Computing: introduces elastic pipeline parallelism with workload-aware scheduling and adaptive checkpointing for distributed long-context LLM training.
ShadowServe: Interference-Free KV Cache Fetching for Distributed Prefix Caching - Score: 17 (R=9, N=8) - Date: 2025-09-24 - Comment: HPC/systems: SmartNIC-offloaded, interference-free KV cache fetching for distributed prefix caching; pipeline and memory designs for serving efficiency.
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding - Score: 17 (R=9, N=8) - Date: 2025-09-23 - Comment: Efficiency/HPC: lossless speculative decoding for diffusion LLMs with parallel verification and calibrated draft graphs; complements KV caching and multi-token methods.
LightCode: Compiling LLM Inference for Photonic-Electronic Systems - Score: 17 (R=9, N=8) - Date: 2025-09-23 - Comment: High Performance Computing/Systems: compiler IR (Stacked Graph) and hardware assignment for hybrid photonic–electronic LLM inference optimizing latency/energy.
Robust LLM Training Infrastructure at ByteDance - Score: 17 (R=9, N=8) - Date: 2025-09-23 - Comment: High Performance Computing: systems-level robust training infrastructure enabling large-scale, failure-tolerant LLM training across 200k+ GPUs.
LEGO: Spatial Accelerator Generation and Optimization for Tensor Applications - Score: 17 (R=9, N=8) - Date: 2025-09-17 - Comment: High-Performance Computing/Systems: automatic spatial accelerator RTL generation with affine architecture representation and LP-based pipeline/register optimization.
Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM - Score: 16 (R=9, N=7) - Date: 2025-09-30 - Comment: High Performance Computing: fine-grained, hardware-aware GPU performance modeling for distributed LLM training across parallelism strategies; systems-level innovation enabling planning without on-cluster runs.
Pipeline Parallelism is All You Need for Optimized Early-Exit Based Self-Speculative Decoding - Score: 16 (R=9, N=7) - Date: 2025-09-25 - Comment: High Performance Computing/Efficiency: pipeline-parallel early-exit self-speculative decoding with verify-while-draft scheduling for faster LLM inference.
Scaling Up Data Parallelism in Decentralized Deep Learning - Score: 16 (R=9, N=7) - Date: 2025-09-17 - Comment: High Performance Computing: decentralized data-parallel training with adaptive communication graph (Ada) and large-scale distributed training insights.
veScale: Consistent and Efficient Tensor Programming with Eager-Mode SPMD - Score: 16 (R=9, N=7) - Date: 2025-09-10 - Comment: Matches High Performance Computing: eager-mode SPMD system with a novel distributed RNG ensuring single-device consistency and communication-efficient training.
Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning - Score: 16 (R=8, N=8) - Date: 2025-09-30 - Comment: Matches high-performance training algorithms: scales Evolution Strategies to full-parameter LLM fine-tuning at billion-parameter scale, offering a systems-level alternative to RL.
A Unifying Framework for Parallelizing Sequential Models with Linear Dynamical Systems - Score: 16 (R=8, N=8) - Date: 2025-09-29 - Comment: High Performance Computing: unifies fixed-point parallelization of sequential models via an LDS framework for efficient, scalable evaluation.
Pareto-optimal Tradeoffs Between Communication and Computation with Flexible Gradient Tracking - Score: 16 (R=8, N=8) - Date: 2025-09-24 - Comment: High Performance Computing: distributed optimization algorithm with tunable comm/comp tradeoffs and near-optimal iteration complexity guarantees.
Nonconvex Decentralized Stochastic Bilevel Optimization under Heavy-Tailed Noises - Score: 16 (R=8, N=8) - Date: 2025-09-22 - Comment: High Performance Computing: introduces a decentralized stochastic bilevel optimization algorithm with theoretical guarantees under heavy-tailed noise—an algorithmic contribution to distributed training.
Verifying Computational Graphs in Production-Grade Distributed Machine Learning Frameworks - Score: 16 (R=8, N=8) - Date: 2025-09-17 - Comment: High Performance Computing/Systems: semantic equivalence verification of large distributed ML computational graphs via equality saturation and Datalog-style reasoning.
Beyond Ensembles: Simulating All-Atom Protein Dynamics in a Learned Latent Space - Score: 16 (R=8, N=8) - Date: 2025-09-03 - Comment: The paper extends a generative model for simulating protein dynamics, relevant to AI for Science with a focus on foundational research in molecular modeling.
Crystal Structure Prediction with a Geometric Permutation-Invariant Loss Function - Score: 16 (R=8, N=8) - Date: 2025-09-03 - Comment: The paper proposes a novel geometric permutation-invariant loss function for crystal structure prediction, which is relevant to AI for Science with a focus on foundational research in molecular modeling.
Intra-request branch orchestration for efficient LLM reasoning - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: High Performance Computing/Systems: intra-request branch orchestration using activation-based linear probes to reduce inference cost/latency for multi-branch reasoning.
Scaling LLM Test-Time Compute with Mobile NPU on Smartphones - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: High Performance Computing: systems-level mobile NPU inference with hardware-aware tile quantization and LUT-based ops enabling test-time compute scaling.
CollaPipe: Adaptive Segment-Optimized Pipeline Parallelism for Collaborative LLM Training in Heterogeneous Edge Networks - Score: 15 (R=8, N=7) - Date: 2025-09-25 - Comment: High Performance Computing: adaptive pipeline parallelism with resource scheduling for distributed LLM training on heterogeneous edge devices, with convergence analysis and memory benefits.
MeshODENet: A Graph-Informed Neural Ordinary Differential Equation Neural Network for Simulating Mesh-Based Physical Systems - Score: 15 (R=8, N=7) - Date: 2025-09-24 - Comment: Model Architecture: integrates GNN spatial reasoning with Neural ODE continuous-time dynamics to improve long-term stability for mesh-based systems.
GraphMend: Code Transformations for Fixing Graph Breaks in PyTorch 2 - Score: 15 (R=8, N=7) - Date: 2025-09-23 - Comment: Matches High Performance Computing: compiler-level transformations eliminate FX graph breaks, enabling larger JIT graphs and reducing CPU–GPU syncs.
Decentralized Optimization with Topology-Independent Communication - Score: 15 (R=8, N=7) - Date: 2025-09-19 - Comment: Matches High Performance Computing: randomized local coordination reduces per-iteration communications to topology-independent constant with convergence guarantees for distributed optimization.
eIQ Neutron: Redefining Edge-AI Inference with Integrated NPU and Compiler Innovations - Score: 15 (R=8, N=7) - Date: 2025-09-19 - Comment: High-Performance/Systems: co-designed NPU architecture and constrained-programming compiler optimizing compute and data movement to maximize utilization for AI inference.
Learning non-Markovian Dynamical Systems with Signature-based Encoders - Score: 15 (R=8, N=7) - Date: 2025-09-17 - Comment: Matches Model Architecture/Representation: signature-based encoders for non-Markovian continuous-time dynamics replacing RNNs.
Decentralized Stochastic Nonconvex Optimization under the Relaxed Smoothness - Score: 15 (R=8, N=7) - Date: 2025-09-11 - Comment: High Performance Computing — decentralized first-order optimization under relaxed smoothness with explicit sample and communication complexity bounds for distributed training.
Dato: A Task-Based Programming Model for Dataflow Accelerators - Score: 15 (R=8, N=7) - Date: 2025-09-09 - Comment: High Performance Computing: a task-based programming model with first-class stream/sharding types and spatial mapping compiler for dataflow accelerators enabling high utilization.
ArcMemo: Abstract Reasoning Composition with Lifelong LLM Memory - Score: 15 (R=8, N=7) - Date: 2025-09-05 - Comment: The paper introduces a novel approach to LLM memory management, which aligns with the interest in theoretical insights into LLM behavior.
Beyond Memorization: Reasoning-Driven Synthesis as a Mitigation Strategy Against Benchmark Contamination - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper addresses benchmark contamination in LLMs, focusing on reasoning-driven synthesis, which is relevant to foundational research in LLM behavior.
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers - Score: 15 (R=8, N=7) - Date: 2025-09-01 - Comment: The survey on Scientific Large Language Models (Sci-LLMs) provides a comprehensive overview of foundational models in scientific research, focusing on data-centric challenges and evaluation protocols, which aligns with the interest in foundational research in LLMs.
Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision - Score: 15 (R=7, N=8) - Date: 2025-09-18 - Comment: Foundational training method: converts inference-time compute into reference-free supervision (CaT/CaT-RL), an algorithmic post-training innovation.
Time-adaptive H\'enonNets for separable Hamiltonian systems - Score: 14 (R=7, N=7) - Date: 2025-09-25 - Comment: Model Architecture—time-adaptive symplectic HénonNets with universal approximation results for separable Hamiltonian systems.
The Syntax and Semantics of einsum - Score: 14 (R=7, N=7) - Date: 2025-09-25 - Comment: Systems/HPC: formal syntax/semantics and equivalence rules for einsum enabling formal reasoning and optimization of tensor computations.
ForTIFAI: Fending Off Recursive Training Induced Failure for AI Models - Score: 14 (R=7, N=7) - Date: 2025-09-12 - Comment: Training dynamics/loss design — introduces Truncated Cross Entropy to mitigate recursive training collapse by down-weighting overconfident predictions with theoretical and cross-modal support.
Astra: A Multi-Agent System for GPU Kernel Performance Optimization - Score: 14 (R=7, N=7) - Date: 2025-09-10 - Comment: High Performance Computing: LLM-based multi-agent system for GPU kernel optimization, automating loop/memory transformations to accelerate LLM serving kernels.
ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute - Score: 14 (R=7, N=7) - Date: 2025-09-08 - Comment: Model architecture/efficiency at inference: trains LLMs for native parallel reasoning paths to scale test-time compute in width.
Efficient Training-Free Online Routing for High-Volume Multi-LLM Serving - Score: 14 (R=7, N=7) - Date: 2025-09-08 - Comment: Matches High Performance Computing/systems criterion—training-free online LLM routing with ANN-based feature estimation and competitive-ratio guarantees for high-throughput serving.

Representation Learning (263)

Tracing the Representation Geometry of Language Models from Pretraining to Post-training - Score: 19 (R=10, N=9) - Date: 2025-09-30 - Comment: Matches Representation Learning criterion: spectral geometry (effective rank, eigenspectrum decay) tracing phases from pretraining to post-training.
$\mathbf{Li_2}$: A Framework on Dynamics of Feature Emergence and Delayed Generalization - Score: 19 (R=10, N=9) - Date: 2025-09-29 - Comment: Representation Learning: provides a mathematical framework for grokking and feature emergence with provable dynamics and scaling laws in neural networks.
Towards Atoms of Large Language Models - Score: 19 (R=10, N=9) - Date: 2025-09-26 - Comment: Representation Learning + Autoencoders: formalizes atomic units with RIP/uniqueness guarantees and shows threshold-activated SAEs recover stable sparse representations in LLMs.
The Complexity of Finding Local Optima in Contrastive Learning - Score: 19 (R=10, N=9) - Date: 2025-09-23 - Comment: Representation Learning Theory: proves PLS/CLS-hardness for finding local optima in contrastive objectives (e.g., triplet loss), clarifying limits of local search/gradient methods.
Learning words in groups: fusion algebras, tensor ranks and grokking - Score: 18 (R=10, N=8) - Date: 2025-09-09 - Comment: Representation Learning/Training Dynamics: explains learning of group word operations via low-rank tensor decompositions and links to grokking.
A Law of Data Reconstruction for Random Features (and Beyond) - Score: 18 (R=9, N=9) - Date: 2025-09-29 - Comment: Representation Learning/Theory: shows a law for full data reconstruction (p ≳ d·n) in random features and beyond, with an accompanying reconstruction method.
Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias - Score: 18 (R=9, N=9) - Date: 2025-09-26 - Comment: Representation Learning/Theory: closed-form scaling laws for ||w||_r under ℓ_p-biased interpolation and corresponding implicit bias in diagonal linear networks, clarifying training dynamics across norms.
Physics of Learning: A Lagrangian perspective to different learning paradigms - Score: 18 (R=9, N=9) - Date: 2025-09-26 - Comment: Representation Learning / Training Dynamics: introduces a Learning Lagrangian, deriving classic algorithms (e.g., Bellman optimality, Adam) from least-action principles.
Scaling Laws are Redundancy Laws - Score: 18 (R=9, N=9) - Date: 2025-09-26 - Comment: Representation Learning/Training Dynamics: provides a theoretical explanation of scaling laws via data covariance spectrum redundancy, including transformers in NTK and feature-learning regimes.
Learning From Simulators: A Theory of Simulation-Grounded Learning - Score: 18 (R=9, N=9) - Date: 2025-09-24 - Comment: Representation Learning: foundational theory showing SGNNs perform amortized Bayesian inference under a simulation prior, with generalization bounds and mechanistic interpretability.
Long-time dynamics and universality of nonconvex gradient descent - Score: 18 (R=9, N=9) - Date: 2025-09-17 - Comment: Matches Representation Learning: rigorous theory for long-time gradient descent dynamics and implicit regularization (training dynamics).
World Modeling with Probabilistic Structure Integration - Score: 18 (R=9, N=9) - Date: 2025-09-15 - Comment: Model Architecture and Representation Learning: random-access autoregressive world model with causal structure extraction and iterative integration as control tokens enabling universal prompting and improved modeling.
Universality of physical neural networks with multivariate nonlinearity - Score: 18 (R=9, N=9) - Date: 2025-09-09 - Comment: Model Architecture and Representation Learning: proves universality conditions for physical neural networks and proposes a scalable optical architecture.
Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime - Score: 17 (R=9, N=8) - Date: 2025-09-30 - Comment: Matches Representation Learning criterion: theoretical scaling laws and spectral analysis linking weight spectra to generalization.
Identity Bridge: Enabling Implicit Reasoning via Shared Latent Memory - Score: 17 (R=9, N=8) - Date: 2025-09-30 - Comment: Representation Learning: identity supervision aligns latent geometry (implicit nuclear-norm regularization) to enable compositional reasoning (two-hop), with theory and scaling evidence.
Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models - Score: 17 (R=9, N=8) - Date: 2025-09-30 - Comment: Representation Learning/Training Dynamics: theoretical framework explaining test-time training as specialization after generalization; supported by sparse autoencoder analyses.
Negative Pre-activations Differentiate Syntax - Score: 17 (R=9, N=8) - Date: 2025-09-30 - Comment: Strong match to representation learning/interpretability: identifies a sparse mechanism (negative pre-activations in entangled neurons) underlying syntax processing with causal interventions.
Statistical Learning Guarantees for Group-Invariant Barron Functions - Score: 17 (R=9, N=8) - Date: 2025-09-30 - Comment: Architecture/representation theory: statistical learning guarantees for group-invariant neural networks in the Barron framework; analyses of approximation and estimation errors.
Emergence of Superposition: Unveiling the Training Dynamics of Chain of Continuous Thought - Score: 17 (R=9, N=8) - Date: 2025-09-30 - Comment: Representation Learning/Training Dynamics: theory of how superposition emerges in continuous chain-of-thought within transformers through two-stage training dynamics.
LLM Interpretability with Identifiable Temporal-Instantaneous Representation - Score: 17 (R=9, N=8) - Date: 2025-09-30 - Comment: Matches Representation Learning/Identifiable Interpretability: identifiable temporal-instantaneous causal representation framework scaling to LLM activations with theoretical guarantees.
Neighborhood Sampling Does Not Learn the Same Graph Neural Network - Score: 17 (R=9, N=8) - Date: 2025-09-30 - Comment: Representation Learning/Training Dynamics: NTK/GP-theoretic analysis of neighborhood sampling shows distinct posterior processes and errors versus full GNNs, clarifying systemic behaviors.
IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning - Score: 17 (R=9, N=8) - Date: 2025-09-29 - Comment: Matches Representation Learning/Training Dynamics: self-distills SFT to align internal activations with ICL, transferring in-context computation mechanisms to improve accuracy and calibration.
Linear Causal Representation Learning by Topological Ordering, Pruning, and Disentanglement - Score: 17 (R=9, N=8) - Date: 2025-09-29 - Comment: Representation Learning: linear causal representation learning with topological ordering, pruning, and disentanglement to recover latent causal features under weaker assumptions.
REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model - Score: 17 (R=9, N=8) - Date: 2025-09-29 - Comment: Representation Learning: proposes a low-dimensional reasoning manifold and geometric deviation metric to localize failure points in LLM reasoning.
Neural Feature Geometry Evolves as Discrete Ricci Flow - Score: 17 (R=9, N=8) - Date: 2025-09-29 - Comment: Representation learning: theoretical link between neural feature geometry and discrete Ricci flow, explaining training dynamics and informing design (depth, early stopping).
Mechanistic Independence: A Principle for Identifiable Disentangled Representations - Score: 17 (R=9, N=8) - Date: 2025-09-29 - Comment: Representation Learning: principled identifiability of disentangled latent subspaces under nonlinear, non-invertible mixing via mechanistic independence.
Concept-SAE: Active Causal Probing of Visual Model Behavior - Score: 17 (R=9, N=8) - Date: 2025-09-29 - Comment: Representation Learning: Concept-grounded sparse autoencoders with dual supervision enabling causal probing via interventions.
Bilinear relational structure fixes reversal curse and enables consistent model editing - Score: 17 (R=9, N=8) - Date: 2025-09-29 - Comment: Representation Learning: identifies bilinear relational structure in LM representations that fixes reversal curse and enables consistent model editing—linking internal geometry to logical generalization.
Mechanism of Task-oriented Information Removal in In-context Learning - Score: 17 (R=9, N=8) - Date: 2025-09-26 - Comment: Representation learning criterion: proposes a low-rank filtering view of ICL as task-oriented information removal and identifies ‘denoising’ attention heads—mechanistic insight into hidden-state representations and ICL dynamics.
Feature Augmentation of GNNs for ILPs: Local Uniqueness Suffices - Score: 17 (R=9, N=8) - Date: 2025-09-26 - Comment: Model Architecture (GNN expressiveness): Local-UID (d-hop uniqueness) with ColorGNN/ColorUID achieving Global-UID power while improving generalization.
Binary Autoencoder for Mechanistic Interpretability of Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-09-26 - Comment: Representation Learning/Autoencoders: Binary Autoencoder with minibatch entropy minimization for sparse, independent features in LLM hidden states supporting mechanistic interpretability.
Explaining Grokking and Information Bottleneck through Neural Collapse Emergence - Score: 17 (R=9, N=8) - Date: 2025-09-26 - Comment: Matches Representation Learning/Training Dynamics: links grokking and information bottleneck to neural collapse dynamics, providing a unified theoretical explanation.
Aligning Inductive Bias for Data-Efficient Generalization in State Space Models - Score: 17 (R=9, N=8) - Date: 2025-09-26 - Comment: Model Architecture and Representation Learning: formalizes SSM inductive bias via an SSM-induced kernel and introduces task-dependent initialization through power spectrum matching to improve data efficiency.
Alignment-Sensitive Minimax Rates for Spectral Algorithms with Learned Kernels - Score: 17 (R=9, N=8) - Date: 2025-09-25 - Comment: Representation Learning/Theory: introduces effective span dimension with alignment-sensitive minimax rates and shows gradient flow reduces ESD, linking adaptive feature learning to generalization.
Mamba Modulation: On the Length Generalization of Mamba - Score: 17 (R=9, N=8) - Date: 2025-09-25 - Comment: Model Architecture—state-space models (Mamba): analysis of transition matrix spectra and spectral modulation to improve long-context generalization.
Global Minimizers of Sigmoid Contrastive Loss - Score: 17 (R=9, N=8) - Date: 2025-09-24 - Comment: Representation Learning/Theory: analyzes sigmoid contrastive loss with trainable temperature/bias (SigLIP), introduces constellation characterization and reparameterization.
nDNA -- the Semantic Helix of Artificial Cognition - Score: 17 (R=9, N=8) - Date: 2025-09-24 - Comment: Proposes intrinsic latent-geometry metrics (spectral curvature, thermodynamic length, belief vector field) to fingerprint models (Representation Learning theory).
Understanding Post-Training Structural Changes in Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-09-24 - Comment: Representation Learning/Training Dynamics: SVD-based analysis reveals uniform singular value scaling and coordinated singular-vector rotations after post-training.
Evolution of Concepts in Language Model Pre-Training - Score: 17 (R=9, N=8) - Date: 2025-09-24 - Comment: Representation Learning: analyzes training dynamics via sparse dictionary crosscoders to track interpretable features through pre-training.
Flatness is Necessary, Neural Collapse is Not: Rethinking Generalization via Grokking - Score: 17 (R=9, N=8) - Date: 2025-09-23 - Comment: Representation Learning: disentangles flatness vs. neural collapse via grokking, showing flatness predicts generalization; theoretical link between collapse and flatness.
Bias-variance Tradeoff in Tensor Estimation - Score: 17 (R=9, N=8) - Date: 2025-09-23 - Comment: Low-rank/tensor estimation theory: rank-adaptive bias–variance tradeoff for HOSVD-like estimator; foundational in low-rank representation learning.
Causality-Induced Positional Encoding for Transformer-Based Representation Learning of Non-Sequential Features - Score: 17 (R=9, N=8) - Date: 2025-09-23 - Comment: Model Architecture + Representation Learning: causality-induced positional encodings via DAG-to-hyperbolic embeddings integrated as rotary PEs for Transformers.
Sparse-Autoencoder-Guided Internal Representation Unlearning for Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-09-22 - Comment: Representation Learning: directly intervenes in LLM internal activations via a sparse autoencoder latent space to achieve genuine unlearning (aligning targets with “unknown” representations).
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification - Score: 17 (R=9, N=8) - Date: 2025-09-22 - Comment: Model Architecture + Representation Learning: unified encoder/decoder mapping to disjoint zones in a shared Gaussian latent space enabling generation, representation learning, and classification.
A Compositional Kernel Model for Feature Learning - Score: 17 (R=9, N=8) - Date: 2025-09-18 - Comment: Representation learning theory: compositional kernel model with guarantees on variable selection and recovery of nonlinear features.
Sparse Neurons Carry Strong Signals of Question Ambiguity in LLMs - Score: 17 (R=9, N=8) - Date: 2025-09-18 - Comment: Representation Learning: identifies sparse neurons encoding question ambiguity and demonstrates controllable behavior via neuron-level manipulation.
Why and How Auxiliary Tasks Improve JEPA Representations - Score: 17 (R=9, N=8) - Date: 2025-09-17 - Comment: Representation Learning: provides theory (no unhealthy collapse) for JEPA with auxiliary tasks, clarifying what distinctions encoders must preserve.
Learning Neural Networks by Neuron Pursuit - Score: 17 (R=9, N=8) - Date: 2025-09-17 - Comment: Matches Representation Learning and training dynamics: analyzes gradient flow near sparse saddle points and introduces a greedy architecture growth/training algorithm (Neuron Pursuit).
Identifiable Autoregressive Variational Autoencoders for Nonlinear and Nonstationary Spatio-Temporal Blind Source Separation - Score: 17 (R=9, N=8) - Date: 2025-09-17 - Comment: Matches Model Architecture/Representation Learning: identifiable autoregressive VAE with identifiability guarantees for nonlinear, nonstationary sources.
Contrastive Network Representation Learning - Score: 17 (R=9, N=8) - Date: 2025-09-16 - Comment: Representation Learning: contrastive edge embedding for networks with adaptive masking and theoretical guarantees (non-asymptotic bounds, minimax optimality).
Why does your graph neural network fail on some graphs? Insights from exact generalisation error - Score: 17 (R=9, N=8) - Date: 2025-09-15 - Comment: Representation Learning: exact generalization error characterization for broad classes of GNNs, providing theoretical insights into structure–feature alignment and generalization.
Representation-Aware Distributionally Robust Optimization: A Knowledge Transfer Framework - Score: 17 (R=9, N=8) - Date: 2025-09-12 - Comment: Representation Learning: introduces representation-aware Wasserstein DRO with theoretical reformulations (seminorm regularization equivalence) and an optimization method on the solution surface.
An entropy formula for the Deep Linear Network - Score: 17 (R=9, N=8) - Date: 2025-09-12 - Comment: Foundational analysis of deep linear networks via Riemannian geometry and an entropy formula—insights into representation/training dynamics.
Performative Thinking? The Brittle Correlation Between CoT Length and Problem Complexity - Score: 17 (R=9, N=8) - Date: 2025-09-10 - Comment: Representation learning/training dynamics: empirical study of intermediate token generation (Chain-of-Thought) vs problem complexity using transformers trained from scratch.
Robust and Adaptive Spectral Method for Representation Multi-Task Learning with Contamination - Score: 17 (R=9, N=8) - Date: 2025-09-09 - Comment: Representation Learning: robust, adaptive spectral method to extract shared representations in contaminated multi-task settings with non-asymptotic guarantees.
Evaluating the Efficiency of Latent Spaces via the Coupling-Matrix - Score: 17 (R=9, N=8) - Date: 2025-09-09 - Comment: Representation Learning: proposes a redundancy index rho(C) using coupling-matrix off-diagonal statistics (energy distance) to quantify inter-dimensional dependencies in latent spaces.
Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining - Score: 17 (R=9, N=8) - Date: 2025-09-08 - Comment: Representation Learning: aligns sparse features across checkpoints and introduces a causal metric (RelIE) to track emergence/maintenance of linguistic features during pretraining.
Dynamical Learning in Deep Asymmetric Recurrent Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-09-08 - Comment: Matches Model Architecture (asymmetric deep recurrent networks with sparse excitatory couplings) and Representation Learning (emergent stable manifolds and gradient-free distributed learning dynamics).
Beyond I-Con: Exploring New Dimension of Distance Measures in Representation Learning - Score: 17 (R=9, N=8) - Date: 2025-09-08 - Comment: Matches Representation Learning: introduces a framework exploring alternative divergences and similarity kernels to derive new objectives beyond KL (contrastive/PMI/SNE).
Just-in-time and distributed task representations in language models - Score: 17 (R=9, N=8) - Date: 2025-09-08 - Comment: Matches Representation Learning criterion—empirical analysis of when/where transferable task representations form and evolve during in-context learning in LMs.
Natural Latents: Latent Variables Stable Across Ontologies - Score: 17 (R=9, N=8) - Date: 2025-09-05 - Comment: The paper discusses conditions under which latent variables can be translated between different generative models, contributing to foundational research in representation learning.
Sparse Autoencoder Neural Operators: Model Recovery in Function Spaces - Score: 17 (R=9, N=8) - Date: 2025-09-05 - Comment: The paper extends sparse autoencoders to function spaces, contributing to foundational research in representation learning and model recovery.
Differentiable Entropy Regularization for Geometry and Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-09-05 - Comment: The paper introduces a differentiable estimator of range-partition entropy and applies it to neural networks, contributing to representation learning and model efficiency.
Improving Factuality in LLMs via Inference-Time Knowledge Graph Construction - Score: 17 (R=9, N=8) - Date: 2025-09-05 - Comment: The paper introduces a novel framework for improving factuality in LLMs by constructing knowledge graphs during inference, which aligns with foundational research in LLM behavior and interpretability.
Structure Transfer: an Inference-Based Calculus for the Transformation of Representations - Score: 17 (R=9, N=8) - Date: 2025-09-04 - Comment: The paper introduces a novel calculus for representation transformation across diverse representational systems, which aligns with the core topic of representation learning.
SurGBSA: Learning Representations From Molecular Dynamics Simulations - Score: 17 (R=9, N=8) - Date: 2025-09-04 - Comment: The paper presents a new modeling approach for MD-based representation learning, which aligns with the core topic of representation learning.
Learnable Loss Geometries with Mirror Descent for Scalable and Convergent Meta-Learning - Score: 17 (R=9, N=8) - Date: 2025-09-03 - Comment: The paper presents a novel approach to meta-learning by learning a versatile distance-generating function, which aligns with representation learning and training dynamics in neural networks.
Fisher information flow in artificial neural networks - Score: 17 (R=9, N=8) - Date: 2025-09-03 - Comment: The paper introduces a method to monitor Fisher information flow in neural networks, providing insights into how networks process information, relevant to representation learning.
Geometric origin of adversarial vulnerability in deep learning - Score: 17 (R=9, N=8) - Date: 2025-09-03 - Comment: The paper presents a geometry-aware deep learning framework for improving adversarial robustness, focusing on representation learning and training dynamics.
Beyond Universal Approximation Theorems: Algorithmic Uniform Approximation by Neural Networks Trained with Noisy Data - Score: 17 (R=9, N=8) - Date: 2025-09-03 - Comment: The paper extends universal approximation theorems to noisy data, providing theoretical insights into neural network behavior, relevant to representation learning.
Causal Interpretation of Sparse Autoencoder Features in Vision - Score: 17 (R=9, N=8) - Date: 2025-09-03 - Comment: The paper proposes a method for causal interpretation of sparse autoencoder features, aligning with representation learning and feature learning.
Domain Generalization in-the-Wild: Disentangling Classification from Domain-Aware Representations - Score: 17 (R=9, N=8) - Date: 2025-09-01 - Comment: The paper focuses on enhancing domain awareness in foundational models like CLIP, which aligns with representation learning and foundational model insights. It introduces a novel approach to disentangle classification from domain-aware representations, which is a significant theoretical contribution.
Normalized Maximum Likelihood Code-Length on Riemannian Manifold Data Spaces - Score: 17 (R=9, N=8) - Date: 2025-09-01 - Comment: The paper extends the concept of Normalized Maximum Likelihood to Riemannian manifolds, which is a significant theoretical contribution to model selection and representation learning in non-Euclidean spaces.
Adaptive Heavy-Tailed Stochastic Gradient Descent - Score: 17 (R=9, N=8) - Date: 2025-09-01 - Comment: The paper introduces Adaptive Heavy Tailed Stochastic Gradient Descent (AHTSGD), which is relevant to representation learning as it provides insights into training dynamics and optimization in neural networks.
Distribution-Aware Feature Selection for SAEs - Score: 17 (R=9, N=8) - Date: 2025-09-01 - Comment: The paper discusses Distribution-Aware Feature Selection for Sparse Autoencoders (SAEs), which is relevant to representation learning and model compression as it involves sparse methods and feature selection.
RelP: Faithful and Efficient Circuit Discovery via Relevance Patching - Score: 17 (R=9, N=8) - Date: 2025-09-01 - Comment: The paper introduces Relevance Patching (RelP), a method improving mechanistic interpretability in neural networks, which is relevant to representation learning and theoretical insights into model behavior.
Expressive Power of Deep Networks on Manifolds: Simultaneous Approximation - Score: 17 (R=8, N=9) - Date: 2025-09-12 - Comment: Representation Learning/Theory: simultaneous Sobolev approximation on manifolds with bounded-weight ReLU^k networks and matching lower bounds, leveraging sparse architectural structure.
Learning to Ponder: Adaptive Reasoning in Latent Space - Score: 16 (R=9, N=7) - Date: 2025-09-30 - Comment: Conditional/Dynamic Networks: instance-adaptive test-time compute via a controller that applies latent steering and learned halting without changing backbone.
Measuring Sparse Autoencoder Feature Sensitivity - Score: 16 (R=9, N=7) - Date: 2025-09-30 - Comment: Representation learning/interpretability: introduces sensitivity metric and evaluation protocol for Sparse Autoencoder features.
Hedonic Neurons: A Mechanistic Mapping of Latent Coalitions in Transformer MLPs - Score: 16 (R=9, N=7) - Date: 2025-09-30 - Comment: Representation Learning: mechanistic interpretability of Transformer MLPs via coalitional game theory to reveal synergistic neuron groups.
MonoCon: A general framework for learning ultra-compact high-fidelity representations using monotonicity constraints - Score: 16 (R=9, N=7) - Date: 2025-09-30 - Comment: Model Compression/Efficiency + Representation Learning: monotonic MLP head imposes functional constraints to learn ultra-compact, robust embeddings across domains.
OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features - Score: 16 (R=9, N=7) - Date: 2025-09-29 - Comment: Representation Learning: Sparse autoencoders with orthogonality regularization to mitigate feature absorption/composition while scaling linearly.
On Theoretical Interpretations of Concept-Based In-Context Learning - Score: 16 (R=9, N=7) - Date: 2025-09-26 - Comment: Theoretical representation learning criterion: provides theory for concept-based in-context learning, including similarity measures, effects of demo size and embedding dimension, explaining when/why CB-ICL works.
Probability Signature: Bridging Data Semantics and Embedding Structure in Language Models - Score: 16 (R=9, N=7) - Date: 2025-09-25 - Comment: Representation Learning: provides mechanistic insight into how data distributions (probability signatures) shape embedding geometry via gradient-flow analysis.
Learning Dynamics of Deep Learning -- Force Analysis of Deep Neural Networks - Score: 16 (R=9, N=7) - Date: 2025-09-25 - Comment: Representation Learning/Training Dynamics: proposes a force-based framework analyzing inter-example influences during training, offering insights into how networks learn.
Towards Provable Emergence of In-Context Reinforcement Learning - Score: 16 (R=9, N=7) - Date: 2025-09-24 - Comment: Representation Learning/Training Dynamics: theoretical analysis showing when Transformer pretraining yields in-context RL.
Interpreting vision transformers via residual replacement model - Score: 16 (R=9, N=7) - Date: 2025-09-23 - Comment: Representation Learning/Interpretability: sparse autoencoder features and residual replacement model to explain ViT circuits across layers.
The Transfer Neurons Hypothesis: An Underlying Mechanism for Language Latent Space Transitions in Multilingual LLMs - Score: 16 (R=9, N=7) - Date: 2025-09-23 - Comment: Representation learning: identifies and validates MLP “transfer neurons” enabling transitions between language-specific and shared latent spaces in multilingual LLMs.
Language models' activations linearly encode training-order recency - Score: 16 (R=9, N=7) - Date: 2025-09-18 - Comment: Representation Learning: reveals that activations linearly encode training-order recency, with successful linear probes—an insight into internal representations and training dynamics.
A Modern Look at Simplicity Bias in Image Classification Tasks - Score: 16 (R=9, N=7) - Date: 2025-09-17 - Comment: Representation Learning: introduces a frequency-aware measure of simplicity bias in CLIP and links inductive bias to generalization/robustness.
Beyond Instance Consistency: Investigating View Diversity in Self-supervised Learning - Score: 16 (R=9, N=7) - Date: 2025-09-16 - Comment: Representation Learning: analyses of instance consistency and view diversity in SSL with an EMD-based estimator guiding view design.
Sparse Coding Representation of 2-way Data - Score: 16 (R=9, N=7) - Date: 2025-09-15 - Comment: Representation Learning and Efficiency: sparse dictionary learning with a low-rank coding model, sample complexity guarantees, and a convex relaxation with convergence.
Beyond I'm Sorry, I Can't: Dissecting Large Language Model Refusal - Score: 16 (R=9, N=7) - Date: 2025-09-15 - Comment: Matches Representation Learning criterion: uses sparse autoencoders on residual-stream activations to identify and causally manipulate refusal-mediating features, providing mechanistic insight via sparse features and interactions.
Fourier Learning Machines: Nonharmonic Fourier-Based Neural Networks for Scientific Machine Learning - Score: 16 (R=9, N=7) - Date: 2025-09-11 - Comment: Model Architecture and Representation Learning: introduces a Fourier-based MLP with cosine activations that learns frequencies/amplitudes/phases, yielding a separable Fourier basis with one-to-one mapping to coefficients.
Towards Interpretable Deep Neural Networks for Tabular Data - Score: 16 (R=9, N=7) - Date: 2025-09-11 - Comment: Matches Representation Learning and Model Architecture: employs a sparse autoencoder to learn a dictionary of monosemantic features with interpretable latent components for prediction.
FAVAE-Effective Frequency Aware Latent Tokenizer - Score: 16 (R=9, N=7) - Date: 2025-09-09 - Comment: Model Architecture and Representation Learning: frequency-aware VAE tokenizer with wavelet-based decoupling of low/high frequencies to improve latent representations and high-frequency reconstruction.
The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs - Score: 16 (R=9, N=7) - Date: 2025-09-05 - Comment: The paper provides insights into LLM behavior and interpretability, which aligns with the foundational research on LLMs.
Graph Contrastive Learning versus Untrained Baselines: The Role of Dataset Size - Score: 16 (R=9, N=7) - Date: 2025-09-03 - Comment: The paper discusses Graph Contrastive Learning, which is a form of representation learning, and examines its performance relative to untrained baselines, providing insights into training dynamics and dataset size effects.
Discrete Variational Autoencoding via Policy Search - Score: 16 (R=8, N=8) - Date: 2025-09-30 - Comment: Matches Autoencoders/Discrete Representation Learning: new policy-search-based training for discrete VAEs avoiding reparameterization, with scaling to high-dimensional data.
LLM DNA: Tracing Model Evolution via Functional Representations - Score: 16 (R=8, N=8) - Date: 2025-09-30 - Comment: Representation Learning: defines a low-dimensional, bi-Lipschitz functional representation (LLM DNA) of model behavior with theoretical guarantees and a training-free extraction pipeline.
Rethinking JEPA: Compute-Efficient Video SSL with Frozen Teachers - Score: 16 (R=8, N=8) - Date: 2025-09-30 - Comment: Representation Learning + Efficiency: SALT replaces EMA with a frozen teacher for masked-latent prediction, improving compute-efficiency and scalability in video SSL.
Let LLMs Speak Embedding Languages: Generative Text Embeddings via Iterative Contrastive Refinement - Score: 16 (R=8, N=8) - Date: 2025-09-30 - Comment: Matches Representation Learning: generative iterative refinement for contrastive sentence embeddings using autoregressive LLMs.
Define latent spaces by example: optimisation over the outputs of generative models - Score: 16 (R=8, N=8) - Date: 2025-09-30 - Comment: Representation Learning/Control: introduces surrogate Euclidean latent spaces extracted from generative models for constraint-aware optimization without retraining.
Dense associative memory on the Bures-Wasserstein space - Score: 16 (R=8, N=8) - Date: 2025-09-30 - Comment: Model Architecture + Representation Learning: extends dense associative memory to the Bures–Wasserstein space with storage/retrieval theory, a foundational representational framework.
Toward a Physics of Deep Learning and Brains - Score: 16 (R=8, N=8) - Date: 2025-09-29 - Comment: Representation learning/training dynamics: non-equilibrium statistical physics framework (quasi-criticality, susceptibility, universality classes) explaining when networks learn best.
Spectral Collapse Drives Loss of Plasticity in Deep Continual Learning - Score: 16 (R=8, N=8) - Date: 2025-09-29 - Comment: Representation Learning/Training Dynamics: links Hessian spectral collapse to loss of plasticity and proposes regularizers (feature-rank, L2) to preserve trainability.
Why High-rank Neural Networks Generalize?: An Algebraic Framework with RKHSs - Score: 16 (R=8, N=8) - Date: 2025-09-29 - Comment: Representation Learning/Training Dynamics: new Rademacher complexity bounds via RKHS/Koopman explain why high-rank weight matrices generalize.
Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models - Score: 16 (R=8, N=8) - Date: 2025-09-29 - Comment: Representation Learning: mechanistic attribution of backdoor features and attention heads; Sparsity: identifies a sparse set of heads enabling backdoor control via a single vector intervention.
Differentiable Structure Learning for General Binary Data - Score: 16 (R=8, N=8) - Date: 2025-09-29 - Comment: Matches Representation Learning: general differentiable structure learning for binary variables with identifiability analysis and a single differentiable optimization, capturing arbitrary dependencies.
Bispectral OT: Dataset Comparison using Symmetry-Aware Optimal Transport - Score: 16 (R=8, N=8) - Date: 2025-09-26 - Comment: Representation Learning: symmetry-aware optimal transport using bispectral (group-Fourier) invariants to compare datasets under group actions.
Latent Twins - Score: 16 (R=8, N=8) - Date: 2025-09-26 - Comment: Representation learning/operator learning criterion: unifying latent operator surrogate framework (Latent Twins) with theoretical approximation properties for ODEs/PDEs, bridging model reduction and learned representations.
Function Spaces Without Kernels: Learning Compact Hilbert Space Representations - Score: 16 (R=8, N=8) - Date: 2025-09-26 - Comment: Model Compression and Efficiency + Representation Learning: learns compact neural bases for function spaces with progressive growth and prune strategies, kernel-theoretic view, and generalization bounds.
SIM-CoT: Supervised Implicit Chain-of-Thought - Score: 16 (R=8, N=8) - Date: 2025-09-25 - Comment: Model Architecture/Representation Learning: step-level supervision via an auxiliary decoder to stabilize and diversify latent states in implicit CoT, improving training dynamics with no inference overhead.
Unveiling the Role of Learning Rate Schedules via Functional Scaling Laws - Score: 16 (R=8, N=8) - Date: 2025-09-24 - Comment: Representation learning/training dynamics: theoretical analysis of SGD dynamics and learning-rate schedules via a functional scaling law.
Soft Tokens, Hard Truths - Score: 16 (R=8, N=8) - Date: 2025-09-24 - Comment: Representation Learning/Training Dynamics: learning continuous (soft) Chain-of-Thought tokens via RL to enhance reasoning diversity and performance.
Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise - Score: 16 (R=8, N=8) - Date: 2025-09-23 - Comment: Matches Representation Learning: theoretical SDE analysis of SGN in SAM explains m-sharpness and motivates a new sharpness-weighted sampling method.
Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization - Score: 16 (R=8, N=8) - Date: 2025-09-23 - Comment: Representation Learning/Training Dynamics: instance-wise finite-sample risk comparisons showing implicit regularization (GD) dominates explicit (ridge) under broad conditions.
Universal Learning of Stochastic Dynamics for Exact Belief Propagation using Bernstein Normalizing Flows - Score: 16 (R=8, N=8) - Date: 2025-09-22 - Comment: Matches model-architecture innovation (normalizing-flow design with Bernstein polynomials) enabling analytical belief propagation; also advances representation learning of stochastic dynamics.
Synthetic bootstrapped pretraining - Score: 16 (R=8, N=8) - Date: 2025-09-22 - Comment: Representation Learning/Foundation Models: proposes synthetic bootstrapped pretraining that models inter-document relations to improve LM pretraining.
Beyond Correlation: Causal Multi-View Unsupervised Feature Selection Learning - Score: 16 (R=8, N=8) - Date: 2025-09-18 - Comment: Representation Learning: causal regularization for unsupervised multi-view feature selection to mitigate confounding and select informative features.
Spectral Bottleneck in Deep Neural Networks: Noise is All You Need - Score: 16 (R=8, N=8) - Date: 2025-09-17 - Comment: Matches Representation Learning/training dynamics: proposes target-aware weight initialization with noise to overcome spectral bias in INRs; analyzes activation spectra and NTK eigenbasis.
Deceptive Risk Minimization: Out-of-Distribution Generalization by Deceiving Distribution Shift Detectors - Score: 16 (R=8, N=8) - Date: 2025-09-16 - Comment: Representation Learning: objective for invariant features by deceiving distribution shift detectors (conformal martingales) to improve OOD generalization without domain partitioning.
Kernel-based Stochastic Approximation Framework for Nonlinear Operator Learning - Score: 16 (R=8, N=8) - Date: 2025-09-16 - Comment: Representation Learning: theoretical framework for nonlinear operator learning with general operator-valued kernels and dimension-free convergence rates.
A Discrepancy-Based Perspective on Dataset Condensation - Score: 16 (R=8, N=8) - Date: 2025-09-15 - Comment: Representation Learning: frames dataset condensation under discrepancy-based distribution matching, providing a unified theoretical objective beyond task-specific generalization.
ReBaNO: Reduced Basis Neural Operator Mitigating Generalization Gaps and Achieving Discretization Invariance - Score: 16 (R=8, N=8) - Date: 2025-09-12 - Comment: Strong match to Model Architecture and Representation Learning: reduced-basis–driven adaptive network construction yielding compact operator learners with discretization invariance.
ALICE: An Interpretable Neural Architecture for Generalization in Substitution Ciphers - Score: 16 (R=8, N=8) - Date: 2025-09-10 - Comment: Matches Model Architecture and Representation Learning: introduces a bijective decoding head (Gumbel-Sinkhorn) in a Transformer and analyzes layer-wise strategies for combinatorial generalization.
Learning from one graph: transductive learning guarantees via the geometry of small random worlds - Score: 16 (R=8, N=8) - Date: 2025-09-09 - Comment: Theoretical Foundations: transductive learning guarantees via geometric concentration and extensions to GCNs, providing rates under single-graph settings.
Probabilistic operator learning: generative modeling and uncertainty quantification for foundation models of differential equations - Score: 16 (R=8, N=8) - Date: 2025-09-08 - Comment: Representation Learning: probabilistic/Bayesian framework for operator foundation models (ICON) with generative posterior and uncertainty quantification.
Learning Laplacian Eigenvectors: a Pre-training Method for Graph Neural Networks - Score: 16 (R=8, N=8) - Date: 2025-09-04 - Comment: The paper proposes a novel pre-training method for GNNs by learning Laplacian eigenvectors, which is relevant to representation learning.
Accelerating PDE Solvers with Equation-Recast Neural Operator Preconditioning - Score: 16 (R=8, N=8) - Date: 2025-09-03 - Comment: The paper introduces a new paradigm for accelerating PDE solvers with neural operator preconditioning, which is relevant to AI for science and emerging trends.
Theory Foundation of Physics-Enhanced Residual Learning - Score: 16 (R=8, N=8) - Date: 2025-09-03 - Comment: The paper provides a theoretical foundation for physics-enhanced residual learning, which is relevant to AI for science and emerging trends.
Weighted Support Points from Random Measures: An Interpretable Alternative for Generative Modeling - Score: 16 (R=8, N=8) - Date: 2025-09-01 - Comment: The paper introduces a new generative modeling framework using random weighted support points, which offers a compact representation of data. This aligns with representation learning and introduces a novel method that is interpretable and efficient, providing a theoretical contribution to generative modeling.
Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Matches Representation Learning/Interpretability: proposes a VSA-based probing method to decode structured concepts from LLM residual stream, addressing SAE/DLA limitations.
Beyond Softmax: A Natural Parameterization for Categorical Random Variables - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Model Architecture/Representation Learning: proposes a replacement for softmax (catnat) via hierarchical binary splits with information-geometric justification (diagonal Fisher), improving gradient-based learning.
Interpretable Kernel Representation Learning at Scale: A Unified Framework Utilizing Nystr\"om Approximation - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Matches Representation Learning criterion: scalable kernel-based representation learning via Nyström approximation with interpretability.
Semantic Compression via Multimodal Representation Learning - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Matches Compression/Efficiency and Representation Learning: semantic compression of multimodal embeddings via modality-gap reduction and centroiding.
Dual Mechanisms of Value Expression: Intrinsic vs. Prompted Values in LLMs - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Representation Learning/Mechanistic interpretability: identifies and contrasts intrinsic vs prompted value vectors/neurons in LLMs.
Uncovering Grounding IDs: How External Cues Shape Multi-Modal Binding - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Representation Learning: uncovers latent Grounding IDs induced by external cues, explaining improved multimodal binding and attention via causal/representational analyses.
Toward Preference-aligned Large Language Models via Residual-based Model Steering - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Matches Efficiency/Representation Learning: training-free residual-stream steering vectors for preference alignment applied at inference time.
Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Representation Learning/Training dynamics: mechanistic analysis of subliminal learning in distillation, identifying divergence tokens and early-layer roles.
Graph Your Own Prompt - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Matches Representation Learning: graph consistency regularization aligning feature-similarity graphs with class-aware prediction graphs via parameter-free layers.
Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Representation Learning: layer-wise chain-of-embedding analysis revealing a Visual Integration Point and TVI metric to quantify language prior in LVLMs.
Understanding Catastrophic Interference On the Identifibility of Latent Representations - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Representation Learning: theoretical framework linking catastrophic interference to identifiability of shared latent variables; method to learn shared representations to mitigate forgetting.
What Matters More For In-Context Learning under Matched Compute Budgets: Pretraining on Natural Text or Incorporating Targeted Synthetic Examples? - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Representation Learning: mechanism-focused study of induction circuits under iso-FLOPs with targeted synthetic curricula and head-level telemetry/ablations informing ICL dynamics.
Concept activation vectors: a unifying view and adversarial attacks - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Representation Learning/Interpretability: probabilistic theory of Concept Activation Vectors with variance analysis and adversarial vulnerability.
Transformers Can Learn Connectivity in Some Graphs but Not Others - Score: 15 (R=8, N=7) - Date: 2025-09-29 - Comment: Matches Representation Learning: analyzes transformers’ ability to learn transitive relations/connectivity across graph families and scales, yielding insights into what structure transformers can encode.
Sharpness-Aware Minimization Can Hallucinate Minimizers - Score: 15 (R=8, N=7) - Date: 2025-09-29 - Comment: Representation Learning/Training Dynamics: theoretical analysis of SAM shows convergence to non-minimizers and proposes a remedy.
Prophecy: Inferring Formal Properties from Neuron Activations - Score: 15 (R=8, N=7) - Date: 2025-09-29 - Comment: Representation Learning/Interpretability: infers formal rules from neuron activation patterns for verification, monitoring, and explanation of neural behavior.
Understanding and Enhancing Mask-Based Pretraining towards Universal Representations - Score: 15 (R=8, N=7) - Date: 2025-09-29 - Comment: Representation Learning: theoretical characterization of mask-based pretraining and a simple masking scheme improving learned representations.
A Data-driven Typology of Vision Models from Integrated Representational Metrics - Score: 15 (R=8, N=7) - Date: 2025-09-29 - Comment: Matches Representation Learning: integrated representational similarity metrics and Similarity Network Fusion to typologize model representations (geometry, tuning, decodability).
IndiSeek learns information-guided disentangled representations - Score: 15 (R=8, N=7) - Date: 2025-09-29 - Comment: Representation Learning: disentanglement via independence-enforcing objective plus reconstruction loss that bounds conditional mutual information.
SAEmnesia: Erasing Concepts in Diffusion Models with Sparse Autoencoders - Score: 15 (R=8, N=7) - Date: 2025-09-29 - Comment: Matches Representation Learning and Sparsity: supervised sparse autoencoders to achieve one-to-one concept–neuron mappings enabling targeted unlearning in diffusion models.
No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-09-26 - Comment: Representation Learning/Training Dynamics: theoretical analysis of limits of reconstruction attacks under implicit bias, clarifying privacy–generalization interplay.
RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs - Score: 15 (R=8, N=7) - Date: 2025-09-26 - Comment: Representation Learning/Training dynamics: graph-based analysis of reasoning paths shows complementary effects of SFT vs RL on LLM reasoning, offering mechanistic insights into how training shapes representations.
Reinforcement Learning Fine-Tuning Enhances Activation Intensity and Diversity in the Internal Circuitry of LLMs - Score: 15 (R=8, N=7) - Date: 2025-09-26 - Comment: Representation Learning/Training Dynamics: probes how RL fine-tuning reshapes internal activations (intensity/diversity) compared to SFT/DPO.
SiNGER: A Clearer Voice Distills Vision Transformers Further - Score: 15 (R=8, N=7) - Date: 2025-09-26 - Comment: Matches Model Compression and Efficiency + Representation Learning: a distillation framework using nullspace-guided feature refinement with LoRA adapters to suppress artifacts.
CLUE: Conflict-guided Localization for LLM Unlearning Framework - Score: 15 (R=8, N=7) - Date: 2025-09-26 - Comment: Representation Learning/Mechanistic interpretability: disentangles and localizes forget vs retain circuits in LLMs with CNF-based neuron attribution and targeted interventions, giving insights into how knowledge is encoded.
Alignment Unlocks Complementarity: A Framework for Multiview Circuit Representation Learning - Score: 15 (R=8, N=7) - Date: 2025-09-26 - Comment: Representation Learning: multiview self-supervision with functional alignment (Equivalence Alignment Loss) enabling effective cross-view masked modeling.
Uncovering Graph Reasoning in Decoder-only Transformers with Circuit Tracing - Score: 15 (R=8, N=7) - Date: 2025-09-25 - Comment: Representation Learning: circuit tracing reveals token-merging and structural memorization mechanisms in decoder-only Transformers for graph reasoning.
Feature Dynamics as Implicit Data Augmentation: A Depth-Decomposed View on Deep Neural Network Generalization - Score: 15 (R=8, N=7) - Date: 2025-09-25 - Comment: Representation Learning—training dynamics: temporal feature consistency across depth and anisotropic SGD noise linked to generalization.
Staying on the Manifold: Geometry-Aware Noise Injection - Score: 15 (R=8, N=7) - Date: 2025-09-25 - Comment: Representation Learning: geometry-aware noise injection on learned manifolds to regularize gradients and improve generalization.
Projective Kolmogorov Arnold Neural Networks (P-KANs): Entropy-Driven Functional Space Discovery for Interpretable Machine Learning - Score: 15 (R=8, N=7) - Date: 2025-09-25 - Comment: Model Architecture and Representation Learning: entropy-driven P-KAN training discovers lower-parameter functional bases (e.g., Fourier/Chebyshev) for compression and interpretability.
Interpreting ResNet-based CLIP via Neuron-Attention Decomposition - Score: 15 (R=8, N=7) - Date: 2025-09-25 - Comment: Representation Learning: decomposes CLIP-ResNet computation into neuron–attention-head paths, finds sparse influential pairs and aligns them with directions/text, yielding interpretable representation units.
Latent Iterative Refinement Flow: A Geometric-Constrained Approach for Few-Shot Generation - Score: 15 (R=8, N=7) - Date: 2025-09-25 - Comment: Representation Learning/Autoencoders: manifold-preserving autoencoder and contractive correction operator with convergence guarantees for few-shot generation.
How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-09-25 - Comment: Representation Learning/Training dynamics: proposes a scaling law for knowledge infusion to balance specialization and forgetting during pretraining.
Quantifying Compositionality of Classic and State-of-the-Art Embeddings - Score: 15 (R=8, N=7) - Date: 2025-09-25 - Comment: Representation Learning—quantifying additive compositionality in embeddings via CCA and reconstruction with layer-wise/training-stage analysis.
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT - Score: 15 (R=8, N=7) - Date: 2025-09-24 - Comment: Representation Learning/Training Dynamics: structural analysis of chain-of-thought with a new metric (Failed-Step Fraction) guiding structure-aware test-time scaling.
Recovering Wasserstein Distance Matrices from Few Measurements - Score: 15 (R=8, N=7) - Date: 2025-09-24 - Comment: Efficiency and Representation Learning: Nyström and matrix completion approaches to recover costly Wasserstein distance matrices with stability analysis for MDS, enabling scalable manifold learning.
Exploring Heterophily in Graph-level Tasks - Score: 15 (R=8, N=7) - Date: 2025-09-24 - Comment: Representation Learning/Theory: spectral analysis of heterophily for graph-level tasks yielding guidance for GNN architecture design.
Probabilistic Geometric Principal Component Analysis with application to neural data - Score: 15 (R=8, N=7) - Date: 2025-09-24 - Comment: Representation Learning: extends PPCA to explicitly model data around a known nonlinear manifold and derives an EM algorithm (PGPCA) for manifold-aware probabilistic dimensionality reduction.
Statistical Insight into Meta-Learning via Predictor Subspace Characterization and Quantification of Task Diversity - Score: 15 (R=8, N=7) - Date: 2025-09-24 - Comment: Representation learning theory: latent predictor subspace characterization and task diversity quantification for meta-learning performance analysis.
Self Identity Mapping - Score: 15 (R=8, N=7) - Date: 2025-09-24 - Comment: Representation Learning/Regularization: inverse-mapping regularizer preserves information and smooths gradients; model-agnostic and plug-and-play.
ConceptFlow: Hierarchical and Fine-grained Concept-Based Explanation for Convolutional Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-09-24 - Comment: Representation Learning/Interpretability: assigns concepts to filters and models concept propagation across CNN layers via concept attentions and transition matrices.
Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels - Score: 15 (R=8, N=7) - Date: 2025-09-24 - Comment: Analyzes SFT effects on knowledge at token/parameter levels; identifies non-contributory parameter updates (Representation Learning; training dynamics).
$\boldsymbol{\lambda}$-Orthogonality Regularization for Compatible Representation Learning - Score: 15 (R=8, N=7) - Date: 2025-09-23 - Comment: Representation Learning: λ-orthogonality regularization to learn near-orthogonal affine mappings that preserve latent geometry while aligning representations across independently trained models.
A Closer Look at Model Collapse: From a Generalization-to-Memorization Perspective - Score: 15 (R=8, N=7) - Date: 2025-09-23 - Comment: Matches Representation Learning: analyzes training dynamics (generalization→memorization) and proposes entropy-based data selection to mitigate collapse.
Hierarchical Retrieval: The Geometry and a Pretrain-Finetune Recipe - Score: 15 (R=8, N=7) - Date: 2025-09-23 - Comment: Matches Representation Learning: analyzes embedding geometry limits of dual encoders in hierarchical retrieval and proposes a pretrain-finetune training recipe.
EMPEROR: Efficient Moment-Preserving Representation of Distributions - Score: 15 (R=8, N=7) - Date: 2025-09-23 - Comment: Representation Learning: moment-preserving distributional descriptors (sliced GMM moments) with identifiability and finite-sample guarantees; efficient alternative to global pooling.
Stabilizing Information Flow Entropy: Regularization for Safe and Interpretable Autonomous Driving Perception - Score: 15 (R=8, N=7) - Date: 2025-09-23 - Comment: Representation Learning/Training Dynamics: entropy-based regularizer enforcing smooth mutual information flow and monotonic entropy decay across layers.
Digging Into the Internal: Causality-Based Analysis of LLM Function Calling - Score: 15 (R=8, N=7) - Date: 2025-09-23 - Comment: Representation/Training Dynamics: layer- and token-level causal interventions to analyze how function calling alters internal computation and compliance.
MTS-DMAE: Dual-Masked Autoencoder for Unsupervised Multivariate Time Series Representation Learning - Score: 15 (R=8, N=7) - Date: 2025-09-22 - Comment: Model Architecture and Representation Learning: proposes a dual-masked autoencoder for multivariate time series with teacher-guided latent estimation and feature-level alignment.
Detail Across Scales: Multi-Scale Enhancement for Full Spectrum Neural Representations - Score: 15 (R=8, N=7) - Date: 2025-09-22 - Comment: Model Architecture/Representation: wavelet-informed multi-scale implicit neural representation with a fine-scale kernel for high-frequency detail under compact models.
Computing Linear Regions in Neural Networks with Skip Connections - Score: 15 (R=8, N=7) - Date: 2025-09-22 - Comment: Representation Learning: algorithms to compute linear regions in piecewise-linear networks (including skip connections) via tropical geometry, yielding insights into training dynamics/overfitting.
Stochastic Sample Approximations of (Local) Moduli of Continuity - Score: 15 (R=8, N=7) - Date: 2025-09-22 - Comment: Matches representation-learning/robustness theory: stochastic approximation of local moduli of continuity to assess neural network robustness in closed-loop use.
Precision Neural Networks: Joint Graph And Relational Learning - Score: 15 (R=8, N=7) - Date: 2025-09-19 - Comment: Model Architecture: introduces Precision Neural Networks performing graph convolutions on the inverse covariance and jointly learns the precision matrix; Representation Learning: task-aware structure learning with theoretical bounds and sparsity via conditional independence.
DeCoP: Enhancing Self-Supervised Time Series Representation with Dependency Controlled Pre-training - Score: 15 (R=8, N=7) - Date: 2025-09-19 - Comment: Representation Learning: self-supervised pretraining for time series with explicit multi-scale dependency modeling (IPN, DCL, ICM) to improve generalization.
Data coarse graining can improve model performance - Score: 15 (R=8, N=7) - Date: 2025-09-19 - Comment: Representation Learning: analytical study of data coarse graining and its nonmonotonic effect on generalization risk in high-dimensional regression.
Curvature as a tool for evaluating dimensionality reduction and estimating intrinsic dimension - Score: 15 (R=8, N=7) - Date: 2025-09-18 - Comment: Representation learning evaluation: curvature-based metric to assess dimensionality reduction quality and estimate intrinsic dimension.
Evaluation Awareness Scales Predictably in Open-Weights Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-09-18 - Comment: Matches Representation Learning: analyzes internal activations with linear probes to reveal a scaling law for evaluation awareness in LLMs, offering insight into encoded context.
RepIt: Representing Isolated Targets to Steer Language Models - Score: 15 (R=8, N=7) - Date: 2025-09-17 - Comment: Matches Representation Learning/Model Analysis: isolates concept-specific activation directions to steer LLMs with neuron-level localization.
MEUV: Achieving Fine-Grained Capability Activation in Large Language Models via Mutually Exclusive Unlock Vectors - Score: 15 (R=8, N=7) - Date: 2025-09-17 - Comment: Representation Learning: factorizes refusal direction into near-orthogonal topic-aligned vectors to control capability activation in LLMs.
Disentangling Content from Style to Overcome Shortcut Learning: A Hybrid Generative-Discriminative Learning Framework - Score: 15 (R=8, N=7) - Date: 2025-09-17 - Comment: Representation Learning: explicit content–style disentanglement via analytical projection and invariance pretraining to mitigate shortcut learning.
Feature Space Topology Control via Hopkins Loss - Score: 15 (R=8, N=7) - Date: 2025-09-17 - Comment: Matches Representation Learning: introduces a new loss (Hopkins loss) to directly shape feature-space topology; evaluated also with nonlinear bottleneck autoencoders.
A Differential Manifold Perspective and Universality Analysis of Continuous Attractors in Artificial Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-09-17 - Comment: Matches Representation Learning theory: differential-manifold framework for continuous attractors; links attractor properties to Jacobian eigenvalues and singular value stratification.
From the Gradient-Step Denoiser to the Proximal Denoiser and their associated convergent Plug-and-Play algorithms - Score: 15 (R=8, N=7) - Date: 2025-09-15 - Comment: Optimization/Representation Learning: Plug-and-Play with denoisers trained to be explicit gradient/prox operators and associated convergent algorithms.
Semantic Concentration for Self-Supervised Dense Representations Learning - Score: 15 (R=8, N=7) - Date: 2025-09-12 - Comment: Representation Learning — proposes explicit semantic concentration for dense SSL via a noise-tolerant AP-based ranking loss and an object-aware prototype filtering mechanism for patch features.
Group Distributionally Robust Machine Learning under Group Level Distributional Uncertainty - Score: 15 (R=8, N=7) - Date: 2025-09-12 - Comment: Representation Learning/Robust Training: group-level Wasserstein DRO to optimize worst-group performance under distributional uncertainty with a descent–ascent algorithm and convergence results.
RoseCDL: Robust and Scalable Convolutional Dictionary Learning for Rare-event Detection - Score: 15 (R=8, N=7) - Date: 2025-09-10 - Comment: Representation Learning: introduces a robust, scalable convolutional dictionary learning algorithm for unsupervised pattern discovery.
Kernel VICReg for Self-Supervised Learning in Reproducing Kernel Hilbert Space - Score: 15 (R=8, N=7) - Date: 2025-09-10 - Comment: Matches Representation Learning: kernelizes the VICReg SSL objective in RKHS with Hilbert–Schmidt norms, offering a principled nonlinear representation learning formulation that avoids collapse.
PAC-Bayesian Generalization Bounds for Graph Convolutional Networks on Inductive Node Classification - Score: 15 (R=8, N=7) - Date: 2025-09-09 - Comment: Representation Learning/Training Dynamics: PAC-Bayesian generalization bounds for GCNs with dependent, non-stationary nodes; foundational theory on generalization.
Focusing by Contrastive Attention: Enhancing VLMs' Visual Reasoning - Score: 15 (R=8, N=7) - Date: 2025-09-09 - Comment: Representation Learning/Attention Mechanisms: training-free contrastive attention to isolate task-relevant visual signals and reduce attention entropy effects.
Rethinking Reasoning Quality in Large Language Models through Enhanced Chain-of-Thought via RL - Score: 15 (R=8, N=7) - Date: 2025-09-09 - Comment: Training dynamics/representation learning: RL-based reward shaping (Reasoning Quality Reward, Dynamic Length Advantage) to improve CoT reasoning in LLMs.
time2time: Causal Intervention in Hidden States to Simulate Rare Events in Time Series Foundation Models - Score: 15 (R=8, N=7) - Date: 2025-09-09 - Comment: Representation Learning: causal interventions on Transformer hidden states reveal and control semantic latent concepts in TS foundation models.
ProtSAE: Disentangling and Interpreting Protein Language Models via Semantically-Guided Sparse Autoencoders - Score: 15 (R=8, N=7) - Date: 2025-09-09 - Comment: Representation Learning/Interpretability with Sparsity: semantically-guided sparse autoencoders to disentangle features in PLMs while maintaining reconstruction fidelity.
Natural Spectral Fusion: p-Exponent Cyclic Scheduling and Early Decision-Boundary Alignment in First-Order Optimization - Score: 15 (R=8, N=7) - Date: 2025-09-08 - Comment: Matches Representation Learning (training dynamics via spectral bias analysis) and Model Efficiency (cyclic p-exponent adaptive-moment scheduling that reduces training cost).
Delta Activations: A Representation for Finetuned Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-09-05 - Comment: The paper introduces Delta Activations, a method for representing finetuned LLMs, which is relevant to representation learning and insights into LLM behavior.
Interpretable Clustering with Adaptive Heterogeneous Causal Structure Learning in Mixed Observational Data - Score: 15 (R=8, N=7) - Date: 2025-09-05 - Comment: The paper presents a framework for interpretable clustering with causal structure learning, which is relevant to representation learning and emerging trends in foundational research.
Nonnegative matrix factorization and the principle of the common cause - Score: 15 (R=8, N=7) - Date: 2025-09-05 - Comment: The paper explores the relationship between nonnegative matrix factorization and the principle of the common cause, contributing to representation learning.
Breaking the Mirror: Activation-Based Mitigation of Self-Preference in LLM Evaluators - Score: 15 (R=8, N=7) - Date: 2025-09-05 - Comment: The paper addresses self-preference bias in LLM evaluators, which is relevant to LLM behavior and interpretability.
CEHR-GPT: A Scalable Multi-Task Foundation Model for Electronic Health Records - Score: 15 (R=8, N=7) - Date: 2025-09-05 - Comment: The paper presents CEHR-GPT, a foundation model for EHRs, focusing on feature representation and temporal reasoning, which aligns with foundational research in representation learning and model architecture.
Unlearning That Lasts: Utility-Preserving, Robust, and Almost Irreversible Forgetting in LLMs - Score: 15 (R=8, N=7) - Date: 2025-09-04 - Comment: The paper introduces a new method for unlearning in LLMs, which is relevant to foundational research in LLM behavior and interpretability.
Structured Basis Function Networks: Loss-Centric Multi-Hypothesis Ensembles with Controllable Diversity - Score: 15 (R=8, N=7) - Date: 2025-09-04 - Comment: The paper introduces Structured Basis Function Networks, linking multi-hypothesis prediction and ensembling, which is relevant to representation learning.
Deep Research is the New Analytics System: Towards Building the Runtime for AI-Driven Analytics - Score: 15 (R=8, N=7) - Date: 2025-09-04 - Comment: The paper proposes a contrastive clustering framework for influential node identification, which involves representation learning through contrastive methods.
Contrastive clustering based on regular equivalence for influential node identification in complex networks - Score: 15 (R=8, N=7) - Date: 2025-09-04 - Comment: The paper presents a novel deep unsupervised framework for influential node identification using contrastive clustering, which aligns with the interest in representation learning and contrastive methods.
VASSO: Variance Suppression for Sharpness-Aware Minimization - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper introduces VASSO for sharpness-aware minimization, which is relevant to representation learning and model optimization.
Differentiable Expectation-Maximisation and Applications to Gaussian Mixture Model Optimal Transport - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper presents differentiable EM for Gaussian Mixture Models, which is relevant to representation learning and model architecture.
Towards Comprehensive Information-theoretic Multi-view Learning - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper proposes a new information-theoretic framework for multi-view learning, which is relevant to representation learning.
Fake & Square: Training Self-Supervised Vision Transformers with Synthetic Data and Synthetic Hard Negatives - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper investigates synthetic data and hard negatives in self-supervised learning for vision transformers, relevant to representation learning.
Causal representation learning from network data - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper introduces a framework for causal representation learning from network data, which is relevant to representation learning.
VISP: Volatility Informed Stochastic Projection for Adaptive Regularization - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper introduces an adaptive regularization method using gradient volatility, which is relevant to representation learning and model architecture.
Unraveling LLM Jailbreaks Through Safety Knowledge Neurons - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper presents a neuron-level interpretability method for LLMs, relevant to theoretical insights into LLM behavior.
Double Descent and Overparameterization in Particle Physics Data - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper explores overparameterization and double descent in particle physics data, providing insights into training dynamics and model behavior.
Reward-Weighted Sampling: Enhancing Non-Autoregressive Characteristics in Masked Diffusion LLMs - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper proposes a novel decoding strategy for masked diffusion models, which is relevant to model architecture and representation learning.
Optimized Weight Initialization on the Stiefel Manifold for Deep ReLU Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper introduces an optimized weight initialization method on the Stiefel manifold for deep ReLU networks, addressing issues like dying ReLU and gradient instability, which is relevant to representation learning and training dynamics.
Teaching AI to Remember: Insights from Brain-Inspired Replay in Continual Learning - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper explores brain-inspired replay mechanisms in continual learning, focusing on representation learning and training dynamics in neural networks.
Generalization vs. Memorization in Autoregressive Deep Learning: Or, Examining Temporal Decay of Gradient Coherence - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper explores generalization vs. memorization in autoregressive models, providing insights into model behavior, which is relevant to representation learning.
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation - Score: 15 (R=7, N=8) - Date: 2025-09-19 - Comment: Matches Representation Learning (training dynamics): a label-free RL objective coupling majority selection with novelty to prevent entropy collapse and improve generalization in LMs.
E-ROBOT: a dimension-free method for robust statistics and machine learning via Schr\"odinger bridge - Score: 15 (R=7, N=8) - Date: 2025-09-16 - Comment: Representation Learning criterion: introduces a robust OT loss (robust Sinkhorn divergence via entropic regularization) with a dimension-free O(n^{-1/2}) sample complexity guarantee.
Modified Loss of Momentum Gradient Descent: Fine-Grained Analysis - Score: 15 (R=7, N=8) - Date: 2025-09-11 - Comment: Representation Learning criterion: fine-grained theoretical analysis of optimization dynamics (heavy-ball momentum) via modified equations and principal flow.
Probabilistic Modeling of Latent Agentic Substructures in Deep Neural Networks - Score: 15 (R=7, N=8) - Date: 2025-09-09 - Comment: Representation/Theory: probabilistic framework for compositional latent subagents in neural networks via weighted log pooling, yielding insights into internal alignment dynamics.
SupCLAP: Controlling Optimization Trajectory Drift in Audio-Text Contrastive Learning with Support Vector Regularization - Score: 14 (R=7, N=7) - Date: 2025-09-26 - Comment: Matches Representation Learning: proposes Support Vector Regularization to control optimization trajectory drift in contrastive learning by shaping negative-sample effects.
Shaping Initial State Prevents Modality Competition in Multi-modal Fusion: A Two-stage Scheduling Framework via Fast Partial Information Decomposition - Score: 14 (R=7, N=7) - Date: 2025-09-26 - Comment: Representation Learning: information-theoretic training dynamics (mutual information and differentiable Partial Information Decomposition) to shape initialization and mitigate modality competition in multi-modal fusion.
Implicit Augmentation from Distributional Symmetry in Turbulence Super-Resolution - Score: 14 (R=7, N=7) - Date: 2025-09-26 - Comment: Representation Learning: analyzes when CNNs acquire rotational equivariance from data (implicit augmentation), offering insights into learned symmetry and training dynamics.
LatentGuard: Controllable Latent Steering for Robust Refusal of Attacks and Reliable Response Generation - Score: 14 (R=7, N=7) - Date: 2025-09-25 - Comment: Representation Learning: supervised VAE on internal MLP activations to learn disentangled, controllable latent factors for safety steering.
A Unified Noise-Curvature View of Loss of Trainability - Score: 14 (R=7, N=7) - Date: 2025-09-25 - Comment: Representation Learning/Training Dynamics: noise–curvature-based per-layer thresholds and scheduler to mitigate loss of trainability with Adam.
Quantum Harmonic Analysis and the Structure in Data: Augmentation - Score: 14 (R=7, N=7) - Date: 2025-09-25 - Comment: Representation learning theory: analyzes how augmentation enforces smoothness of principal components via harmonic analysis.
OmniBridge: Unified Multimodal Understanding, Generation, and Retrieval via Latent Space Alignment - Score: 14 (R=7, N=7) - Date: 2025-09-24 - Comment: Model Architecture: unified multimodal framework with bidirectional latent alignment; Representation Learning: shared cross-modal latent space via semantic-guided diffusion alignment.
Graph-based Clustering Revisited: A Relaxation of Kernel $k$-Means Perspective - Score: 14 (R=7, N=7) - Date: 2025-09-24 - Comment: Representation Learning: revised graph-based clustering with low-rank, nonnegative, doubly stochastic structure and theoretical optimization guarantees.
Subspace Clustering of Subspaces: Unifying Canonical Correlation Analysis and Subspace Clustering - Score: 14 (R=7, N=7) - Date: 2025-09-24 - Comment: Representation learning: subspace/dictionary learning via tensor block-term decomposition with identifiability results.
HyperNAS: Enhancing Architecture Representation for NAS Predictor via Hypernetwork - Score: 14 (R=7, N=7) - Date: 2025-09-24 - Comment: Model Architecture/Representation Learning: hypernetwork-based architecture representation for NAS predictors with global encoding and multi-task loss.
Can LLMs Reason Over Non-Text Modalities in a Training-Free Manner? A Case Study with In-Context Representation Learning - Score: 14 (R=7, N=7) - Date: 2025-09-23 - Comment: Representation Learning: training-free integration of non-text foundation model representations into LLMs via in-context representation learning; Efficiency: avoids fine-tuning.
Causal Representation Learning from Multimodal Clinical Records under Non-Random Modality Missingness - Score: 14 (R=7, N=7) - Date: 2025-09-23 - Comment: Representation Learning: MMNAR-aware multimodal fusion with causal modeling and contrastive reconstruction to learn robust patient representations.
On Optimal Steering to Achieve Exact Fairness - Score: 14 (R=7, N=7) - Date: 2025-09-22 - Comment: Representation Learning: optimizes steering of features/LLM internal representations to ideal distributions guaranteeing group-fair outcomes with provable properties.
Global Pre-fixing, Local Adjusting: A Simple yet Effective Contrastive Strategy for Continual Learning - Score: 14 (R=7, N=7) - Date: 2025-09-22 - Comment: Representation Learning: supervised contrastive continual-learning strategy enforcing inter-/intra-task ETF structure to shape embeddings and reduce forgetting.
Mind the Gap: Data Rewriting for Stable Off-Policy Supervised Fine-Tuning - Score: 14 (R=7, N=7) - Date: 2025-09-19 - Comment: Matches Representation Learning/Training Dynamics with an algorithmic framework (data rewriting) to stabilize off-policy supervised fine-tuning by shrinking the policy gap.
Learning Graph from Smooth Signals under Partial Observation: A Robustness Analysis - Score: 14 (R=7, N=7) - Date: 2025-09-19 - Comment: Matches Representation Learning: provides a theoretical robustness analysis for graph topology learning under partial observations, extending RIP to Dirichlet energy objectives.
A Variational Framework for Residual-Based Adaptivity in Neural PDE Solvers and Operator Learning - Score: 14 (R=7, N=7) - Date: 2025-09-18 - Comment: Matches Representation Learning: formalizes residual-based adaptivity via a variational objective, improving training dynamics (variance reduction, gradient SNR) and principled sampling/discretization.
The LLM Already Knows: Estimating LLM-Perceived Question Difficulty via Hidden Representations - Score: 14 (R=7, N=7) - Date: 2025-09-17 - Comment: Matches Representation Learning/Efficiency: leverages hidden representations for difficulty estimation enabling adaptive inference without token generation.
LTA-thinker: Latent Thought-Augmented Training Framework for Large Language Models on Complex Reasoning - Score: 14 (R=7, N=7) - Date: 2025-09-17 - Comment: Matches Model Architecture: introduces a learnable latent-thought prior and distributional training losses (KL/contrastive) for dynamic reasoning.
A Time-Series Foundation Model by Universal Delay Embedding - Score: 14 (R=7, N=7) - Date: 2025-09-17 - Comment: Representation Learning + Model Architecture: integrates delay-embedding representations with a self-attention encoder and Koopman operator for dynamical modeling.
SpeCa: Accelerating Diffusion Transformers with Speculative Feature Caching - Score: 14 (R=7, N=7) - Date: 2025-09-17 - Comment: Matches Efficiency: speculative feature caching and forecast-then-verify sampling to accelerate diffusion transformer inference.
HARP: Hallucination Detection via Reasoning Subspace Projection - Score: 14 (R=7, N=7) - Date: 2025-09-16 - Comment: Representation Learning: disentangles semantic vs. reasoning subspaces via SVD of the unembedding matrix and uses projections of hidden states.
Data distribution impacts the performance and generalisability of contrastive learning-based foundation models of electrocardiograms - Score: 14 (R=7, N=7) - Date: 2025-09-15 - Comment: Representation Learning: investigates how pretraining data distribution affects contrastive representations and introduces an IDB pretraining strategy to improve OOD robustness.
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs - Score: 14 (R=7, N=7) - Date: 2025-09-12 - Comment: Representation Learning/Training Dynamics: analyzes long-horizon execution, identifies self-conditioning error effects, and proposes a measurement framework for execution capability.
Tokenizing Loops of Antibodies - Score: 14 (R=7, N=7) - Date: 2025-09-11 - Comment: Matches Representation Learning—contrastive multimodal tokenizer for structural sequence representations; also an architectural tokenization module integrable into protein foundation models.
ACE and Diverse Generalization via Selective Disagreement - Score: 14 (R=7, N=7) - Date: 2025-09-10 - Comment: Matches Representation Learning: addresses underspecification and spurious correlations by learning diverse concept sets via confident selective disagreement (self-training).
FUnc-SNE: A flexible, Fast, and Unconstrained algorithm for neighbour embeddings - Score: 14 (R=7, N=7) - Date: 2025-09-10 - Comment: Representation Learning/Efficiency: introduces a fast neighbour embedding method with a novel iterative approximate nearest neighbour search allowing flexible embedding dimensionality.
Reconstruction Alignment Improves Unified Multimodal Models - Score: 14 (R=7, N=7) - Date: 2025-09-10 - Comment: Representation learning/training dynamics: post-training self-supervised reconstruction to realign understanding and generation in unified multimodal models with efficient compute.
On optimal solutions of classical and sliced Wasserstein GANs with non-Gaussian data - Score: 14 (R=7, N=7) - Date: 2025-09-09 - Comment: Representation Learning: theoretical characterization of optimal WGAN parameters beyond LQG and asymptotic optimality in sliced WGANs, offering insights into learned generative mappings.
Text-Trained LLMs Can Zero-Shot Extrapolate PDE Dynamics - Score: 14 (R=7, N=7) - Date: 2025-09-09 - Comment: Representation Learning: analysis of in-context learning scaling laws and token-level dynamics for zero-shot PDE rollout.
Icon$^{2}$: Aligning Large Language Models Using Self-Synthetic Preference Data via Inherent Regulation - Score: 14 (R=7, N=7) - Date: 2025-09-09 - Comment: Representation Learning: leverages layer-wise direction vectors to regulate token representations and synthesize preference data for alignment.
Adapt in the Wild: Test-Time Entropy Minimization with Sharpness and Feature Regularization - Score: 14 (R=7, N=7) - Date: 2025-09-08 - Comment: Matches Model Efficiency (stable test-time adaptation via sharpness-aware updates) and Representation Learning (feature regularizers to prevent representation collapse).
Neuro-Spectral Architectures for Causal Physics-Informed Networks - Score: 14 (R=7, N=7) - Date: 2025-09-08 - Comment: Matches Model Architecture (neuro-spectral PINN combining spectral projection with Neural ODE) and Representation Learning (addresses spectral bias and causal consistency).

Other Foundational Research (14)

Active Attacks: Red-teaming LLMs via Adaptive Environments - Score: 20.0 (R=0, N=0) - Date: 2025-09-29 - Comment: Author match
Relative Trajectory Balance is equivalent to Trust-PCL - Score: 20.0 (R=0, N=0) - Date: 2025-09-03 - Comment: Author match
DOSA: Differentiable Model-Based One-Loop Search for DNN Accelerators - Score: 17 (R=9, N=8) - Date: 2025-09-17 - Comment: High-Performance Computing: differentiable performance models enabling joint optimization of hardware parameters and DNN mapspace.
Theoretical Analysis on how Learning Rate Warmup Accelerates Convergence - Score: 17 (R=9, N=8) - Date: 2025-09-10 - Comment: Training dynamics/optimization theory: analyzes why learning rate warmup accelerates convergence under generalized smoothness.
Towards a Unified View of Large Language Model Post-Training - Score: 17 (R=9, N=8) - Date: 2025-09-05 - Comment: The paper provides a unified theoretical framework for post-training LLMs, which is a significant contribution to foundational research on LLMs.
Distilled Pretraining: A modern lens of Data, In-Context Learning and Test-Time Scaling - Score: 16 (R=9, N=7) - Date: 2025-09-03 - Comment: The paper explores the effects of distillation on LLM pretraining, particularly on test-time scaling and in-context learning, providing insights into LLM behavior.
Pretraining Scaling Laws for Generative Evaluations of Language Models - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Scaling laws for generative evaluations (pass@k), connecting compute, params×tokens, and gold-solution likelihoods.
Dynamic Orthogonal Continual Fine-tuning for Mitigating Catastrophic Forgettings - Score: 15 (R=8, N=7) - Date: 2025-09-30 - Comment: Matches training dynamics for continual learning: proposes dynamic orthogonal gradient constraints to mitigate catastrophic forgetting in LLM fine-tuning.
Effective continuous equations for adaptive SGD: a stochastic analysis view - Score: 15 (R=8, N=7) - Date: 2025-09-29 - Comment: Training Dynamics: derives effective continuous-time stochastic dynamics for adaptive SGD with scaling rules, clarifying noise structure in optimization.
Intermediate Languages Matter: Formal Languages and LLMs affect Neurosymbolic Reasoning - Score: 15 (R=8, N=7) - Date: 2025-09-05 - Comment: The paper explores the impact of formal language choice on neurosymbolic reasoning with LLMs, contributing to foundational insights into LLM behavior.
Towards Agents That Know When They Don't Know: Uncertainty as a Control Signal for Structured Reasoning - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper introduces an uncertainty-aware agent for structured reasoning, which is relevant to large language models and emerging trends in AI research.
Counterfactual Sensitivity for Faithful Reasoning in Language Models - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper proposes a novel training objective for improving reasoning faithfulness in language models, which is relevant to large language models.
Learning residue level protein dynamics with multiscale Gaussians - Score: 15 (R=8, N=7) - Date: 2025-09-03 - Comment: The paper presents a framework for predicting protein dynamics, which is relevant to foundational research in AI for Science, particularly in molecular modeling.
Convexity of Optimization Curves: Local Sharp Thresholds, Robustness Impossibility, and New Counterexamples - Score: 14 (R=7, N=7) - Date: 2025-09-12 - Comment: Training dynamics/optimization theory — derives sharp step-size thresholds for convexity of GD optimization curves and links discrete GD with continuous-time gradient flow.