← Previous Summary | Monthly Overview | Next Summary →
2025-03 | 2025-04 | 2025-05

Personalized Monthly Topic Summary 2025/04

Metric	Value
Total Papers	430
Model Architecture	96
Model Compression and Efficiency	119
High Performance Computing	27
Representation Learning	168
Other Foundational Research	20

Model Architecture (96)

S'MoRE: Structural Mixture of Residual Experts for LLM Fine-tuning - Score: 19 (R=10, N=9) - Date: 2025-04-10 - Comment: The paper introduces S'MoRE, a novel framework combining Mixture-of-Experts (MoE) and low-rank adaptations (LoRA) for efficient LLM fine-tuning. This aligns closely with the 'Model Architecture' and 'Model Compression' criteria, as it innovates on MoE structures and efficiency.
Language Models Are Implicitly Continuous - Score: 19 (R=10, N=9) - Date: 2025-04-08 - Comment: Explores the implicit continuous nature of LLMs, providing theoretical insights into their behavior, which is highly relevant to foundational research on LLMs.
MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators - Score: 19 (R=10, N=9) - Date: 2025-04-04 - Comment: The paper addresses efficient quantization in Mixture-of-Experts (MoE) models, aligning closely with the 'Model Compression' and 'Model Architecture' criteria. It introduces low-rank compensators and adaptive rank selection policies, which are novel contributions.
Accelerating Mixture-of-Experts Training with Adaptive Expert Replication - Score: 18 (R=10, N=8) - Date: 2025-04-29 - Comment: The paper introduces SwiftMoE, an adaptive training system for Mixture-of-Experts models, which directly aligns with the 'Model Architecture' criterion. The dynamic expert replication is a novel contribution.
Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation - Score: 18 (R=10, N=8) - Date: 2025-04-29 - Comment: The paper proposes a training-free framework (DPE) for extending LLM context windows, which aligns with foundational research in LLM architecture and efficiency.
MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core - Score: 18 (R=10, N=8) - Date: 2025-04-22 - Comment: The paper introduces MoE Parallel Folding, a novel parallelism strategy for efficient training of large-scale MoE models. This directly aligns with the interest in Mixture-of-Experts and architectural innovations.
Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective - Score: 18 (R=10, N=8) - Date: 2025-04-21 - Comment: This paper provides a theoretical study on the expressive capabilities of Transformers, specifically addressing their ability to overcome the curse of dimensionality. It aligns closely with the 'Model Architecture' criterion by offering insights into the structure and theoretical underpinnings of Transformers.
Dense Backpropagation Improves Training for Sparse Mixture-of-Experts - Score: 18 (R=10, N=8) - Date: 2025-04-18 - Comment: Proposes a method to improve training for sparse Mixture-of-Experts, directly aligning with foundational research in MoE architectures.
Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language Models - Score: 18 (R=10, N=8) - Date: 2025-04-18 - Comment: The paper explores expert collaboration and pruning in MoE-based LLMs, which is highly relevant to foundational research in model architecture and efficiency.
Mixture of Group Experts for Learning Invariant Representations - Score: 18 (R=10, N=8) - Date: 2025-04-15 - Comment: The paper proposes a novel group sparse regularization approach for Mixture-of-Experts (MoE) models, directly addressing architectural innovations and representation learning.
On the Spatial Structure of Mixture-of-Experts in Transformers - Score: 18 (R=10, N=8) - Date: 2025-04-08 - Comment: The paper analyzes the spatial structure of Mixture-of-Experts (MoE) in Transformers, which directly aligns with the model architecture criterion, particularly MoE behavior.
DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism - Score: 18 (R=10, N=8) - Date: 2025-04-02 - Comment: DynMoLE proposes a hybrid routing mechanism for MoE models, which directly aligns with foundational research in Mixture-of-Experts and efficiency improvements.
Beyond Standard MoE: Mixture of Latent Experts for Resource-Efficient Language Models - Score: 18 (R=10, N=8) - Date: 2025-04-01 - Comment: The paper introduces Mixture of Latent Experts (MoLE), a novel parameterization for MoE architectures, addressing computational efficiency and memory challenges. This aligns closely with the 'Model Architecture' and 'Model Compression' criteria.
Quantum Doubly Stochastic Transformers - Score: 18 (R=9, N=9) - Date: 2025-04-25 - Comment: The paper introduces a quantum-inspired doubly stochastic Transformer, replacing Softmax with a variational quantum circuit. This aligns with the 'Model Architecture' criterion, particularly in exploring novel architectural paradigms.
Universal Approximation with Softmax Attention - Score: 18 (R=9, N=9) - Date: 2025-04-23 - Comment: The paper provides theoretical insights into the universal approximation capabilities of softmax attention, which is highly relevant to foundational research in model architecture.
Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction - Score: 18 (R=9, N=9) - Date: 2025-04-22 - Comment: The paper introduces minimal algorithmic tasks to test the creative limits of language models and argues for moving beyond next-token prediction. It aligns with 'Emerging Trends' by challenging established paradigms in LLM training.
Minimum Description Length of a Spectrum Variational Autoencoder: A Theory - Score: 18 (R=9, N=9) - Date: 2025-04-02 - Comment: The paper introduces a theoretical framework for Variational Autoencoders (VAEs) based on the Minimum Description Length (MDL) principle, which is highly relevant to representation learning and foundational research.
Equivariant non-linear maps for neural networks on homogeneous spaces - Score: 17 (R=9, N=8) - Date: 2025-04-30 - Comment: This paper provides a theoretical framework for non-linear equivariant neural network layers, which aligns with architectural innovations and foundational research.
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax - Score: 17 (R=9, N=8) - Date: 2025-04-30 - Comment: Introduces 'softpick', a rectified softmax replacement for transformer attention mechanisms, with implications for sparsity, quantization, and interpretability, directly aligning with model architecture and compression topics.
Quantifying Memory Utilization with Effective State-Size - Score: 17 (R=9, N=8) - Date: 2025-04-29 - Comment: The paper introduces a metric for memory utilization in sequence models, which aligns with foundational research in model architecture analysis.
BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts - Score: 17 (R=9, N=8) - Date: 2025-04-29 - Comment: The paper explores vulnerabilities in Mixture-of-Experts (MoE) models, which aligns with the 'Model Architecture' criterion. The focus on dormant experts and routing triggers is novel and provides insights into MoE behavior.
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism - Score: 17 (R=9, N=8) - Date: 2025-04-29 - Comment: The paper provides insights into the Gather-and-Aggregate mechanism in Transformers and SSMs, which aligns with foundational research on model architecture and training dynamics.
Random Long-Context Access for Mamba via Hardware-aligned Hierarchical Sparse Attention - Score: 17 (R=9, N=8) - Date: 2025-04-25 - Comment: The paper introduces a novel hierarchical sparse attention mechanism (HSA) for RNNs, enhancing their efficiency and long-range context modeling. This aligns with the 'Model Architecture' criterion, particularly in architectural innovations for efficiency.
LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement - Score: 17 (R=9, N=8) - Date: 2025-04-23 - Comment: The paper introduces LongMamba, a training-free method to enhance state space models (SSMs) for long-context understanding. It aligns with foundational research in model architecture by addressing limitations in SSMs and proposing a novel technique for improving their performance.
Quantitative Clustering in Mean-Field Transformer Models - Score: 17 (R=9, N=8) - Date: 2025-04-22 - Comment: The paper investigates clustering behavior in mean-field transformer models, providing theoretical insights into transformer dynamics, which aligns with the 'Model Architecture' criterion.
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization - Score: 17 (R=9, N=8) - Date: 2025-04-18 - Comment: The paper proposes a general framework for designing neural architectures inspired by attentional bias, aligning with the 'Model Architecture' criterion by introducing novel architectural insights.
Memorization: A Close Look at Books - Score: 17 (R=9, N=8) - Date: 2025-04-18 - Comment: The paper explores memorization in LLMs and its connection to pretraining data, aligning with the 'Large Language Models' criterion as it provides theoretical insights into LLM behavior.
Looking beyond the next token - Score: 17 (R=9, N=8) - Date: 2025-04-16 - Comment: The paper proposes a data rearrangement technique, Trelawney, to address the mismatch in causal language model training. It aligns with foundational research in LLMs and offers a novel perspective on training dynamics.
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers - Score: 17 (R=9, N=8) - Date: 2025-04-16 - Comment: This paper provides theoretical insights into task vector arithmetic for model editing, particularly in nonlinear Transformers. It aligns with foundational research in representation learning and model architecture, offering a novel theoretical perspective.
Weight-of-Thought Reasoning: Exploring Neural Network Weights for Enhanced LLM Reasoning - Score: 17 (R=9, N=8) - Date: 2025-04-16 - Comment: The paper introduces 'Weight-of-Thought' reasoning, a novel approach to enhance reasoning in LLMs by exploring internal weight dynamics. This aligns with the 'Large Language Models' criterion, particularly in theoretical insights into LLM behavior.
Continuum-Interaction-Driven Intelligence: Human-Aligned Neural Architecture via Crystallized Reasoning and Fluid Generation - Score: 17 (R=9, N=8) - Date: 2025-04-15 - Comment: Proposes a dual-channel architecture inspired by human cognitive duality, integrating probabilistic generation with procedural reasoning. This aligns with the 'Model Architecture' criterion, offering insights into architectural innovations for trustworthy AI.
Steering CLIP's vision transformer with sparse autoencoders - Score: 17 (R=9, N=8) - Date: 2025-04-14 - Comment: The paper explores sparse autoencoders (SAEs) to analyze and steer CLIP's vision transformer, aligning with representation learning and architectural analysis. It provides insights into sparsity patterns and steerability, which are foundational topics.
Cellular Development Follows the Path of Minimum Action - Score: 17 (R=9, N=8) - Date: 2025-04-14 - Comment: This paper proposes a novel computational framework using Transformers to model cellular development based on the principle of least action. It aligns with 'AI for Science' as it introduces a foundational approach to understanding cellular processes through thermodynamic and informational metrics, which could have broad implications.
SEE: Continual Fine-tuning with Sequential Ensemble of Experts - Score: 17 (R=9, N=8) - Date: 2025-04-10 - Comment: The paper proposes a novel framework for continual fine-tuning using a Sequential Ensemble of Experts, which aligns with the Mixture-of-Experts (MoE) criterion.
Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations - Score: 17 (R=9, N=8) - Date: 2025-04-09 - Comment: The paper addresses sparsity and pruning in Mixture-of-Experts (MoE) models, aligning closely with the model compression and sparsity criteria. It introduces iterative pruning and correction mechanisms, which are novel contributions.
Capturing AI's Attention: Physics of Repetition, Hallucination, Bias and Beyond - Score: 17 (R=9, N=8) - Date: 2025-04-08 - Comment: The paper provides a physics-based theoretical analysis of attention mechanisms in LLMs, which aligns with the criterion of theoretical insights into LLM behavior.
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning - Score: 17 (R=9, N=8) - Date: 2025-04-08 - Comment: The paper provides theoretical insights into Gated Linear Attention (GLA) and its in-context learning capabilities, aligning with foundational research in model architecture and training dynamics.
Using Attention Sinks to Identify and Evaluate Dormant Heads in Pretrained LLMs - Score: 17 (R=9, N=8) - Date: 2025-04-08 - Comment: The paper investigates dormant attention heads in LLMs, which aligns with foundational research on understanding LLM behavior and interpretability.
HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs - Score: 17 (R=9, N=8) - Date: 2025-04-08 - Comment: The paper introduces HeterMoE, a system for efficient training of MoE models on heterogeneous GPUs, which aligns with model architecture innovations and efficiency improvements.
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models - Score: 17 (R=9, N=8) - Date: 2025-04-07 - Comment: The paper introduces a hybrid Mamba-Transformer model with architectural innovations to improve inference efficiency, aligning with foundational research in model architecture and compression.
SpectR: Dynamically Composing LM Experts with Spectral Routing - Score: 17 (R=9, N=8) - Date: 2025-04-07 - Comment: The paper introduces SPECTR, a method for dynamically composing expert models during inference. It aligns with the topic of Mixture-of-Experts (MoE) and explores token- and layer-wise model combinations, making it relevant to model architecture innovations.
PETIMOT: A Novel Framework for Inferring Protein Motions from Sparse Data Using SE(3)-Equivariant Graph Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-04-07 - Comment: The paper presents a novel SE(3)-equivariant graph neural network for inferring protein motions, which is relevant to foundational AI for science. It introduces architectural innovations and task-specific loss functions.
MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism - Score: 17 (R=9, N=8) - Date: 2025-04-04 - Comment: This paper focuses on optimizing Mixture-of-Experts (MoE) inference with disaggregated parallelism and introduces novel techniques like ping-pong pipeline parallelism. It aligns closely with the 'Model Architecture' criterion for MoE innovations.
Is the Reversal Curse a Binding Problem? Uncovering Limitations of Transformers from a Basic Generalization Failure - Score: 17 (R=9, N=8) - Date: 2025-04-03 - Comment: The paper investigates the Reversal Curse in transformers and links it to the binding problem, providing insights into transformer limitations and proposing architectural improvements. This aligns with foundational research in model architecture and representation learning.
Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design - Score: 17 (R=9, N=8) - Date: 2025-04-03 - Comment: This paper addresses efficiency challenges in Mixture-of-Experts (MoE) models by introducing a novel collaboration-constrained routing (C2R) strategy. It provides insights into MoE routing policies and improves expert utilization, aligning well with foundational research in model architecture and efficiency.
Spectral Architecture Search for Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-04-02 - Comment: SPARCS introduces a novel spectral-based architecture search method, which aligns with foundational research in model architecture optimization.
Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality - Score: 17 (R=9, N=8) - Date: 2025-04-01 - Comment: The paper introduces a novel sparse autoencoder architecture with theoretical grounding, aligning with the 'Representation Learning' criterion.
TransMamba: Flexibly Switching between Transformer and Mamba - Score: 17 (R=9, N=8) - Date: 2025-04-01 - Comment: The paper introduces a hybrid framework combining Transformers and Mamba, which aligns with model architecture innovations and explores dynamic switching mechanisms.
KernelDNA: Dynamic Kernel Sharing via Decoupled Naive Adapters - Score: 17 (R=9, N=8) - Date: 2025-04-01 - Comment: The paper proposes KernelDNA, a novel dynamic convolution method leveraging weight sharing and adapters, which aligns with the 'Model Compression' and 'Model Architecture' criteria.
Mixture of Routers - Score: 17 (R=9, N=8) - Date: 2025-04-01 - Comment: The paper proposes a novel fine-tuning method integrating Mixture-of-Experts (MoE) with a new routing mechanism, which directly aligns with the 'Model Architecture' criterion.
TRA: Better Length Generalisation with Threshold Relative Attention - Score: 17 (R=9, N=8) - Date: 2025-04-01 - Comment: The paper introduces modifications to the attention mechanism in transformers, specifically addressing sparsity and positional biases, which aligns with architectural innovations.
MoRE-LLM: Mixture of Rule Experts Guided by a Large Language Model - Score: 17 (R=9, N=8) - Date: 2025-04-01 - Comment: The paper introduces MoRE-LLM, a mixture of rule experts guided by LLMs, which aligns with 'Model Architecture' through its novel combination of rule-based and data-driven methods.
Variational Online Mirror Descent for Robust Learning in Schr\"odinger Bridge - Score: 17 (R=8, N=9) - Date: 2025-04-04 - Comment: The paper introduces a novel variational online mirror descent framework for Schrödinger Bridge problems, which is a foundational contribution to probabilistic generative modeling. It aligns with 'Emerging Trends' due to its theoretical advancements.
RouterKT: Mixture-of-Experts for Knowledge Tracing - Score: 16 (R=9, N=7) - Date: 2025-04-15 - Comment: The paper introduces a Mixture-of-Experts (MoE) architecture for knowledge tracing, which aligns with the interest in MoE and architectural innovations.
Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models - Score: 16 (R=9, N=7) - Date: 2025-04-11 - Comment: The paper discusses a lightweight reasoning model derived from a Mixture-of-Experts (MoE) architecture, aligning with the 'Model Architecture' criterion. It emphasizes parameter efficiency and reasoning capabilities, which are relevant to foundational research.
Neural operators struggle to learn complex PDEs in pedestrian mobility: Hughes model case study - Score: 16 (R=8, N=8) - Date: 2025-04-28 - Comment: The paper critiques neural operators' ability to handle complex PDEs, which aligns with 'Emerging Trends' as it highlights limitations in current architectures and raises foundational questions about their generalization.
Likelihood-Free Variational Autoencoders - Score: 16 (R=8, N=8) - Date: 2025-04-25 - Comment: The paper introduces a likelihood-free VAE framework (EnVAE) with a novel reconstruction loss based on the energy score. This aligns with the 'Model Architecture' criterion, particularly in advancing autoencoder-based generative modeling.
SUPRA: Subspace Parameterized Attention for Neural Operator on General Domains - Score: 16 (R=8, N=8) - Date: 2025-04-23 - Comment: The paper introduces a novel attention mechanism (SUPRA) for neural operators, which aligns with architectural innovations and efficiency improvements, particularly in irregular domains.
VeLU: Variance-enhanced Learning Unit for Deep Neural Networks - Score: 16 (R=8, N=8) - Date: 2025-04-22 - Comment: The paper introduces VeLU, a novel activation function that dynamically scales based on input variance, which aligns with foundational research in model architecture and optimization.
GT-SVQ: A Linear-Time Graph Transformer for Node Classification Using Spiking Vector Quantization - Score: 16 (R=8, N=8) - Date: 2025-04-17 - Comment: This paper proposes a Graph Transformer using Spiking Vector Quantization, which is a novel architectural innovation addressing scalability and efficiency in graph representation learning. It aligns with the 'Model Architecture' criterion due to its focus on Transformer-based innovations and efficiency improvements.
Constants of motion network revisited - Score: 16 (R=8, N=8) - Date: 2025-04-15 - Comment: The paper proposes a novel neural network architecture using SVD and a two-phase training algorithm to improve the discovery of constants of motion. This aligns with foundational research in representation learning and architectural innovations.
Meta-Continual Learning of Neural Fields - Score: 16 (R=8, N=8) - Date: 2025-04-09 - Comment: The paper introduces a new problem setting and modular architecture for neural fields, which aligns with emerging trends and architectural innovations.
GMR-Conv: An Efficient Rotation and Reflection Equivariant Convolution Kernel Using Gaussian Mixture Rings - Score: 16 (R=8, N=8) - Date: 2025-04-04 - Comment: The paper proposes GMR-Conv, a novel convolution kernel with rotation and reflection equivariance, which aligns with architectural innovations in equivariant networks.
Implicit Neural Differential Model for Spatiotemporal Dynamics - Score: 16 (R=8, N=8) - Date: 2025-04-04 - Comment: The paper introduces a novel implicit neural differential model for spatiotemporal dynamics, which aligns with foundational research in model architecture and training dynamics. The use of implicit fixed-point layers and hybrid gradient propagation strategy is innovative.
FT-Transformer: Resilient and Reliable Transformer with End-to-End Fault Tolerant Attention - Score: 16 (R=8, N=8) - Date: 2025-04-04 - Comment: The paper introduces a novel fault-tolerant framework for Transformers, which aligns with the 'Model Architecture' criterion by addressing architectural reliability and efficiency. The proposed methods, such as architecture-aware ABFT and selective neuron value restriction, are innovative.
AuditVotes: A Framework Towards More Deployable Certified Robustness for Graph Neural Networks - Score: 16 (R=8, N=8) - Date: 2025-04-01 - Comment: The paper proposes a framework for improving the robustness of Graph Neural Networks, which aligns with Model Architecture and Emerging Trends due to its focus on robustness and novel augmentation techniques.
RBFleX-NAS: Training-Free Neural Architecture Search Using Radial Basis Function Kernel and Hyperparameter Detection - Score: 16 (R=8, N=8) - Date: 2025-04-01 - Comment: The paper proposes a novel training-free NAS framework (RBFleX-NAS) with significant improvements in architecture search, aligning with the model architecture criterion.
DYNAMAX: Dynamic computing for Transformers and Mamba based architectures - Score: 15 (R=8, N=7) - Date: 2025-04-30 - Comment: Proposes DYNAMAX, a framework for early exits in Mamba and transformer architectures, aligning with dynamic computing and efficiency improvements in model architecture.
SFi-Former: Sparse Flow Induced Attention for Graph Transformer - Score: 15 (R=8, N=7) - Date: 2025-04-30 - Comment: The paper introduces a novel sparse attention mechanism (SFi-attention) for Graph Transformers, which aligns with the 'Model Architecture' criterion by addressing architectural innovations and sparsity.
FX-DARTS: Designing Topology-unconstrained Architectures with Differentiable Architecture Search and Entropy-based Super-network Shrinking - Score: 15 (R=8, N=7) - Date: 2025-04-30 - Comment: The paper introduces FX-DARTS, which focuses on differentiable architecture search and entropy-based super-network shrinking, aligning with the model architecture criterion by exploring architectural flexibility and optimization.
Graph Fourier Transformer with Structure-Frequency Information - Score: 15 (R=8, N=7) - Date: 2025-04-29 - Comment: The paper introduces a novel Graph Transformer (Grafourierformer) leveraging Fourier transforms for structural and frequency information. This is relevant to architectural innovations in Transformers.
Hierarchical Attention Generates Better Proofs - Score: 15 (R=8, N=7) - Date: 2025-04-29 - Comment: The paper introduces hierarchical attention for LLMs, which aligns with architectural innovations and theoretical insights into LLM behavior.
NeuralGrok: Accelerate Grokking by Neural Gradient Transformation - Score: 15 (R=8, N=7) - Date: 2025-04-25 - Comment: NeuralGrok proposes a gradient-based approach to accelerate grokking in transformers, which aligns with foundational research in training dynamics and generalization in neural networks. The use of Absolute Gradient Entropy (AGE) as a metric adds a novel perspective.
A Call for New Recipes to Enhance Spatial Reasoning in MLLMs - Score: 15 (R=8, N=7) - Date: 2025-04-22 - Comment: This position paper highlights the limitations of spatial reasoning in multimodal LLMs and calls for fundamental modifications, aligning with the 'Emerging Trends' criterion for foundational challenges.
Pets: General Pattern Assisted Architecture For Time Series Analysis - Score: 15 (R=8, N=7) - Date: 2025-04-22 - Comment: The paper proposes a novel architecture for time series analysis using a mixture of predictors, which aligns with the 'Model Architecture' criterion, particularly for its use of MoE.
Linking forward-pass dynamics in Transformers and real-time human processing - Score: 15 (R=8, N=7) - Date: 2025-04-22 - Comment: The paper links forward-pass dynamics in transformers to human processing, which provides insights into transformer behavior and aligns with foundational research in model architecture.
A discrete physics-informed training for projection-based reduced order models with neural networks - Score: 15 (R=8, N=7) - Date: 2025-04-22 - Comment: The paper presents a physics-informed training framework for reduced order models, which aligns with 'AI for Science' as it bridges neural networks and FEM-based residuals. It introduces a novel residual-driven optimization approach.
Decoding Vision Transformers: the Diffusion Steering Lens - Score: 15 (R=8, N=7) - Date: 2025-04-21 - Comment: The paper introduces Diffusion Steering Lens (DSL) for interpretability in Vision Transformers, which aligns with the analysis of existing architectures. This is relevant to understanding how representations evolve in ViTs.
Learning to Attribute with Attention - Score: 15 (R=8, N=7) - Date: 2025-04-21 - Comment: The paper proposes a method for token attribution using attention weights, which provides insights into interpretability and training dynamics of LLMs.
Hadamard product in deep learning: Introduction, Advances and Challenges - Score: 15 (R=8, N=7) - Date: 2025-04-18 - Comment: The paper surveys the Hadamard product in deep learning, which aligns with the 'Model Architecture' criterion by analyzing its role as a fundamental architectural primitive.
Transferrable Surrogates in Expressive Neural Architecture Search Spaces - Score: 15 (R=8, N=7) - Date: 2025-04-18 - Comment: The paper focuses on surrogate models for neural architecture search (NAS), which aligns with the 'Model Architecture' criterion as it explores architectural innovation and efficiency in search spaces.
Simplifying Graph Transformers - Score: 15 (R=8, N=7) - Date: 2025-04-18 - Comment: The paper proposes architectural simplifications for Graph Transformers, which aligns with the 'Model Architecture' criterion by introducing modifications to make Transformers applicable to graphs without major architectural changes.
Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures - Score: 15 (R=8, N=7) - Date: 2025-04-17 - Comment: This paper provides a detailed analysis of LLM inference workloads on CPU-GPU architectures, offering insights into performance dynamics and optimization strategies. While it is relevant to efficiency and architecture-level analysis, it does not introduce groundbreaking theoretical contributions.
VEXP: A Low-Cost RISC-V ISA Extension for Accelerated Softmax Computation in Transformers - Score: 15 (R=8, N=7) - Date: 2025-04-16 - Comment: The paper proposes a hardware-level optimization for Softmax computation in Transformers, which aligns with model efficiency and architectural innovations. The integration into RISC-V cores and performance improvements are notable.
MiMu: Mitigating Multiple Shortcut Learning Behavior of Transformers - Score: 15 (R=8, N=7) - Date: 2025-04-16 - Comment: The paper addresses shortcut learning in Transformers and proposes strategies to mitigate it, which aligns with foundational research in model architecture and training dynamics.
HyperCore: The Core Framework for Building Hyperbolic Foundation Models with Comprehensive Modules - Score: 15 (R=8, N=7) - Date: 2025-04-15 - Comment: HyperCore provides a framework for building hyperbolic foundation models, aligning with architectural innovations and foundational research.
Between Linear and Sinusoidal: Rethinking the Time Encoder in Dynamic Graph Learning - Score: 15 (R=8, N=7) - Date: 2025-04-14 - Comment: The paper proposes a simpler linear time encoder for dynamic graph learning, which provides architectural insights and efficiency improvements, aligning with model architecture innovations.
Scaling Laws of Graph Neural Networks for Atomistic Materials Modeling - Score: 15 (R=8, N=7) - Date: 2025-04-14 - Comment: The paper investigates scaling laws for GNNs in atomistic materials modeling, which aligns with foundational research in AI for Science. It provides insights into scaling GNN architectures and dataset size effects.
EDIT: Enhancing Vision Transformers by Mitigating Attention Sink through an Encoder-Decoder Architecture - Score: 15 (R=8, N=7) - Date: 2025-04-10 - Comment: The paper proposes a novel encoder-decoder architecture for Vision Transformers to address the attention sink phenomenon. This aligns with foundational research in model architecture, particularly in improving and analyzing transformer-based models.
Understanding Machine Unlearning Through the Lens of Mode Connectivity - Score: 15 (R=8, N=7) - Date: 2025-04-10 - Comment: The paper explores mode connectivity in the context of machine unlearning, which provides theoretical insights into optimization dynamics and loss landscapes, aligning with foundational research.
DDT: Decoupled Diffusion Transformer - Score: 15 (R=8, N=7) - Date: 2025-04-09 - Comment: The paper proposes a decoupled diffusion transformer architecture, which is relevant to model architecture innovations, particularly in transformer-based generative models.
asKAN: Active Subspace embedded Kolmogorov-Arnold Network - Score: 15 (R=8, N=7) - Date: 2025-04-08 - Comment: The paper introduces a novel architecture (asKAN) for small-scale AI+Science applications, which aligns with foundational research in model architecture innovations.
Memory and Bandwidth are All You Need for Fully Sharded Data Parallel - Score: 15 (R=8, N=7) - Date: 2025-04-08 - Comment: The paper investigates hardware constraints and optimization strategies for training large transformer models, which aligns with foundational research in model architecture and efficiency.
State-Space Model Inspired Multiple-Input Multiple-Output Spiking Neurons - Score: 15 (R=8, N=7) - Date: 2025-04-04 - Comment: The paper proposes a novel MIMO spiking neuron model inspired by state-space models, which introduces architectural innovations in spiking neural networks. This aligns with the 'Model Architecture' criterion.
R2DN: Scalable Parameterization of Contracting and Lipschitz Recurrent Deep Networks - Score: 15 (R=8, N=7) - Date: 2025-04-03 - Comment: The paper introduces a novel parameterization for recurrent deep networks (R2DN) that ensures stability and robustness by design, aligning with foundational research in model architecture. The focus on computational efficiency and scalability also adds to its relevance.
Adaptive Layer-skipping in Pre-trained LLMs - Score: 15 (R=8, N=7) - Date: 2025-04-01 - Comment: The paper proposes an adaptive layer-skipping method (FlexiDepth) for LLMs, which aligns with the model architecture criterion by introducing a dynamic mechanism for computational efficiency.

Model Compression and Efficiency (119)

70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float - Score: 19 (R=10, N=9) - Date: 2025-04-17 - Comment: This paper presents a novel lossless compression method for LLMs using dynamic-length float encoding, which directly aligns with the model compression criterion. The approach is innovative, enabling significant efficiency improvements while preserving exact outputs.
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity - Score: 18 (R=10, N=8) - Date: 2025-04-29 - Comment: The paper explores Transformers through the lens of entropy and dynamic sparsity, directly addressing compression and efficiency breakthroughs.
Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations - Score: 18 (R=10, N=8) - Date: 2025-04-10 - Comment: The paper addresses domain-specific pruning in large MoE models, which directly aligns with the model compression and MoE criteria.
GaLore 2: Large-Scale LLM Pre-Training by Gradient Low-Rank Projection - Score: 17 (R=9, N=8) - Date: 2025-04-30 - Comment: The paper introduces GaLore 2, which focuses on gradient low-rank projection for efficient LLM pretraining, aligning with model compression and efficiency breakthroughs.
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate - Score: 17 (R=9, N=8) - Date: 2025-04-29 - Comment: The paper introduces TurboQuant, a novel vector quantization method with theoretical guarantees and applications to KV cache quantization, which aligns with model compression and efficiency breakthroughs.
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference - Score: 17 (R=9, N=8) - Date: 2025-04-29 - Comment: The paper introduces R-Sparse, a training-free activation sparsity method for LLM inference, aligning with the 'Model Compression' criterion. The rank-aware sparsity approach is a novel contribution.
TLoRA: Tri-Matrix Low-Rank Adaptation of Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-04-29 - Comment: The paper introduces TLoRA, a novel low-rank adaptation method for LLMs, which aligns with the 'Model Compression' criterion. The tri-matrix design and analysis of adaptation dynamics add methodological insights.
ZipR1: Reinforcing Token Sparsity in MLLMs - Score: 17 (R=9, N=8) - Date: 2025-04-29 - Comment: The paper proposes a novel RL-based method for token sparsity in MLLMs, which aligns with foundational research in model compression and efficiency.
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs - Score: 17 (R=9, N=8) - Date: 2025-04-28 - Comment: BitNet v2 introduces a novel method for 4-bit activation quantization in 1-bit LLMs, addressing efficiency and memory challenges. This is highly relevant to model compression and efficiency breakthroughs.
NoEsis: Differentially Private Knowledge Transfer in Modular LLM Adaptation - Score: 17 (R=9, N=8) - Date: 2025-04-28 - Comment: The paper introduces a modular framework for LLM adaptation with differential privacy, which aligns with the core topic of model architecture, particularly modularity and privacy-preserving methods. The use of low-rank adapters adds relevance to model compression.
The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs - Score: 17 (R=9, N=8) - Date: 2025-04-25 - Comment: This paper explores sparse attention in Transformer LLMs, directly addressing sparsity and efficiency trade-offs, which aligns with the model compression and efficiency criteria.
Backslash: Rate Constrained Optimized Training of Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-04-25 - Comment: The paper introduces a novel training-time compression method (Backslash) based on rate-distortion optimization, which aligns with the 'Model Compression' criterion. It provides significant insights into compression during training, a less explored area.
Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light - Score: 17 (R=9, N=8) - Date: 2025-04-25 - Comment: The paper introduces Generalized Neighborhood Attention (GNA), a sparse attention mechanism with significant efficiency improvements. This aligns with foundational research in model architecture and sparsity.
Compute-Optimal LLMs Provably Generalize Better With Scale - Score: 17 (R=9, N=8) - Date: 2025-04-22 - Comment: This paper provides theoretical insights into why larger language models generalize better, aligning with the foundational research on LLM behavior and scaling laws.
Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator - Score: 17 (R=9, N=8) - Date: 2025-04-22 - Comment: The paper introduces a flexible N:M sparsity method and a compute-in-memory accelerator, which aligns with model compression and efficiency breakthroughs.
Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs - Score: 17 (R=9, N=8) - Date: 2025-04-22 - Comment: The paper proposes a novel quantization method for LLMs, addressing challenges in activation quantization and KV cache, which aligns with the 'Model Compression' criterion.
CacheFormer: High Attention-Based Segment Caching - Score: 17 (R=9, N=8) - Date: 2025-04-22 - Comment: The paper introduces CacheFormer, which focuses on improving efficiency in handling long contexts in transformers. This aligns with foundational research in model compression and efficiency.
Multiscale Tensor Summation Factorization as a New Neural Network Layer (MTS Layer) for Multidimensional Data Processing - Score: 17 (R=9, N=8) - Date: 2025-04-22 - Comment: The introduction of Multiscale Tensor Summation (MTS) as a new neural network layer aligns with architectural innovations. The method offers a novel approach to improve efficiency and parameter optimization.
Efficient algorithms for the Hadamard decomposition - Score: 17 (R=9, N=8) - Date: 2025-04-21 - Comment: The paper introduces an efficient algorithm for the Hadamard decomposition, which is relevant to model compression and low-rank approaches. The extension to multiple low-rank matrices adds methodological depth.
A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving - Score: 17 (R=9, N=8) - Date: 2025-04-18 - Comment: Presents a virtual machine for low-precision GPGPU computation, which aligns with foundational research in model compression and efficiency.
Scaling Instruction-Tuned LLMs to Million-Token Contexts via Hierarchical Synthetic Data Generation - Score: 17 (R=9, N=8) - Date: 2025-04-18 - Comment: The paper introduces a novel synthetic data generation strategy for extending LLM context lengths, which aligns with the 'Large Language Models' criterion, particularly in addressing architectural and efficiency challenges.
MOM: Memory-Efficient Offloaded Mini-Sequence Inference for Long Context Language Models - Score: 17 (R=9, N=8) - Date: 2025-04-18 - Comment: Proposes a memory-efficient inference method for long-context LLMs, which is relevant to model compression and efficiency breakthroughs.
Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation - Score: 17 (R=9, N=8) - Date: 2025-04-18 - Comment: The paper introduces a sparse optimization framework for few-shot adaptation, which aligns with model compression topics like sparsity and efficiency improvements.
A Dual-Space Framework for General Knowledge Distillation of Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-04-16 - Comment: The paper proposes a novel dual-space knowledge distillation framework for compressing large language models, addressing key limitations in existing methods. This aligns with the 'Model Compression' criterion, particularly in sparsity and efficiency breakthroughs.
Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints - Score: 17 (R=9, N=8) - Date: 2025-04-16 - Comment: The paper focuses on optimizing LLM inference with memory constraints, introducing a novel scheduling algorithm (WAIT) and theoretical analysis. This aligns with the 'Model Compression' criterion, particularly in the context of KV cache efficiency.
Dynamic Compressing Prompts for Efficient Inference of Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-04-16 - Comment: The paper proposes a method for dynamic prompt compression in LLMs, which aligns with model compression and efficiency breakthroughs. The use of a Markov Decision Process and hierarchical training strategy adds novelty.
KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference - Score: 17 (R=9, N=8) - Date: 2025-04-15 - Comment: The paper introduces KeepKV, a novel KV cache compression method for LLM inference, which aligns with foundational research in model compression and efficiency.
DL-QAT: Weight-Decomposed Low-Rank Quantization-Aware Training for Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-04-15 - Comment: The paper introduces DL-QAT, a low-rank quantization-aware training method for LLMs, aligning with the 'Model Compression' criterion by addressing efficiency in LLMs with novel quantization techniques.
MGS: Markov Greedy Sums for Accurate Low-Bitwidth Floating-Point Accumulation - Score: 17 (R=9, N=8) - Date: 2025-04-15 - Comment: The paper proposes a novel method for low-bitwidth floating-point accumulation, which aligns with model compression and efficiency breakthroughs.
PQS (Prune, Quantize, and Sort): Low-Bitwidth Accumulation of Dot Products in Neural Network Computations - Score: 17 (R=9, N=8) - Date: 2025-04-15 - Comment: The PQS method combines pruning, quantization, and sorting for low-bitwidth accumulation, directly addressing model compression and efficiency.
Position: Beyond Euclidean -- Foundation Models Should Embrace Non-Euclidean Geometries - Score: 17 (R=9, N=8) - Date: 2025-04-15 - Comment: This position paper argues for the adoption of non-Euclidean geometries in foundation models, which aligns with emerging trends and architectural innovations. It provides a theoretical perspective on improving model scalability and efficiency.
Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash - Score: 17 (R=9, N=8) - Date: 2025-04-14 - Comment: The paper introduces ActiveFlow, a framework for scaling LLMs on mobile devices via active-weight swapping. It aligns with Model Compression and efficiency breakthroughs, particularly in DRAM-flash memory management.
SAEs $\textit{Can}$ Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs - Score: 17 (R=9, N=8) - Date: 2025-04-14 - Comment: The paper introduces a dynamic sparse autoencoder method for precision unlearning in LLMs, which aligns with foundational research in model compression and sparsity.
Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression - Score: 17 (R=9, N=8) - Date: 2025-04-11 - Comment: The paper introduces a novel mixed-precision quantization method (TaCQ) that directly addresses foundational challenges in model compression, particularly in low-bit regimes. The approach is highly relevant and demonstrates significant improvements over baselines.
Adaptive Computation Pruning for the Forgetting Transformer - Score: 17 (R=9, N=8) - Date: 2025-04-10 - Comment: The paper introduces Adaptive Computation Pruning (ACP) for the Forgetting Transformer, which directly addresses model compression and efficiency through pruning techniques. It provides significant computational savings without performance degradation, aligning well with the model compression criterion.
Mosaic: Composite Projection Pruning for Resource-efficient LLMs - Score: 17 (R=9, N=8) - Date: 2025-04-10 - Comment: The paper introduces a novel pruning method for LLMs, which aligns with the model compression criterion, particularly focusing on fine-grained pruning techniques.
Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching - Score: 17 (R=9, N=8) - Date: 2025-04-10 - Comment: The paper focuses on KV cache optimization for LLM inference, which aligns with the model compression criterion, specifically addressing efficiency breakthroughs.
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-04-09 - Comment: The paper presents efficient training strategies for ultra-long context LLMs, which aligns with foundational research in large language models and efficiency improvements.
Lattice: Learning to Efficiently Compress the Memory - Score: 17 (R=9, N=8) - Date: 2025-04-09 - Comment: The paper introduces a novel RNN mechanism leveraging low-rank compression for memory efficiency, which is highly relevant to model compression and efficiency breakthroughs.
Find A Winning Sign: Sign Is All We Need to Win the Lottery - Score: 17 (R=9, N=8) - Date: 2025-04-09 - Comment: The paper explores the role of parameter sign configuration in sparse networks, contributing to the Lottery Ticket Hypothesis and sparsity-related research. This aligns well with model compression and sparsity criteria.
Achieving binary weight and activation for LLMs using Post-Training Quantization - Score: 17 (R=9, N=8) - Date: 2025-04-09 - Comment: The paper proposes a novel post-training quantization framework for LLMs, which directly aligns with the model compression criterion. The approach to achieve binary weight and activation is innovative.
Thanos: A Block-wise Pruning Algorithm for Efficient Large Language Model Compression - Score: 17 (R=9, N=8) - Date: 2025-04-09 - Comment: The paper introduces a novel block-wise pruning algorithm, Thanos, for efficient LLM compression, which aligns with the 'Model Compression' criterion, particularly in sparsity and pruning methods.
MASS: MoErging through Adaptive Subspace Selection - Score: 17 (R=9, N=8) - Date: 2025-04-09 - Comment: The paper introduces MASS, a novel approach leveraging low-rank decomposition and adaptive subspace selection for model merging. This aligns with model compression and efficiency criteria, particularly through its innovative use of low-rank methods.
Exact Unlearning of Finetuning Data via Model Merging at Scale - Score: 17 (R=9, N=8) - Date: 2025-04-08 - Comment: The paper introduces SIFT-Masks for exact unlearning in LLMs, which aligns with foundational research in model compression and efficiency.
Saliency-driven Dynamic Token Pruning for Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-04-08 - Comment: The paper proposes a token pruning framework for LLMs, which aligns with model compression and efficiency breakthroughs, particularly through saliency-driven dynamic pruning.
Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning - Score: 17 (R=9, N=8) - Date: 2025-04-08 - Comment: The Retro-Search algorithm for improving reasoning paths in LLMs introduces a novel search-based approach for distillation, which aligns with foundational research on LLM efficiency and reasoning capabilities.
Sensitivity Meets Sparsity: The Impact of Extremely Sparse Parameter Patterns on Theory-of-Mind of Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-04-08 - Comment: The paper investigates sparse parameter patterns in LLMs and their role in Theory-of-Mind capabilities, providing insights into sparsity and interpretability in neural networks.
Entropy-Based Block Pruning for Efficient Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-04-08 - Comment: Proposes an entropy-based pruning strategy for LLMs, which aligns with foundational research in model compression and efficiency.
Towards Symmetric Low-Rank Adapters - Score: 17 (R=9, N=8) - Date: 2025-04-08 - Comment: The paper introduces Symmetric Low-Rank Adapters (SymLoRA), which is a novel approach to low-rank adaptation with fewer weights. This aligns with the model compression criterion, specifically low-rank approaches.
RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm - Score: 17 (R=9, N=8) - Date: 2025-04-08 - Comment: The paper introduces a novel post-training quantization framework, which aligns with the model compression criterion, particularly quantization methods.
Optimizing Specific and Shared Parameters for Efficient Parameter Tuning - Score: 17 (R=9, N=8) - Date: 2025-04-07 - Comment: The paper proposes a novel PETL method using low-rank projections and hypernetworks, which aligns with model compression and efficiency topics. It introduces a dual design for shared and layer-specific parameter tuning.
GPTQv2: Efficient Finetuning-Free Quantization for Asymmetric Calibration - Score: 17 (R=9, N=8) - Date: 2025-04-04 - Comment: The paper introduces GPTQv2, a finetuning-free quantization method for large-scale transformers, which aligns strongly with the 'Model Compression' criterion, particularly in quantization and efficiency improvements.
Large (Vision) Language Models are Unsupervised In-Context Learners - Score: 17 (R=9, N=8) - Date: 2025-04-04 - Comment: The paper explores unsupervised adaptation methods for large language models, focusing on foundational aspects of in-context learning and joint inference, which aligns with the LLM criterion.
MDP: Multidimensional Vision Model Pruning with Latency Constraint - Score: 17 (R=9, N=8) - Date: 2025-04-04 - Comment: The paper introduces a novel multi-dimensional pruning framework for both CNNs and transformers, addressing latency constraints and achieving significant efficiency improvements. This aligns well with the 'Model Compression' criterion.
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning - Score: 17 (R=9, N=8) - Date: 2025-04-02 - Comment: The paper provides theoretical insights into compute-optimal strategies for reasoning in LLMs, addressing trade-offs in solution generation and verification. This contributes to foundational understanding of LLM behavior.
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization - Score: 17 (R=9, N=8) - Date: 2025-04-02 - Comment: MergeVQ introduces a unified framework combining token merging and quantization, which is relevant to representation learning and efficiency improvements in model architecture.
Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations - Score: 17 (R=9, N=8) - Date: 2025-04-02 - Comment: The paper generalizes parameter-efficient fine-tuning methods to higher-dimensional parameter spaces using Lie group transformations, which aligns with foundational research in model compression and efficiency.
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning - Score: 17 (R=9, N=8) - Date: 2025-04-02 - Comment: The paper introduces an adaptive low-rank adaptation framework (ElaLoRA) for fine-tuning, which is highly relevant to model compression and efficiency. The dynamic rank allocation mechanism adds a novel theoretical contribution.
SQuat: Subspace-orthogonal KV Cache Quantization - Score: 17 (R=9, N=8) - Date: 2025-04-01 - Comment: The paper introduces a novel KV cache quantization method (SQuat) with a theoretical framework, aligning with the model compression criterion.
AirCache: Activating Inter-modal Relevancy KV Cache Compression for Efficient Large Vision-Language Model Inference - Score: 17 (R=9, N=8) - Date: 2025-04-01 - Comment: The paper proposes AirCache, a KV cache compression method for LVLMs, which aligns with the 'Model Compression' criterion.
Model Hemorrhage and the Robustness Limits of Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-04-01 - Comment: The paper investigates robustness limits of LLMs under compression techniques like pruning and quantization, providing insights into model stability and resilience.
Boosting Large Language Models with Mask Fine-Tuning - Score: 17 (R=9, N=8) - Date: 2025-04-01 - Comment: The paper introduces Mask Fine-Tuning for LLMs, which aligns with foundational research in large language models and explores a novel fine-tuning paradigm.
NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models - Score: 16 (R=9, N=7) - Date: 2025-04-22 - Comment: The paper introduces a unified framework for shape-preserving compression of LLMs, addressing sparsity and quantization, which aligns with the model compression criterion.
Sparse Hybrid Linear-Morphological Networks - Score: 16 (R=9, N=7) - Date: 2025-04-15 - Comment: The paper introduces a hybrid linear-morphological network with sparsity and pruning insights, aligning with the model compression and sparsity criteria.
TAGC: Optimizing Gradient Communication in Distributed Transformer Training - Score: 16 (R=9, N=7) - Date: 2025-04-09 - Comment: The paper proposes a gradient compression method tailored for transformer-based models, which is relevant to model compression and efficiency improvements.
When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks - Score: 16 (R=9, N=7) - Date: 2025-04-04 - Comment: The paper benchmarks compressed large reasoning models, providing insights into compression techniques like quantization, pruning, and distillation. It aligns with foundational research in model compression and efficiency.
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models - Score: 16 (R=9, N=7) - Date: 2025-04-01 - Comment: The paper focuses on quantization for State Space Models (SSMs), which aligns with the model compression criterion, particularly in terms of sparsity and low-bit quantization techniques.
Provably faster randomized and quantum algorithms for k-means clustering via uniform sampling - Score: 16 (R=8, N=8) - Date: 2025-04-30 - Comment: The paper proposes randomized and quantum algorithms for k-means clustering, which is foundational research in optimization and efficiency methods.
FourierSpecNet: Neural Collision Operator Approximation Inspired by the Fourier Spectral Method for Solving the Boltzmann Equation - Score: 16 (R=8, N=8) - Date: 2025-04-30 - Comment: The paper proposes a hybrid framework for solving the Boltzmann equation using deep learning, which is foundational research in AI for Science.
Coreset selection for the Sinkhorn divergence and generic smooth divergences - Score: 16 (R=8, N=8) - Date: 2025-04-30 - Comment: The paper introduces a coreset selection algorithm for smooth divergences, which is a foundational contribution to optimization and efficiency methods.
semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage - Score: 16 (R=8, N=8) - Date: 2025-04-29 - Comment: The paper proposes semi-PD, a system for efficient LLM serving, which aligns with model compression topics like KV cache optimization and introduces novel efficiency mechanisms.
TeleSparse: Practical Privacy-Preserving Verification of Deep Neural Networks - Score: 16 (R=8, N=8) - Date: 2025-04-29 - Comment: The paper addresses model sparsification and ZK-SNARKs for privacy-preserving verification, which aligns with the 'Model Compression' criterion. The sparsification and neural teleportation techniques are novel.
Efficient Learning on Large Graphs using a Densifying Regularity Lemma - Score: 16 (R=8, N=8) - Date: 2025-04-28 - Comment: The paper introduces a novel low-rank factorization method for large graphs, which aligns with the 'Model Compression' criterion due to its focus on efficiency and sparsity. Additionally, it provides theoretical insights via a constructive version of the weak regularity lemma, which is foundational.
Effortless, Simulation-Efficient Bayesian Inference using Tabular Foundation Models - Score: 16 (R=8, N=8) - Date: 2025-04-25 - Comment: The paper introduces a novel approach to simulation-based Bayesian inference using tabular foundation models, which aligns with foundational research in efficiency and representation learning.
DP2Unlearning: An Efficient and Guaranteed Unlearning Framework for LLMs - Score: 16 (R=8, N=8) - Date: 2025-04-21 - Comment: The paper introduces a differential privacy-based unlearning framework for LLMs, which aligns with foundational research in model efficiency and privacy guarantees, offering a novel approach to unlearning.
Preconditioned Gradient Descent for Over-Parameterized Nonconvex Matrix Factorization - Score: 16 (R=8, N=8) - Date: 2025-04-15 - Comment: The paper proposes PrecGD, a preconditioned gradient descent method for nonconvex matrix factorization, contributing foundational insights into optimization and efficiency in over-parameterized regimes.
Long Context In-Context Compression by Getting to the Gist of Gisting - Score: 16 (R=8, N=8) - Date: 2025-04-15 - Comment: The paper proposes GistPool, a method for in-context compression in LLMs, which aligns with model compression and efficiency breakthroughs.
Compositional Flows for 3D Molecule and Synthesis Pathway Co-design - Score: 16 (R=8, N=8) - Date: 2025-04-14 - Comment: The paper introduces CGFlow, a novel framework for compositional generative flows in 3D molecular design. It aligns with foundational research in AI for Science, particularly in generative paradigms for molecular modeling.
LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation - Score: 16 (R=8, N=8) - Date: 2025-04-11 - Comment: The paper proposes LoRI, a novel approach to reduce cross-task interference in multi-task low-rank adaptation for LLMs. It aligns with model compression and efficiency topics, particularly in parameter-efficient fine-tuning.
Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning - Score: 16 (R=8, N=8) - Date: 2025-04-10 - Comment: The paper introduces a novel adaptive SVD-based approach for continual learning in LLMs, which aligns with foundational research in model compression and efficiency. The method addresses catastrophic forgetting and provides theoretical insights into balancing plasticity and retention.
Self-Steering Language Models - Score: 16 (R=8, N=8) - Date: 2025-04-10 - Comment: The introduction of 'self-steering' LMs and recursive search procedures aligns with foundational research in model architecture, particularly in decoupling planning from execution for efficient reasoning.
Stochastic Optimization with Optimal Importance Sampling - Score: 16 (R=8, N=8) - Date: 2025-04-07 - Comment: The paper introduces a novel optimization algorithm with importance sampling, which is relevant to foundational research in stochastic optimization and efficiency improvements.
FLAMES: A Hybrid Spiking-State Space Model for Adaptive Memory Retention in Event-Based Learning - Score: 16 (R=8, N=8) - Date: 2025-04-03 - Comment: The paper introduces FLAMES, a novel hybrid framework combining structured state-space dynamics with event-driven computation. The use of a normal-plus-low-rank (NPLR) decomposition for efficiency aligns with the model compression criterion, and the Spike-Aware HiPPO mechanism offers insights into representation learning through memory retention dynamics.
Denoising guarantees for optimized sampling schemes in compressed sensing - Score: 16 (R=8, N=8) - Date: 2025-04-03 - Comment: This paper provides theoretical guarantees for optimized sampling schemes in compressed sensing, which aligns with foundational research in model compression and efficiency. The focus on denoising guarantees and generative priors adds a novel theoretical perspective.
On Stochastic Rounding with Few Random Bits - Score: 15 (R=8, N=7) - Date: 2025-04-30 - Comment: The paper explores stochastic rounding with few random bits, which aligns with the 'Model Compression' criterion by addressing low-precision computations and efficiency improvements.
Head-Tail-Aware KL Divergence in Knowledge Distillation for Spiking Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-04-30 - Comment: Proposes a novel Head-Tail Aware KL divergence for knowledge distillation in Spiking Neural Networks, which aligns with model compression and efficiency breakthroughs.
Can Large Language Models Learn Formal Logic? A Data-Driven Training and Evaluation Framework - Score: 15 (R=8, N=7) - Date: 2025-04-30 - Comment: Investigates LLMs' ability to learn formal logic with a data-driven framework, which aligns with theoretical insights into LLM behavior.
Fitness Landscape of Large Language Model-Assisted Automated Algorithm Search - Score: 15 (R=8, N=7) - Date: 2025-04-29 - Comment: The paper analyzes the fitness landscape of LLM-assisted algorithm search, providing theoretical insights into search behavior and multimodal landscapes. This aligns with foundational research in LLM behavior.
Towards Faster and More Compact Foundation Models for Molecular Property Prediction - Score: 15 (R=8, N=7) - Date: 2025-04-29 - Comment: The paper explores model compression strategies for molecular property prediction, aligning with foundational research in model compression and efficiency.
Outlier-aware Tensor Robust Principal Component Analysis with Self-guided Data Augmentation - Score: 15 (R=8, N=7) - Date: 2025-04-28 - Comment: The paper proposes a novel optimization-driven approach for Tensor Robust Principal Component Analysis, which aligns with 'Representation Learning' due to its focus on low-rank tensor decomposition and handling structured corruptions.
Towards Robust LLMs: an Adversarial Robustness Measurement Framework - Score: 15 (R=8, N=7) - Date: 2025-04-25 - Comment: The paper adapts a robustness measurement framework for LLMs, providing insights into adversarial robustness, which aligns with the 'Large Language Models' criterion by addressing theoretical aspects of LLM behavior.
HMI: Hierarchical Knowledge Management for Efficient Multi-Tenant Inference in Pretrained Language Models - Score: 15 (R=8, N=7) - Date: 2025-04-25 - Comment: The paper introduces a hierarchical knowledge management system for efficient multi-tenant inference in PLMs, which aligns with 'Model Compression' and 'Model Architecture' by addressing resource efficiency and hierarchical design.
HPU: High-Bandwidth Processing Unit for Scalable, Cost-effective LLM Inference via GPU Co-processing - Score: 15 (R=8, N=7) - Date: 2025-04-25 - Comment: The paper introduces a novel co-processing unit (HPU) to improve LLM inference efficiency, which aligns with model compression and efficiency breakthroughs. The focus on memory-bound operations and GPU-HPU collaboration is relevant to foundational efficiency research.
W-PCA Based Gradient-Free Proxy for Efficient Search of Lightweight Language Models - Score: 15 (R=8, N=7) - Date: 2025-04-23 - Comment: The paper introduces W-PCA, a novel zero-shot NAS method for lightweight language models, focusing on efficiency and evaluation proxies. This aligns with model compression and efficiency breakthroughs.
Low-Rank Adaptation of Neural Fields - Score: 15 (R=8, N=7) - Date: 2025-04-23 - Comment: The paper adapts LoRA for neural fields, which aligns with model compression and efficiency topics. It introduces a novel application of LoRA to neural fields, making it relevant to foundational research in parameter-efficient methods.
LoRe: Personalizing LLMs via Low-Rank Reward Modeling - Score: 15 (R=8, N=7) - Date: 2025-04-22 - Comment: The paper proposes a low-rank preference modeling framework for personalizing LLMs, which aligns with foundational research in representation learning and efficiency.
FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference - Score: 15 (R=8, N=7) - Date: 2025-04-22 - Comment: The paper proposes a fine-grained mixed-precision quantization method for LLM inference, which is relevant to model compression and efficiency improvements.
On Revealing the Hidden Problem Structure in Real-World and Theoretical Problems Using Walsh Coefficient Influence - Score: 15 (R=8, N=7) - Date: 2025-04-22 - Comment: The paper extends Walsh decomposition for optimization problems, which aligns with foundational research in sparsity and efficiency. The proposed weighted dynamic Variable Interaction Graph (wdVIG) is a novel contribution.
DIDS: Domain Impact-aware Data Sampling for Large Language Model Training - Score: 15 (R=8, N=7) - Date: 2025-04-21 - Comment: The paper introduces a domain-aware data sampling strategy for LLM training, which aligns with foundational research in optimizing training dynamics and efficiency.
Towards Lossless Token Pruning in Late-Interaction Retrieval Models - Score: 15 (R=8, N=7) - Date: 2025-04-18 - Comment: The paper proposes a principled approach to token pruning in late-interaction retrieval models, aligning with the 'Model Compression' criterion by addressing efficiency through pruning strategies.
Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data - Score: 15 (R=8, N=7) - Date: 2025-04-15 - Comment: The paper proposes a novel alignment method for LLMs using reference answers, which could provide insights into LLM behavior and optimization.
Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training - Score: 15 (R=8, N=7) - Date: 2025-04-15 - Comment: The paper introduces Lumos, a performance modeling toolkit for LLM training, which aligns with foundational research in efficiency and system-level optimization for LLMs.
SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting - Score: 15 (R=8, N=7) - Date: 2025-04-15 - Comment: The paper introduces speculative early exiting for LLM inference, which aligns with model compression and efficiency breakthroughs. It provides a novel system-level optimization for accelerating inference.
Sharpness-Aware Parameter Selection for Machine Unlearning - Score: 15 (R=8, N=7) - Date: 2025-04-10 - Comment: The paper proposes a sharpness-aware parameter selection strategy for machine unlearning, which aligns with the 'Model Compression' criterion due to its focus on efficient parameter updates and theoretical justifications.
Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation - Score: 15 (R=8, N=7) - Date: 2025-04-09 - Comment: The paper explores adapting decoder-only LLMs to encoder-decoder models, which provides insights into model architecture and efficiency trade-offs, aligning with foundational research in model architecture.
DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding - Score: 15 (R=8, N=7) - Date: 2025-04-09 - Comment: The paper introduces DEL, a dynamic exit layer method for efficient speculative decoding in LLMs, which aligns with 'Model Compression' and 'Large Language Models' by addressing efficiency in decoding.
Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling - Score: 15 (R=8, N=7) - Date: 2025-04-09 - Comment: The paper introduces a novel algorithm for constrained generation in LMs, which aligns with foundational research in efficiency and algorithmic improvements for LLMs.
PINNverse: Accurate parameter estimation in differential equations from noisy data with constrained physics-informed neural networks - Score: 15 (R=8, N=7) - Date: 2025-04-08 - Comment: The paper introduces a constrained optimization framework for physics-informed neural networks (PINNs), which is relevant to foundational AI for Science. The approach addresses key limitations in PINNs, offering methodological advancements.
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models - Score: 15 (R=8, N=7) - Date: 2025-04-08 - Comment: Conducts an empirical study on quantized reasoning models, which aligns with foundational research in model compression and efficiency.
SnapPix: Efficient-Coding--Inspired In-Sensor Compression for Edge Vision - Score: 15 (R=8, N=7) - Date: 2025-04-08 - Comment: The SnapPix system introduces an efficient in-sensor compression method inspired by efficient coding, which aligns with foundational research in model compression and efficiency.
LOGLO-FNO: Efficient Learning of Local and Global Features in Fourier Neural Operators - Score: 15 (R=8, N=7) - Date: 2025-04-08 - Comment: The paper proposes architectural enhancements to Fourier Neural Operators (FNOs) to address spectral bias, which aligns with architectural innovations and efficiency improvements.
The Effects of Grouped Structural Global Pruning of Vision Transformers on Domain Generalisation - Score: 15 (R=8, N=7) - Date: 2025-04-08 - Comment: The paper introduces a novel grouped structural pruning method for vision transformers, which aligns with the model compression criterion. The focus on dependency graph analysis and pruning metrics adds methodological insights.
Compressing 3D Gaussian Splatting by Noise-Substituted Vector Quantization - Score: 15 (R=8, N=7) - Date: 2025-04-07 - Comment: The paper proposes a novel compression method for 3D Gaussian Splatting using noise-substituted vector quantization, which aligns with the model compression criterion. The method introduces a new approach to reduce memory consumption while maintaining quality.
ConfEviSurrogate: A Conformalized Evidential Surrogate Model for Uncertainty Quantification - Score: 15 (R=8, N=7) - Date: 2025-04-07 - Comment: The paper proposes a novel conformalized evidential surrogate model for uncertainty quantification, which aligns with foundational research in model efficiency and uncertainty modeling. It introduces a new calibration step and separation of uncertainty sources.
Efficient Model Editing with Task-Localized Sparse Fine-tuning - Score: 15 (R=8, N=7) - Date: 2025-04-04 - Comment: The paper introduces sparse fine-tuning for task-localized model editing, which aligns with model compression and efficiency breakthroughs.
NeuraLUT-Assemble: Hardware-aware Assembling of Sub-Neural Networks for Efficient LUT Inference - Score: 15 (R=8, N=7) - Date: 2025-04-02 - Comment: The paper proposes a hardware-aware framework for LUT-based neural networks, focusing on efficiency and sparsity. It aligns with model compression topics, particularly in addressing sparsity and resource utilization.
Geometric Median Matching for Robust k-Subset Selection from Noisy Data - Score: 15 (R=8, N=7) - Date: 2025-04-02 - Comment: The paper introduces a novel k-subset selection strategy using the Geometric Median for robust data pruning, which aligns with model compression and efficiency topics. The theoretical guarantees and robustness improvements add to its relevance.
Hawkeye:Efficient Reasoning with Model Collaboration - Score: 15 (R=8, N=7) - Date: 2025-04-02 - Comment: The paper proposes HAWKEYE, a framework for efficient reasoning in LLMs by reducing redundancy in Chain-of-Thought reasoning. This aligns with model compression and efficiency breakthroughs, making it relevant.
How to safely discard features based on aggregate SHAP values - Score: 15 (R=8, N=7) - Date: 2025-04-01 - Comment: The paper investigates the soundness of using SHAP values for feature selection and proposes a theoretical framework for safe feature removal, which aligns with 'Model Compression' through feature pruning.

High Performance Computing (27)

Epistemic Closure and the Irreversibility of Misalignment: Modeling Systemic Barriers to Alignment Innovation - Score: 19 (R=10, N=9) - Date: 2025-04-04 - Comment: The paper introduces a theoretical model of epistemic closure and systemic barriers to alignment innovation, aligning with the 'Emerging Trends' criterion. It challenges established assumptions and proposes recursive modeling as a novel paradigm.
Nonlinear Computation with Linear Optics via Source-Position Encoding - Score: 18 (R=9, N=9) - Date: 2025-04-30 - Comment: The paper proposes a novel method for nonlinear computation in linear optical systems, which aligns with 'Emerging Trends' and foundational advancements in hardware for neural networks.
Sparks: Multi-Agent Artificial Intelligence Model Discovers Protein Design Principles - Score: 18 (R=9, N=9) - Date: 2025-04-29 - Comment: The paper presents Sparks, a multi-agent AI model discovering protein design principles, which aligns with AI for Science and introduces novel generative paradigms for protein modeling.
Markov Kernels, Distances and Optimal Control: A Parable of Linear Quadratic Non-Gaussian Distribution Steering - Score: 18 (R=9, N=9) - Date: 2025-04-23 - Comment: The paper explores Markov kernels and optimal control, introducing a novel connection between Markov kernels, distances, and control. This is a cutting-edge theoretical contribution with potential foundational impact.
Generative AI Act II: Test Time Scaling Drives Cognition Engineering - Score: 18 (R=9, N=9) - Date: 2025-04-21 - Comment: The paper discusses 'Act II' of generative AI and test-time scaling, which introduces a new paradigm in cognition engineering. This aligns with emerging trends and foundational shifts in AI.
Generalising from Self-Produced Data: Model Training Beyond Human Constraints - Score: 18 (R=9, N=9) - Date: 2025-04-08 - Comment: The paper introduces a novel framework for AI models to autonomously generate and validate knowledge, which aligns with foundational research in LLMs and emerging trends. The focus on self-improving AI systems is highly relevant.
AI-Newton: A Concept-Driven Physical Law Discovery System without Prior Physical Knowledge - Score: 18 (R=9, N=9) - Date: 2025-04-03 - Comment: AI-Newton represents a novel paradigm for autonomous scientific discovery, which aligns with the 'AI for Science' criterion and introduces a concept-driven approach to deriving physical laws.
The Limits of AI Explainability: An Algorithmic Information Theory Approach - Score: 17 (R=9, N=8) - Date: 2025-04-30 - Comment: This paper provides a theoretical foundation for AI explainability using algorithmic information theory, aligning with the 'Emerging Trends' criterion for foundational research.
Reviving Any-Subset Autoregressive Models with Principled Parallel Sampling and Speculative Decoding - Score: 17 (R=9, N=8) - Date: 2025-04-30 - Comment: The paper revives Any-Subset Autoregressive Models (AS-ARMs) with a principled approach to parallel sampling and decoding, which aligns with the 'Large Language Models' criterion by providing theoretical insights into language model behavior.
Scaling Laws For Scalable Oversight - Score: 17 (R=9, N=8) - Date: 2025-04-28 - Comment: The paper proposes a framework for scalable oversight and introduces theoretical scaling laws, which aligns with emerging trends in foundational AI research.
Learning Adaptive Parallel Reasoning with Language Models - Score: 17 (R=9, N=8) - Date: 2025-04-23 - Comment: The paper introduces Adaptive Parallel Reasoning (APR), which explores novel reasoning frameworks for LLMs, aligning with the criterion of theoretical insights into LLM behavior and architecture-level innovations.
Semi-parametric Memory Consolidation: Towards Brain-like Deep Continual Learning - Score: 17 (R=9, N=8) - Date: 2025-04-22 - Comment: The paper proposes a biomimetic continual learning framework inspired by human memory systems, which aligns with representation learning and training dynamics in neural networks. The semi-parametric memory consolidation mechanism is a novel contribution.
LLM Library Learning Fails: A LEGO-Prover Case Study - Score: 17 (R=9, N=8) - Date: 2025-04-07 - Comment: The paper critiques LLM-based library learning systems and highlights misconceptions, aligning with foundational research on LLM behavior and interpretability.
InfiniteICL: Breaking the Limit of Context Window Size via Long Short-term Memory Transformation - Score: 17 (R=9, N=8) - Date: 2025-04-03 - Comment: The paper introduces InfiniteICL, a framework for extending context window size in LLMs by transforming context knowledge into parameter updates. This is highly relevant to foundational research in large language models and addresses a critical limitation in context handling.
The Work Capacity of Channels with Memory: Maximum Extractable Work in Percept-Action Loops - Score: 17 (R=8, N=9) - Date: 2025-04-09 - Comment: The paper develops a thermodynamic framework for percept-action loops, which introduces a novel perspective on active learning systems and their energy efficiency, aligning with emerging trends in foundational research.
Energy-Based Coarse-Graining in Molecular Dynamics: A Flow-Based Framework Without Data - Score: 16 (R=8, N=8) - Date: 2025-04-30 - Comment: The paper introduces a data-free generative framework for coarse-graining in molecular dynamics, aligning with 'AI for Science' and foundational generative modeling.
SlimPipe: Memory-Thrifty and Efficient Pipeline Parallelism for Long-Context LLM Training - Score: 16 (R=8, N=8) - Date: 2025-04-22 - Comment: SlimPipe introduces a novel pipeline parallelism method for LLM training, addressing memory efficiency and scalability, which is relevant to foundational advancements in LLM training.
Sub-optimality of the Separation Principle for Quadratic Control from Bilinear Observations - Score: 16 (R=8, N=8) - Date: 2025-04-17 - Comment: The paper provides theoretical insights into the sub-optimality of the separation principle in quadratic control with bilinear observations. It challenges established assumptions and introduces new theoretical perspectives, aligning with emerging trends.
Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory - Score: 16 (R=8, N=8) - Date: 2025-04-11 - Comment: The paper introduces Dynamic Cheatsheet (DC), a framework for test-time learning with adaptive memory, which aligns with emerging trends in LLM behavior and interpretability. The approach is novel and demonstrates significant performance improvements.
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems - Score: 16 (R=8, N=8) - Date: 2025-04-04 - Comment: The survey provides a modular, brain-inspired architecture for intelligent agents, touching on foundational aspects of memory, world modeling, and continual learning. It aligns with emerging trends and foundational research in LLMs and AI systems.
Graph Learning at Scale: Characterizing and Optimizing Pre-Propagation GNNs - Score: 15 (R=8, N=7) - Date: 2025-04-21 - Comment: The paper characterizes and optimizes pre-propagation GNNs, which aligns with foundational research in graph learning and scalability, offering system-level insights.
GRAIL: Gradient-Based Adaptive Unlearning for Privacy and Copyright in LLMs - Score: 15 (R=8, N=7) - Date: 2025-04-18 - Comment: Introduces a gradient-based unlearning framework for LLMs, which aligns with foundational research in managing sensitive information in large models.
You Don't Need All Attentions: Distributed Dynamic Fine-Tuning for Foundation Models - Score: 15 (R=8, N=7) - Date: 2025-04-18 - Comment: The paper introduces a distributed fine-tuning framework for foundation models, which aligns with the 'Model Compression' criterion by addressing computational efficiency in fine-tuning.
Elucidating the Design Space of Multimodal Protein Language Models - Score: 15 (R=8, N=7) - Date: 2025-04-16 - Comment: The paper systematically explores the design space of multimodal protein language models, addressing tokenization loss and structural modeling. It aligns with foundational research in AI for science, particularly in protein modeling.
LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks - Score: 15 (R=8, N=7) - Date: 2025-04-15 - Comment: The paper uncovers a coreset effect in LLM unlearning benchmarks, providing theoretical insights into how unlearning can be achieved with minimal data. This aligns with the 'Large Language Models' criterion, offering foundational insights into LLM behavior.
How new data permeates LLM knowledge and how to dilute it - Score: 15 (R=8, N=7) - Date: 2025-04-15 - Comment: The paper investigates how new data permeates LLM knowledge and introduces methods to modulate knowledge insertion. This aligns with the 'Large Language Models' criterion, offering insights into LLM behavior and training dynamics.
A Simultaneous Approach for Training Neural Differential-Algebraic Systems of Equations - Score: 15 (R=8, N=7) - Date: 2025-04-08 - Comment: The paper extends neural ODEs to neural DAEs with a simultaneous training approach, contributing to foundational research in scientific machine learning and hybrid modeling.

Representation Learning (168)

I-Con: A Unifying Framework for Representation Learning - Score: 19 (R=10, N=9) - Date: 2025-04-25 - Comment: The paper introduces a unifying framework for representation learning, connecting various loss functions and methods through an information-theoretic perspective. This aligns closely with the 'Representation Learning' criterion, particularly in understanding how deep networks encode information.
SparseJEPA: Sparse Representation Learning of Joint Embedding Predictive Architectures - Score: 19 (R=10, N=9) - Date: 2025-04-24 - Comment: SparseJEPA directly addresses 'Representation Learning' by integrating sparsity into Joint Embedding Predictive Architectures, with theoretical contributions like reducing Multiinformation and proving the Data Processing Inequality for Multiinformation.
Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition - Score: 18 (R=10, N=8) - Date: 2025-04-30 - Comment: The paper introduces Low-Rank Sparse Attention (Lorsa), which aligns with the criteria of representation learning and model compression by exploring sparse dictionary learning and low-rank decomposition in Transformer attention layers. It also provides insights into training dynamics and interpretability.
Representation Learning via Non-Contrastive Mutual Information - Score: 18 (R=10, N=8) - Date: 2025-04-25 - Comment: The paper proposes a novel non-contrastive mutual information objective (MINC) for self-supervised representation learning, which is highly relevant to foundational research in representation learning. The approach combines strengths of contrastive and non-contrastive methods, offering a significant methodological improvement.
Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models - Score: 18 (R=9, N=9) - Date: 2025-04-29 - Comment: The paper introduces Modular Machine Learning (MML) as a paradigm for improving LLMs, which aligns with the 'Large Language Models' criterion. The focus on disentangled representation and modular reasoning is novel and impactful.
Symbolic Representation for Any-to-Any Generative Tasks - Score: 18 (R=9, N=9) - Date: 2025-04-25 - Comment: The paper proposes a symbolic generative task description language, which introduces a novel paradigm for generative AI. This aligns with emerging trends and foundational research in representation learning.
An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research - Score: 18 (R=9, N=9) - Date: 2025-04-18 - Comment: The paper proposes expanding Identifiability Theory to explain self-supervised learning, which aligns with foundational research in representation learning and training dynamics.
Towards Combinatorial Interpretability of Neural Computation - Score: 18 (R=9, N=9) - Date: 2025-04-15 - Comment: The paper introduces a novel combinatorial interpretability framework for understanding neural computation, which aligns with foundational research in representation learning and training dynamics.
Quantum Mechanics and Neural Networks - Score: 18 (R=9, N=9) - Date: 2025-04-09 - Comment: The paper explores the representation of quantum mechanical theories as neural networks, introducing theoretical insights into the intersection of quantum mechanics and neural networks. This aligns with emerging trends and foundational research.
The Dual-Route Model of Induction - Score: 18 (R=9, N=9) - Date: 2025-04-07 - Comment: The paper introduces the concept of token-level and concept-level induction heads, providing theoretical insights into in-context learning and representation learning in LLMs. This aligns with foundational research.
Example-Free Learning of Regular Languages with Prefix Queries - Score: 18 (R=9, N=9) - Date: 2025-04-04 - Comment: The paper introduces a novel algorithm (PL*) for learning regular languages using prefix queries, which is a cutting-edge theoretical contribution to representation learning and language modeling.
NoProp: Training Neural Networks without Back-propagation or Forward-propagation - Score: 18 (R=9, N=9) - Date: 2025-04-01 - Comment: The paper introduces a gradient-free learning method (NoProp) that departs from traditional backpropagation, offering a novel perspective on training dynamics.
Jekyll-and-Hyde Tipping Point in an AI's Behavior - Score: 17 (R=9, N=8) - Date: 2025-04-30 - Comment: The paper derives a formula for tipping points in LLM behavior, aligning with the 'Large Language Models' criterion by providing theoretical insights into LLM behavior and interpretability.
Learning Laplacian Positional Encodings for Heterophilous Graphs - Score: 17 (R=9, N=8) - Date: 2025-04-30 - Comment: The paper introduces Learnable Laplacian Positional Encodings, which aligns with foundational research in representation learning and graph neural networks.
Partial Answer of How Transformers Learn Automata - Score: 17 (R=9, N=8) - Date: 2025-04-30 - Comment: The paper introduces a novel framework for simulating finite automata using representation-theoretic methods, which aligns with representation learning and theoretical insights into Transformer behavior.
Emergence and scaling laws in SGD learning of shallow neural networks - Score: 17 (R=9, N=8) - Date: 2025-04-29 - Comment: The paper provides a theoretical analysis of SGD dynamics in learning two-layer neural networks, offering insights into training dynamics and scaling laws. This is highly relevant to representation learning.
Improving Reasoning Performance in Large Language Models via Representation Engineering - Score: 17 (R=9, N=8) - Date: 2025-04-29 - Comment: The paper introduces representation engineering for reasoning tasks in LLMs, which aligns with foundational research in representation learning and LLM behavior.
Convergence Properties of Natural Gradient Descent for Minimizing KL Divergence - Score: 17 (R=9, N=8) - Date: 2025-04-29 - Comment: The paper analyzes natural gradient descent for minimizing KL divergence, providing theoretical insights into optimization dynamics, which aligns with representation learning and foundational research.
Generalization Guarantees for Multi-View Representation Learning and Application to Regularization via Gaussian Product Mixture Prior - Score: 17 (R=9, N=8) - Date: 2025-04-28 - Comment: The paper explores multi-view representation learning with a focus on generalization bounds and introduces a novel regularizer based on Gaussian mixture priors. This aligns closely with the 'Representation Learning' criterion, particularly in training dynamics and feature learning.
Gradient Descent as a Shrinkage Operator for Spectral Bias - Score: 17 (R=9, N=8) - Date: 2025-04-28 - Comment: This paper provides a theoretical analysis of gradient descent as a shrinkage operator for spectral bias, which aligns with 'Representation Learning' and training dynamics in neural networks. It also introduces novel insights into activation functions and spectral bias.
Score-Based Deterministic Density Sampling - Score: 17 (R=9, N=8) - Date: 2025-04-28 - Comment: The paper proposes a deterministic sampling framework using Score-Based Transport Modeling, which aligns with 'Emerging Trends' and 'Representation Learning' due to its novel approach to sampling and convergence diagnostics.
Random-Set Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-04-28 - Comment: The paper proposes a novel approach to uncertainty quantification in LLMs using random sets, which aligns with the 'Large Language Models' criterion due to its focus on foundational improvements in LLM behavior and interpretability.
Non-identifiability distinguishes Neural Networks among Parametric Models - Score: 17 (R=9, N=8) - Date: 2025-04-28 - Comment: This paper provides theoretical insights into the non-identifiability of neural networks, distinguishing them from traditional parametric models. It aligns closely with foundational research in representation learning.
When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars - Score: 17 (R=9, N=8) - Date: 2025-04-25 - Comment: The paper investigates metadata conditioning in language model pretraining, providing theoretical insights into when this technique works or fails. This aligns with foundational research in representation learning and LLM behavior.
Enhancing Variational Autoencoders with Smooth Robust Latent Encoding - Score: 17 (R=9, N=8) - Date: 2025-04-25 - Comment: The paper enhances VAEs with adversarial training to improve robustness and fidelity, contributing to foundational insights into representation learning and generative model robustness.
Provable wavelet-based neural approximation - Score: 17 (R=9, N=8) - Date: 2025-04-24 - Comment: The paper develops a wavelet-based theoretical framework for analyzing neural network approximation capabilities, which aligns closely with foundational research in Representation Learning and Model Architecture.
An Effective Gram Matrix Characterizes Generalization in Deep Networks - Score: 17 (R=9, N=8) - Date: 2025-04-24 - Comment: This paper provides a theoretical analysis of generalization in deep networks using an effective Gram matrix, which aligns with the Representation Learning criterion by offering insights into training dynamics and generalization behavior.
Learning Energy-Based Generative Models via Potential Flow: A Variational Principle Approach to Probability Density Homotopy Matching - Score: 17 (R=9, N=8) - Date: 2025-04-24 - Comment: The paper proposes VPFB, a novel energy-based generative modeling framework that eliminates the need for implicit MCMC sampling. It introduces a variational principle for density homotopy matching, which is a significant theoretical contribution to generative modeling and energy-based models.
Shannon invariants: A scalable approach to information decomposition - Score: 17 (R=9, N=8) - Date: 2025-04-23 - Comment: The paper introduces 'Shannon invariants' for scalable information decomposition, offering insights into how deep learning architectures process information. This aligns with representation learning and training dynamics.
On Learning Parallel Pancakes with Mostly Uniform Weights - Score: 17 (R=9, N=8) - Date: 2025-04-22 - Comment: The paper explores the complexity of learning Gaussian Mixture Models (GMMs) with structural assumptions, which is foundational research in representation learning. It provides theoretical insights into the statistical query complexity and quasi-polynomial bounds.
AI for the Open-World: the Learning Principles - Score: 17 (R=9, N=8) - Date: 2025-04-22 - Comment: The paper discusses learning principles for open-world AI, aligning with 'Emerging Trends' by addressing foundational challenges in AI learning paradigms.
Data Selection for ERMs - Score: 17 (R=9, N=8) - Date: 2025-04-22 - Comment: The paper explores data selection for empirical risk minimizers, providing theoretical insights into optimizing training data. It aligns with 'Representation Learning' and offers foundational contributions to learning theory.
Density Measures for Language Generation - Score: 17 (R=9, N=8) - Date: 2025-04-22 - Comment: The paper introduces a theoretical framework for language generation, focusing on the trade-off between validity and breadth. It aligns with foundational research in LLM behavior and interpretability.
How Learnable Grids Recover Fine Detail in Low Dimensions: A Neural Tangent Kernel Analysis of Multigrid Parametric Encodings - Score: 17 (R=9, N=8) - Date: 2025-04-21 - Comment: The paper provides a theoretical analysis of multigrid parametric encodings (MPE) and Fourier feature encodings (FFE) using neural tangent kernels, offering foundational insights into representation learning.
Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model - Score: 17 (R=9, N=8) - Date: 2025-04-21 - Comment: The paper explores the phenomenon of grokking and proposes a method to accelerate it, which provides insights into training dynamics and representation learning.
Propagation of Chaos in One-hidden-layer Neural Networks beyond Logarithmic Time - Score: 17 (R=9, N=8) - Date: 2025-04-18 - Comment: The paper provides theoretical insights into the dynamics of neural networks in the mean-field regime, which aligns with the 'Representation Learning' criterion by analyzing training dynamics and approximation gaps.
Hierarchical Vector Quantized Graph Autoencoder with Annealing-Based Code Selection - Score: 17 (R=9, N=8) - Date: 2025-04-18 - Comment: The paper introduces a hierarchical vector quantized graph autoencoder, which aligns with foundational research in representation learning and autoencoders.
A Two-Phase Perspective on Deep Learning Dynamics - Score: 17 (R=9, N=8) - Date: 2025-04-18 - Comment: Proposes a two-phase perspective on deep learning dynamics, offering insights into training dynamics and representation learning.
Identifying and Mitigating the Influence of the Prior Distribution in Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-04-18 - Comment: The paper investigates the influence of prior distributions in LLMs and proposes methods to mitigate their effects, aligning with the 'Large Language Models' criterion by providing theoretical insights into LLM behavior.
On Linear Representations and Pretraining Data Frequency in Language Models - Score: 17 (R=9, N=8) - Date: 2025-04-18 - Comment: The paper investigates the relationship between pretraining data frequency and linear representations in LLMs, aligning with the 'Representation Learning' criterion as it provides insights into how LLMs encode information.
Approximation Bounds for Transformer Networks with Application to Regression - Score: 17 (R=9, N=8) - Date: 2025-04-17 - Comment: This paper provides theoretical insights into the approximation capabilities of Transformer networks, particularly in sequence-to-sequence mappings and regression problems. It aligns with the 'Model Architecture' criterion by analyzing Transformers' structure and interpretability, and also touches on foundational aspects of representation learning.
Error Broadcast and Decorrelation as a Potential Artificial and Natural Learning Mechanism - Score: 17 (R=9, N=8) - Date: 2025-04-17 - Comment: The paper introduces the Error Broadcast and Decorrelation (EBD) algorithm, which provides a novel learning framework addressing the credit assignment problem in neural networks. This aligns closely with the 'Representation Learning' criterion, as it explores training dynamics and biologically plausible frameworks.
Erzeugunsgrad, VC-Dimension and Neural Networks with rational activation function - Score: 17 (R=9, N=8) - Date: 2025-04-16 - Comment: This paper provides a theoretical connection between VC-dimension and neural networks with rational activation functions, which aligns with foundational research in representation learning and theoretical insights into neural networks.
Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs - Score: 17 (R=9, N=8) - Date: 2025-04-16 - Comment: The paper introduces a novel model merging strategy leveraging submodule linearity, which aligns with foundational research in model architecture and representation learning. The closed-form solution for merging weights is a notable theoretical contribution.
Expressivity of Quadratic Neural ODEs - Score: 17 (R=9, N=8) - Date: 2025-04-15 - Comment: The paper provides theoretical bounds on the expressivity of quadratic neural ODEs, focusing on the role of depth in model capabilities. This aligns with foundational research in representation learning and model architecture.
In almost all shallow analytic neural network optimization landscapes, efficient minimizers have strongly convex neighborhoods - Score: 17 (R=9, N=8) - Date: 2025-04-15 - Comment: The paper provides theoretical insights into the optimization landscape of shallow neural networks, which aligns with foundational research in representation learning and training dynamics.
From Tokens to Lattices: Emergent Lattice Structures in Language Models - Score: 17 (R=9, N=8) - Date: 2025-04-15 - Comment: The paper investigates how conceptual knowledge emerges in pretrained masked language models using Formal Concept Analysis, which provides theoretical insights into representation learning.
Dimension reduction for derivative-informed operator learning: An analysis of approximation errors - Score: 17 (R=9, N=8) - Date: 2025-04-14 - Comment: The paper provides a theoretical analysis of dimension reduction methods for derivative-informed operator learning, which aligns with foundational research in representation learning and AI for Science. It offers insights into approximation errors in neural operators.
Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-04-14 - Comment: The paper studies the intrinsic dimension of data learned by CNNs trained with gradient descent, providing theoretical insights into training dynamics and representation learning.
Statistically guided deep learning - Score: 17 (R=9, N=8) - Date: 2025-04-14 - Comment: The paper proposes a theoretically grounded deep learning algorithm with a focus on optimization, generalization, and approximation, aligning with foundational research in representation learning.
Large language models could be rote learners - Score: 17 (R=9, N=8) - Date: 2025-04-14 - Comment: The paper explores rote memorization versus genuine capability learning in LLMs, which aligns with foundational insights into LLM behavior and interpretability.
Minimum width for universal approximation using squashable activation functions - Score: 17 (R=9, N=8) - Date: 2025-04-11 - Comment: This paper provides theoretical insights into the minimum width for universal approximation using squashable activation functions, which is foundational research in representation learning.
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization - Score: 17 (R=9, N=8) - Date: 2025-04-09 - Comment: The paper introduces an unsupervised incentivization method for reasoning in LLMs, which aligns with foundational research in large language models and their training dynamics.
Architecture independent generalization bounds for overparametrized deep ReLU networks - Score: 17 (R=9, N=8) - Date: 2025-04-09 - Comment: This paper provides theoretical generalization bounds for overparametrized deep ReLU networks, which aligns with 'Representation Learning' by offering insights into training dynamics and generalization.
Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification - Score: 17 (R=9, N=8) - Date: 2025-04-09 - Comment: The paper probes hidden states of reasoning models to verify correctness, which provides insights into representation learning and training dynamics in neural networks.
Nonlocal techniques for the analysis of deep ReLU neural network approximations - Score: 17 (R=9, N=8) - Date: 2025-04-08 - Comment: The paper provides theoretical insights into ReLU neural network approximations, focusing on Sobolev and Barron function spaces. This aligns with representation learning and foundational theoretical work.
AI in a vat: Fundamental limits of efficient world modelling for agent sandboxing and interpretability - Score: 17 (R=9, N=8) - Date: 2025-04-08 - Comment: The paper investigates fundamental trade-offs in world modeling for agent sandboxing, which aligns with emerging trends and foundational research in AI interpretability.
Variational Self-Supervised Learning - Score: 17 (R=9, N=8) - Date: 2025-04-08 - Comment: The paper introduces Variational Self-Supervised Learning (VSSL), which aligns with foundational research in representation learning and probabilistic modeling.
Directional Sign Loss: A Topology-Preserving Loss Function that Approximates the Sign of Finite Differences - Score: 17 (R=9, N=8) - Date: 2025-04-08 - Comment: The introduction of a topology-preserving loss function (DSL) for representation learning is highly relevant. It provides a novel approach to preserving topological features in latent spaces, aligning with representation learning.
Scalable Robust Bayesian Co-Clustering with Compositional ELBOs - Score: 17 (R=9, N=8) - Date: 2025-04-08 - Comment: The paper introduces a novel variational co-clustering framework with a compositional ELBO, which aligns with representation learning through its focus on latent space clustering and training dynamics.
Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition - Score: 17 (R=9, N=8) - Date: 2025-04-08 - Comment: The paper explores reasoning capabilities of LLMs using a principled experimental protocol, which aligns with foundational research on LLM behavior and interpretability.
Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking - Score: 17 (R=9, N=8) - Date: 2025-04-07 - Comment: The paper provides theoretical insights into the mechanism of grokking, focusing on optimization dynamics and embedding space uniformity. This aligns with foundational research in representation learning and training dynamics, making it highly relevant.
How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence - Score: 17 (R=9, N=8) - Date: 2025-04-07 - Comment: The paper provides a mechanistic analysis of how post-training reshapes LLMs, focusing on knowledge, truthfulness, refusal, and confidence. This aligns with the criterion of theoretical insights into LLM behavior.
Towards Understanding How Knowledge Evolves in Large Vision-Language Models - Score: 17 (R=9, N=8) - Date: 2025-04-07 - Comment: The paper investigates the internal mechanisms of Large Vision-Language Models (LVLMs), focusing on how multimodal knowledge evolves. This aligns with the 'Representation Learning' and 'Large Language Models' criteria, as it provides insights into how these models encode and process information.
Do Two AI Scientists Agree? - Score: 17 (R=9, N=8) - Date: 2025-04-04 - Comment: This paper explores how AI models converge on theories in scientific tasks, using Hamiltonian-Lagrangian neural networks. It aligns with 'Emerging Trends' as it challenges assumptions about AI learning dynamics and introduces a novel perspective on interpretability in scientific modeling.
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models - Score: 17 (R=9, N=8) - Date: 2025-04-04 - Comment: The paper applies Sparse Autoencoders (SAEs) to Vision-Language Models (VLMs), enhancing interpretability and control, which aligns with the 'Representation Learning' criterion. It also contributes to unsupervised methods for improving model behavior.
Analytical Discovery of Manifold with Machine Learning - Score: 17 (R=9, N=8) - Date: 2025-04-04 - Comment: The paper introduces GAMLA, a framework for analytical manifold learning using autoencoders, aligning with the 'Representation Learning' criterion. It provides foundational insights into manifold geometry and interpretability.
A Physics-Informed Meta-Learning Framework for the Continuous Solution of Parametric PDEs on Arbitrary Geometries - Score: 17 (R=9, N=8) - Date: 2025-04-04 - Comment: The paper introduces a physics-informed meta-learning framework for solving parametric PDEs, which aligns with 'AI for Science' due to its foundational contributions to computational mechanics and operator learning.
Critical Thinking: Which Kinds of Complexity Govern Optimal Reasoning Length? - Score: 17 (R=9, N=8) - Date: 2025-04-03 - Comment: This paper investigates reasoning length in LLMs using a formal framework based on deterministic finite automata (DFAs). It provides theoretical insights into LLM behavior and interpretability, which aligns with the foundational research focus on large language models.
Sparse Gaussian Neural Processes - Score: 17 (R=9, N=8) - Date: 2025-04-03 - Comment: The paper introduces a novel approach to meta-learning sparse Gaussian process inference, which aligns with representation learning and sparsity. The focus on interpretability and manual elicitation of priors adds theoretical depth.
Estimating Unbounded Density Ratios: Applications in Error Control under Covariate Shift - Score: 17 (R=9, N=8) - Date: 2025-04-03 - Comment: The paper addresses density ratio estimation under relaxed conditions, which is foundational for representation learning and generalization under covariate shift. The theoretical contributions are significant and align with the criteria for foundational research.
Logical perspectives on learning statistical objects - Score: 17 (R=9, N=8) - Date: 2025-04-02 - Comment: The paper provides theoretical insights into learnability and sample complexity, which aligns with foundational research in representation learning. It also explores connections to logical formulas, adding a novel perspective.
Deep Generative Models: Complexity, Dimensionality, and Approximation - Score: 17 (R=9, N=8) - Date: 2025-04-02 - Comment: The paper provides a theoretical insight into generative networks and challenges the conventional belief about the relationship between input dimensionality and data distribution modeling. This aligns with foundational research in representation learning.
ObscuraCoder: Powering Efficient Code LM Pre-Training Via Obfuscation Grounding - Score: 17 (R=9, N=8) - Date: 2025-04-02 - Comment: The paper proposes a novel pretraining objective for Code-LMs using obfuscation grounding, which aligns with foundational research in representation learning and LLM pretraining. The approach introduces methodological innovations.
GMapLatent: Geometric Mapping in Latent Space - Score: 17 (R=9, N=8) - Date: 2025-04-01 - Comment: The paper introduces a novel geometric mapping approach for aligning latent spaces in encoder-decoder architectures, which aligns with the Representation Learning criterion. The focus on foundational methods for latent space alignment is relevant.
The Reasoning-Memorization Interplay in Language Models Is Mediated by a Single Direction - Score: 17 (R=9, N=8) - Date: 2025-04-01 - Comment: The paper provides mechanistic insights into reasoning and memorization dynamics in LLMs, which aligns with the foundational research on LLM behavior and interpretability.
Towards Understanding the Optimization Mechanisms in Deep Learning - Score: 17 (R=9, N=8) - Date: 2025-04-01 - Comment: The paper provides theoretical insights into optimization mechanisms in deep learning, aligning with representation learning and training dynamics.
Studying Small Language Models with Susceptibilities - Score: 17 (R=8, N=9) - Date: 2025-04-28 - Comment: The paper develops a linear response framework for interpretability in small language models, which aligns with representation learning by analyzing how components of a network respond to data distribution shifts. The focus on susceptibility and attribution scores is novel.
A Unified Approach to Analysis and Design of Denoising Markov Models - Score: 17 (R=8, N=9) - Date: 2025-04-03 - Comment: The paper provides a rigorous mathematical foundation for denoising Markov models, unifying existing generative model formulations and introducing novel design principles. This aligns with foundational research in representation learning and emerging trends.
From Evidence to Belief: A Bayesian Epistemology Approach to Language Models - Score: 16 (R=9, N=7) - Date: 2025-04-29 - Comment: The paper explores Bayesian epistemology in language models, which provides theoretical insights into LLM behavior and interpretability, aligning with foundational research in LLMs.
MIB: A Mechanistic Interpretability Benchmark - Score: 16 (R=9, N=7) - Date: 2025-04-18 - Comment: Introduces a benchmark for mechanistic interpretability, which aligns with foundational research in understanding LLM behavior.
Towards Interpretable Deep Generative Models via Causal Representation Learning - Score: 16 (R=9, N=7) - Date: 2025-04-17 - Comment: The paper reviews causal representation learning (CRL), which is a foundational topic in representation learning. It connects CRL to classical statistical models and highlights open questions, making it highly relevant to the 'Representation Learning' criterion.
How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective - Score: 16 (R=9, N=7) - Date: 2025-04-11 - Comment: This paper provides mechanistic interpretability insights into how LLMs process relevance judgments, which aligns with the foundational study of LLM behavior and interpretability.
Contrastive and Variational Approaches in Self-Supervised Learning for Complex Data Mining - Score: 16 (R=9, N=7) - Date: 2025-04-08 - Comment: The paper explores contrastive and variational approaches in self-supervised learning, which aligns with foundational research in representation learning and training dynamics.
Learning and Generalization with Mixture Data - Score: 16 (R=8, N=8) - Date: 2025-04-30 - Comment: The paper studies generalization and statistical rates for mixture data, providing theoretical insights into heterogeneous data learning. This aligns with foundational research in representation learning and generalization theory.
Hierarchical Uncertainty-Aware Graph Neural Network - Score: 16 (R=8, N=8) - Date: 2025-04-29 - Comment: The paper introduces a hierarchical uncertainty-aware GNN, which aligns with representation learning and model architecture. The integration of uncertainty estimation and graph hierarchies is novel.
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning - Score: 16 (R=8, N=8) - Date: 2025-04-29 - Comment: The paper introduces a novel self-play framework for improving LLM reasoning, which aligns with foundational research in LLM behavior and interpretability.
Simple Graph Contrastive Learning via Fractional-order Neural Diffusion Networks - Score: 16 (R=8, N=8) - Date: 2025-04-24 - Comment: The paper proposes a novel augmentation-free graph contrastive learning framework using fractional-order neural diffusion networks. It aligns with representation learning and introduces a unique approach to graph representation learning.
MAGIC: Near-Optimal Data Attribution for Deep Learning - Score: 16 (R=8, N=8) - Date: 2025-04-24 - Comment: The MAGIC method addresses data attribution in deep learning, which is relevant to foundational research in Representation Learning by exploring the impact of training data on model predictions.
Riemannian Neural Geodesic Interpolant - Score: 16 (R=8, N=8) - Date: 2025-04-23 - Comment: The paper introduces a Riemannian Neural Geodesic Interpolant for generative modeling on manifolds, which is a novel contribution to model architecture and representation learning.
Deep learning with missing data - Score: 16 (R=8, N=8) - Date: 2025-04-23 - Comment: The paper introduces Pattern Embedded Neural Networks (PENNs) for handling missing data, which provides theoretical insights into representation learning and achieves minimax rates.
Probabilistic Stability Guarantees for Feature Attributions - Score: 16 (R=8, N=8) - Date: 2025-04-21 - Comment: The paper proposes a novel stability certification algorithm for feature attributions, which aligns with representation learning by providing insights into the robustness of explanation methods. The use of Boolean function analysis and soft stability introduces a novel theoretical perspective.
Spectral Algorithms under Covariate Shift - Score: 16 (R=8, N=8) - Date: 2025-04-18 - Comment: The paper investigates spectral algorithms under covariate shift, providing theoretical insights into generalization, which aligns with foundational research.
Emergence of Computational Structure in a Neural Network Physics Simulator - Score: 16 (R=8, N=8) - Date: 2025-04-17 - Comment: This paper investigates the emergence of computational structures in neural networks, specifically in a transformer-like model. It provides insights into training dynamics and interpretability, aligning with the 'Representation Learning' and 'Model Architecture' criteria.
Cryo-em images are intrinsically low dimensional - Score: 16 (R=8, N=8) - Date: 2025-04-16 - Comment: The paper investigates the low-dimensional structure of cryo-EM latent spaces, providing insights into representation learning and manifold geometry. This aligns with 'Representation Learning' and foundational research in AI for Science.
IsoSEL: Isometric Structural Entropy Learning for Deep Graph Clustering in Hyperbolic Space - Score: 16 (R=8, N=8) - Date: 2025-04-15 - Comment: The paper introduces IsoSEL, a novel framework for deep graph clustering in hyperbolic space, contributing foundational insights into representation learning and clustering methods.
NetTAG: A Multimodal RTL-and-Layout-Aligned Netlist Foundation Model via Text-Attributed Graph - Score: 16 (R=8, N=8) - Date: 2025-04-15 - Comment: The paper introduces a multimodal netlist foundation model combining graph and text attributes, which aligns with representation learning and architectural innovations.
High-order expansion of Neural Ordinary Differential Equations flows - Score: 16 (R=8, N=8) - Date: 2025-04-15 - Comment: The paper introduces a high-order expansion framework for neural ODEs, contributing to the theoretical understanding of neural dynamics. This aligns with foundational research in representation learning and interpretability.
Constrained Machine Learning Through Hyperspherical Representation - Score: 16 (R=8, N=8) - Date: 2025-04-14 - Comment: The paper proposes a novel hyperspherical representation method to enforce constraints in machine learning outputs, which aligns with foundational research in representation learning and model architecture.
Entropic bounds for conditionally Gaussian vectors and applications to neural networks - Score: 16 (R=8, N=8) - Date: 2025-04-14 - Comment: The paper provides theoretical bounds on neural network convergence to Gaussian distributions, which aligns with foundational research in understanding training dynamics and representation learning.
Enabling Automatic Differentiation with Mollified Graph Neural Operators - Score: 16 (R=8, N=8) - Date: 2025-04-14 - Comment: The paper introduces a novel method (mGNO) for physics-informed neural operators, which is foundational in AI for Science. It provides significant theoretical contributions by enabling exact gradients and improving generalization on irregular grids.
Smoothed Distance Kernels for MMDs and Applications in Wasserstein Gradient Flows - Score: 16 (R=8, N=8) - Date: 2025-04-11 - Comment: The paper introduces a new smoothed distance kernel for MMDs with theoretical guarantees, which is relevant to representation learning and emerging trends in foundational research. The novel kernel design and its theoretical contributions are noteworthy.
Adversarial Subspace Generation for Outlier Detection in High-Dimensional Data - Score: 16 (R=8, N=8) - Date: 2025-04-11 - Comment: The paper introduces a novel theoretical framework (Myopic Subspace Theory) for subspace selection and proposes a generative method (V-GAN) to address high-dimensional data challenges. While it focuses on outlier detection, the foundational insights into subspace selection and optimization align with representation learning.
GOLLuM: Gaussian Process Optimized LLMs -- Reframing LLM Finetuning through Bayesian Optimization - Score: 16 (R=8, N=8) - Date: 2025-04-09 - Comment: The paper reframes LLM finetuning through Bayesian optimization, introducing a novel integration of Gaussian processes and LLMs, which aligns with foundational research in representation learning and optimization.
Fractal and Regular Geometry of Deep Neural Networks - Score: 16 (R=8, N=8) - Date: 2025-04-09 - Comment: The paper investigates the fractal and regular geometry of neural networks, which aligns with 'Representation Learning' by exploring the geometric properties of network activations and their implications.
Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models - Score: 16 (R=8, N=8) - Date: 2025-04-08 - Comment: The paper introduces TISER, a framework for improving temporal reasoning in LLMs, which is relevant to foundational research in LLM behavior and interpretability. The focus on timeline construction and iterative self-reflection is novel.
Adversarial KA - Score: 16 (R=8, N=8) - Date: 2025-04-08 - Comment: The paper investigates the robustness of the Kolmogorov-Arnold representation theorem under adversarial attacks, which aligns with foundational research in representation learning and theoretical insights.
Adaptive Elicitation of Latent Information Using Natural Language - Score: 16 (R=8, N=8) - Date: 2025-04-08 - Comment: The paper proposes an adaptive elicitation framework for reducing uncertainty in latent entities using natural language, which aligns with foundational research in representation learning and emerging trends.
Persuasive Calibration - Score: 16 (R=8, N=8) - Date: 2025-04-07 - Comment: The paper introduces a novel framework for optimal predictors under calibration error budgets, which aligns with foundational research in representation learning and training dynamics. It provides theoretical insights into prediction calibration.
Less is More: Efficient Black-box Attribution via Minimal Interpretable Subset Selection - Score: 16 (R=8, N=8) - Date: 2025-04-02 - Comment: The paper introduces a black-box attribution method (LiMA) for interpretability, which aligns with representation learning by focusing on input-prediction interactions. The novel optimization approach and submodular function design are significant contributions.
Bayesian Predictive Coding - Score: 16 (R=8, N=8) - Date: 2025-04-01 - Comment: The paper introduces Bayesian Predictive Coding, which provides theoretical insights into biologically plausible learning and uncertainty quantification, aligning with foundational research in representation learning.
Node Embeddings via Neighbor Embeddings - Score: 16 (R=8, N=8) - Date: 2025-04-01 - Comment: The paper introduces a unified framework for graph layouts and node embeddings, which aligns with representation learning and proposes a novel neighbor embedding method.
Guessing Efficiently for Constrained Subspace Approximation - Score: 15 (R=8, N=7) - Date: 2025-04-30 - Comment: The paper introduces a coreset-guess-solve framework for constrained subspace approximation, which aligns with 'Representation Learning' by addressing theoretical aspects of subspace approximation.
Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think - Score: 15 (R=8, N=7) - Date: 2025-04-30 - Comment: The paper explores reasoning trace analysis in LLMs, which aligns with foundational research in understanding LLM behavior and interpretability.
Explanations Go Linear: Interpretable and Individual Latent Encoding for Post-hoc Explainability - Score: 15 (R=8, N=7) - Date: 2025-04-30 - Comment: The paper introduces ILLUME, a framework for post-hoc explainability using representation learning, which aligns with 'Representation Learning' by addressing interpretable latent encoding.
Group Relative Knowledge Distillation: Learning from Teacher's Relational Inductive Bias - Score: 15 (R=8, N=7) - Date: 2025-04-30 - Comment: Proposes Group Relative Knowledge Distillation (GRKD), focusing on relational inductive biases in knowledge distillation, which aligns with representation learning and model compression.
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning - Score: 15 (R=8, N=7) - Date: 2025-04-29 - Comment: The paper proposes VCM, a framework for visual concept modeling with efficiency improvements, which aligns with foundational research in representation learning and efficiency.
Representation Learning for Distributional Perturbation Extrapolation - Score: 15 (R=8, N=7) - Date: 2025-04-28 - Comment: This paper introduces a new method, PDAE, for representation learning in the context of distributional perturbation extrapolation. It provides theoretical guarantees and focuses on latent variable modeling, which aligns with 'Representation Learning' and foundational generative paradigms.
A Model Zoo on Phase Transitions in Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-04-28 - Comment: The paper introduces a structured 'model zoo' for weight space learning and explores phase transitions in neural networks, which could provide foundational insights into representation learning and training dynamics.
Physics-informed features in supervised machine learning - Score: 15 (R=8, N=7) - Date: 2025-04-25 - Comment: The paper discusses a physics-informed approach to feature-based machine learning, which aligns with 'Representation Learning' by integrating domain knowledge into feature extraction, enhancing interpretability and potential discovery of new mechanisms.
In-Context Learning can distort the relationship between sequence likelihoods and biological fitness - Score: 15 (R=8, N=7) - Date: 2025-04-25 - Comment: The paper explores how in-context learning distorts sequence likelihoods in biological fitness, providing theoretical insights into LLM behavior. This aligns with foundational research in LLM interpretability.
An XAI-based Analysis of Shortcut Learning in Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-04-23 - Comment: The paper systematically analyzes shortcut learning in neural networks using an XAI-based diagnostic measure. It aligns with representation learning by providing insights into how spurious correlations are encoded and disentangled in different architectures.
Analytical Softmax Temperature Setting from Feature Dimensions for Model- and Domain-Robust Classification - Score: 15 (R=8, N=7) - Date: 2025-04-23 - Comment: The paper provides a theoretical insight into the relationship between feature dimensionality and the softmax temperature parameter, which aligns with representation learning and training dynamics. The work introduces a novel empirical formula and optimization scheme, making it relevant to foundational research.
Unifying Image Counterfactuals and Feature Attributions with Latent-Space Adversarial Attacks - Score: 15 (R=8, N=7) - Date: 2025-04-23 - Comment: The paper introduces a novel framework for counterfactual explanations in computer vision models, leveraging latent-space adversarial attacks. It aligns with representation learning by addressing interpretability and feature attribution in a computationally efficient manner.
Emergence and Evolution of Interpretable Concepts in Diffusion Models - Score: 15 (R=8, N=7) - Date: 2025-04-23 - Comment: The paper applies Sparse Autoencoders to diffusion models for mechanistic interpretability, aligning with representation learning and emerging trends in understanding generative models.
Improving Learning to Optimize Using Parameter Symmetries - Score: 15 (R=8, N=7) - Date: 2025-04-23 - Comment: The paper investigates learning-to-optimize (L2O) algorithms by leveraging parameter space symmetry, which is relevant to representation learning and training dynamics. The theoretical analysis and empirical benchmarks add novelty to the field.
A Basic Evaluation of Neural Networks Trained with the Error Diffusion Learning Algorithm - Score: 15 (R=8, N=7) - Date: 2025-04-22 - Comment: The paper evaluates the biologically inspired Error Diffusion Learning Algorithm (EDLA), which contributes to foundational research in training dynamics and alternative learning methods. The biologically motivated approach is novel.
The Geometry of Self-Verification in a Task-Specific Reasoning Model - Score: 15 (R=8, N=7) - Date: 2025-04-22 - Comment: The paper provides insights into self-verification mechanisms in reasoning models, aligning with foundational research in understanding LLM behavior and interpretability.
Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation - Score: 15 (R=8, N=7) - Date: 2025-04-22 - Comment: The paper introduces a novel stochastic self-distillation strategy, which aligns with representation learning by improving knowledge distillation techniques.
Leakage and Interpretability in Concept-Based Models - Score: 15 (R=8, N=7) - Date: 2025-04-22 - Comment: The paper introduces an information-theoretic framework to address leakage in concept-based models, which aligns with representation learning and interpretability, making it relevant to foundational research.
LogicTree: Structured Proof Exploration for Coherent and Rigorous Logical Reasoning with Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-04-22 - Comment: The paper proposes LogicTree, a framework for structured proof exploration in LLMs, which aligns with the 'Large Language Models' criterion by addressing reasoning and interpretability challenges.
Generative System Dynamics in Recurrent Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-04-22 - Comment: The paper investigates the dynamics of RNNs with a focus on oscillatory behavior and stability, which aligns with foundational research in representation learning and training dynamics.
Training Autoencoders Using Stochastic Hessian-Free Optimization with LSMR - Score: 15 (R=8, N=7) - Date: 2025-04-21 - Comment: The paper proposes improvements to Hessian-free optimization for training autoencoders, which aligns with representation learning and foundational training dynamics. The use of LSMR and mini-batch selection adds methodological novelty.
Disentangling Polysemantic Channels in Convolutional Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-04-18 - Comment: The paper focuses on disentangling polysemantic channels in CNNs, which aligns with the 'Representation Learning' criterion by enhancing interpretability and understanding of feature encoding in neural networks.
Information Gain-Guided Causal Intervention for Autonomous Debiasing Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-04-18 - Comment: The paper proposes a causal intervention framework for debiasing LLMs, which is relevant to foundational research in LLM behavior and interpretability.
The Others: Naturally Isolating Out-of-Distribution Samples for Robust Open-Set Semi-Supervised Learning - Score: 15 (R=8, N=7) - Date: 2025-04-18 - Comment: The paper introduces a novel framework, MagMatch, for open-set semi-supervised learning using a prototype-based contrastive learning paradigm. This aligns with representation learning, particularly in the context of feature space structuring and contrastive methods.
MLPs and KANs for data-driven learning in physical problems: A performance comparison - Score: 15 (R=8, N=7) - Date: 2025-04-16 - Comment: The paper compares KANs and MLPs for solving PDEs, providing insights into their expressiveness and performance. This aligns with foundational research in representation learning and model architecture, particularly in physics-informed machine learning.
Single-Input Multi-Output Model Merging: Leveraging Foundation Models for Dense Multi-Task Learning - Score: 15 (R=8, N=7) - Date: 2025-04-16 - Comment: The paper introduces a novel approach to model merging in multi-task learning, which aligns with representation learning and architectural insights. The focus on task-specific decoders and representation alignment is relevant.
Better Estimation of the KL Divergence Between Language Models - Score: 15 (R=8, N=7) - Date: 2025-04-16 - Comment: The paper proposes a Rao-Blackwellized estimator for KL divergence, addressing variance issues in sampling-based estimators. This aligns with 'Representation Learning' and theoretical advancements in LLM behavior.
Negate or Embrace: On How Misalignment Shapes Multimodal Representation Learning - Score: 15 (R=8, N=7) - Date: 2025-04-15 - Comment: The paper explores the impact of misalignment in multimodal representation learning, providing theoretical insights into selection and perturbation biases, which is relevant to representation learning.
The Impact of Model Zoo Size and Composition on Weight Space Learning - Score: 15 (R=8, N=7) - Date: 2025-04-15 - Comment: The paper explores weight space learning across heterogeneous model zoos, contributing to foundational research in representation learning and transferability of neural network weights.
Towards Quantifying Commonsense Reasoning with Mechanistic Insights - Score: 15 (R=8, N=7) - Date: 2025-04-15 - Comment: The paper explores commonsense reasoning in LLMs and proposes mechanisms for evaluating reasoning components, aligning with foundational research into LLM behavior and interpretability.
Towards Scalable Bayesian Optimization via Gradient-Informed Bayesian Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-04-15 - Comment: The paper proposes gradient-informed Bayesian neural networks for Bayesian optimization, which aligns with the 'Representation Learning' criterion by enhancing surrogate models with gradient information.
Measuring Leakage in Concept-Based Methods: An Information Theoretic Approach - Score: 15 (R=8, N=7) - Date: 2025-04-15 - Comment: The paper introduces an information-theoretic measure for leakage in Concept Bottleneck Models, which aligns with representation learning by addressing how information is encoded and structured.
Proofs as Explanations: Short Certificates for Reliable Predictions - Score: 15 (R=8, N=7) - Date: 2025-04-14 - Comment: The paper introduces a model for explainable AI using certificates for reliable predictions, which aligns with foundational research in representation learning and theoretical insights.
Revisiting Likelihood-Based Out-of-Distribution Detection by Modeling Representations - Score: 15 (R=8, N=7) - Date: 2025-04-11 - Comment: This paper revisits likelihood-based OOD detection by modeling representations using diffusion models. It aligns with foundational research in representation learning and provides insights into improving OOD detection.
ConceptFormer: Towards Efficient Use of Knowledge-Graph Embeddings in Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-04-11 - Comment: ConceptFormer introduces a method to integrate structured knowledge from knowledge graphs into LLMs without altering their internal architecture. This aligns with foundational research in model architecture and representation learning.
Prototype-Based Continual Learning with Label-free Replay Buffer and Cluster Preservation Loss - Score: 15 (R=8, N=7) - Date: 2025-04-11 - Comment: The paper introduces a novel prototype-based continual learning method with a label-free replay buffer and cluster preservation loss, which aligns with representation learning and sparsity-related methods.
PEEL the Layers and Find Yourself: Revisiting Inference-time Data Leakage for Residual Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-04-10 - Comment: The PEEL method for inference-time data leakage aligns with foundational research in understanding training dynamics and feature inversion in neural networks.
Curved representational Bregman divergences and their applications - Score: 15 (R=8, N=7) - Date: 2025-04-09 - Comment: The paper introduces curved Bregman divergences and explores their theoretical properties, which align with foundational research in representation learning, particularly in understanding divergence measures and embeddings.
Measuring D\'ej`a vu Memorization Efficiently - Score: 15 (R=8, N=7) - Date: 2025-04-09 - Comment: The paper explores memorization in representation learning models, which aligns with the topic of representation learning and provides insights into training dynamics. The method to measure memorization in pre-trained models is novel.
Intermediate Layer Classifiers for OOD generalization - Score: 15 (R=8, N=7) - Date: 2025-04-09 - Comment: The paper introduces Intermediate Layer Classifiers (ILCs) and explores the utility of intermediate layers for OOD generalization. This aligns with representation learning, particularly in understanding how information is distributed across network layers.
Learning symmetries in datasets - Score: 15 (R=8, N=7) - Date: 2025-04-08 - Comment: The paper investigates how symmetries in datasets affect the latent space of VAEs, which aligns with the representation learning criterion, particularly in understanding how models encode information.
Better Rates for Random Task Orderings in Continual Linear Models - Score: 15 (R=8, N=7) - Date: 2025-04-08 - Comment: The paper provides theoretical insights into training dynamics and forgetting in continual learning for linear models, which aligns with representation learning and foundational research.
Cramer-Rao Bounds for Laplacian Matrix Estimation - Score: 15 (R=8, N=7) - Date: 2025-04-08 - Comment: The paper provides theoretical analysis on Laplacian matrix estimation, introducing new bounds and their applications. This aligns with foundational research in representation learning and sparsity.
Noiser: Bounded Input Perturbations for Attributing Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-04-07 - Comment: The paper introduces a perturbation-based feature attribution method for LLMs, which aligns with foundational research on LLM interpretability and behavior.
Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-04-07 - Comment: The paper explores the calibration of self-improving LLMs, which touches on foundational aspects of LLM behavior and interpretability. This aligns with the 'Large Language Models' criterion, particularly in understanding iterative self-improvement.
Learning Geometrically-Informed Lyapunov Functions with Deep Diffeomorphic RBF Networks - Score: 15 (R=8, N=7) - Date: 2025-04-04 - Comment: The paper introduces a novel diffeomorphic function learning framework with RBF networks, which aligns with representation learning through indirect function approximation and structural encoding.
Towards Interpretable Soft Prompts - Score: 15 (R=8, N=7) - Date: 2025-04-04 - Comment: The paper introduces a theoretical framework for interpretable soft prompts, aligning with 'Representation Learning' and 'Large Language Models' as it addresses interpretability and optimization in trainable prompts.
Fourier Feature Attribution: A New Efficiency Attribution Method - Score: 15 (R=8, N=7) - Date: 2025-04-04 - Comment: The paper proposes a Fourier feature attribution method, which provides insights into representation learning by analyzing neural networks through signal decomposition theory.
Hessian-aware Training for Enhancing DNNs Resilience to Parameter Corruptions - Score: 15 (R=8, N=7) - Date: 2025-04-03 - Comment: The paper proposes a Hessian-aware training method to enhance resilience to parameter corruptions, which aligns with foundational research in model robustness and training dynamics. The focus on loss surface properties and resilience is relevant to representation learning and model efficiency.
Self-Evolving Visual Concept Library using Vision-Language Critics - Score: 15 (R=8, N=7) - Date: 2025-04-02 - Comment: The paper introduces a novel approach to iteratively refine a visual concept library using vision-language models, which aligns with representation learning and foundational insights into training dynamics.
From Colors to Classes: Emergence of Concepts in Vision Transformers - Score: 15 (R=8, N=7) - Date: 2025-04-01 - Comment: The paper provides a layer-wise analysis of Vision Transformers (ViTs), offering insights into their representation learning dynamics, which aligns with the 'Representation Learning' criterion.
An extrapolated and provably convergent algorithm for nonlinear matrix decomposition with the ReLU function - Score: 15 (R=8, N=7) - Date: 2025-04-01 - Comment: The paper introduces a novel algorithm for nonlinear matrix decomposition with ReLU, which aligns with 'Representation Learning' through its focus on matrix factorization and optimization.
Partial Transportability for Domain Generalization - Score: 15 (R=8, N=7) - Date: 2025-04-01 - Comment: The paper introduces a novel approach to domain generalization using causal diagrams, which is relevant to foundational research in generalization and transportability.
Order Independence With Finetuning - Score: 15 (R=8, N=7) - Date: 2025-04-01 - Comment: The paper introduces a fine-tuning strategy to address order dependence in LLMs, which aligns with foundational research on improving LLM robustness and interpretability.
On Geometrical Properties of Text Token Embeddings for Strong Semantic Binding in Text-to-Image Generation - Score: 15 (R=8, N=7) - Date: 2025-04-01 - Comment: The paper investigates geometrical properties of text token embeddings for text-to-image generation, which aligns with representation learning and architectural analysis.
Learning Library Cell Representations in Vector Space - Score: 15 (R=8, N=7) - Date: 2025-04-01 - Comment: The paper introduces a self-supervised framework for learning vector representations of library cells, which aligns with Representation Learning. The use of attention-based architecture adds relevance to Model Architecture.

Other Foundational Research (20)

TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining - Score: 20.0 (R=0, N=0) - Date: 2025-04-04 - Comment: Author match
Transport f divergences - Score: 18 (R=9, N=9) - Date: 2025-04-23 - Comment: The paper introduces transport f-divergences, a novel theoretical framework for measuring differences between probability densities, aligning with emerging trends and foundational research.
A Quantum of Learning: Using Quaternion Algebra to Model Learning on Quantum Devices - Score: 18 (R=9, N=9) - Date: 2025-04-21 - Comment: The paper introduces quaternion algebra for modeling learning on quantum devices, which represents a novel and emerging trend in foundational research.
FEAT: Free energy Estimators with Adaptive Transport - Score: 18 (R=9, N=9) - Date: 2025-04-17 - Comment: This paper introduces a novel framework for free energy estimation using adaptive transport, which is foundational research in AI for science. It unifies equilibrium and non-equilibrium methods under a theoretical framework, making it highly relevant and novel.
Weight Ensembling Improves Reasoning in Language Models - Score: 17 (R=9, N=8) - Date: 2025-04-15 - Comment: The paper proposes WiSE-FT, a weight ensembling method for improving reasoning in LLMs, contributing foundational insights into training dynamics and test-time scaling.
Towards Weaker Variance Assumptions for Stochastic Optimization - Score: 17 (R=9, N=8) - Date: 2025-04-15 - Comment: The paper revisits variance assumptions in stochastic optimization, which aligns with the 'Emerging Trends' criterion by challenging established assumptions and providing theoretical insights.
Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning - Score: 17 (R=9, N=8) - Date: 2025-04-07 - Comment: This paper investigates the scaling laws of reasoning in LLMs, providing theoretical insights into their behavior. It aligns closely with the 'Large Language Models' criterion, particularly in understanding pretraining and reasoning capabilities.
A Polynomial-Time Algorithm for Variational Inequalities under the Minty Condition - Score: 17 (R=8, N=9) - Date: 2025-04-07 - Comment: This paper presents a polynomial-time algorithm for variational inequalities under the Minty condition, contributing to foundational optimization theory. It introduces a novel algorithmic approach with broad implications.
Rethinking Reflection in Pre-Training - Score: 16 (R=9, N=7) - Date: 2025-04-08 - Comment: The paper investigates the emergence of self-correction during pretraining in LLMs, which aligns with the criterion of theoretical insights into LLM behavior.
Sharp higher order convergence rates for the Adam optimizer - Score: 16 (R=8, N=8) - Date: 2025-04-29 - Comment: The paper provides theoretical insights into the convergence rates of the Adam optimizer, which aligns with foundational research in optimization methods. The higher-order convergence analysis is a significant contribution.
Common Functional Decompositions Can Mis-attribute Differences in Outcomes Between Populations - Score: 16 (R=8, N=8) - Date: 2025-04-24 - Comment: This paper provides a theoretical critique of functional decompositions in explaining population outcome differences, which aligns with the Emerging Trends criterion as it challenges established assumptions in decomposition methods.
Stochastic Gradient Descent in Non-Convex Problems: Asymptotic Convergence with Relaxed Step-Size via Stopping Time Methods - Score: 16 (R=8, N=8) - Date: 2025-04-18 - Comment: Provides a novel theoretical framework for SGD convergence under relaxed step-size conditions, contributing to foundational optimization research.
Training Small Reasoning LLMs with Cognitive Preference Alignment - Score: 16 (R=8, N=8) - Date: 2025-04-15 - Comment: The CRV framework for training smaller reasoning LLMs introduces cognitive preference alignment, which is relevant to foundational research in LLM training dynamics.
A Piecewise Lyapunov Analysis of sub--quadratic SGD: Applications to Robust and Quantile Regression - Score: 16 (R=8, N=8) - Date: 2025-04-14 - Comment: The paper provides theoretical analysis of SGD with sub-quadratic tails, which contributes to foundational understanding of optimization dynamics in machine learning.
Leanabell-Prover: Posttraining Scaling in Formal Reasoning - Score: 16 (R=8, N=8) - Date: 2025-04-09 - Comment: The paper focuses on posttraining scaling for automated theorem proving, which aligns with foundational research in LLMs and explores reinforcement learning for reasoning tasks. The approach to improve formal provers is novel.
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes - Score: 16 (R=8, N=8) - Date: 2025-04-08 - Comment: The paper provides theoretical insights into gradient descent with adaptive stepsizes, which aligns with training dynamics and foundational optimization methods.
GVPO: Group Variance Policy Optimization for Large Language Model Post-Training - Score: 15 (R=8, N=7) - Date: 2025-04-29 - Comment: The paper introduces a novel post-training method (GVPO) for LLMs with theoretical guarantees and practical adaptability. This aligns with foundational research in LLM behavior and optimization.
Subfunction Structure Matters: A New Perspective on Local Optima Networks - Score: 15 (R=8, N=7) - Date: 2025-04-28 - Comment: The paper explores a novel perspective on local optima networks by incorporating subfunction-based information, which aligns with the 'Emerging Trends' criterion as it challenges established assumptions in landscape analysis.
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning - Score: 15 (R=8, N=7) - Date: 2025-04-08 - Comment: The paper proposes TRPA, a preference-based optimization algorithm for reasoning tasks in LLMs, which aligns with foundational research in LLM reasoning and optimization methods.
Analysis of an Idealized Stochastic Polyak Method and its Application to Black-Box Model Distillation - Score: 15 (R=8, N=7) - Date: 2025-04-03 - Comment: The paper provides theoretical insights into an idealized stochastic Polyak step size and its convergence properties, which aligns with foundational research in optimization methods relevant to training dynamics.