← Previous Summary | Monthly Overview | Next Summary →
2025-04 | 2025-05 | 2025-06

Personalized Monthly Topic Summary 2025/05

Metric	Value
Total Papers	727
Model Architecture	215
Model Compression and Efficiency	214
High Performance Computing	33
Representation Learning	234
Other Foundational Research	31

Model Architecture (215)

Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures - Score: 18 (R=10, N=8) - Date: 2025-05-20 - Comment: The paper introduces a novel method for model selection in Mixture of Experts, which is highly relevant to model architecture.
On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating - Score: 18 (R=10, N=8) - Date: 2025-05-19 - Comment: The paper provides a theoretical study of MoE architectures, specifically focusing on shared experts and normalized sigmoid gating, which is highly relevant to model architecture.
UMoE: Unifying Attention and FFN with Shared Experts - Score: 18 (R=10, N=8) - Date: 2025-05-13 - Comment: The paper introduces UMoE, a novel architecture unifying MoE designs in attention and FFN layers, which directly aligns with the 'Model Architecture' criterion, particularly focusing on Mixture-of-Experts (MoE).
The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts - Score: 18 (R=10, N=8) - Date: 2025-05-13 - Comment: The paper focuses on Mixture-of-Experts (MoE) and provides theoretical insights into the impact of granularity on network expressivity, which aligns closely with the 'Model Architecture' criterion.
FloE: On-the-Fly MoE Inference - Score: 18 (R=10, N=8) - Date: 2025-05-12 - Comment: The paper proposes FloE, an on-the-fly MoE inference system, which directly aligns with the Mixture-of-Experts (MoE) criterion under model architecture and compression. The compression techniques and inference acceleration are novel and impactful.
MoxE: Mixture of xLSTM Experts with Entropy-Aware Routing for Efficient Language Modeling - Score: 18 (R=10, N=8) - Date: 2025-05-06 - Comment: The paper introduces MoxE, a novel MoE-based architecture with entropy-aware routing, which aligns with foundational research in model architecture and efficiency.
CoCoAFusE: Beyond Mixtures of Experts via Model Fusion - Score: 18 (R=10, N=8) - Date: 2025-05-05 - Comment: The paper introduces CoCoAFusE, which extends Mixture of Experts (MoE) with a novel fusion mechanism. This aligns closely with the model architecture criterion, particularly innovations in MoE frameworks.
Improving Routing in Sparse Mixture of Experts with Graph of Tokens - Score: 18 (R=10, N=8) - Date: 2025-05-05 - Comment: This paper addresses routing stability in Sparse Mixture of Experts (SMoE) through novel probabilistic graphical modeling and attention-aware mechanisms, directly contributing to foundational research in model architecture and sparsity.
TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts - Score: 18 (R=10, N=8) - Date: 2025-05-01 - Comment: The paper introduces TT-LoRA MoE, which integrates sparse Mixture-of-Experts (MoE) with low-rank adaptation, aligning closely with the 'Model Architecture' and 'Model Compression' criteria. It provides a novel approach to scalability and efficiency in multi-task settings.
Copresheaf Topological Neural Networks: A Generalized Deep Learning Framework - Score: 18 (R=9, N=9) - Date: 2025-05-28 - Comment: The paper introduces copresheaf topological neural networks, a generalized deep learning framework, which aligns with emerging trends and architectural innovation.
Accelerating Machine Learning Systems via Category Theory: Applications to Spherical Attention for Gene Regulatory Networks - Score: 18 (R=9, N=9) - Date: 2025-05-15 - Comment: The paper introduces a novel attention mechanism (spherical attention) derived using category theory and neural circuit diagrams, which aligns with 'Emerging Trends' and 'Model Architecture' due to its theoretical innovation and architectural insights.
Continuous Thought Machines - Score: 18 (R=9, N=9) - Date: 2025-05-12 - Comment: The paper introduces the Continuous Thought Machine, which challenges established paradigms by incorporating neuron-level temporal dynamics, aligning with emerging trends and architectural innovations.
Binding threshold units with artificial oscillatory neurons - Score: 18 (R=9, N=9) - Date: 2025-05-07 - Comment: The paper introduces a theoretical framework combining oscillatory and threshold units, which aligns with foundational research on neural coding and architecture-level innovations.
Bayesian Attention Mechanism: A Probabilistic Framework for Positional Encoding and Context Length Extrapolation - Score: 17 (R=9, N=8) - Date: 2025-05-31 - Comment: The paper proposes a Bayesian Attention Mechanism for positional encoding, which aligns with model architecture innovations in transformers.
ATLAS: Learning to Optimally Memorize the Context at Test Time - Score: 17 (R=9, N=8) - Date: 2025-05-30 - Comment: The paper introduces ATLAS, a new memory module for Transformers, which aligns with model architecture innovations.
Learning Compositional Functions with Transformers from Easy-to-Hard Data - Score: 17 (R=9, N=8) - Date: 2025-05-30 - Comment: The paper studies the learnability of compositional functions with transformers, providing theoretical insights into model architecture.
Two Is Better Than One: Rotations Scale LoRAs - Score: 17 (R=9, N=8) - Date: 2025-05-30 - Comment: The paper introduces a novel gating method for scaling LoRAs in MoE, which is relevant to model architecture and efficiency.
Scaling Reasoning without Attention - Score: 17 (R=9, N=8) - Date: 2025-05-29 - Comment: The paper introduces an attention-free language model, addressing architectural inefficiencies in LLMs, which aligns with interests in model architecture innovations.
Curse of High Dimensionality Issue in Transformer for Long-context Modeling - Score: 17 (R=9, N=8) - Date: 2025-05-29 - Comment: The paper addresses the computational inefficiencies in transformers for long-context modeling, proposing a dynamic group attention mechanism, relevant to model architecture and efficiency.
RePaViT: Scalable Vision Transformer Acceleration via Structural Reparameterization on Feedforward Network Layers - Score: 17 (R=9, N=8) - Date: 2025-05-29 - Comment: The paper proposes a structural reparameterization method for Vision Transformers, which is relevant to model compression and architecture.
Is Attention Required for Transformer Inference? Explore Function-preserving Attention Replacement - Score: 17 (R=9, N=8) - Date: 2025-05-29 - Comment: The paper explores replacing attention mechanisms in transformers with more efficient modules, relevant to model architecture and efficiency.
EvidenceMoE: A Physics-Guided Mixture-of-Experts with Evidential Critics for Advancing Fluorescence Light Detection and Ranging in Scattering Media - Score: 17 (R=9, N=8) - Date: 2025-05-29 - Comment: The paper presents EvidenceMoE, a Physics-Guided Mixture-of-Experts framework, which is relevant to model architecture innovations and MoE.
Why Do More Experts Fail? A Theoretical Analysis of Model Merging - Score: 17 (R=9, N=8) - Date: 2025-05-28 - Comment: The paper provides a theoretical analysis of model merging, focusing on the scalability and parameter space constraints, which is relevant to model architecture and efficiency.
Pause Tokens Strictly Increase the Expressivity of Constant-Depth Transformers - Score: 17 (R=9, N=8) - Date: 2025-05-28 - Comment: The paper provides theoretical insights into the expressivity of Transformers with pause tokens, which aligns with the model architecture criterion.
Multi-objective Large Language Model Alignment with Hierarchical Experts - Score: 17 (R=9, N=8) - Date: 2025-05-28 - Comment: The paper introduces Hierarchical Mixture-of-Experts for LLM alignment, which involves architectural innovation and aligns with the model architecture criterion.
How Do Transformers Learn Variable Binding in Symbolic Programs? - Score: 17 (R=9, N=8) - Date: 2025-05-28 - Comment: The paper investigates how Transformers learn variable binding, providing insights into how deep networks encode information, which aligns with representation learning.
Leaner Transformers: More Heads, Less Depth - Score: 17 (R=9, N=8) - Date: 2025-05-28 - Comment: The paper challenges the belief that bigger transformers are better by proposing a redesign with more heads and less depth, aligning with model architecture innovation.
Pretraining Language Models to Ponder in Continuous Space - Score: 17 (R=9, N=8) - Date: 2025-05-28 - Comment: The paper introduces a novel pondering mechanism for language models, which could be considered a new paradigm in LLM pretraining and architecture.
Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence Transformers - Score: 17 (R=9, N=8) - Date: 2025-05-28 - Comment: The paper introduces a novel attention mechanism using PDEs for long-sequence transformers, which is relevant to model architecture.
Towards Fully FP8 GEMM LLM Training at Scale - Score: 17 (R=9, N=8) - Date: 2025-05-28 - Comment: The paper introduces a new class of LLM architectures supporting FP8 computation, relevant to model architecture and efficiency in LLMs.
Error Optimization: Overcoming Exponential Signal Decay in Deep Predictive Coding Networks - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper introduces Error Optimization to address signal decay in deep predictive coding networks, providing theoretical insights into training dynamics.
Unfolding AlphaFold's Bayesian Roots in Probability Kinematics - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper provides a novel theoretical interpretation of AlphaFold1 using probability kinematics, which is a foundational research in AI for Science.
Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper presents a data-free knowledge distillation framework using Mixture-of-Experts, which is relevant to model architecture and compression.
Understanding Transformer from the Perspective of Associative Memory - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper provides insights into Transformer architectures through the lens of associative memory, which aligns with model architecture analysis and offers theoretical insights.
To CoT or To Loop? A Formal Comparison Between Chain-of-Thought and Looped Transformers - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper provides a formal comparison between Chain-of-Thought and Looped Transformers, contributing to the understanding of model architecture.
I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper introduces I2MoE, a framework for interpretable multimodal interaction-aware mixture-of-experts, relevant to model architecture.
Exact Expressive Power of Transformers with Padding - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper analyzes the expressive power of transformers with padding, contributing to the understanding of model architecture.
MonarchAttention: Zero-Shot Conversion to Fast, Hardware-Aware Structured Attention - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: MonarchAttention presents a novel approach to sub-quadratic attention approximation in transformers, relevant to model architecture and efficiency.
On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper provides theoretical insights into the softmax-contaminated mixture of experts model, which is relevant to model architecture and representation learning.
$\mu$-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper introduces a test-time pruning method as a micro-grained mixture-of-experts, relevant to model compression and architecture.
The emergence of sparse attention: impact of data distribution and benefits of repetition - Score: 17 (R=9, N=8) - Date: 2025-05-26 - Comment: The paper studies the emergence of sparse attention in transformers, providing theoretical insights into training dynamics, which aligns with representation learning and model architecture interests.
Structured Linear CDEs: Maximally Expressive and Parallel-in-Time Sequence Models - Score: 17 (R=9, N=8) - Date: 2025-05-26 - Comment: The paper introduces Structured Linear CDEs, a novel sequence model framework, aligning with model architecture innovations.
PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval - Score: 17 (R=9, N=8) - Date: 2025-05-26 - Comment: PreMoe introduces a framework for efficient deployment of MoE models using expert pruning and retrieval, relevant to model architecture and compression.
JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model - Score: 17 (R=9, N=8) - Date: 2025-05-26 - Comment: The paper introduces JanusDNA, a hybrid DNA foundation model using MoE architecture, relevant to model architecture and foundational research in AI for science.
Scale-invariant Attention - Score: 17 (R=9, N=8) - Date: 2025-05-26 - Comment: The paper proposes a scale-invariant attention mechanism, which is relevant to model architecture innovations, particularly in attention mechanisms.
Bottlenecked Transformers: Periodic KV Cache Abstraction for Generalised Reasoning - Score: 17 (R=9, N=8) - Date: 2025-05-23 - Comment: The paper proposes a modification to the Transformer architecture for improved generalization, aligning with model architecture innovations and KV cache manipulation.
The Computational Complexity of Counting Linear Regions in ReLU Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-05-23 - Comment: The paper analyzes the computational complexity of counting linear regions in ReLU networks, relevant to model architecture and theoretical insights.
Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence - Score: 17 (R=9, N=8) - Date: 2025-05-23 - Comment: The paper analyzes the emergence of multi-phase circuits in transformers, which is relevant to understanding transformer architecture and training dynamics.
Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models - Score: 17 (R=9, N=8) - Date: 2025-05-22 - Comment: The paper provides insights into the local routing consistency of Mixture-of-Experts models, which is relevant to model architecture and efficiency.
MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding - Score: 17 (R=9, N=8) - Date: 2025-05-22 - Comment: The paper introduces MoRE-Brain, a routed mixture of experts architecture for fMRI visual decoding, which is relevant to model architecture as it employs a hierarchical MoE framework for interpretable and generalizable decoding.
Neural Collapse is Globally Optimal in Deep Regularized ResNets and Transformers - Score: 17 (R=9, N=8) - Date: 2025-05-22 - Comment: The paper provides theoretical insights into neural collapse in deep networks, specifically in ResNets and Transformers, which aligns with representation learning and model architecture analysis.
Large Language Models Implicitly Learn to See and Hear Just By Reading - Score: 17 (R=9, N=8) - Date: 2025-05-21 - Comment: The paper presents a novel finding that LLMs can develop abilities to understand images and audio, which is relevant to emerging trends in LLM research.
Subquadratic Algorithms and Hardness for Attention with Any Temperature - Score: 17 (R=9, N=8) - Date: 2025-05-21 - Comment: The paper addresses the efficiency of the Attention mechanism in Transformers, which is directly relevant to model architecture and efficiency.
Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training - Score: 17 (R=9, N=8) - Date: 2025-05-21 - Comment: The paper introduces a novel inference-time methodology for MoE architectures, directly aligning with the model architecture criterion.
Neural Incompatibility: The Unbridgeable Gap of Cross-Scale Parametric Knowledge Transfer in Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-05-21 - Comment: The paper explores Parametric Knowledge Transfer (PKT) in Large Language Models, focusing on the challenges of neural incompatibility across scales. This aligns with the interest in foundational research on LLMs, particularly in understanding theoretical insights into LLM behavior.
Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging - Score: 17 (R=9, N=8) - Date: 2025-05-21 - Comment: The paper proposes a novel approach to scale MoE models using Test-Time Model Merging, which is relevant to model architecture and efficiency.
Adversarially Pretrained Transformers may be Universally Robust In-Context Learners - Score: 17 (R=9, N=8) - Date: 2025-05-21 - Comment: The paper discusses adversarially pretrained transformers as robust foundation models, which is relevant to model architecture and LLMs.
Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of LMO-based Optimizers for LLMs) - Score: 17 (R=9, N=8) - Date: 2025-05-20 - Comment: The paper introduces a new LMO-based optimization method for LLMs, addressing theoretical gaps and improving practical performance, which is relevant to LLM architecture and optimization.
Unpacking Positional Encoding in Transformers: A Spectral Analysis of Content-Position Coupling - Score: 17 (R=9, N=8) - Date: 2025-05-20 - Comment: The paper provides a spectral analysis of positional encoding in Transformers, offering insights into model architecture and positional encoding mechanisms.
A3 : an Analytical Low-Rank Approximation Framework for Attention - Score: 17 (R=9, N=8) - Date: 2025-05-20 - Comment: The paper presents a low-rank approximation framework for attention in Transformers, relevant to model compression and architecture.
TDFormer: A Top-Down Attention-Controlled Spiking Transformer - Score: 17 (R=9, N=8) - Date: 2025-05-19 - Comment: The paper introduces TDFormer, a novel spiking transformer model, aligning with the core topic of model architecture.
Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency - Score: 17 (R=9, N=8) - Date: 2025-05-19 - Comment: The paper applies optimal control theory to Transformers, offering theoretical insights into architecture design and training, which aligns with model architecture innovations.
Approximation theory for 1-Lipschitz ResNets - Score: 17 (R=9, N=8) - Date: 2025-05-19 - Comment: The paper provides universal approximation guarantees for 1-Lipschitz ResNets, contributing to model architecture analysis and theoretical insights.
MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging - Score: 17 (R=9, N=8) - Date: 2025-05-19 - Comment: The paper introduces a novel MoE framework for continual model merging, which is relevant to model architecture and efficiency.
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production - Score: 17 (R=9, N=8) - Date: 2025-05-19 - Comment: The paper presents MegaScale-MoE, a system for efficient training of MoE models, which aligns with model architecture and efficiency innovations.
Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction - Score: 17 (R=9, N=8) - Date: 2025-05-19 - Comment: The paper proposes a method for sparse attention inference in transformers, which is relevant to model architecture and efficiency.
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset - Score: 17 (R=9, N=8) - Date: 2025-05-15 - Comment: The paper introduces a unified multimodal model architecture and training strategy, which aligns with the 'Model Architecture' criterion. The focus on foundational design choices and training recipes for multimodal models adds significant relevance.
PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts - Score: 17 (R=9, N=8) - Date: 2025-05-14 - Comment: The paper proposes a privacy-aware MoE framework with dynamic token routing and bandwidth-adaptive mechanisms. This aligns closely with the Mixture-of-Experts (MoE) and model architecture criteria.
Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain - Score: 17 (R=9, N=8) - Date: 2025-05-14 - Comment: The paper proposes a novel self-attention mechanism interpreted as a graph filter in the singular value domain, which aligns with architectural innovations in Transformers.
Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures - Score: 17 (R=9, N=8) - Date: 2025-05-13 - Comment: The paper provides theoretical insights into representation learning and scaling laws in hierarchical languages, aligning with the 'Representation Learning' criterion by analyzing how architectures encode information.
FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers - Score: 17 (R=9, N=8) - Date: 2025-05-13 - Comment: The paper proposes FreqMoE, a novel MoE-based framework for solving PDEs, which aligns with the 'Model Architecture' criterion by innovating in the MoE space.
QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration - Score: 17 (R=9, N=8) - Date: 2025-05-13 - Comment: The paper focuses on memory efficiency and runtime reconfiguration for serving Mixture-of-Experts LLMs, which directly aligns with the MoE and model compression criteria.
Beyond Attention: Toward Machines with Intrinsic Higher Mental States - Score: 17 (R=9, N=8) - Date: 2025-05-13 - Comment: The paper explores a novel approach to attention mechanisms inspired by neurobiology, which aligns with architectural innovations and emerging trends.
On the Depth of Monotone ReLU Neural Networks and ICNNs - Score: 17 (R=9, N=8) - Date: 2025-05-12 - Comment: The paper provides theoretical insights into the depth complexity of monotone ReLU networks and ICNNs, which is highly relevant to understanding neural network architectures. The depth separations and expressivity analysis are significant contributions.
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design - Score: 17 (R=9, N=8) - Date: 2025-05-12 - Comment: The paper proposes MxMoE, a mixed-precision quantization framework for MoE models, addressing both algorithmic and system-level challenges. This aligns closely with the model compression and MoE criteria, offering novel insights into quantization sensitivity and expert activation dynamics.
Faster MoE LLM Inference for Extremely Large Models - Score: 17 (R=9, N=8) - Date: 2025-05-07 - Comment: The paper discusses efficiency optimization for sparse Mixture of Experts (MoE) models, which aligns closely with the model architecture and efficiency criteria.
Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights - Score: 17 (R=9, N=8) - Date: 2025-05-07 - Comment: The paper provides theoretical insights into how transformers leverage low-dimensional structures in noisy data, aligning with the 'Model Architecture' criterion for foundational analysis of transformers.
RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale - Score: 17 (R=9, N=8) - Date: 2025-05-07 - Comment: The paper proposes a method for distilling softmax attention transformers into linear attention decoders, aligning with model compression and efficiency breakthroughs.
Intra-Layer Recurrence in Transformers for Language Modeling - Score: 17 (R=9, N=8) - Date: 2025-05-07 - Comment: The paper introduces intra-layer recurrence in transformers, which aligns with architectural innovations and efficiency improvements in transformer models.
Towards Quantifying the Hessian Structure of Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-05-06 - Comment: The paper provides theoretical insights into the Hessian structure of neural networks, which aligns with foundational research in understanding training dynamics and architecture behavior.
Always Skip Attention - Score: 17 (R=9, N=8) - Date: 2025-05-06 - Comment: The paper provides theoretical insights into the critical role of skip connections in Vision Transformers, which is highly relevant to model architecture analysis.
Compact Recurrent Transformer with Persistent Memory - Score: 17 (R=9, N=8) - Date: 2025-05-05 - Comment: The paper introduces Compact Recurrent Transformer (CRT), which innovates on Transformer architecture by combining it with RNNs for efficient long-sequence processing. This aligns with the model architecture criterion, particularly in addressing efficiency and scalability challenges.
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias - Score: 17 (R=9, N=8) - Date: 2025-05-05 - Comment: This paper provides theoretical insights into how transformers learn regular language recognition tasks, analyzing training dynamics and implicit bias. It aligns with the core topic of representation learning and offers foundational insights into transformer behavior.
Dual Filter: A Mathematical Framework for Inference using Transformer-like Architectures - Score: 17 (R=9, N=8) - Date: 2025-05-05 - Comment: The paper provides a mathematical framework inspired by transformer architectures, focusing on deriving transformer-like architectures from first principles. This aligns with the 'Model Architecture' criterion, particularly in understanding and analyzing existing architectures.
Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization - Score: 17 (R=9, N=8) - Date: 2025-05-01 - Comment: The paper introduces a token-level prompt mixture framework with parameter-free routing, which aligns with foundational research in model architecture (Mixture of Experts) and efficiency. The token-level routing is a novel contribution.
PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight - Score: 17 (R=9, N=8) - Date: 2025-05-01 - Comment: The paper introduces a secure transformer architecture using a Mixture-of-Experts framework, which is highly relevant to foundational research in model architecture and MoE.
FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models - Score: 16 (R=9, N=7) - Date: 2025-05-27 - Comment: The paper introduces FLAME-MoE, a research platform for MoE models, which is relevant to model architecture.
MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE - Score: 16 (R=9, N=7) - Date: 2025-05-27 - Comment: The paper explores speculative decoding for accelerating sparse MoE models, which is relevant to model architecture and compression.
Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate - Score: 16 (R=9, N=7) - Date: 2025-05-27 - Comment: The paper proposes a novel gating mechanism for Sparse Mixture-of-Experts (SMoE) architectures, which aligns with the core topic of model architecture innovations.
CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning - Score: 16 (R=9, N=7) - Date: 2025-05-26 - Comment: CoMoE focuses on enhancing Mixture-of-Experts (MoE) through contrastive representation, which is relevant to model architecture and representation learning.
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning - Score: 16 (R=9, N=7) - Date: 2025-05-23 - Comment: The paper explores the role of long-context ability in reasoning, which is relevant to Large Language Models and their theoretical insights.
The Atlas of In-Context Learning: How Attention Heads Shape In-Context Retrieval Augmentation - Score: 16 (R=9, N=7) - Date: 2025-05-22 - Comment: The paper explores in-context learning in LLMs, focusing on attention heads and retrieval augmentation, which aligns with foundational research in LLM behavior and interpretability.
Causal Head Gating: A Framework for Interpreting Roles of Attention Heads in Transformers - Score: 16 (R=9, N=7) - Date: 2025-05-20 - Comment: The paper introduces causal head gating for interpreting attention heads in transformers, which is relevant to model architecture and interpretability.
Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference - Score: 16 (R=9, N=7) - Date: 2025-05-20 - Comment: The paper addresses communication optimization in Mixture-of-Experts (MoE) architectures, relevant to model architecture.
Model Merging in Pre-training of Large Language Models - Score: 16 (R=9, N=7) - Date: 2025-05-19 - Comment: The paper investigates model merging techniques in the pre-training of large language models, with a focus on Mixture-of-Experts (MoE) architectures, which aligns with the model architecture criterion.
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems - Score: 16 (R=9, N=7) - Date: 2025-05-19 - Comment: The paper introduces MoE-CAP, a benchmark for sparse MoE systems, which aligns with model architecture and efficiency innovations.
Computational Algebra with Attention: Transformer Oracles for Border Basis Algorithms - Score: 16 (R=8, N=8) - Date: 2025-05-31 - Comment: The paper introduces a Transformer-based oracle for border basis algorithms, relevant to model architecture innovations and efficiency improvements in computational algebra.
Neural Interpretable PDEs: Harmonizing Fourier Insights with Attention for Scalable and Interpretable Physics Discovery - Score: 16 (R=8, N=8) - Date: 2025-05-31 - Comment: The paper introduces a novel neural operator architecture combining attention mechanisms with Fourier insights, relevant to model architecture innovations.
Equivariant Spherical Transformer for Efficient Molecular Modeling - Score: 16 (R=8, N=8) - Date: 2025-05-30 - Comment: The paper introduces the Equivariant Spherical Transformer for molecular modeling, which is relevant to foundational research in AI for science and architectural innovations.
On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling - Score: 16 (R=8, N=8) - Date: 2025-05-29 - Comment: The paper analyzes the effectiveness of large learning rates under standard width scaling, which is relevant to training dynamics in neural networks.
Lorentz Local Canonicalization: How to Make Any Network Lorentz-Equivariant - Score: 16 (R=8, N=8) - Date: 2025-05-27 - Comment: The paper introduces a framework for making any network Lorentz-equivariant, which is relevant to model architecture innovations.
Chordless Structure: A Pathway to Simple and Expressive GNNs - Score: 16 (R=8, N=8) - Date: 2025-05-27 - Comment: The paper introduces a new GNN architecture based on chordless structures, which is relevant to model architecture innovations.
Message-Passing State-Space Models: Improving Graph Learning with Modern Sequence Modeling - Score: 16 (R=8, N=8) - Date: 2025-05-27 - Comment: The paper introduces a new method for graph learning by embedding state-space model principles into message-passing networks, relevant to model architecture innovations.
Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to Multimodality - Score: 16 (R=8, N=8) - Date: 2025-05-27 - Comment: The paper discusses token reduction in generative models, positioning it as a fundamental principle in generative modeling, which is relevant to model architecture and emerging trends.
Directed Semi-Simplicial Learning with Applications to Brain Activity Decoding - Score: 16 (R=8, N=8) - Date: 2025-05-26 - Comment: The paper introduces Semi-Simplicial Neural Networks for brain activity decoding, which is relevant to emerging trends in model architecture.
Scaling Recurrent Neural Networks to a Billion Parameters with Zero-Order Optimization - Score: 16 (R=8, N=8) - Date: 2025-05-26 - Comment: The paper explores zero-order optimization for training large RNNs, which is relevant to model architecture and efficiency improvements.
Continuum Transformers Perform In-Context Learning by Operator Gradient Descent - Score: 16 (R=8, N=8) - Date: 2025-05-26 - Comment: The paper provides a theoretical characterization of in-context learning in continuum transformers, which aligns with interests in large language models and theoretical insights.
HiLAB: A Hybrid Inverse-Design Framework - Score: 16 (R=8, N=8) - Date: 2025-05-26 - Comment: The paper presents HiLAB, a new paradigm for inverse design in nanophotonics, which aligns with AI for Science through foundational research in molecular modeling.
TI-DeepONet: Learnable Time Integration for Stable Long-Term Extrapolation - Score: 16 (R=8, N=8) - Date: 2025-05-23 - Comment: TI-DeepONet introduces a novel framework for neural operators with adaptive time-stepping, relevant to model architecture and emerging trends.
Artificial Intelligence for Direct Prediction of Molecular Dynamics Across Chemical Space - Score: 16 (R=8, N=8) - Date: 2025-05-23 - Comment: The paper introduces a foundational AI model for molecular dynamics, which is relevant to AI for Science with a focus on architecture-level innovations.
Scalable Graph Generative Modeling via Substructure Sequences - Score: 16 (R=8, N=8) - Date: 2025-05-23 - Comment: The paper presents a generative Transformer pre-training framework for graphs, which is relevant to model architecture innovations beyond message-passing in GNNs.
Learning (Approximately) Equivariant Networks via Constrained Optimization - Score: 16 (R=8, N=8) - Date: 2025-05-20 - Comment: The paper introduces a method for learning approximately equivariant networks, which is relevant to model architecture innovations.
KHRONOS: a Kernel-Based Neural Architecture for Rapid, Resource-Efficient Scientific Computation - Score: 16 (R=8, N=8) - Date: 2025-05-20 - Comment: The paper introduces a new kernel-based neural architecture for scientific computation, which is relevant to AI for Science and model architecture.
Neural Functional: Learning Function to Scalar Maps for Neural PDE Surrogates - Score: 16 (R=8, N=8) - Date: 2025-05-20 - Comment: The paper introduces a new architecture for neural PDE surrogates, which is relevant to model architecture innovations.
ChromFound: Towards A Universal Foundation Model for Single-Cell Chromatin Accessibility Data - Score: 16 (R=8, N=8) - Date: 2025-05-20 - Comment: The paper presents ChromFound, a foundation model for single-cell chromatin accessibility data, which is relevant to AI for Science as it offers a new framework for understanding disease risk variants.
SpikeX: Exploring Accelerator Architecture and Network-Hardware Co-Optimization for Sparse Spiking Neural Networks - Score: 16 (R=8, N=8) - Date: 2025-05-19 - Comment: The paper proposes a novel SNN accelerator architecture, relevant to model architecture and efficiency improvements.
Block-Biased Mamba for Long-Range Sequence Processing - Score: 16 (R=8, N=8) - Date: 2025-05-15 - Comment: The paper proposes improvements to state space models (SSMs) for long-range sequence processing, which aligns with architectural innovations and addresses theoretical limitations of prior models.
Insertion Language Models: Sequence Generation with Arbitrary-Position Insertions - Score: 16 (R=8, N=8) - Date: 2025-05-12 - Comment: The paper introduces Insertion Language Models (ILMs), which offer a novel approach to sequence generation by inserting tokens at arbitrary positions. This aligns with foundational research in model architecture and sequence generation paradigms.
Griffin: Towards a Graph-Centric Relational Database Foundation Model - Score: 16 (R=8, N=8) - Date: 2025-05-12 - Comment: The paper introduces Griffin, a foundation model for relational databases, which aligns with the 'Emerging Trends' criterion by proposing a novel architecture for a new domain of foundation models.
Advancing Neural Network Verification through Hierarchical Safety Abstract Interpretation - Score: 16 (R=8, N=8) - Date: 2025-05-09 - Comment: The paper introduces a novel hierarchical safety verification framework for DNNs, which aligns with foundational research in model architecture by providing theoretical insights into safety and robustness verification.
Physics-inspired Energy Transition Neural Network for Sequence Learning - Score: 16 (R=8, N=8) - Date: 2025-05-07 - Comment: The paper proposes a novel recurrent architecture inspired by physics, which aligns with the 'Model Architecture' criterion for foundational innovations in sequence modeling.
Advancing Constrained Monotonic Neural Networks: Achieving Universal Approximation Beyond Bounded Activations - Score: 16 (R=8, N=8) - Date: 2025-05-06 - Comment: The paper advances monotonic neural networks with theoretical contributions, aligning with emerging trends in foundational research.
Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders - Score: 16 (R=8, N=8) - Date: 2025-05-05 - Comment: The paper introduces a novel aggregation method for multimodal VAEs, which aligns with the representation learning criterion, particularly in the context of generative models. The CoDE method offers a new perspective on joint distribution estimation.
Global optimization of graph acquisition functions for neural architecture search - Score: 15 (R=8, N=7) - Date: 2025-05-31 - Comment: The paper presents a global optimization method for graph acquisition functions in NAS, which is relevant to model architecture and efficiency.
Graph Positional Autoencoders as Self-supervised Learners - Score: 15 (R=8, N=7) - Date: 2025-05-31 - Comment: The paper proposes Graph Positional Autoencoders, which is relevant to representation learning and autoencoders.
Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs - Score: 15 (R=8, N=7) - Date: 2025-05-31 - Comment: The paper introduces a framework for auditing unlearning in LLMs, which is relevant to foundational research in LLM behavior.
Improving the Effective Receptive Field of Message-Passing Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-05-31 - Comment: The paper proposes an architecture to improve the effective receptive field of MPNNs, which is relevant to model architecture innovations.
Update Your Transformer to the Latest Release: Re-Basin of Task Vectors - Score: 15 (R=8, N=7) - Date: 2025-05-31 - Comment: The paper proposes a method for transferring fine-tuning to new model releases without retraining, focusing on model architecture and efficiency.
MoRE: A Mixture of Low-Rank Experts for Adaptive Multi-Task Learning - Score: 15 (R=8, N=7) - Date: 2025-05-30 - Comment: The paper proposes a Mixture of Low-Rank Experts (MoRE) for multi-task learning, which is relevant to model architecture and efficiency improvements.
Sherlock: Self-Correcting Reasoning in Vision-Language Models - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper introduces a self-correction framework for reasoning in vision-language models, which is relevant to foundational research in model architecture.
Transformers Pretrained on Procedural Data Contain Modular Structures for Algorithmic Reasoning - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper explores modular structures in transformers pretrained on procedural data, which aligns with the model architecture criterion.
The quest for the GRAph Level autoEncoder (GRALE) - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper introduces a novel graph autoencoder, which is relevant to model architecture and representation learning.
The Resurrection of the ReLU - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper introduces a novel regularizer for ReLU activation functions, which aligns with the model architecture criterion.
Learning in Compact Spaces with Approximately Normalized Transformers - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper introduces a novel normalization method for Transformers, which aligns with the model architecture criterion.
Taming Transformer Without Using Learning Rate Warmup - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper provides a theoretical analysis of training Transformers without learning rate warmup, which aligns with model architecture insights.
Born a Transformer -- Always a Transformer? - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper investigates the limitations of transformers in sequence-to-sequence tasks, relevant to model architecture and emerging trends.
Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper focuses on optimizing thinking dynamics in large reasoning models, which aligns with the interest in theoretical insights into LLM behavior.
Enhancing Vision Transformer Explainability Using Artificial Astrocytes - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper proposes a novel approach to enhance Vision Transformer explainability using artificial astrocytes, which is relevant to model architecture innovations.
Input Convex Kolmogorov Arnold Networks - Score: 15 (R=8, N=7) - Date: 2025-05-28 - Comment: The paper presents a new input convex neural network architecture, which involves architectural innovation, aligning with model architecture criteria.
A Lightweight Multi-Expert Generative Language Model System for Engineering Information and Knowledge Extraction - Score: 15 (R=8, N=7) - Date: 2025-05-28 - Comment: The paper introduces a lightweight multi-expert generative language model system, which aligns with the model architecture criterion by proposing a novel graph-based structure.
One-Time Soft Alignment Enables Resilient Learning without Weight Transport - Score: 15 (R=8, N=7) - Date: 2025-05-28 - Comment: The paper proposes a method for resilient learning without weight transport, which aligns with model architecture and training dynamics.
HAD: Hybrid Architecture Distillation Outperforms Teacher in Genomic Sequence Modeling - Score: 15 (R=8, N=7) - Date: 2025-05-28 - Comment: The paper proposes a novel Hybrid Architecture Distillation approach, which involves architectural innovation and model compression, aligning with model architecture and compression criteria.
Position: Adopt Constraints Over Penalties in Deep Learning - Score: 15 (R=8, N=7) - Date: 2025-05-28 - Comment: The paper argues for the use of constrained optimization methods over penalties in deep learning, which aligns with foundational research in model architecture and training dynamics.
Rotary Masked Autoencoders are Versatile Learners - Score: 15 (R=8, N=7) - Date: 2025-05-28 - Comment: The paper presents Rotary Masked Autoencoders, an extension of MAE for representation learning across various modalities.
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms - Score: 15 (R=8, N=7) - Date: 2025-05-28 - Comment: The paper discusses a novel method for controlling language model generation using sparse autoencoders, which is relevant to representation learning and model architecture.
TabPFN: One Model to Rule Them All? - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper discusses TabPFN, a transformer-based model for tabular data, with potential foundational model capabilities.
Revisiting Glorot Initialization for Long-Range Linear Recurrences - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper revisits Glorot initialization for RNNs, providing theoretical insights into initialization schemes, which aligns with model architecture analysis.
Scaling Laws for Gradient Descent and Sign Descent for Linear Bigram Models under Zipf's Law - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper provides theoretical insights into optimization difficulties in training language models, which is relevant to understanding training dynamics in neural networks.
Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper proposes Universal Reasoner, a plug-and-play reasoning module for LLMs, which aligns with model architecture innovations.
Hierarchical-embedding autoencoder with a predictor (HEAP) as efficient architecture for learning long-term evolution of complex multi-scale physical systems - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper proposes a novel hierarchical-embedding autoencoder architecture, which is relevant to model architecture innovations.
Leveraging KANs for Expedient Training of Multichannel MLPs via Preconditioning and Geometric Refinement - Score: 15 (R=8, N=7) - Date: 2025-05-26 - Comment: The paper explores the relationship between KANs and MLPs, providing insights into training dynamics and architectural innovations.
A Principled Bayesian Framework for Training Binary and Spiking Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-05-26 - Comment: The paper presents a Bayesian framework for training binary and spiking neural networks, which aligns with interests in model architecture and efficiency improvements.
Selection Mechanisms for Sequence Modeling using Linear State Space Models - Score: 15 (R=8, N=7) - Date: 2025-05-26 - Comment: The paper introduces a novel selection mechanism for sequence modeling using Linear State Space Models, which is relevant to model architecture innovations.
Hybrid Mamba-Transformer Decoder for Error-Correcting Codes - Score: 15 (R=8, N=7) - Date: 2025-05-26 - Comment: The paper introduces a novel hybrid architecture combining Mamba and Transformer layers, which aligns with the model architecture criterion.
DAM-GT: Dual Positional Encoding-Based Attention Masking Graph Transformer for Node Classification - Score: 15 (R=8, N=7) - Date: 2025-05-26 - Comment: The paper introduces a novel graph transformer architecture with dual positional encoding and attention masking, which aligns with the model architecture criterion.
\texttt{Range-Arithmetic}: Verifiable Deep Learning Inference on an Untrusted Party - Score: 15 (R=8, N=7) - Date: 2025-05-26 - Comment: The paper introduces a novel framework for verifiable DNN inference, which aligns with the model architecture criterion.
Transformer brain encoders explain human high-level visual responses - Score: 15 (R=8, N=7) - Date: 2025-05-26 - Comment: The paper explores the use of transformer architectures to model brain activity, focusing on the interpretability and routing of visual information, which aligns with the interest in model architecture and insights into existing architectures.
SELF: Self-Extend the Context Length With Logistic Growth Function - Score: 15 (R=8, N=7) - Date: 2025-05-23 - Comment: The paper proposes a method to extend context length in large language models, which aligns with the large language models criterion.
Native Segmentation Vision Transformers - Score: 15 (R=8, N=7) - Date: 2025-05-23 - Comment: The paper proposes a new vision transformer architecture with native segmentation capabilities, aligning with the model architecture criterion.
AnchorFormer: Differentiable Anchor Attention for Efficient Vision Transformer - Score: 15 (R=8, N=7) - Date: 2025-05-23 - Comment: The paper proposes an efficient vision transformer architecture, AnchorFormer, which is relevant to model architecture innovations.
Circle-RoPE: Cone-like Decoupled Rotary Positional Embedding for Large Vision-Language Models - Score: 15 (R=8, N=7) - Date: 2025-05-23 - Comment: The paper introduces Circle-RoPE, a novel positional encoding scheme for large vision-language models, which aligns with the Model Architecture criterion by proposing a new encoding structure. It also addresses cross-modal biases, which is a significant insight into model behavior.
PaTH Attention: Position Encoding via Accumulating Householder Transformations - Score: 15 (R=8, N=7) - Date: 2025-05-23 - Comment: The paper introduces a novel position encoding scheme for transformers, which is relevant to model architecture innovations.
AdamS: Momentum Itself Can Be A Normalizer for LLM Pretraining and Post-training - Score: 15 (R=8, N=7) - Date: 2025-05-23 - Comment: AdamS introduces a new optimizer for LLM pretraining and post-training, relevant to large language models and optimization techniques.
Understanding Differential Transformer Unchains Pretrained Self-Attentions - Score: 15 (R=8, N=7) - Date: 2025-05-23 - Comment: The paper investigates Differential Transformer and proposes DEX, which is relevant to model architecture innovations.
DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving - Score: 15 (R=8, N=7) - Date: 2025-05-23 - Comment: The paper focuses on a novel MoE-based framework for autonomous driving, which aligns with the Model Architecture criterion.
Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation - Score: 15 (R=8, N=7) - Date: 2025-05-23 - Comment: The paper leverages sparse autoencoders to mitigate hallucinations in LVLMs, which is relevant to representation learning and model architecture.
Bidirectional Variational Autoencoders - Score: 15 (R=8, N=7) - Date: 2025-05-22 - Comment: The paper introduces a new bidirectional variational autoencoder architecture, which is relevant to model architecture innovations.
Meta-Learning an In-Context Transformer Model of Human Higher Visual Cortex - Score: 15 (R=8, N=7) - Date: 2025-05-22 - Comment: The paper uses a transformer architecture for modeling human visual cortex, which is relevant to model architecture and AI for science.
HybridProver: Augmenting Theorem Proving with LLM-Driven Proof Synthesis and Refinement - Score: 15 (R=8, N=7) - Date: 2025-05-22 - Comment: The paper presents a dual-model framework for theorem proving using LLMs, which is relevant to foundational research in LLMs and model architecture.
Stronger ViTs With Octic Equivariance - Score: 15 (R=8, N=7) - Date: 2025-05-22 - Comment: The paper introduces octic-equivariant layers in Vision Transformers, which is a novel architectural innovation.
Fourier-Invertible Neural Encoder (FINE) for Homogeneous Flows - Score: 15 (R=8, N=7) - Date: 2025-05-22 - Comment: The paper introduces a new invertible neural architecture, which is relevant to model architecture innovations.
Scaling Diffusion Transformers Efficiently via $μ$P - Score: 15 (R=8, N=7) - Date: 2025-05-22 - Comment: The paper focuses on scaling diffusion transformers efficiently, which is relevant to model architecture and efficiency improvements.
Degree-Optimized Cumulative Polynomial Kolmogorov-Arnold Networks - Score: 15 (R=8, N=7) - Date: 2025-05-22 - Comment: The paper introduces a new neural architecture combining polynomial basis functions and optimization, which is relevant to model architecture innovations.
Time Tracker: Mixture-of-Experts-Enhanced Foundation Time Series Forecasting Model with Decoupled Training Pipelines - Score: 15 (R=8, N=7) - Date: 2025-05-22 - Comment: The paper proposes a mixture-of-experts-enhanced model for time series forecasting, which is relevant to model architecture innovations.
Mechanistic evaluation of Transformers and state space models - Score: 15 (R=8, N=7) - Date: 2025-05-22 - Comment: The paper provides a mechanistic evaluation of Transformers and state space models, which is relevant to model architecture analysis.
$\texttt{LLINBO}$: Trustworthy LLM-in-the-Loop Bayesian Optimization - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper introduces a hybrid framework for Bayesian optimization with LLMs, which is relevant to foundational research in LLM behavior.
FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper presents a sparsified LLM for multilingual translation using MoE, which is relevant to model architecture and model compression.
Learning with Local Search MCMC Layers - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper introduces a novel approach for integrating combinatorial optimization layers into neural networks, which is relevant to model architecture innovations.
Mechanistic Fine-tuning for In-context Learning - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper introduces a novel fine-tuning method for in-context learning by focusing on attention scores, which aligns with representation learning and model architecture insights.
Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper introduces a multimodal mixture of low-rank experts, which aligns with model architecture and representation learning.
MSVIT: Improving Spiking Vision Transformer Using Multi-scale Attention Fusion - Score: 15 (R=8, N=7) - Date: 2025-05-20 - Comment: The paper proposes a novel spike-driven Transformer architecture, relevant to model architecture.
Multi-head Temporal Latent Attention - Score: 15 (R=8, N=7) - Date: 2025-05-20 - Comment: The paper proposes Multi-head Temporal Latent Attention, which is relevant to model architecture innovations focusing on efficiency improvements.
$μ$PC: Scaling Predictive Coding to 100+ Layer Networks - Score: 15 (R=8, N=7) - Date: 2025-05-20 - Comment: The paper explores scaling predictive coding networks, which is relevant to model architecture and training dynamics.
Walking the Tightrope: Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning - Score: 15 (R=8, N=7) - Date: 2025-05-20 - Comment: The paper introduces a novel counterfact-aware RFT framework for LLMs, which is relevant to theoretical insights into LLM behavior.
TSPulse: Dual Space Tiny Pre-Trained Models for Rapid Time-Series Analysis - Score: 15 (R=8, N=7) - Date: 2025-05-20 - Comment: The paper introduces TSPulse, a compact model for time-series analysis with architectural innovations like dual-space masked reconstruction, relevant to model architecture.
CALM-PDE: Continuous and Adaptive Convolutions for Latent Space Modeling of Time-dependent PDEs - Score: 15 (R=8, N=7) - Date: 2025-05-20 - Comment: The paper introduces CALM-PDE, a model for solving PDEs using a novel convolution-based architecture, relevant to model architecture innovations.
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper provides a theoretical perspective on continuous chain-of-thoughts in LLMs, which is relevant to the large language models criterion.
SchoenbAt: Rethinking Attention with Polynomial basis - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper proposes SchoenbAt, a new attention mechanism using polynomial basis, which aligns with model architecture innovations.
WaLRUS: Wavelets for Long-range Representation Using SSMs - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper introduces a new implementation of state-space models using wavelets, which aligns with model architecture innovations.
SAINT: Attention-Based Modeling of Sub-Action Dependencies in Multi-Action Policies - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper presents a novel policy architecture using Transformers, which aligns with the model architecture criterion.
FlashBias: Fast Computation of Attention with Bias - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: FlashBias focuses on efficient computation of attention with bias, which is relevant to model architecture and efficiency.
Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper explores transformers in unsupervised learning, specifically Gaussian Mixture Models, which aligns with the core topic of model architecture and representation learning.
S-Crescendo: A Nested Transformer Weaving Framework for Scalable Nonlinear System in S-Domain Representation - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper introduces a novel nested transformer framework for scalable nonlinear system simulation, which aligns with the model architecture criterion.
A Classical View on Benign Overfitting: The Role of Sample Size - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper provides theoretical insights into benign overfitting, which is relevant to understanding training dynamics in neural networks.
What Can We Learn From MIMO Graph Convolutions? - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper explores MIMO graph convolutions, providing insights into graph neural network architectures, which is relevant to model architecture.
Where You Place the Norm Matters: From Prejudiced to Neutral Initializations - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper investigates the impact of normalization placement on network initialization, relevant to model architecture analysis.
Attention on the Sphere - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper introduces a generalized attention mechanism for spherical domains, which is relevant to model architecture, specifically in the context of transformers.
PhiNet v2: A Mask-Free Brain-Inspired Vision Foundation Model from Video - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper introduces a brain-inspired vision model, which aligns with the model architecture criterion.
Efficient Attention via Pre-Scoring: Prioritizing Informative Keys in Transformers - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper proposes a pre-scoring mechanism to improve attention in transformers, which is relevant to model architecture and efficiency.
Relational Graph Transformer - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper introduces a novel graph transformer architecture specifically designed for relational tables, which aligns with the model architecture criterion.
MergeBench: A Benchmark for Merging Domain-Specialized LLMs - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper introduces a benchmark for merging domain-specialized LLMs, which is relevant to model architecture and LLMs.
PoE-World: Compositional World Modeling with Products of Programmatic Experts - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper introduces a novel program synthesis method for world modeling using products of programmatic experts, which aligns with representation learning and model architecture innovations.
SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and $\mathcal{O}(T)$ Complexity - Score: 15 (R=8, N=7) - Date: 2025-05-16 - Comment: The paper introduces a spike-driven video Transformer, which is relevant to model architecture innovations.
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures - Score: 15 (R=8, N=7) - Date: 2025-05-15 - Comment: The paper discusses hardware-aware co-design for scaling LLMs, including innovations like Mixture of Experts (MoE) and FP8 mixed-precision training. This aligns with the 'Model Architecture' criterion due to its focus on MoE and architectural efficiency.
The Influence of the Memory Capacity of Neural DDEs on the Universal Approximation Property - Score: 15 (R=8, N=7) - Date: 2025-05-13 - Comment: The paper explores the universal approximation property of neural DDEs, which is relevant to emerging trends in model architecture and theoretical insights. The focus on memory capacity and its influence is novel.
Deeply Explainable Artificial Neural Network - Score: 15 (R=8, N=7) - Date: 2025-05-13 - Comment: The paper introduces DxANN, a novel architecture embedding explainability directly into the training process, which aligns with the 'Model Architecture' criterion by proposing an innovative design.
Probing In-Context Learning: Impact of Task Complexity and Model Architecture on Generalization and Efficiency - Score: 15 (R=8, N=7) - Date: 2025-05-13 - Comment: The paper investigates in-context learning with a focus on task complexity and model architecture, providing insights into architectural behavior and generalization.
UniSymNet: A Unified Symbolic Network Guided by Transformer - Score: 15 (R=8, N=7) - Date: 2025-05-12 - Comment: The paper introduces a novel symbolic network guided by Transformers, which aligns with architectural innovations and representation learning.
OpenworldAUC: Towards Unified Evaluation and Optimization for Open-world Prompt Tuning - Score: 15 (R=8, N=7) - Date: 2025-05-09 - Comment: The paper proposes Gated Mixture-of-Prompts (GMoP) for open-world prompt tuning, which aligns with 'Model Architecture' due to its innovative use of domain-specific prompts and gating mechanisms.
SetONet: A Deep Set-based Operator Network for Solving PDEs with permutation invariant variable input sampling - Score: 15 (R=8, N=7) - Date: 2025-05-09 - Comment: The paper introduces SetONet, a novel architecture extending DeepONet with permutation invariance, which aligns with the 'Model Architecture' criterion for architectural innovations.
Information Filtering Networks: Theoretical Foundations, Generative Methodologies, and Real-World Applications - Score: 15 (R=8, N=7) - Date: 2025-05-08 - Comment: The paper provides a comprehensive review of Information Filtering Networks (IFNs), discussing their theoretical foundations and potential integration with deep learning architectures, which aligns with emerging trends and foundational research.
Attention Mechanisms Perspective: Exploring LLM Processing of Graph-Structured Data - Score: 15 (R=8, N=7) - Date: 2025-05-07 - Comment: The paper explores attention mechanisms in LLMs for graph-structured data, aligning with the criterion of analyzing LLM behavior and interpretability.
Learning Local Causal World Models with State Space Models and Attention - Score: 15 (R=8, N=7) - Date: 2025-05-06 - Comment: The paper explores causal discovery in State Space Models (SSMs), which is relevant to representation learning and architectural innovations, particularly in comparison to Transformers.
BiGSCoder: State Space Model for Code Understanding - Score: 15 (R=8, N=7) - Date: 2025-05-06 - Comment: The paper introduces BiGSCoder, a state-space model for code understanding, which provides insights into SSMs as an alternative to Transformers, aligning with architectural innovations.
On the expressivity of deep Heaviside networks - Score: 15 (R=8, N=7) - Date: 2025-05-02 - Comment: The paper provides theoretical insights into the expressivity of deep Heaviside networks, including VC dimensions and approximation rates, which aligns with foundational research in model architecture.

Model Compression and Efficiency (214)

From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning - Score: 20.0 (R=0, N=0) - Date: 2025-05-26 - Comment: Author match
Adaptive Cyclic Diffusion for Inference Scaling - Score: 20.0 (R=0, N=0) - Date: 2025-05-21 - Comment: Author match
Search-Based Correction of Reasoning Chains for Language Models - Score: 20.0 (R=0, N=0) - Date: 2025-05-19 - Comment: Author match
Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations - Score: 18 (R=10, N=8) - Date: 2025-05-13 - Comment: The paper proposes a novel 4-bit quantization method (BOF4) for LLMs, which directly aligns with the 'Model Compression' criterion and introduces significant improvements.
Large Language Model Compression with Global Rank and Sparsity Optimization - Score: 18 (R=10, N=8) - Date: 2025-05-08 - Comment: Proposes a two-stage LLM compression method combining low-rank and sparse approximations with global optimization. This directly addresses foundational challenges in model compression and sparsity.
Self-orthogonalizing attractor neural networks emerging from the free energy principle - Score: 18 (R=9, N=9) - Date: 2025-05-30 - Comment: The paper presents a theory of self-organizing attractor networks emerging from the free energy principle, which aligns with emerging trends in theoretical work.
Quartet: Native FP4 Training Can Be Optimal for Large Language Models - Score: 18 (R=9, N=9) - Date: 2025-05-21 - Comment: The paper introduces Quartet, a novel approach for FP4 training in LLMs, which is relevant to model compression and efficiency.
Multi-Step Consistency Models: Fast Generation with Theoretical Guarantees - Score: 18 (R=9, N=9) - Date: 2025-05-05 - Comment: The paper provides theoretical guarantees for consistency models, which are an emerging trend in generative modeling. It offers foundational insights into the efficiency and theoretical underpinnings of these models, aligning with the 'Emerging Trends' criterion.
KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction - Score: 17 (R=9, N=8) - Date: 2025-05-31 - Comment: The paper introduces KVzip, a novel method for KV cache compression in LLMs, which aligns with the model compression criterion.
DenoiseRotator: Enhance Pruning Robustness for LLMs via Importance Concentration - Score: 17 (R=9, N=8) - Date: 2025-05-31 - Comment: The paper introduces DenoiseRotator, a method to enhance pruning robustness in LLMs, aligning with model compression.
Model-Preserving Adaptive Rounding - Score: 17 (R=9, N=8) - Date: 2025-05-31 - Comment: The paper introduces a new quantization algorithm, YAQA, which is relevant to model compression through quantization and efficiency improvements.
Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference - Score: 17 (R=9, N=8) - Date: 2025-05-31 - Comment: The paper focuses on unstructured sparsity for KV cache pruning in LLM inference, aligning with model compression and efficiency breakthroughs.
MAP: Revisiting Weight Decomposition for Low-Rank Adaptation - Score: 17 (R=9, N=8) - Date: 2025-05-30 - Comment: The paper introduces a novel framework for low-rank adaptation in model fine-tuning, relevant to model compression.
Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking - Score: 17 (R=9, N=8) - Date: 2025-05-30 - Comment: The paper discusses scalable and memory-efficient pretraining methods for LLMs, focusing on model compression and efficiency, which is highly relevant.
Almost Linear Convergence under Minimal Score Assumptions: Quantized Transition Diffusion - Score: 17 (R=9, N=8) - Date: 2025-05-29 - Comment: The paper introduces Quantized Transition Diffusion, a novel approach integrating data quantization with discrete diffusion dynamics, advancing theoretical foundations of diffusion-based generative modeling.
Global Minimizers of $\ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-05-29 - Comment: The paper addresses the challenge of finding the sparsest interpolating ReLU network using a novel training objective, which is relevant to model compression and sparsity.
LaX: Boosting Low-Rank Training of Foundation Models via Latent Crossing - Score: 17 (R=9, N=8) - Date: 2025-05-29 - Comment: The paper introduces Latent Crossing, a module to enhance low-rank models, relevant to model compression and low-rank approaches.
Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning - Score: 17 (R=9, N=8) - Date: 2025-05-29 - Comment: The paper explores 4-bit FP quantization for diffusion models, addressing challenges in model quantization, which is relevant to model compression and efficiency.
Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders - Score: 17 (R=9, N=8) - Date: 2025-05-28 - Comment: The paper introduces Mixture of Decoders (MxDs) for interpretable dense layer decomposition, relevant to model architecture and sparsity.
LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning - Score: 17 (R=9, N=8) - Date: 2025-05-28 - Comment: The paper introduces a novel low-rank adaptation method that aligns with the model compression criterion by focusing on parameter-efficient fine-tuning.
Efficient Large Language Model Inference with Neural Block Linearization - Score: 17 (R=9, N=8) - Date: 2025-05-28 - Comment: The paper introduces Neural Block Linearization, a novel framework for accelerating LLM inference, aligning with model compression and efficiency breakthroughs.
Sparsified State-Space Models are Efficient Highway Networks - Score: 17 (R=9, N=8) - Date: 2025-05-28 - Comment: The paper introduces a novel sparsification method for state-space models, which aligns with the model compression criterion by focusing on token pruning and efficiency improvements.
Test-Time Learning for Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-05-28 - Comment: The paper proposes a test-time learning paradigm for LLMs, which aligns with the large language models criterion by focusing on theoretical insights into LLM behavior.
ResSVD: Residual Compensated SVD for Large Language Model Compression - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper presents ResSVD, a new SVD-based method for LLM compression, focusing on reducing truncation loss and selective layer compression, which is relevant to model compression.
Foundations of Top-$k$ Decoding For Language Models - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper provides a theoretical framework for top-k decoding in LLMs, which is relevant to theoretical insights into LLM behavior.
Shifting AI Efficiency From Model-Centric to Data-Centric Compression - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper discusses a shift from model-centric to data-centric compression, which is relevant to model compression and efficiency.
FP4 All the Way: Fully Quantized Training of LLMs - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper demonstrates fully quantized training of LLMs, which is relevant to model compression and efficiency breakthroughs.
RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large Models - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: RefLoRA proposes a method for efficient fine-tuning of large models using low-rank adaptation, relevant to model compression and efficiency.
AuroRA: Breaking Low-Rank Bottleneck of LoRA with Nonlinear Mapping - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper proposes AuroRA, which addresses the low-rank bottleneck in LoRA, relevant to model compression and efficiency.
LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper introduces a novel fine-tuning method for quantized LLMs, relevant to model compression and efficiency.
Neural Parameter Search for Slimmer Fine-Tuned Models and Better Transfer - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper introduces a novel pruning strategy for fine-tuned models, focusing on neural parameter search within low-rank subspaces, which aligns with the model compression criterion.
PLUMAGE: Probabilistic Low rank Unbiased Min Variance Gradient Estimator for Efficient Large Model Training - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper proposes a new low-rank gradient estimator for efficient large model training, relevant to model compression and efficiency.
ELDeR: Getting Efficient LLMs through Data-Driven Regularized Layer-wise Pruning - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: ELDeR introduces a novel paradigm for pruning LLMs, relevant to model compression and efficiency.
Generalized Fisher-Weighted SVD: Scalable Kronecker-Factored Fisher Approximation for Compressing Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-05-26 - Comment: The paper proposes a novel compression technique for LLMs using a generalized Fisher-weighted SVD, which is relevant to model compression.
SVD-Free Low-Rank Adaptive Gradient Optimization for Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-05-26 - Comment: The paper presents a novel low-rank optimization method for LLMs, which is relevant to model compression through low-rank approaches.
Stochastic Weight Sharing for Bayesian Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-05-26 - Comment: The paper presents a novel approach to compress Bayesian Neural Networks using stochastic weight-sharing quantization, which aligns with the interest in model compression and efficiency breakthroughs.
ECHO-LLaMA: Efficient Caching for High-Performance LLaMA Training - Score: 17 (R=9, N=8) - Date: 2025-05-26 - Comment: The paper introduces ECHO-LLaMA, focusing on efficient caching and computational efficiency in LLMs, aligning with model compression and efficiency breakthroughs.
From Compression to Expansion: A Layerwise Analysis of In-Context Learning - Score: 17 (R=9, N=8) - Date: 2025-05-23 - Comment: The paper conducts a layerwise analysis of in-context learning in large language models, aligning with the large language models criterion.
TRIM: Achieving Extreme Sparsity with Targeted Row-wise Iterative Metric-driven Pruning - Score: 17 (R=9, N=8) - Date: 2025-05-23 - Comment: The paper introduces a novel pruning method, TRIM, which is relevant to model compression and sparsity.
FPQVAR: Floating Point Quantization for Visual Autoregressive Model with FPGA Hardware Co-design - Score: 17 (R=9, N=8) - Date: 2025-05-23 - Comment: The paper presents a novel quantization framework for visual autoregressive models, which aligns with model compression through quantization and efficiency improvements.
Is Quantum Optimization Ready? An Effort Towards Neural Network Compression using Adiabatic Quantum Computing - Score: 17 (R=9, N=8) - Date: 2025-05-23 - Comment: The paper explores quantum optimization for neural network compression, specifically using adiabatic quantum computing for pruning and quantization.
NQKV: A KV Cache Quantization Scheme Based on Normal Distribution Characteristics - Score: 17 (R=9, N=8) - Date: 2025-05-23 - Comment: The paper introduces NQKV, a KV cache quantization scheme, which directly relates to model compression and efficiency.
Decouple and Orthogonalize: A Data-Free Framework for LoRA Merging - Score: 17 (R=9, N=8) - Date: 2025-05-22 - Comment: The paper introduces a data-free framework for LoRA merging, which is relevant to model compression and efficiency.
Beyond Hard and Soft: Hybrid Context Compression for Balancing Local and Global Information Retention - Score: 17 (R=9, N=8) - Date: 2025-05-22 - Comment: The paper proposes a hybrid context compression method for LLMs, which is relevant to model compression and efficiency in LLMs.
Efficient Differentiable Approximation of Generalized Low-rank Regularization - Score: 17 (R=9, N=8) - Date: 2025-05-22 - Comment: The paper proposes an efficient differentiable approximation for low-rank regularization, aligning with foundational research in model compression and efficiency.
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning - Score: 17 (R=9, N=8) - Date: 2025-05-22 - Comment: The paper explores entropy minimization in LLM reasoning, which is relevant to theoretical insights into LLM behavior.
Saten: Sparse Augmented Tensor Networks for Post-Training Compression of Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-05-21 - Comment: The paper proposes sparse augmented tensor networks for post-training compression of LLMs, which is relevant to model compression.
Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis - Score: 17 (R=9, N=8) - Date: 2025-05-21 - Comment: The paper introduces a quantized parameter-efficient fine-tuning framework, which aligns with model compression and efficiency.
Scaling Law for Quantization-Aware Training - Score: 17 (R=9, N=8) - Date: 2025-05-21 - Comment: The paper proposes a scaling law for quantization-aware training, which is relevant to model compression and efficiency.
Do Language Models Use Their Depth Efficiently? - Score: 17 (R=9, N=8) - Date: 2025-05-21 - Comment: The paper analyzes the efficiency of depth usage in LLMs, which is relevant to large language models and model architecture.
EfficientLLM: Efficiency in Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-05-21 - Comment: The paper presents a comprehensive study on efficiency techniques for LLMs, which is relevant to model compression and efficiency.
Breaking the Compression Ceiling: Data-Free Pipeline for Ultra-Efficient Delta Compression - Score: 17 (R=9, N=8) - Date: 2025-05-20 - Comment: The paper introduces UltraDelta, a data-free delta compression pipeline, which is relevant to model compression with a focus on sparsity and quantization.
FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference - Score: 17 (R=9, N=8) - Date: 2025-05-20 - Comment: The paper introduces FreeKV, a framework for efficient KV cache retrieval in LLMs, which aligns with the interest in model compression and efficiency.
A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone - Score: 17 (R=9, N=8) - Date: 2025-05-20 - Comment: The paper introduces a low-rank approach for efficient knowledge distillation, relevant to model compression and efficiency.
Qronos: Correcting the Past by Shaping the Future... in Post-Training Quantization - Score: 17 (R=9, N=8) - Date: 2025-05-19 - Comment: Qronos is a new post-training quantization algorithm, which is relevant to model compression.
RanDeS: Randomized Delta Superposition for Multi-Model Compression - Score: 17 (R=9, N=8) - Date: 2025-05-19 - Comment: The paper presents a novel approach to multi-model compression using randomized transformations, aligning with the core topic of model compression.
Addition is almost all you need: Compressing neural networks with double binary factorization - Score: 17 (R=9, N=8) - Date: 2025-05-19 - Comment: The paper introduces a novel method for model compression using Double Binary Factorization, which aligns with the model compression criterion focusing on sparsity, pruning, and quantization.
Phi: Leveraging Pattern-based Hierarchical Sparsity for High-Efficiency Spiking Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-05-19 - Comment: The paper proposes a framework for optimizing spiking neural networks using hierarchical sparsity, aligning with the model compression criterion.
A probabilistic framework for dynamic quantization - Score: 17 (R=9, N=8) - Date: 2025-05-16 - Comment: The paper proposes a probabilistic framework for dynamic quantization, which is relevant to model compression by introducing an input-adaptive rescaling method.
Parallel Scaling Law for Language Models - Score: 17 (R=9, N=8) - Date: 2025-05-16 - Comment: The paper introduces a new scaling paradigm for language models, which aligns with the interest in model architecture and efficiency improvements.
An Extra RMSNorm is All You Need for Fine Tuning to 1.58 Bits - Score: 17 (R=9, N=8) - Date: 2025-05-15 - Comment: The paper focuses on quantization for LLMs, specifically achieving ternary (2-bit) precision using RMS normalization and a gradual quantization schedule. This aligns with the 'Model Compression' criterion, particularly in sparsity and low-bit efficiency breakthroughs.
InfoPO: On Mutual Information Maximization for Large Language Model Alignment - Score: 17 (R=9, N=8) - Date: 2025-05-14 - Comment: The paper proposes InfoPO, a novel algorithm for aligning LLMs using preference data, which addresses foundational challenges in LLM alignment and optimization.
Efficient Unstructured Pruning of Mamba State-Space Models for Resource-Constrained Environments - Score: 17 (R=9, N=8) - Date: 2025-05-14 - Comment: The paper proposes a novel unstructured pruning framework for Mamba state-space models, which is relevant to model compression. The gradient-aware magnitude pruning and iterative pruning schedule are innovative contributions.
Blockbuster, Part 1: Block-level AI Operator Fusion - Score: 17 (R=9, N=8) - Date: 2025-05-14 - Comment: The paper introduces a novel operator fusion framework, which directly models data movement between memory tiers and achieves significant efficiency improvements. This aligns with the model compression and efficiency breakthroughs criterion.
ICE-Pruning: An Iterative Cost-Efficient Pruning Pipeline for Deep Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-05-13 - Comment: The paper introduces ICE-Pruning, a novel pruning pipeline for model compression, which is directly relevant to foundational research in model efficiency.
GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance - Score: 17 (R=9, N=8) - Date: 2025-05-13 - Comment: The paper introduces a novel quantization method for LLMs, which is highly relevant to model compression and efficiency. The integration of gradient information into the quantization objective is a significant contribution.
Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM - Score: 17 (R=9, N=8) - Date: 2025-05-12 - Comment: The paper focuses on sparsity-optimized data mapping for efficient LLM decoding, which aligns with the 'Model Compression' criterion, particularly in sparsity and KV cache optimization. It also touches on 'Large Language Models' by addressing efficiency in LLM inference.
How to Train Your Metamorphic Deep Neural Network - Score: 17 (R=9, N=8) - Date: 2025-05-12 - Comment: The paper proposes a training algorithm for neural metamorphosis, which aligns with model compression and efficiency breakthroughs.
DPQ-HD: Post-Training Compression for Ultra-Low Power Hyperdimensional Computing - Score: 17 (R=9, N=8) - Date: 2025-05-09 - Comment: The paper introduces a post-training compression algorithm for hyperdimensional computing, combining decomposition, pruning, and quantization. This aligns well with foundational research in model compression and efficiency.
Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry - Score: 17 (R=9, N=8) - Date: 2025-05-09 - Comment: This paper addresses sparse training and the Lottery Ticket Hypothesis, which is highly relevant to model compression and sparsity. The proposed method of aligning masks using weight symmetry is novel and provides significant insights.
Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning - Score: 17 (R=9, N=8) - Date: 2025-05-09 - Comment: The paper addresses mixed-precision quantization with novel techniques like sharpness-aware minimization and adaptive gradient alignment, which are directly relevant to model compression and efficiency.
ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via $\alpha$-$\beta$-Divergence - Score: 17 (R=9, N=8) - Date: 2025-05-08 - Comment: The paper proposes a novel framework for knowledge distillation using alpha-beta divergence, addressing challenges in balancing concentration effects. This aligns with the 'Model Compression' criterion, particularly in advancing theoretical understanding of distillation.
Grouped Sequency-arranged Rotation: Optimizing Rotation Transformation for Quantization for Free - Score: 17 (R=9, N=8) - Date: 2025-05-08 - Comment: This paper proposes a novel quantization method leveraging the Walsh-Hadamard transform and introduces Grouped Sequency-arranged Rotation (GSR), which directly addresses model compression and efficiency. The method is innovative and relevant to foundational research in model compression.
Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth - Score: 17 (R=9, N=8) - Date: 2025-05-08 - Comment: The paper introduces QR-Adaptor, a method for fine-tuning quantized models by jointly optimizing quantization and low-rank components. This aligns well with the model compression criterion, particularly in low-rank approaches and quantization.
APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design - Score: 17 (R=9, N=8) - Date: 2025-05-08 - Comment: The paper introduces a novel quantization method (APSQ) for partial sums in DNN accelerators, which aligns with the model compression criterion, particularly in energy efficiency and quantization innovations.
AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design - Score: 17 (R=9, N=8) - Date: 2025-05-08 - Comment: The paper proposes AccLLM, a framework for accelerating LLM inference with algorithm-hardware co-design, including pruning and quantization innovations, which aligns with model compression and efficiency breakthroughs.
Nonnegative Low-rank Matrix Recovery Can Have Spurious Local Minima - Score: 17 (R=9, N=8) - Date: 2025-05-07 - Comment: The paper investigates theoretical properties of low-rank matrix recovery with nonnegative constraints, which aligns with the model compression topic, particularly low-rank approaches.
SPAP: Structured Pruning via Alternating Optimization and Penalty Methods - Score: 17 (R=9, N=8) - Date: 2025-05-07 - Comment: This paper proposes SPAP, a structured pruning framework for LLMs, which aligns with the model compression criterion by introducing optimization-driven pruning methods.
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference - Score: 17 (R=9, N=8) - Date: 2025-05-07 - Comment: The paper proposes RetroInfer, a novel system for efficient long-context LLM inference, which aligns with model compression and efficiency topics.
Practical Efficiency of Muon for Pretraining - Score: 17 (R=9, N=8) - Date: 2025-05-06 - Comment: The paper introduces Muon, a second-order optimizer, and demonstrates its efficiency in pretraining large models. This aligns with foundational research in model efficiency and optimization.
Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients - Score: 17 (R=9, N=8) - Date: 2025-05-06 - Comment: The paper introduces a novel framework for low-rank gradient projection (VLoRP) and provides theoretical analysis, aligning with the model compression criterion.
Don't be lazy: CompleteP enables compute-efficient deep transformers - Score: 17 (R=9, N=8) - Date: 2025-05-06 - Comment: The paper introduces a parameterization method for LLM training that improves compute efficiency and avoids lazy learning. This aligns with foundational research in model architecture and training dynamics.
ICQuant: Index Coding enables Low-bit LLM Quantization - Score: 17 (R=9, N=8) - Date: 2025-05-05 - Comment: The paper introduces ICQuant, a novel low-bit quantization framework addressing outliers in LLMs. This aligns closely with the 'Model Compression' criterion, particularly in advancing quantization techniques.
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation - Score: 17 (R=9, N=8) - Date: 2025-05-02 - Comment: This paper introduces FineScope, a framework for pruning and optimizing LLMs using sparse autoencoders. It aligns with the model compression criterion, particularly in sparsity and pruning, and offers methodological contributions.
FreqKV: Frequency Domain Key-Value Compression for Efficient Context Window Extension - Score: 17 (R=9, N=8) - Date: 2025-05-02 - Comment: The paper introduces a novel KV cache compression method for LLMs using frequency domain techniques, which aligns with the model compression criterion and efficiency breakthroughs.
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics - Score: 17 (R=9, N=8) - Date: 2025-05-02 - Comment: This paper introduces a novel low-bit optimizer with theoretical contributions addressing challenges in quantization, which aligns with the model compression criterion, particularly in sparsity and quantization.
Optimal Vector Compressed Sensing Using James Stein Shrinkage - Score: 17 (R=9, N=8) - Date: 2025-05-02 - Comment: The paper introduces SteinSense, a novel algorithm for vector compressed sensing that is provably optimal and highly scalable. This aligns with foundational research in model compression and efficiency.
AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization - Score: 17 (R=9, N=8) - Date: 2025-05-01 - Comment: The paper proposes a hybrid reasoning optimization framework for LLMs, which aligns with foundational research in LLM efficiency and adaptive reasoning strategies. It introduces a novel bi-level optimization approach.
Memorization and Knowledge Injection in Gated LLMs - Score: 17 (R=9, N=8) - Date: 2025-05-01 - Comment: The paper introduces a novel framework, MEGa, for continual learning in LLMs using gated low-rank weights, which aligns with the 'Model Compression' criterion due to its focus on low-rank approaches and efficiency. It also touches on foundational aspects of LLM behavior.
Efficient LLMs with AMP: Attention Heads and MLP Pruning - Score: 17 (R=9, N=8) - Date: 2025-05-01 - Comment: The paper proposes AMP, a structured pruning method targeting Attention Heads and MLPs in LLMs, which aligns with the 'Model Compression' criterion. It offers a novel pruning approach with significant efficiency improvements.
The Discovery Engine: A Framework for AI-Driven Synthesis and Navigation of Scientific Knowledge Landscapes - Score: 17 (R=8, N=9) - Date: 2025-05-26 - Comment: The Discovery Engine framework for AI-driven synthesis of scientific knowledge is a novel paradigm, relevant to AI for science.
AnchorAttention: Difference-Aware Sparse Attention with Stripe Granularity - Score: 16 (R=9, N=7) - Date: 2025-05-31 - Comment: The paper introduces AnchorAttention, a sparse attention mechanism for LLMs, focusing on model compression and efficiency.
Weight Spectra Induced Efficient Model Adaptation - Score: 16 (R=9, N=7) - Date: 2025-05-31 - Comment: The paper investigates structural changes in weight matrices during fine-tuning, which aligns with model compression and efficiency through low-rank approaches.
FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference - Score: 16 (R=9, N=7) - Date: 2025-05-31 - Comment: The paper introduces FlashFormer, a kernel for efficient low-batch inference in transformer-based LLMs, focusing on model compression and efficiency.
SlimLLM: Accurate Structured Pruning for Large Language Models - Score: 16 (R=9, N=7) - Date: 2025-05-31 - Comment: The paper presents SlimLLM, a structured pruning method for LLMs, which is relevant to model compression as it addresses pruning and efficiency in LLMs.
Lookahead Q-Cache: Achieving More Consistent KV Cache Eviction via Pseudo Query - Score: 16 (R=9, N=7) - Date: 2025-05-28 - Comment: The paper proposes a novel KV cache eviction framework for LLMs, which is relevant to model compression and efficiency improvements.
TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling - Score: 16 (R=9, N=7) - Date: 2025-05-26 - Comment: The paper proposes a framework for efficient test-time scaling in large reasoning models, focusing on compression and efficiency, which is relevant to model compression.
Is (Selective) Round-To-Nearest Quantization All You Need? - Score: 16 (R=9, N=7) - Date: 2025-05-22 - Comment: The paper revisits RTN quantization for LLMs, providing insights into model compression techniques, which is relevant to model compression.
Layer-wise Quantization for Quantized Optimistic Dual Averaging - Score: 16 (R=9, N=7) - Date: 2025-05-21 - Comment: The paper introduces a layer-wise quantization framework, which is relevant to model compression and efficiency.
Through a Compressed Lens: Investigating the Impact of Quantization on LLM Explainability and Interpretability - Score: 16 (R=9, N=7) - Date: 2025-05-21 - Comment: The paper investigates the impact of quantization on LLM explainability and interpretability, which is relevant to model compression and efficiency.
Exploring Federated Pruning for Large Language Models - Score: 16 (R=9, N=7) - Date: 2025-05-20 - Comment: The paper discusses federated pruning for LLMs, which aligns with the interest in model compression and efficiency.
Does Low Rank Adaptation Lead to Lower Robustness against Training-Time Attacks? - Score: 16 (R=9, N=7) - Date: 2025-05-20 - Comment: The paper investigates the robustness of Low Rank Adaptation (LoRA) in LLMs, which is relevant to model compression and efficiency.
InfiJanice: Joint Analysis and In-situ Correction Engine for Quantization-Induced Math Degradation in Large Language Models - Score: 16 (R=9, N=7) - Date: 2025-05-19 - Comment: The paper focuses on quantization in LLMs, which is relevant to model compression, specifically addressing the degradation in mathematical reasoning accuracy due to quantization.
Dynamic Base model Shift for Delta Compression - Score: 16 (R=9, N=7) - Date: 2025-05-19 - Comment: The paper discusses delta compression in transformer models, which is relevant to model compression and efficiency.
Triangulating PL functions and the existence of efficient ReLU DNNs - Score: 16 (R=9, N=7) - Date: 2025-05-13 - Comment: The paper provides a theoretical proof for efficient ReLU DNNs and aligns with 'Model Architecture' by addressing the representation of piecewise linear functions.
Parameter-Efficient Fine-Tuning with Circulant and Diagonal Vectors - Score: 16 (R=9, N=7) - Date: 2025-05-02 - Comment: The paper proposes a parameter-efficient fine-tuning method using circulant and diagonal matrices, which aligns with model compression and efficiency topics.
Foundation Molecular Grammar: Multi-Modal Foundation Models Induce Interpretable Molecular Graph Languages - Score: 16 (R=8, N=8) - Date: 2025-05-30 - Comment: The paper proposes a new method for molecular modeling using multi-modal foundation models, which is relevant to foundational research in AI for science.
Out of the Shadows: Exploring a Latent Space for Neural Network Verification - Score: 16 (R=8, N=8) - Date: 2025-05-26 - Comment: The paper presents a novel latent space for neural network verification, which is relevant to model architecture and efficiency improvements.
Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation - Score: 16 (R=8, N=8) - Date: 2025-05-21 - Comment: The paper proposes a novel framework for log-augmented generation in LLMs, which is relevant to large language models.
RGNMR: A Gauss-Newton method for robust matrix completion with theoretical guarantees - Score: 16 (R=8, N=8) - Date: 2025-05-20 - Comment: The paper introduces a novel method for robust matrix completion with theoretical guarantees, relevant to model compression through low-rank approaches.
Establishing Linear Surrogate Regret Bounds for Convex Smooth Losses via Convolutional Fenche-Young Losses - Score: 16 (R=8, N=8) - Date: 2025-05-15 - Comment: This paper introduces a novel convex smooth surrogate loss with linear regret bounds, which aligns with foundational research in optimization and theoretical efficiency improvements.
Certified Data Removal Under High-dimensional Settings - Score: 16 (R=8, N=8) - Date: 2025-05-13 - Comment: The paper proposes a high-dimensional unlearning algorithm with theoretical guarantees, which aligns with 'Emerging Trends' due to its novel approach to machine unlearning.
GraphComp: Extreme Error-bounded Compression of Scientific Data via Temporal Graph Autoencoders - Score: 16 (R=8, N=8) - Date: 2025-05-13 - Comment: The paper introduces a novel graph-based compression method using temporal graph autoencoders, which aligns with model compression and representation learning.
Morello: Compiling Fast Neural Networks with Dynamic Programming and Spatial Compression - Score: 16 (R=8, N=8) - Date: 2025-05-06 - Comment: The paper introduces a novel dynamic programming-based approach for optimizing neural network inference, which aligns with model compression and efficiency breakthroughs.
Domain-Aware Tensor Network Structure Search - Score: 15 (R=8, N=7) - Date: 2025-05-31 - Comment: The paper proposes a novel framework for tensor network structure search, which involves large language models and domain-aware prompting, relevant to model architecture and efficiency.
Scalable Complexity Control Facilitates Reasoning Ability of LLMs - Score: 15 (R=8, N=7) - Date: 2025-05-31 - Comment: The paper discusses complexity control in LLMs, which is relevant to foundational research in LLM behavior.
Sentinel: Attention Probing of Proxy Models for LLM Context Compression with an Understanding Perspective - Score: 15 (R=8, N=7) - Date: 2025-05-30 - Comment: The paper presents a lightweight framework for context compression in LLMs, which is relevant to model compression and efficiency.
Effective and Efficient One-pass Compression of Speech Foundation Models Using Sparsity-aware Self-pinching Gates - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper presents a novel approach for speech model compression, which aligns with the model compression criterion.
One Rank at a Time: Cascading Error Dynamics in Sequential Learning - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper provides an analysis of error dynamics in sequential learning through low-rank linear regression, which relates to model compression and low-rank approaches.
Look Within or Look Beyond? A Theoretical Comparison Between Parameter-Efficient and Full Fine-Tuning - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper provides a theoretical comparison between parameter-efficient and full fine-tuning, relevant to model architecture and efficiency.
Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework Design - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper explores the integration of speculative decoding and quantization, relevant to model compression and efficiency.
Estimating the Effects of Sample Training Orders for Large Language Models without Retraining - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper presents a retraining-free framework for estimating the effects of sample training orders in LLMs, contributing to theoretical insights into LLM behavior.
Balanced Token Pruning: Accelerating Vision Language Models Beyond Local Optimization - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper introduces a novel token pruning method for vision-language models, which is relevant to model compression through pruning techniques.
Compressing Sine-Activated Low-Rank Adapters through Post-Training Quantization - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper investigates the application of sinusoidal transformations in low-rank adapters and their impact on post-training quantization, relevant to model compression and low-rank approaches.
TuneComp: Joint Fine-tuning and Compression for Large Foundation Models - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper proposes a joint fine-tuning and compression method, which aligns with the model compression criterion.
PolarGrad: A Class of Matrix-Gradient Optimizers from a Unifying Preconditioning Perspective - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper introduces a new class of matrix-gradient optimizers, which is relevant to model training dynamics and optimization.
Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper presents a new recurrent model LrcSSM for efficient sequence modeling, which aligns with the core topic of model architecture innovations.
Efficient Diffusion Models for Symmetric Manifolds - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper introduces efficient diffusion models for symmetric manifolds, focusing on algorithmic efficiency and training speed, which aligns with model compression and efficiency breakthroughs.
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper introduces R2R, a neural token routing method for efficient LLM inference, which is relevant to large language models and efficiency.
FCOS: A Two-Stage Recoverable Model Pruning Framework for Automatic Modulation Recognition - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper introduces a novel model pruning framework, which aligns with the model compression criterion.
EaqVLA: Encoding-aligned Quantization for Vision-Language-Action Models - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper proposes a quantization framework for Vision-Language-Action models, which is relevant to model compression.
Hardware-Efficient Attention for Fast Decoding - Score: 15 (R=8, N=7) - Date: 2025-05-28 - Comment: The paper proposes hardware-efficient attention mechanisms for LLMs, focusing on improving decoding efficiency, which is relevant to model compression and efficiency.
DeCAF: Decentralized Consensus-And-Factorization for Low-Rank Adaptation of Foundation Models - Score: 15 (R=8, N=7) - Date: 2025-05-28 - Comment: The paper presents a novel algorithm for low-rank adaptation in decentralized settings, which aligns with model compression and efficiency improvements.
SageAttention2++: A More Efficient Implementation of SageAttention2 - Score: 15 (R=8, N=7) - Date: 2025-05-28 - Comment: The paper focuses on efficiency improvements in attention mechanisms, which aligns with the model compression criterion.
EasyDistill: A Comprehensive Toolkit for Effective Knowledge Distillation of Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-05-28 - Comment: The paper presents EasyDistill, a toolkit for knowledge distillation of LLMs, which is relevant to model compression and efficiency.
Convergence of Clipped-SGD for Convex $(L_0,L_1)$-Smooth Optimization with Heavy-Tailed Noise - Score: 15 (R=8, N=7) - Date: 2025-05-28 - Comment: The paper provides convergence bounds for Clip-SGD under heavy-tailed noise, which is relevant to model compression and efficiency.
SeMe: Training-Free Language Model Merging via Semantic Alignment - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper introduces a novel, data-free, and training-free approach for merging language models using semantic alignment, which provides insights into the semantic structure of LMs.
Logic Gate Neural Networks are Good for Verification - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper introduces Logic Gate Networks, which are sparse architectures that improve verification, aligning with model architecture and compression topics.
ESLM: Risk-Averse Selective Language Modeling for Efficient Pretraining - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper introduces a risk-aware algorithm for efficient pretraining of LLMs, focusing on model compression and efficiency improvements.
Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper presents a novel KV cache compression framework for visual autoregressive models, which is relevant to model compression and efficiency.
Accelerating Prefilling for Long-Context LLMs via Sparse Pattern Sharing - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper proposes a sparse attention mechanism for long-context LLMs, which aligns with model compression and efficiency improvements.
FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language Models - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: FlowCut proposes a new pruning framework for vision-language models based on information flow, which is relevant to model compression.
Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper evaluates agentic capabilities in compressed LLMs, focusing on compression impacts, relevant to model compression.
CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper introduces CoreMatching, a framework for co-adaptive sparse inference, relevant to model compression and efficiency.
Efficient Data Selection at Scale via Influence Distillation - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper introduces Influence Distillation for data selection in LLM training, which is relevant to understanding training dynamics and efficiency.
AmorLIP: Efficient Language-Image Pretraining via Amortization - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper proposes an efficient language-image pretraining framework using amortization, relevant to representation learning and model efficiency.
ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper proposes ALPS, an attention localization and pruning strategy for efficient LLM alignment, which is relevant to model compression and efficiency.
HD-PiSSA: High-Rank Distributed Orthogonal Adaptation - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper presents HD-PiSSA, a high-rank distributed adaptation method for LLMs, which is relevant to model compression and efficiency.
ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper introduces ThanoRA, a framework for multi-task low-rank adaptation, which is relevant to model compression and efficiency.
PacTrain: Pruning and Adaptive Sparse Gradient Compression for Efficient Collective Communication in Distributed Deep Learning - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper introduces PacTrain, a framework for pruning and sparse gradient compression, relevant to model compression.
Scalable Gaussian Processes with Low-Rank Deep Kernel Decomposition - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper introduces a scalable deep kernel representation for Gaussian processes, which aligns with model compression and efficiency through low-rank approaches.
Task Specific Pruning with LLM-Sieve: How Many Parameters Does Your Task Really Need? - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper presents a framework for task-specific pruning of LLMs, which aligns with model compression through pruning and efficiency improvements.
Beyond Discreteness: Finite-Sample Analysis of Straight-Through Estimator for Quantization - Score: 15 (R=8, N=7) - Date: 2025-05-26 - Comment: The paper provides a finite-sample analysis of the straight-through estimator for quantization, aligning with the model compression criterion.
NeuroTrails: Training with Dynamic Sparse Heads as the Key to Effective Ensembling - Score: 15 (R=8, N=7) - Date: 2025-05-26 - Comment: The paper introduces a sparse multi-head architecture with dynamic sparsity, which is relevant to model architecture and sparsity in model compression.
C-LoRA: Contextual Low-Rank Adaptation for Uncertainty Estimation in Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-05-26 - Comment: The paper introduces C-LoRA, a novel approach for uncertainty-aware fine-tuning of LLMs using contextual low-rank adaptation, which aligns with model compression and efficiency.
COUNTDOWN: Contextually Sparse Activation Filtering Out Unnecessary Weights in Down Projection - Score: 15 (R=8, N=7) - Date: 2025-05-26 - Comment: The paper proposes a method for sparse activation in LLMs, which is relevant to model compression and efficiency improvements.
NeUQI: Near-Optimal Uniform Quantization Parameter Initialization - Score: 15 (R=8, N=7) - Date: 2025-05-26 - Comment: NeUQI proposes a method for initializing quantization parameters, relevant to model compression and efficiency, particularly in the context of LLMs.
DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies - Score: 15 (R=8, N=7) - Date: 2025-05-26 - Comment: The paper proposes a dynamic layer-skipping framework for LLMs, relevant to model architecture and efficiency improvements.
Training Long-Context LLMs Efficiently via Chunk-wise Optimization - Score: 15 (R=8, N=7) - Date: 2025-05-23 - Comment: The paper introduces a memory-efficient training paradigm for long-context LLMs, which is relevant to model compression and efficiency.
HOFT: Householder Orthogonal Fine-tuning - Score: 15 (R=8, N=7) - Date: 2025-05-23 - Comment: The paper proposes a novel orthogonal fine-tuning method for foundation models, aligning with the model compression criterion.
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning - Score: 15 (R=8, N=7) - Date: 2025-05-23 - Comment: The paper introduces EquivPruner, a method for pruning semantically equivalent actions in LLM reasoning, which relates to model compression and efficiency.
Large-Scale Bayesian Tensor Reconstruction: An Approximate Message Passing Solution - Score: 15 (R=8, N=7) - Date: 2025-05-23 - Comment: The paper presents a scalable Bayesian CPD algorithm, which is relevant to model compression and efficiency through low-rank approaches.
NAN: A Training-Free Solution to Coefficient Estimation in Model Merging - Score: 15 (R=8, N=7) - Date: 2025-05-23 - Comment: The paper discusses a training-free method for model merging, which is relevant to model efficiency and compression.
Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models - Score: 15 (R=8, N=7) - Date: 2025-05-23 - Comment: The paper presents a novel approach to restore safety in pruned LVLMs, aligning with Model Compression.
On the creation of narrow AI: hierarchy and nonlocality of neural network skills - Score: 15 (R=8, N=7) - Date: 2025-05-22 - Comment: The paper discusses challenges in creating narrow AI systems and explores representation learning and model compression through pruning, aligning with foundational research in these areas.
SSR: Speculative Parallel Scaling Reasoning in Test-time - Score: 15 (R=8, N=7) - Date: 2025-05-22 - Comment: The paper proposes a framework for efficient reasoning in large language models, which is relevant to large language models and efficiency.
An Efficient Private GPT Never Autoregressively Decodes - Score: 15 (R=8, N=7) - Date: 2025-05-22 - Comment: The paper proposes an efficient private GPT that never autoregressively decodes, which is relevant to model compression as it addresses efficiency in secure inference through public decoding and secure verification.
The Energy Cost of Reasoning: Analyzing Energy Usage in LLMs with Test-time Compute - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper explores test-time compute as a method to improve energy efficiency in LLMs, which is relevant to model efficiency and compression.
Optimizing Binary and Ternary Neural Network Inference on RRAM Crossbars using CIM-Explorer - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper presents a toolkit for optimizing binary and ternary neural network inference on RRAM crossbars, which is relevant to model compression and efficiency.
Structured Agent Distillation for Large Language Model - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper presents a framework for compressing LLM-based agents, focusing on reasoning fidelity and action consistency, which is relevant to model compression.
Scalable Bayesian Monte Carlo: fast uncertainty estimation beyond deep ensembles - Score: 15 (R=8, N=7) - Date: 2025-05-20 - Comment: The paper introduces scalable Bayesian Monte Carlo for uncertainty estimation, which is relevant to Model Compression as it offers improved uncertainty quantification and efficiency.
Fine-tuning Quantized Neural Networks with Zeroth-order Optimization - Score: 15 (R=8, N=7) - Date: 2025-05-20 - Comment: The paper proposes a novel approach for fine-tuning quantized neural networks, relevant to model compression.
Why Knowledge Distillation Works in Generative Models: A Minimal Working Explanation - Score: 15 (R=8, N=7) - Date: 2025-05-20 - Comment: The paper provides a minimal explanation for knowledge distillation in generative models, which is relevant to understanding model compression and efficiency.
Fractured Chain-of-Thought Reasoning - Score: 15 (R=8, N=7) - Date: 2025-05-20 - Comment: The paper introduces Fractured Sampling, an inference-time strategy for LLMs, which aligns with the interest in efficiency improvements in LLMs.
Efficient training for large-scale optical neural network using an evolutionary strategy and attention pruning - Score: 15 (R=8, N=7) - Date: 2025-05-20 - Comment: The paper proposes an efficient training algorithm for optical neural networks, focusing on pruning and optimization, relevant to model compression and efficiency.
Decentralized Arena: Towards Democratic and Scalable Automatic Evaluation of Language Models - Score: 15 (R=8, N=7) - Date: 2025-05-20 - Comment: The paper introduces Decentralized Arena, a framework for evaluating language models, which is relevant to Large Language Models as it offers a novel evaluation method leveraging collective intelligence.
Deep Unfolding with Kernel-based Quantization in MIMO Detection - Score: 15 (R=8, N=7) - Date: 2025-05-20 - Comment: The paper focuses on a novel kernel-based adaptive quantization framework for deep unfolding networks, which aligns with model compression through quantization.
Adversarially Robust Spiking Neural Networks with Sparse Connectivity - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper introduces a method for creating sparse and adversarially robust spiking neural networks, which is relevant to model compression and sparsity.
LoRASuite: Efficient LoRA Adaptation Across Large Language Model Upgrades - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper introduces LoRASuite for efficient adaptation of LoRA weights, aligning with the core topic of model compression and efficiency.
Adaptive parameter-efficient fine-tuning via Hessian-informed subset selection - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper proposes a Hessian-informed subset selection for parameter-efficient fine-tuning, which is relevant to model compression and efficiency.
Exploring Sparsity for Parameter Efficient Fine Tuning Using Wavelets - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper introduces a novel parameter-efficient fine-tuning method using wavelets, which aligns with the model compression criterion.
AltLoRA: Towards Better Gradient Approximation in Low-Rank Adaptation with Alternating Projections - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper proposes AltLoRA, an improvement in low-rank adaptation for LLMs, which aligns with model compression and efficiency.
SRLoRA: Subspace Recomposition in Low-Rank Adaptation via Importance-Based Fusion and Reinitialization - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper introduces a novel approach to enhance LoRA's expressiveness, which is relevant to model compression and efficiency.
Efficient Optimization with Orthogonality Constraint: a Randomized Riemannian Submanifold Method - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper proposes a randomized Riemannian submanifold method for optimization with orthogonality constraints, relevant to foundational research in optimization methods.
STAR: Stage-Wise Attention-Guided Token Reduction for Efficient Large Vision-Language Models Inference - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper introduces a token reduction framework for efficient inference in large vision-language models, which aligns with the model compression criterion.
Efficient Federated Class-Incremental Learning of Pre-Trained Models via Task-agnostic Low-rank Residual Adaptation - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper introduces a novel approach for federated class-incremental learning, focusing on low-rank residual adaptation, which is relevant to model compression.
Back to Square Roots: An Optimal Bound on the Matrix Factorization Error for Multi-Epoch Differentially Private SGD - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper introduces a new matrix factorization method for differentially private training, which is relevant to model compression and efficiency.
SepPrune: Structured Pruning for Efficient Deep Speech Separation - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper presents SepPrune, a structured pruning framework for speech separation, aligning with the core topic of model compression.
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper explores low-bit attention for inference and training, relevant to model compression and efficiency.
Flash Invariant Point Attention - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper introduces FlashIPA, a reformulation of IPA for efficient computation, relevant to model architecture and efficiency.
msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper introduces a novel technique for optimizing CNNs for TinyML, relevant to model compression and efficiency.
MID-L: Matrix-Interpolated Dropout Layer with Layer-wise Neuron Selection - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper introduces a novel dropout layer for efficient neural network computation, which aligns with the model compression criterion.
Memory-Efficient Orthogonal Fine-Tuning with Principal Subspace Adaptation - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper proposes MOFT for memory-efficient orthogonal fine-tuning, aligning with the core topic of model compression and efficiency.
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper proposes a framework for improving long-context capabilities in LLMs, which aligns with foundational research in large language models.
SubGCache: Accelerating Graph-based RAG with Subgraph-level KV Cache - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper introduces SubGCache, which is related to model compression through KV cache optimization, aligning with the core topic of model compression.
SPAT: Sensitivity-based Multihead-attention Pruning on Time Series Forecasting Models - Score: 15 (R=8, N=7) - Date: 2025-05-14 - Comment: The paper introduces a structured pruning method for attention mechanisms in time series forecasting, which aligns with the model compression criterion, particularly sparsity and pruning.
Scaling Laws for Speculative Decoding - Score: 15 (R=8, N=7) - Date: 2025-05-14 - Comment: The paper explores speculative decoding techniques and establishes scaling laws for decoding efficiency in LLMs. This provides theoretical insights into LLM behavior and decoding efficiency, aligning with the LLM criterion.
Solving Nonlinear PDEs with Sparse Radial Basis Function Networks - Score: 15 (R=8, N=7) - Date: 2025-05-13 - Comment: The paper proposes a sparse radial basis function network for solving nonlinear PDEs, focusing on sparsity and adaptive feature selection. This aligns with the model compression criterion, particularly in sparsity and efficiency.
SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-05-13 - Comment: The paper introduces an adaptive routing framework for LLM inference, focusing on efficiency improvements through KV cache management and speculative decoding. This aligns with the model compression criterion, particularly in terms of algorithmic efficiency breakthroughs.
Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity - Score: 15 (R=8, N=7) - Date: 2025-05-13 - Comment: The paper introduces a private inference system leveraging activation sparsity in LLMs, which aligns with 'Model Compression' through its focus on efficiency improvements.
CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging - Score: 15 (R=8, N=7) - Date: 2025-05-13 - Comment: The paper introduces a novel training-free framework for model merging, which aligns with foundational research in model architecture and efficiency. The focus on resolving conflicts in task vector accumulation is relevant to representation learning and model compression.
Efficient Parallelization of Message Passing Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-05-13 - Comment: The paper proposes an efficient parallelization framework for message passing neural networks, which is relevant to model architecture and efficiency improvements.
PRUNE: A Patching Based Repair Framework for Certiffable Unlearning of Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-05-13 - Comment: The paper proposes a novel patching-based framework for certifiable unlearning, which aligns with model compression and efficiency breakthroughs.
Lossless Compression of Large Language Model-Generated Text via Next-Token Prediction - Score: 15 (R=8, N=7) - Date: 2025-05-13 - Comment: The paper investigates lossless compression of LLM-generated text using next-token prediction, aligning with the model compression criterion through its focus on efficient compression techniques.
Importance Analysis for Dynamic Control of Balancing Parameter in a Simple Knowledge Distillation Setting - Score: 15 (R=8, N=7) - Date: 2025-05-13 - Comment: The paper discusses a mathematical rationale for dynamically adjusting the balancing parameter in knowledge distillation, which aligns with model compression and training dynamics in neural networks.
Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning - Score: 15 (R=8, N=7) - Date: 2025-05-09 - Comment: The paper proposes a shortcut approach for efficient on-device learning, which is relevant to model compression and efficiency. The method offers significant memory and computational savings, making it a notable contribution.
Sparsity is All You Need: Rethinking Biological Pathway-Informed Approaches in Deep Learning - Score: 15 (R=8, N=7) - Date: 2025-05-08 - Comment: The paper critiques biologically-informed neural networks and highlights the role of sparsity, which aligns with the 'Representation Learning' and 'Model Compression' criteria, particularly in understanding sparsity's role in neural networks.
Large Language Model Partitioning for Low-Latency Inference at the Edge - Score: 15 (R=8, N=7) - Date: 2025-05-07 - Comment: The paper proposes a resource-aware partitioning algorithm for LLM inference, which aligns with model compression and efficiency topics.
Quantitative Analysis of Performance Drop in DeepSeek Model Quantization - Score: 15 (R=8, N=7) - Date: 2025-05-07 - Comment: The paper discusses quantization techniques for large models, which aligns with the model compression criterion. The introduction of DQ3_K_M as a novel 3-bit quantization method adds some methodological contribution.
Block Circulant Adapter for Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-05-02 - Comment: The paper introduces a block circulant matrix-based fine-tuning method for LLMs, which aligns with model compression and efficiency improvements. The use of Fourier transforms and circulant matrices is a novel approach to reduce storage and computation costs.
Optimizing Deep Neural Networks using Safety-Guided Self Compression - Score: 15 (R=8, N=7) - Date: 2025-05-02 - Comment: The paper introduces a safety-driven quantization framework, which aligns with foundational research in model compression and efficiency.
Scaling On-Device GPU Inference for Large Generative Models - Score: 15 (R=8, N=7) - Date: 2025-05-02 - Comment: The paper presents an optimized framework for on-device GPU inference for large generative models, which aligns with foundational research in model efficiency and compression.
Low-rank computation of the posterior mean in Multi-Output Gaussian Processes - Score: 15 (R=8, N=7) - Date: 2025-05-01 - Comment: The paper presents low-rank methods for efficient computation in multi-output Gaussian processes, aligning with the 'Model Compression' criterion due to its focus on low-rank approaches and efficiency.
Deep Learning Optimization Using Self-Adaptive Weighted Auxiliary Variables - Score: 15 (R=8, N=7) - Date: 2025-05-01 - Comment: The paper introduces a novel optimization framework using self-adaptive weighted auxiliary variables, which aligns with foundational research in training dynamics and optimization for neural networks.

High Performance Computing (33)

Equilibrium Propagation for Learning in Lagrangian Dynamical Systems - Score: 18 (R=9, N=9) - Date: 2025-05-15 - Comment: The paper extends Equilibrium Propagation to Lagrangian dynamical systems, introducing a novel training paradigm that challenges traditional backpropagation, aligning with emerging trends in foundational research.
Absolute Zero: Reinforced Self-play Reasoning with Zero Data - Score: 18 (R=9, N=9) - Date: 2025-05-07 - Comment: The paper introduces a self-evolving reasoning paradigm for LLMs, which aligns with the 'Large Language Models' criterion for foundational innovations in reasoning capabilities.
Agentic Neurodivergence as a Contingent Solution to the AI Alignment Problem - Score: 18 (R=9, N=9) - Date: 2025-05-07 - Comment: The paper provides a theoretical argument about AI alignment and introduces a novel perspective on misalignment as a strategy, which aligns with emerging trends in foundational AI research.
Are Large Language Models Reliable AI Scientists? Assessing Reverse-Engineering of Black-Box Systems - Score: 17 (R=9, N=8) - Date: 2025-05-26 - Comment: The paper provides theoretical insights into LLM behavior, specifically in reverse-engineering black-box systems, which aligns with the LLM criterion.
Pre-training Large Memory Language Models with Internal and External Knowledge - Score: 17 (R=9, N=8) - Date: 2025-05-22 - Comment: The paper introduces Large Memory Language Models with a novel pre-training approach, which is relevant to foundational research in LLMs.
Let LLMs Break Free from Overthinking via Self-Braking Tuning - Score: 17 (R=9, N=8) - Date: 2025-05-21 - Comment: The paper proposes a novel framework for reducing overthinking in LLMs, which is relevant to foundational research in LLM behavior.
Dense Communication between Language Models - Score: 17 (R=9, N=8) - Date: 2025-05-20 - Comment: The paper introduces a novel paradigm of direct dense vector communication between LLMs, which aligns with the core topic of Large Language Models and offers a new perspective on scaling for general intelligence.
Confabulation dynamics in a reservoir computer: Filling in the gaps with untrained attractors - Score: 17 (R=9, N=8) - Date: 2025-05-09 - Comment: The paper provides foundational insights into confabulation dynamics in reservoir computers, analyzing untrained attractors. This aligns with emerging trends and representation learning, offering theoretical contributions to understanding learning systems.
SwarmThinkers: Learning Physically Consistent Atomic KMC Transitions at Scale - Score: 17 (R=8, N=9) - Date: 2025-05-27 - Comment: SwarmThinkers introduces a novel reinforcement learning framework for atomic-scale simulation, relevant to AI for Science with a focus on foundational research.
Continuous Chain of Thought Enables Parallel Exploration and Reasoning - Score: 16 (R=8, N=8) - Date: 2025-05-31 - Comment: The paper explores continuous chain of thought for reasoning, which is relevant to large language models and introduces a novel supervision strategy.
Why Machine Learning Models Fail to Fully Capture Epistemic Uncertainty - Score: 16 (R=8, N=8) - Date: 2025-05-31 - Comment: The paper provides a theoretical analysis of epistemic uncertainty in machine learning models, which is relevant to emerging trends in understanding model uncertainty.
Hyperbolic-PDE GNN: Spectral Graph Neural Networks in the Perspective of A System of Hyperbolic Partial Differential Equations - Score: 16 (R=8, N=8) - Date: 2025-05-31 - Comment: The paper formulates message passing in GNNs as a system of hyperbolic PDEs, which is relevant to model architecture innovations and spectral GNNs.
Advanced long-term earth system forecasting by learning the small-scale nature - Score: 16 (R=8, N=8) - Date: 2025-05-27 - Comment: Triton addresses spectral bias in AI models for Earth system forecasting, relevant to AI for Science with a focus on foundational research.
An Iterative Framework for Generative Backmapping of Coarse Grained Proteins - Score: 16 (R=8, N=8) - Date: 2025-05-26 - Comment: The paper introduces a novel iterative framework for generative backmapping of proteins, aligning with AI for Science through foundational research in molecular modeling.
Strictly Constrained Generative Modeling via Split Augmented Langevin Sampling - Score: 16 (R=8, N=8) - Date: 2025-05-26 - Comment: The paper introduces a novel sampling algorithm for generative models, aligning with the emerging trends criterion.
Deep Koopman operator framework for causal discovery in nonlinear dynamical systems - Score: 16 (R=8, N=8) - Date: 2025-05-21 - Comment: The paper introduces a novel causal discovery algorithm using deep learning and Koopman operator methods, which is relevant to emerging trends in foundational research.
A Path to Universal Neural Cellular Automata - Score: 16 (R=8, N=8) - Date: 2025-05-20 - Comment: The paper explores neural cellular automata for universal computation, which is relevant to emerging trends in foundational research.
Hardware-Adaptive and Superlinear-Capacity Memristor-based Associative Memory - Score: 16 (R=8, N=8) - Date: 2025-05-20 - Comment: The paper presents a memristor-based associative memory system, which is relevant to Model Architecture as it introduces a new hardware-adaptive learning algorithm for associative memories.
Foundation model for mass spectrometry proteomics - Score: 16 (R=8, N=8) - Date: 2025-05-19 - Comment: The paper proposes a foundation model for mass spectrometry proteomics, which is relevant to AI for science and foundational model research.
Boltzmann Classifier: A Thermodynamic-Inspired Approach to Supervised Learning - Score: 16 (R=8, N=8) - Date: 2025-05-13 - Comment: The paper introduces the Boltzmann Classifier, which is a novel energy-based approach to supervised learning, aligning with emerging trends and foundational innovations.
Generative Discovery of Partial Differential Equations by Learning from Math Handbooks - Score: 16 (R=8, N=8) - Date: 2025-05-12 - Comment: The paper introduces a generative model for discovering PDEs, which aligns with AI for Science by addressing foundational research in scientific discovery. The use of generative models for PDE discovery is novel and impactful.
Learning and Transferring Physical Models through Derivatives - Score: 16 (R=8, N=8) - Date: 2025-05-05 - Comment: The paper introduces a novel approach, Derivative Learning (DERL), for modeling physical systems through partial derivatives and incremental knowledge transfer. This aligns with the 'AI for Science' criterion as it provides foundational insights into modeling physical systems.
Defining Foundation Models for Computational Science: A Call for Clarity and Rigor - Score: 15 (R=8, N=7) - Date: 2025-05-31 - Comment: The paper calls for clarity in defining foundation models for computational science, which is relevant to foundational research in AI for science.
Geometric Hyena Networks for Large-scale Equivariant Learning - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper introduces Geometric Hyena, an equivariant long-convolutional model for geometric systems, which is relevant to model architecture innovations.
AI Mathematician: Towards Fully Automated Frontier Mathematical Research - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper proposes an AI framework for mathematical research using LRMs, which is relevant to foundational research in LLMs.
Towards General Continuous Memory for Vision-Language Models - Score: 15 (R=8, N=7) - Date: 2025-05-26 - Comment: The paper introduces a novel continuous memory system for vision-language models, which relates to model architecture innovations.
Conformal Language Model Reasoning with Coherent Factuality - Score: 15 (R=8, N=7) - Date: 2025-05-22 - Comment: The paper introduces a method for ensuring coherent factuality in LLM reasoning, which is relevant to theoretical insights into LLM behavior.
Adversarial Testing in LLMs: Insights into Decision-Making Vulnerabilities - Score: 15 (R=8, N=7) - Date: 2025-05-20 - Comment: The paper presents an adversarial evaluation framework for LLMs, focusing on decision-making vulnerabilities, which is relevant to theoretical insights into LLM behavior.
On the Thinking-Language Modeling Gap in Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-05-20 - Comment: The paper discusses a new prompt technique to address biases in LLMs, which aligns with the interest in theoretical insights into LLM behavior.
On the Eligibility of LLMs for Counterfactual Reasoning: A Decompositional Study - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper provides a decompositional study on counterfactual reasoning in LLMs, which aligns with the interest in theoretical insights into LLM behavior.
Learning dynamically inspired invariant subspaces for Koopman and transfer operator approximation - Score: 15 (R=8, N=7) - Date: 2025-05-09 - Comment: This paper explores operator learning and invariant subspaces, which aligns with representation learning and foundational research into how systems encode information. The use of machine-learned basis functions for spectral properties is novel.
Is the end of Insight in Sight ? - Score: 15 (R=8, N=7) - Date: 2025-05-08 - Comment: The paper questions the explainability of AI by analyzing weight matrices in PINNs, suggesting a potential challenge to the goal of Explainable AI. This aligns with emerging trends challenging established assumptions.
Knowing You Don't Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing - Score: 15 (R=8, N=7) - Date: 2025-05-07 - Comment: The paper proposes a framework for improving multi-round RAG systems, which aligns with foundational improvements in LLM behavior and self-awareness.

Representation Learning (234)

Structure-Aligned Protein Language Model - Score: 20.0 (R=0, N=0) - Date: 2025-05-23 - Comment: Author match
Self-Evolving Curriculum for LLM Reasoning - Score: 20.0 (R=0, N=0) - Date: 2025-05-21 - Comment: Author match
Contextures: Representations from Contexts - Score: 19 (R=10, N=9) - Date: 2025-05-07 - Comment: The paper introduces a novel theoretical framework for representation learning, directly addressing the 'Representation Learning' criterion with a focus on foundational insights.
Quiet Feature Learning in Algorithmic Tasks - Score: 18 (R=10, N=8) - Date: 2025-05-08 - Comment: The paper provides insights into representation learning by analyzing how features are encoded and emerge in Transformer-based models during training. This aligns closely with the 'Representation Learning' criterion, particularly in understanding training dynamics and feature learning.
When Shift Happens - Confounding Is to Blame - Score: 18 (R=9, N=9) - Date: 2025-05-28 - Comment: The paper provides theoretical insights into distribution shifts and hidden confounding, which aligns with emerging trends and foundational research.
Generative Distribution Embeddings - Score: 18 (R=9, N=9) - Date: 2025-05-26 - Comment: The paper introduces a new framework for learning representations of distributions, relevant to representation learning and generative paradigms.
Towards Non-Euclidean Foundation Models: Advancing AI Beyond Euclidean Frameworks - Score: 18 (R=9, N=9) - Date: 2025-05-21 - Comment: The paper explores non-Euclidean spaces for foundation models, which is a novel direction in model architecture and representation learning.
SAD Neural Networks: Divergent Gradient Flows and Asymptotic Optimality via o-minimal Structures - Score: 18 (R=9, N=9) - Date: 2025-05-15 - Comment: The paper provides theoretical insights into gradient flows in neural networks, leveraging o-minimal structures, which is highly relevant to training dynamics and foundational representation learning.
The Geometry of Meaning: Perfect Spacetime Representations of Hierarchical Structures - Score: 18 (R=9, N=9) - Date: 2025-05-15 - Comment: The paper proposes a novel geometric representation of hierarchical structures in 3D Minkowski spacetime, with connections to general relativity and field theory. This introduces a potentially transformative paradigm for representation learning.
Bidirectional predictive coding - Score: 17 (R=9, N=8) - Date: 2025-05-31 - Comment: The paper proposes a bidirectional predictive coding model, which is relevant to representation learning as it explores how the brain encodes information through generative and discriminative inference.
Walking the Weight Manifold: a Topological Approach to Conditioning Inspired by Neuromodulation - Score: 17 (R=9, N=8) - Date: 2025-05-31 - Comment: The paper introduces a topological approach to conditioning inspired by neuromodulation, which is relevant to model architecture and representation learning.
Navigating the Latent Space Dynamics of Neural Models - Score: 17 (R=9, N=8) - Date: 2025-05-31 - Comment: The paper presents a novel interpretation of neural models as dynamical systems on latent manifolds, relevant to representation learning and model analysis.
Representing local protein environments with atomistic foundation models - Score: 17 (R=9, N=8) - Date: 2025-05-30 - Comment: The paper introduces a novel representation for local protein environments using atomistic foundation models, aligning with foundational research in AI for Science.
In Dialogue with Intelligence: Rethinking Large Language Models as Collective Knowledge - Score: 17 (R=9, N=8) - Date: 2025-05-30 - Comment: The paper offers a theoretical re-framing of LLMs as dynamic instantiations of collective human knowledge, which aligns with foundational research in LLM behavior and interpretability.
An Augmentation-Aware Theory for Self-Supervised Contrastive Learning - Score: 17 (R=9, N=8) - Date: 2025-05-29 - Comment: The paper provides a theoretical framework for self-supervised contrastive learning, which aligns with representation learning.
Self-Organizing Visual Prototypes for Non-Parametric Representation Learning - Score: 17 (R=9, N=8) - Date: 2025-05-29 - Comment: The paper introduces a novel training technique for unsupervised visual feature learning, focusing on non-parametric representation learning, which aligns with the representation learning criterion.
Algorithms and SQ Lower Bounds for Robustly Learning Real-valued Multi-index Models - Score: 17 (R=9, N=8) - Date: 2025-05-28 - Comment: The paper provides a theoretical analysis of learning real-valued Multi-Index Models, which is relevant to representation learning and emerging trends.
Can Large Reasoning Models Self-Train? - Score: 17 (R=9, N=8) - Date: 2025-05-28 - Comment: The paper explores self-training in large reasoning models, which touches on foundational aspects of LLM behavior and interpretability.
Factual Self-Awareness in Language Models: Representation, Robustness, and Scaling - Score: 17 (R=9, N=8) - Date: 2025-05-28 - Comment: The paper investigates factual self-awareness in LLMs, providing theoretical insights into LLM behavior, aligning with the large language models criterion.
Who Reasons in the Large Language Models? - Score: 17 (R=9, N=8) - Date: 2025-05-28 - Comment: The paper investigates the reasoning capabilities of LLMs, providing insights into model behavior and interpretability, which is relevant to foundational research in LLMs.
Prot2Token: A Unified Framework for Protein Modeling via Next-Token Prediction - Score: 17 (R=9, N=8) - Date: 2025-05-28 - Comment: Prot2Token presents a unified framework for protein modeling using next-token prediction, aligning with foundational research in AI for science and representation learning.
Kernel Quantile Embeddings and Associated Probability Metrics - Score: 17 (R=9, N=8) - Date: 2025-05-28 - Comment: The paper introduces kernel quantile embeddings, which is relevant to representation learning and emerging trends.
The Coverage Principle: A Framework for Understanding Compositional Generalization - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper introduces the coverage principle for understanding compositional generalization, which is relevant to representation learning and emerging trends.
Grokking ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper presents a unified framework for model, data, and training attribution, which is relevant to understanding model behavior and training dynamics.
Token-Importance Guided Direct Preference Optimization - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper introduces a novel method for optimizing LLM outputs by focusing on token importance, which aligns with foundational research in LLM behavior and interpretability.
When fractional quasi p-norms concentrate - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper addresses the concentration of fractional quasi p-norms, relevant to emerging trends in theoretical work.
The Birth of Knowledge: Emergent Features across Time, Space, and Scale in Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper studies the emergence of interpretable features in LLMs using sparse autoencoders, which aligns with representation learning and provides insights into how deep networks encode information.
Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper analyzes the generalization gap in Langevin dynamics, providing theoretical insights into training dynamics and generalization.
Operator Learning for Schr\"{o}dinger Equation: Unitarity, Error Bounds, and Time Generalization - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper addresses operator learning for the Schrödinger equation with theoretical guarantees, relevant to foundational research in AI for science.
Feature Preserving Shrinkage on Bayesian Neural Networks via the R2D2 Prior - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper proposes a novel R2D2-Net for Bayesian neural networks, focusing on feature-preserving shrinkage, which aligns with representation learning and model compression through sparsity and shrinkage methods.
Time to Spike? Understanding the Representational Power of Spiking Neural Networks in Discrete Time - Score: 17 (R=9, N=8) - Date: 2025-05-26 - Comment: The paper provides theoretical insights into the representational power of spiking neural networks, which aligns with the representation learning criterion. It explores the complexity of input space partitioning and compares SNNs with ANNs, contributing to foundational research.
The Nuclear Route: Sharp Asymptotics of ERM in Overparameterized Quadratic Networks - Score: 17 (R=9, N=8) - Date: 2025-05-26 - Comment: The paper provides a theoretical analysis of overparameterized quadratic networks, focusing on capacity control through low-rank structures, which is relevant to representation learning and model architecture.
Mixture of Low Rank Adaptation with Partial Parameter Sharing for Time Series Forecasting - Score: 17 (R=9, N=8) - Date: 2025-05-26 - Comment: The paper introduces a Mixture-of-Low-Rank Adaptation model for time series forecasting, aligning with model architecture and representation learning criteria.
Attention with Trained Embeddings Provably Selects Important Tokens - Score: 17 (R=9, N=8) - Date: 2025-05-26 - Comment: The paper provides theoretical insights into token embeddings and attention mechanisms, relevant to representation learning and model architecture.
Implicit Regularization of Infinitesimally-perturbed Gradient Descent Toward Low-dimensional Solutions - Score: 17 (R=9, N=8) - Date: 2025-05-23 - Comment: The paper studies implicit regularization in gradient descent, which is relevant to representation learning and training dynamics.
Understanding Prompt Tuning and In-Context Learning via Meta-Learning - Score: 17 (R=9, N=8) - Date: 2025-05-23 - Comment: The paper provides a theoretical understanding of prompt tuning and in-context learning via meta-learning, relevant to large language models and representation learning.
Latent Principle Discovery for Language Model Self-Improvement - Score: 17 (R=9, N=8) - Date: 2025-05-23 - Comment: The paper focuses on a novel method for self-improvement in language models by discovering latent principles, which aligns with foundational research in representation learning and LLM behavior.
Stochastic Forward-Forward Learning through Representational Dimensionality Compression - Score: 17 (R=9, N=8) - Date: 2025-05-23 - Comment: The paper introduces a novel goodness function for the Forward-Forward algorithm, contributing to representation learning with a focus on dimensionality compression.
Robust Invariant Representation Learning by Distribution Extrapolation - Score: 17 (R=9, N=8) - Date: 2025-05-23 - Comment: The paper proposes a novel extrapolation-based framework for invariant representation learning, which aligns with the representation learning criterion.
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space - Score: 17 (R=9, N=8) - Date: 2025-05-22 - Comment: The paper introduces a novel method for reasoning in LLMs using continuous concept space, which is relevant to theoretical insights into LLM behavior.
Mechanistic Insights into Grokking from the Embedding Layer - Score: 17 (R=9, N=8) - Date: 2025-05-22 - Comment: The paper provides mechanistic insights into grokking, focusing on embedding layers, which is relevant to representation learning and training dynamics.
Generalization Through Growth: Hidden Dynamics Controls Depth Dependence - Score: 17 (R=9, N=8) - Date: 2025-05-22 - Comment: The paper presents a unified framework for understanding depth dependence in neural networks, which aligns with representation learning and model architecture analysis.
Just One Layer Norm Guarantees Stable Extrapolation - Score: 17 (R=9, N=8) - Date: 2025-05-21 - Comment: The paper provides theoretical insights into the behavior of neural networks with Layer Norm, which aligns with representation learning and model architecture analysis.
Better Neural Network Expressivity: Subdividing the Simplex - Score: 17 (R=9, N=8) - Date: 2025-05-21 - Comment: The paper provides insights into the expressivity of ReLU neural networks, which is relevant to representation learning and model architecture.
New Evidence of the Two-Phase Learning Dynamics of Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-05-21 - Comment: The paper provides insights into the training dynamics of neural networks, specifically focusing on the two-phase learning dynamics, which aligns with the representation learning criterion.
Understanding Task Representations in Neural Networks via Bayesian Ablation - Score: 17 (R=9, N=8) - Date: 2025-05-20 - Comment: The paper introduces a novel probabilistic framework for interpreting latent task representations in neural networks, aligning with representation learning.
Deterministic Bounds and Random Estimates of Metric Tensors on Neuromanifolds - Score: 17 (R=9, N=8) - Date: 2025-05-20 - Comment: The paper provides deterministic bounds and random estimates of metric tensors on neuromanifolds, contributing to theoretical insights in representation learning.
Implicit bias produces neural scaling laws in learning curves, from perceptrons to deep networks - Score: 17 (R=9, N=8) - Date: 2025-05-20 - Comment: The paper uncovers dynamical scaling laws in learning curves, providing insights into representation learning and training dynamics.
Information Science Principles of Machine Learning: A Causal Chain Meta-Framework Based on Formalized Information Mapping - Score: 17 (R=9, N=8) - Date: 2025-05-20 - Comment: The paper presents a meta-framework for machine learning theory, addressing interpretability and ethical safety, which aligns with emerging trends in foundational research.
Causality-Inspired Robustness for Nonlinear Models via Representation Learning - Score: 17 (R=9, N=8) - Date: 2025-05-20 - Comment: The paper introduces a causality-inspired robustness method for nonlinear models via representation learning, contributing to theoretical insights in representation learning.
Harnessing the Universal Geometry of Embeddings - Score: 17 (R=9, N=8) - Date: 2025-05-19 - Comment: The paper presents a method for translating text embeddings without paired data, relevant to representation learning and foundational research in embeddings.
Unsupervised Invariant Risk Minimization - Score: 17 (R=9, N=8) - Date: 2025-05-19 - Comment: The paper proposes an unsupervised framework for invariant risk minimization, relevant to representation learning and foundational research.
Joint Embedding vs Reconstruction: Provable Benefits of Latent Space Prediction for Self Supervised Learning - Score: 17 (R=9, N=8) - Date: 2025-05-19 - Comment: The paper compares joint embedding and reconstruction in self-supervised learning, providing insights into representation learning paradigms.
Neural Thermodynamics I: Entropic Forces in Deep and Universal Representation Learning - Score: 17 (R=9, N=8) - Date: 2025-05-19 - Comment: The paper proposes a theory for understanding learning dynamics in neural networks, which aligns with the representation learning criterion.
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors - Score: 17 (R=9, N=8) - Date: 2025-05-19 - Comment: The paper explores internal causal mechanisms in language models to predict out-of-distribution behaviors, offering theoretical insights into LLM behavior and interpretability.
Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis - Score: 17 (R=9, N=8) - Date: 2025-05-19 - Comment: The paper challenges the assumption that better performance implies better internal representations, aligning with the representation learning criterion.
Training NTK to Generalize with KARE - Score: 17 (R=9, N=8) - Date: 2025-05-19 - Comment: The paper proposes optimizing the neural tangent kernel explicitly, which aligns with representation learning and training dynamics.
Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context Learning - Score: 17 (R=9, N=8) - Date: 2025-05-19 - Comment: The paper investigates in-context learning in large-scale transformer models, providing insights into training dynamics and interpretability, relevant to representation learning and LLM behavior.
Learning Repetition-Invariant Representations for Polymer Informatics - Score: 17 (R=9, N=8) - Date: 2025-05-16 - Comment: The paper introduces a novel method for learning repetition-invariant representations in polymer informatics, which aligns with representation learning by providing insights into encoding information in deep networks.
Superposition Yields Robust Neural Scaling - Score: 17 (R=9, N=8) - Date: 2025-05-16 - Comment: The paper provides insights into neural scaling laws and representation superposition, which is relevant to representation learning and foundational research in LLMs.
Variational Rank Reduction Autoencoder - Score: 17 (R=9, N=8) - Date: 2025-05-15 - Comment: The paper introduces Variational Rank Reduction Autoencoders (VRRAEs), combining SVD-based regularization with VAEs, which is relevant to representation learning and autoencoder innovations.
An Analytical Characterization of Sloppiness in Neural Networks: Insights from Linear Models - Score: 17 (R=9, N=8) - Date: 2025-05-15 - Comment: This paper provides an analytical characterization of training dynamics in neural networks, focusing on low-dimensional manifolds and their geometry. It aligns closely with foundational research in representation learning and training dynamics.
Rapid Overfitting of Multi-Pass Stochastic Gradient Descent in Stochastic Convex Optimization - Score: 17 (R=9, N=8) - Date: 2025-05-14 - Comment: The paper studies the generalization behavior of multi-pass stochastic gradient descent (SGD) in stochastic convex optimization, providing theoretical insights into overfitting dynamics. This aligns with foundational research in training dynamics.
Iteratively reweighted kernel machines efficiently learn sparse functions - Score: 17 (R=9, N=8) - Date: 2025-05-14 - Comment: The paper explores sparse function learning using kernel machines, which aligns with representation learning through sparse methods. It provides theoretical insights into kernel methods, making it relevant to foundational research.
Super-fast rates of convergence for Neural Networks Classifiers under the Hard Margin Condition - Score: 17 (R=9, N=8) - Date: 2025-05-14 - Comment: The paper provides theoretical insights into the performance of deep neural networks under specific conditions, which aligns with foundational research in representation learning and training dynamics.
Lost in Transmission: When and Why LLMs Fail to Reason Globally - Score: 17 (R=9, N=8) - Date: 2025-05-14 - Comment: The paper introduces the BAPO model to analyze reasoning failures in LLMs, providing theoretical insights into LLM behavior and interpretability, which aligns with the LLM criterion.
Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders - Score: 17 (R=9, N=8) - Date: 2025-05-14 - Comment: The paper introduces Gradient Sparse Autoencoders (GradSAE) to identify influential latents, which is highly relevant to representation learning. The focus on causality and gradient-based methods adds theoretical depth.
Recovering Event Probabilities from Large Language Model Embeddings via Axiomatic Constraints - Score: 17 (R=9, N=8) - Date: 2025-05-14 - Comment: This paper explores recovering coherent event probabilities from LLM embeddings using an extended VAE, which aligns with foundational research in representation learning and theoretical insights into LLM behavior.
Analytic theory of dropout regularization - Score: 17 (R=9, N=8) - Date: 2025-05-13 - Comment: The paper provides an analytic theory of dropout regularization, offering theoretical insights into training dynamics and generalization error. This aligns with the representation learning criterion, particularly in understanding how networks encode information.
Learning from Samples: Inverse Problems over measures via Sharpened Fenchel-Young Losses - Score: 17 (R=9, N=8) - Date: 2025-05-13 - Comment: The paper introduces sharpened Fenchel-Young losses for inverse problems, which is a novel theoretical contribution relevant to representation learning and optimization.
Towards the Three-Phase Dynamics of Generalization Power of a DNN - Score: 17 (R=9, N=8) - Date: 2025-05-13 - Comment: The paper provides a theoretical analysis of the generalization dynamics in DNNs, which is highly relevant to representation learning and training dynamics. The discovery of three-phase dynamics is novel and insightful.
Symbolic Rule Extraction from Attention-Guided Sparse Representations in Vision Transformers - Score: 17 (R=9, N=8) - Date: 2025-05-13 - Comment: The paper proposes a framework for symbolic rule extraction from Vision Transformers, which aligns with representation learning and interpretability. The use of sparse concept layers and symbolic reasoning is novel and foundational.
IIKL: Isometric Immersion Kernel Learning with Riemannian Manifold for Geometric Preservation - Score: 17 (R=9, N=8) - Date: 2025-05-13 - Comment: The paper proposes a novel method for geometric representation learning using Riemannian manifolds, which aligns with the 'Representation Learning' criterion by addressing how data is encoded while preserving geometric properties.
Towards a Unified Representation Evaluation Framework Beyond Downstream Tasks - Score: 17 (R=9, N=8) - Date: 2025-05-12 - Comment: The paper introduces a unified framework for evaluating representations beyond downstream tasks, focusing on attributes like equivariance, invariance, and disentanglement. This is highly relevant to representation learning and offers a novel evaluation perspective.
Deep-ICE: The first globally optimal algorithm for empirical risk minimization of two-layer maxout and ReLU networks - Score: 17 (R=9, N=8) - Date: 2025-05-12 - Comment: The paper introduces a globally optimal algorithm for empirical risk minimization in two-layer maxout and ReLU networks, which is highly relevant to foundational research in representation learning and training dynamics. The coreset selection method for scaling the algorithm adds significant novelty.
Rethinking Graph Contrastive Learning through Relative Similarity Preservation - Score: 17 (R=9, N=8) - Date: 2025-05-12 - Comment: The paper proposes RELGCL, a novel graph contrastive learning framework that leverages relative similarity patterns. This aligns with representation learning and introduces a significant theoretical insight into graph learning.
Stochastic Variational Propagation: Local, Scalable and Efficient Alternative to Backpropagation - Score: 17 (R=9, N=8) - Date: 2025-05-09 - Comment: The paper proposes Stochastic Variational Propagation (SVP) as an alternative to backpropagation, introducing a probabilistic perspective to representation learning and scalability. This aligns well with foundational research in training dynamics.
Understanding In-context Learning of Addition via Activation Subspaces - Score: 17 (R=9, N=8) - Date: 2025-05-09 - Comment: This paper provides insights into in-context learning in transformers, focusing on activation subspaces and computational structures. It aligns well with representation learning and theoretical analysis of LLMs, offering novel insights into model behavior.
Rethinking Invariance in In-context Learning - Score: 17 (R=9, N=8) - Date: 2025-05-09 - Comment: The paper addresses invariance in in-context learning, a key capability of LLMs, and proposes a novel methodology (InvICL) with theoretical and empirical contributions, making it relevant to foundational LLM research.
Chain-of-Thought Tokens are Computer Program Variables - Score: 17 (R=9, N=8) - Date: 2025-05-09 - Comment: The paper investigates the role of chain-of-thought tokens in LLMs, providing insights into their function as variables, which is highly relevant to understanding LLM behavior and representation learning.
When Bad Data Leads to Good Models - Score: 17 (R=9, N=8) - Date: 2025-05-09 - Comment: This paper explores the impact of toxic data on LLM pretraining and its implications for representation geometry, which aligns with foundational insights into LLM behavior and representation learning.
Position: Foundation Models Need Digital Twin Representations - Score: 17 (R=9, N=8) - Date: 2025-05-08 - Comment: The position paper argues for digital twin representations as an alternative to token-based representations in foundation models. This aligns with emerging trends and challenges established assumptions in representation learning.
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection - Score: 17 (R=9, N=8) - Date: 2025-05-08 - Comment: Proposes a theoretical framework for understanding fine-tuning dynamics in LLMs, leveraging PAC-Bayes bounds and NTK-based models. This aligns with foundational research in LLM behavior and generalization.
Sharpness-Aware Minimization with Z-Score Gradient Filtering for Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-05-07 - Comment: The paper introduces ZSharp, an improvement to Sharpness-Aware Minimization, which directly contributes to foundational research in optimization and generalization in neural networks.
What do Language Model Probabilities Represent? From Distribution Estimation to Response Prediction - Score: 17 (R=9, N=8) - Date: 2025-05-07 - Comment: The paper provides theoretical insights into the interpretation of LLM probabilities, aligning with the LLM behavior/interpretability criterion.
StablePCA: Learning Shared Representations across Multiple Sources via Minimax Optimization - Score: 17 (R=9, N=8) - Date: 2025-05-05 - Comment: The paper proposes StablePCA, a novel method for robust representation learning across multiple sources using minimax optimization. This aligns well with representation learning and introduces a theoretically grounded approach to address nonconvexity challenges.
On the generalization of language models from in-context learning and finetuning: a controlled study - Score: 17 (R=9, N=8) - Date: 2025-05-02 - Comment: The paper explores generalization differences between in-context learning and fine-tuning in LLMs, providing theoretical insights into LLM behavior and inductive biases, which aligns with the LLM criterion.
Empirical Evaluation of Progressive Coding for Sparse Autoencoders - Score: 17 (R=9, N=8) - Date: 2025-05-02 - Comment: The paper evaluates sparse autoencoders and introduces Matryoshka SAEs, which aligns with foundational research in representation learning and sparse methods.
Recursive KL Divergence Optimization: A Dynamic Framework for Representation Learning - Score: 17 (R=9, N=8) - Date: 2025-05-01 - Comment: The paper proposes a recursive KL divergence optimization framework for representation learning, which directly aligns with foundational research in representation learning and training dynamics.
Weakly-Supervised Contrastive Learning for Imprecise Class Labels - Score: 16 (R=9, N=7) - Date: 2025-05-29 - Comment: The paper proposes a weakly-supervised contrastive learning framework, which is relevant to representation learning, focusing on semantic similarity and graph-theoretic approaches.
Learning Shared Representations from Unpaired Data - Score: 16 (R=9, N=7) - Date: 2025-05-29 - Comment: The paper explores learning shared representations from unpaired data, contributing to foundational research in representation learning.
Pretrained LLMs Learn Multiple Types of Uncertainty - Score: 16 (R=9, N=7) - Date: 2025-05-28 - Comment: The paper studies how LLMs capture uncertainty, providing insights into model behavior and interpretability, relevant to foundational research in LLMs.
ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs - Score: 16 (R=9, N=7) - Date: 2025-05-26 - Comment: The paper introduces a novel method for efficient interpretability in LLMs, aligning with the LLM criterion.
An approach to identify the most semantically informative deep representations of text and images - Score: 16 (R=9, N=7) - Date: 2025-05-26 - Comment: The paper investigates deep representations in LLMs and vision transformers, aligning with the representation learning criterion.
Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs - Score: 16 (R=9, N=7) - Date: 2025-05-23 - Comment: The paper investigates the reversibility of machine unlearning in LLMs, providing theoretical insights into LLM behavior and interpretability.
Learning novel representations of variable sources from multi-modal $\textit{Gaia}$ data via autoencoders - Score: 16 (R=9, N=7) - Date: 2025-05-23 - Comment: The paper uses variational autoencoders to learn novel representations from Gaia data, relevant to representation learning and autoencoders.
Protoknowledge Shapes Behaviour of LLMs in Downstream Tasks: Memorization and Generalization with Knowledge Graphs - Score: 16 (R=9, N=7) - Date: 2025-05-22 - Comment: The paper explores the concept of protoknowledge in LLMs, focusing on how knowledge graphs are internalized and utilized, which aligns with foundational research in LLM behavior and interpretability.
Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering - Score: 16 (R=9, N=7) - Date: 2025-05-22 - Comment: The paper introduces a method for denoising concept vectors using sparse autoencoders, which is relevant to representation learning and model compression.
An Explanation of Intrinsic Self-Correction via Linear Representations and Latent Concepts - Score: 16 (R=9, N=7) - Date: 2025-05-19 - Comment: The paper provides an explanation for intrinsic self-correction in language models, which is relevant to understanding LLM behavior and interpretability.
On Next-Token Prediction in LLMs: How End Goals Determine the Consistency of Decoding Algorithms - Score: 16 (R=9, N=7) - Date: 2025-05-19 - Comment: The paper studies next-token prediction in LLMs, relevant to foundational research in LLM behavior and interpretability.
LangVAE and LangSpace: Building and Probing for Language Model VAEs - Score: 16 (R=9, N=7) - Date: 2025-05-02 - Comment: The paper presents LangVAE, a framework for building and analyzing variational autoencoders (VAEs) on top of LLMs. This contributes to representation learning by enabling compact and semantically disentangled representations, making it relevant to foundational research.
Implicit Inversion turns CLIP into a Decoder - Score: 16 (R=8, N=8) - Date: 2025-05-31 - Comment: The paper explores the generative potential of CLIP without a decoder, offering insights into representation learning and model architecture.
Is Noise Conditioning Necessary? A Unified Theory of Unconditional Graph Diffusion Models - Score: 16 (R=8, N=8) - Date: 2025-05-31 - Comment: The paper challenges the necessity of noise conditioning in graph diffusion models, which is relevant to representation learning and model efficiency.
Maximum Likelihood Learning of Latent Dynamics Without Reconstruction - Score: 16 (R=8, N=8) - Date: 2025-05-30 - Comment: The paper introduces RP-GSSM, a novel unsupervised learning method for time series data, which is relevant to representation learning.
Directed Graph Grammars for Sequence-based Learning - Score: 16 (R=8, N=8) - Date: 2025-05-30 - Comment: The paper presents a grammar-based approach for sequence representation of DAGs, which is relevant to representation learning and offers a novel method for graph encoding.
Evaluating Training in Binarized Neural Networks Through the Lens of Algorithmic Information Theory - Score: 16 (R=8, N=8) - Date: 2025-05-28 - Comment: The paper applies algorithmic information theory to Binarized Neural Networks, offering a new perspective on training dynamics, relevant to representation learning.
Holes in Latent Space: Topological Signatures Under Adversarial Influence - Score: 16 (R=8, N=8) - Date: 2025-05-28 - Comment: The paper uses topological data analysis to study LLMs under adversarial conditions, offering insights into representational dynamics, relevant to representation learning.
Equivariant Representation Learning for Symmetry-Aware Inference with Guarantees - Score: 16 (R=8, N=8) - Date: 2025-05-27 - Comment: The paper introduces an equivariant representation learning framework with statistical learning guarantees, relevant to representation learning and symmetry-aware inference.
On the Mechanisms of Weak-to-Strong Generalization: A Theoretical Perspective - Score: 16 (R=8, N=8) - Date: 2025-05-27 - Comment: The paper provides a theoretical perspective on weak-to-strong generalization, offering insights into training dynamics and representation learning.
Convexified Message-Passing Graph Neural Networks - Score: 16 (R=8, N=8) - Date: 2025-05-27 - Comment: The paper introduces Convexified Message-Passing Graph Neural Networks, which provides a novel framework combining GNNs with convex optimization, aligning with representation learning.
Uncovering a Universal Abstract Algorithm for Modular Addition in Neural Networks - Score: 16 (R=8, N=8) - Date: 2025-05-27 - Comment: The paper proposes a universality hypothesis for neural networks solving modular addition, relevant to representation learning and interpretability.
Follow the Energy, Find the Path: Riemannian Metrics from Energy-Based Models - Score: 16 (R=8, N=8) - Date: 2025-05-27 - Comment: The paper proposes a method for deriving Riemannian metrics from energy-based models, which is a novel approach in representation learning.
Learning with Restricted Boltzmann Machines: Asymptotics of AMP and GD in High Dimensions - Score: 16 (R=8, N=8) - Date: 2025-05-26 - Comment: The paper provides a theoretical analysis of Restricted Boltzmann Machines (RBM) using methods like Approximate Message Passing, relevant to representation learning and emerging trends.
LaSER: How Learning Can Guide the Evolution of Equations - Score: 16 (R=8, N=8) - Date: 2025-05-23 - Comment: LaSER integrates learning with genetic programming for symbolic regression, relevant to representation learning and emerging trends.
Model Merging is Secretly Certifiable: Non-Vacuous Generalisation Bounds for Low-Shot Learning - Score: 16 (R=8, N=8) - Date: 2025-05-22 - Comment: The paper discusses model merging and provides non-vacuous generalization bounds for low-shot learning, which is relevant to representation learning and model architecture as it connects model fusion with generalization certificates.
HOPSE: Scalable Higher-Order Positional and Structural Encoder for Combinatorial Representations - Score: 16 (R=8, N=8) - Date: 2025-05-22 - Comment: The paper introduces a scalable higher-order encoder for combinatorial representations, which is relevant to model architecture innovations.
Out-of-Distribution Generalization of In-Context Learning: A Low-Dimensional Subspace Perspective - Score: 16 (R=8, N=8) - Date: 2025-05-21 - Comment: The paper provides theoretical insights into the OOD generalization of in-context learning, which is relevant to representation learning and LLMs.
A Probabilistic Perspective on Model Collapse - Score: 16 (R=8, N=8) - Date: 2025-05-21 - Comment: The paper provides a probabilistic perspective on model collapse, offering theoretical insights into training dynamics, which is relevant to representation learning.
Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens - Score: 16 (R=8, N=8) - Date: 2025-05-20 - Comment: The paper challenges the effectiveness of intermediate tokens in reasoning models, which is relevant to representation learning and LLM behavior.
A Minimum Description Length Approach to Regularization in Neural Networks - Score: 16 (R=8, N=8) - Date: 2025-05-20 - Comment: The paper introduces a theoretically grounded regularization method using the Minimum Description Length principle, relevant to representation learning and model training dynamics.
Sinusoidal Initialization, Time for a New Start - Score: 16 (R=8, N=8) - Date: 2025-05-20 - Comment: The paper introduces Sinusoidal initialization, a novel deterministic method for neural network training, which is relevant to representation learning.
Redefining Neural Operators in $d+1$ Dimensions - Score: 16 (R=8, N=8) - Date: 2025-05-19 - Comment: The paper redefines neural operators in a new dimensional framework, which is relevant to emerging trends in representation learning and model architecture.
Layered Unlearning for Adversarial Relearning - Score: 16 (R=8, N=8) - Date: 2025-05-15 - Comment: The paper investigates post-training methods and introduces a novel unlearning algorithm, which aligns with 'Representation Learning' by exploring how model behavior and representations are modified. The focus on adversarial relearning adds theoretical depth.
Independent Component Analysis by Robust Distance Correlation - Score: 16 (R=8, N=8) - Date: 2025-05-15 - Comment: The paper proposes a robust ICA method (RICA) using distance correlation, which aligns with foundational research in representation learning and robust statistical methods.
SaFARi: State-Space Models for Frame-Agnostic Representation - Score: 16 (R=8, N=8) - Date: 2025-05-15 - Comment: The paper introduces a generalized framework for State-Space Models (SSMs) and extends the HiPPO approach, which aligns with representation learning and architectural innovations.
Mirror Mirror on the Wall, Have I Forgotten it All? A New Framework for Evaluating Machine Unlearning - Score: 16 (R=8, N=8) - Date: 2025-05-14 - Comment: The paper introduces a new theoretical framework for machine unlearning, which aligns with foundational research in representation learning and training dynamics. The focus on computational unlearning and its relationship with differential privacy provides theoretical insights.
Identifying Causal Direction via Variational Bayesian Compression - Score: 16 (R=8, N=8) - Date: 2025-05-13 - Comment: The paper proposes a method for identifying causal direction using variational Bayesian compression, which involves foundational insights into representation learning through succinctness and model fitness. This aligns well with the criteria for representation learning.
Learning curves theory for hierarchically compositional data with power-law distributed features - Score: 16 (R=8, N=8) - Date: 2025-05-13 - Comment: The paper provides a theoretical analysis of learning curves for hierarchically compositional data, which aligns with representation learning and emerging trends.
Neuro-Symbolic Concepts - Score: 16 (R=8, N=8) - Date: 2025-05-12 - Comment: The paper presents a concept-centric neuro-symbolic framework for continual learning and reasoning, which is relevant to representation learning due to its focus on compositional and efficient representations. The neuro-symbolic approach is novel and impactful.
Hypergraph Neural Sheaf Diffusion: A Symmetric Simplicial Set Framework for Higher-Order Learning - Score: 16 (R=8, N=8) - Date: 2025-05-12 - Comment: The paper introduces a novel framework for hypergraph learning using Neural Sheaf Diffusion, which aligns with representation learning and architectural innovations. The use of symmetric simplicial sets for higher-order learning is a significant theoretical contribution.
seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models - Score: 16 (R=8, N=8) - Date: 2025-05-07 - Comment: The paper introduces seq-JEPA, a self-supervised learning framework for invariant and equivariant representations. It aligns with representation learning and architectural innovations.
Robustly Invertible Nonlinear Dynamics and the BiLipREN: Contracting Neural Models with Contracting Inverses - Score: 16 (R=8, N=8) - Date: 2025-05-07 - Comment: The paper introduces BiLipREN, a robustly invertible recurrent neural model with theoretical guarantees. This aligns with foundational research in representation learning and dynamic models.
Teaching Models to Understand (but not Generate) High-risk Data - Score: 16 (R=8, N=8) - Date: 2025-05-07 - Comment: The paper proposes a pretraining paradigm (SLUNG) to handle high-risk data, which aligns with foundational research on LLM behavior and interpretability.
GeoERM: Geometry-Aware Multi-Task Representation Learning on Riemannian Manifolds - Score: 16 (R=8, N=8) - Date: 2025-05-07 - Comment: The paper introduces a geometry-aware MTL framework, which aligns with representation learning and foundational innovations in multi-task learning.
Low-Loss Space in Neural Networks is Continuous and Fully Connected - Score: 16 (R=8, N=8) - Date: 2025-05-06 - Comment: The paper provides theoretical insights into the loss landscape of neural networks, which is relevant to understanding training dynamics and over-parameterization in representation learning.
A dynamic view of the double descent - Score: 16 (R=8, N=8) - Date: 2025-05-06 - Comment: The paper provides a theoretical explanation for the double descent phenomenon using stochastic approximation and differential equations. This aligns with foundational research in training dynamics of neural networks.
Incorporating Inductive Biases to Energy-based Generative Models - Score: 16 (R=8, N=8) - Date: 2025-05-05 - Comment: The paper introduces a hybrid energy-based model that incorporates inductive biases, aligning with the 'Representation Learning' criterion. It provides foundational insights into improving generative modeling through statistical constraints.
Beyond Zero Initialization: Investigating the Impact of Non-Zero Initialization on LoRA Fine-Tuning Dynamics - Score: 15 (R=8, N=7) - Date: 2025-05-31 - Comment: The paper investigates the impact of non-zero initialization on LoRA fine-tuning dynamics, which is relevant to representation learning and training dynamics.
Diverse Prototypical Ensembles Improve Robustness to Subpopulation Shift - Score: 15 (R=8, N=7) - Date: 2025-05-31 - Comment: The paper introduces a novel ensemble method using a mixture of prototypical classifiers, which aligns with the representation learning criterion by focusing on how classifiers capture diverse features.
Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time - Score: 15 (R=8, N=7) - Date: 2025-05-30 - Comment: The paper proposes a satisficing alignment framework for LLMs, which is relevant to foundational research in LLM behavior and interpretability.
Gradient Methods with Online Scaling Part I. Theoretical Foundations - Score: 15 (R=8, N=7) - Date: 2025-05-30 - Comment: The paper establishes theoretical foundations for online scaled gradient methods, which aligns with representation learning and training dynamics.
Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper presents a framework for improving LLMs' mathematical reasoning through error generalization, which is relevant to foundational research in LLMs.
Benignity of loss landscape with weight decay requires both large overparametrization and initialization - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper provides theoretical insights into the loss landscape with weight decay, relevant to training dynamics in neural networks.
Sparsification and Reconstruction from the Perspective of Representation Geometry - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper explores sparse autoencoders and representation geometry, which is relevant to representation learning.
A Closer Look at Multimodal Representation Collapse - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper investigates modality collapse in multimodal representation learning, which is relevant to representation learning.
Mitigating Overthinking in Large Reasoning Models via Manifold Steering - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper introduces Manifold Steering to mitigate overthinking in large reasoning models, which involves mechanistic interpretability and activation space analysis, aligning with representation learning insights.
Multiclass Loss Geometry Matters for Generalization of Gradient Descent in Separable Classification - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper provides theoretical insights into the generalization of gradient descent in multiclass classification, which is relevant to representation learning.
From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper explores multidimensional representations of propositional facts in LLMs, contributing to theoretical insights into LLM behavior.
Relevance-driven Input Dropout: an Explanation-guided Regularization Technique - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper introduces Relevance-driven Input Dropout, a novel data augmentation method for improving model generalization, which aligns with representation learning insights.
CellCLAT: Preserving Topology and Trimming Redundancy in Self-Supervised Cellular Contrastive Learning - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper introduces a novel framework for self-supervised topological deep learning, which is relevant to representation learning.
Efficient Identity and Position Graph Embedding via Spectral-Based Random Feature Aggregation - Score: 15 (R=8, N=7) - Date: 2025-05-28 - Comment: The paper introduces a spectral-based random feature aggregation method for graph embedding, relevant to representation learning.
Simple yet Effective Graph Distillation via Clustering - Score: 15 (R=8, N=7) - Date: 2025-05-28 - Comment: The paper presents a new graph distillation approach, which is relevant to model compression and representation learning.
Stochastic Preconditioning for Neural Field Optimization - Score: 15 (R=8, N=7) - Date: 2025-05-28 - Comment: The paper introduces a novel stochastic preconditioning method for neural field optimization, which is relevant to representation learning and training dynamics.
Beyond Demonstrations: Dynamic Vector Construction from Latent Representations - Score: 15 (R=8, N=7) - Date: 2025-05-28 - Comment: The paper proposes a method for dynamic vector construction from latent representations, relevant to representation learning.
Position: Mechanistic Interpretability Should Prioritize Feature Consistency in SAEs - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper discusses feature consistency in Sparse Autoencoders, relevant to representation learning and interpretability.
Variational Deep Learning via Implicit Regularization - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper explores implicit regularization in variational deep learning, which is relevant to representation learning and theoretical insights.
Model Stitching by Functional Latent Alignment - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper proposes Functional Latent Alignment for model stitching, which relates to representation learning and insights into how networks encode information.
AweDist: Attention-aware Embedding Distillation for New Input Token Embeddings - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper proposes AweDist for embedding distillation, which is relevant to representation learning and model efficiency.
Learning Optimal Multimodal Information Bottleneck Representations - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper proposes a novel framework for multimodal information bottleneck representations, which is relevant to representation learning.
Your Classifier Can Do More: Towards Bridging the Gaps in Classification, Robustness, and Generation - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper explores a joint energy-based model for classification, robustness, and generation, which is relevant to model architecture and representation learning.
Paying Alignment Tax with Contrastive Learning - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper introduces a contrastive learning framework for debiasing language models, relevant to representation learning.
Do Large Language Models (Really) Need Statistical Foundations? - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper discusses the need for statistical foundations in LLMs, providing theoretical insights into LLM behavior and interpretability.
Latent Mamba Operator for Partial Differential Equations - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper introduces the Latent Mamba Operator, which integrates state-space models with kernel integral formulations in neural operators, offering insights into representation learning and training dynamics.
Hierarchical Mamba Meets Hyperbolic Geometry: A New Paradigm for Structured Language Embeddings - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper proposes a new paradigm for structured language embeddings using hyperbolic geometry, which is relevant to representation learning.
On the Role of Label Noise in the Feature Learning Process - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper analyzes the role of label noise in feature learning, relevant to representation learning and training dynamics.
Mitigating Deceptive Alignment via Self-Monitoring - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper introduces a self-monitoring framework to mitigate deceptive alignment in LLMs, which is relevant to understanding LLM behavior and interpretability.
Does Representation Intervention Really Identify Desired Concepts and Elicit Alignment? - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper investigates representation intervention in LLMs, which is relevant to representation learning and theoretical insights into LLM behavior.
Mind The Gap: Deep Learning Doesn't Learn Deeply - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper investigates how neural networks learn algorithmic reasoning, relevant to representation learning and training dynamics.
Joint-stochastic-approximation Autoencoders with Application to Semi-supervised Learning - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper introduces Joint-stochastic-approximation autoencoders, which is relevant to foundational research in autoencoders and representation learning.
Hamiltonian Theory and Computation of Optimal Probability Density Control in High Dimensions - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper presents a theoretical framework for optimal probability density control using deep neural networks, which is relevant to representation learning and emerging trends in foundational research.
Next Token Perception Score: Analytical Assessment of your LLM Perception Skills - Score: 15 (R=8, N=7) - Date: 2025-05-26 - Comment: The paper introduces a metric for assessing LLM perception skills, relevant to theoretical insights into LLM behavior.
TULiP: Test-time Uncertainty Estimation via Linearization and Weight Perturbation - Score: 15 (R=8, N=7) - Date: 2025-05-23 - Comment: TULiP proposes a new method for uncertainty estimation using linearization and weight perturbation, which relates to representation learning and training dynamics.
Computing Exact Shapley Values in Polynomial Time for Product-Kernel Methods - Score: 15 (R=8, N=7) - Date: 2025-05-23 - Comment: The paper presents a method for computing exact Shapley values in polynomial time for product-kernel methods, relevant to representation learning and interpretability.
Omni TM-AE: A Scalable and Interpretable Embedding Model Using the Full Tsetlin Machine State Space - Score: 15 (R=8, N=7) - Date: 2025-05-23 - Comment: The paper introduces a novel embedding model using Tsetlin Machine, which aligns with model architecture innovations and interpretability.
Small-to-Large Generalization: Data Influences Models Consistently Across Scale - Score: 15 (R=8, N=7) - Date: 2025-05-23 - Comment: The paper explores the influence of training data distribution on model behavior across scales, which is relevant to representation learning and training dynamics.
GradPCA: Leveraging NTK Alignment for Reliable Out-of-Distribution Detection - Score: 15 (R=8, N=7) - Date: 2025-05-22 - Comment: The paper introduces GradPCA, an OOD detection method leveraging NTK alignment, which is relevant to representation learning and model architecture analysis.
Last Layer Empirical Bayes - Score: 15 (R=8, N=7) - Date: 2025-05-22 - Comment: The paper introduces Last Layer Empirical Bayes, which provides insights into uncertainty quantification in neural networks, aligning with representation learning and model architecture.
Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing - Score: 15 (R=8, N=7) - Date: 2025-05-22 - Comment: The paper develops a framework for optimal retraining using approximate message passing, which is relevant to representation learning and training dynamics.
Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision - Score: 15 (R=8, N=7) - Date: 2025-05-22 - Comment: The paper introduces an energy-based approach for ranking chain-of-thought in LLMs, which provides insights into LLM behavior and aligns with representation learning.
GloSS over Toxicity: Understanding and Mitigating Toxicity in LLMs via Global Toxic Subspace - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper investigates toxicity in LLMs and proposes a method for detoxification, which aligns with foundational research in understanding LLM behavior and interpretability.
Mechanistic Interpretability of GPT-like Models on Summarization Tasks - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper focuses on mechanistic interpretability of GPT-like models, which is relevant to understanding LLM behavior and interpretability.
FOL-Pretrain: A complexity annotated corpus of first-order logic - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper introduces a complexity-annotated corpus for first-order logic, which is relevant to large language models and representation learning.
High-Dimensional Analysis of Bootstrap Ensemble Classifiers - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper provides a theoretical analysis of bootstrap ensemble classifiers, which is relevant to representation learning and model architecture analysis.
Time to Embed: Unlocking Foundation Models for Time Series with Channel Descriptions - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper introduces a foundation embedding model for time series, which involves architectural innovations and representation learning.
Causal Cartographer: From Mapping to Reasoning Over Counterfactual Worlds - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper introduces a framework for causal reasoning in LLMs, which aligns with foundational research in LLM behavior and interpretability.
Fast and close Shannon entropy approximation - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper introduces a novel approximation method for Shannon entropy, which is foundational in information theory and machine learning. This aligns with the representation learning criterion as it provides insights into feature extraction and model efficiency.
Tokenization Constraints in LLMs: A Study of Symbolic and Arithmetic Reasoning Limits - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper studies tokenization constraints in LLMs, which aligns with large language models and representation learning.
Contrastive Consolidation of Top-Down Modulations Achieves Sparsely Supervised Continual Learning - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper introduces a novel task-modulated contrastive learning approach inspired by biological systems, which aligns with representation learning through contrastive methods.
Collaborative Unlabeled Data Optimization - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper introduces a novel data-centric paradigm for optimizing unlabeled data, which is relevant to representation learning and emerging trends.
Textual Steering Vectors Can Improve Visual Understanding in Multimodal Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper explores textual steering vectors for MLLMs, which is relevant to representation learning and model architecture.
Generalized Category Discovery via Token Manifold Capacity Learning - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper proposes a novel approach for generalized category discovery, which is relevant to representation learning.
Towards Comprehensive and Prerequisite-Free Explainer for Graph Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper proposes a novel explainer for GNNs, which is relevant to model architecture and representation learning.
ShortcutProbe: Probing Prediction Shortcuts for Learning Robust Models - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper introduces a framework for spurious bias mitigation in deep learning models, which is relevant to representation learning.
Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations - Score: 15 (R=8, N=7) - Date: 2025-05-20 - Comment: The paper explores metacognitive capabilities in LLMs, relevant to large language models.
Self-Reinforced Graph Contrastive Learning - Score: 15 (R=8, N=7) - Date: 2025-05-20 - Comment: The paper introduces a novel framework for graph contrastive learning, which is relevant to representation learning through contrastive methods.
Parallel Layer Normalization for Universal Approximation - Score: 15 (R=8, N=7) - Date: 2025-05-20 - Comment: The paper explores the universal approximation theorem with normalization layers, providing insights into model architecture and representation learning.
When majority rules, minority loses: bias amplification of gradient descent - Score: 15 (R=8, N=7) - Date: 2025-05-20 - Comment: The paper provides theoretical insights into bias amplification in gradient descent, which is relevant to understanding training dynamics in neural networks.
Identifiability of Nonnegative Tucker Decompositions -- Part I: Theory - Score: 15 (R=8, N=7) - Date: 2025-05-20 - Comment: The paper provides theoretical insights into the identifiability of nonnegative Tucker decompositions, which is relevant to representation learning and model compression.
AdaDim: Dimensionality Adaptation for SSL Representational Dynamics - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper proposes AdaDim for SSL representational dynamics, aligning with the core topic of representation learning.
Multi-modal contrastive learning adapts to intrinsic dimensions of shared latent variables - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper studies multi-modal contrastive learning, which aligns with the representation learning criterion.
Hyperbolic Residual Quantization: Discrete Representations for Data with Latent Hierarchies - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper introduces Hyperbolic Residual Quantization for hierarchical data, which aligns with the representation learning criterion.
Ditch the Denoiser: Emergence of Noise Robustness in Self-Supervised Learning from Data Curriculum - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper presents a self-supervised learning framework for noise robustness, which is relevant to representation learning.
Structured Representation - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper discusses structured representation and invariant partitions, aligning with the core topic of representation learning.
When the Left Foot Leads to the Right Path: Bridging Initial Prejudice and Trainability - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper connects initial-guessing bias with trainability in neural networks, providing insights into training dynamics, which is relevant to representation learning.
Do different prompting methods yield a common task representation in language models? - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper investigates task representation in language models through different prompting methods, offering insights into LLM behavior and interpretability.
A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper provides a theoretical analysis of gradient descent for overparameterized linear models, which is relevant to representation learning and training dynamics in neural networks.
Steering Risk Preferences in Large Language Models by Aligning Behavioral and Neural Representations - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper proposes a method for steering risk preferences in LLMs by aligning behavioral and neural representations, which aligns with representation learning and LLM behavior insights.
Understanding Nonlinear Implicit Bias via Region Counts in Input Space - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper explores implicit bias in neural networks, which is relevant to representation learning as it provides insights into how deep networks encode information.
Graph Representational Learning: When Does More Expressivity Hurt Generalization? - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper provides theoretical insights into the expressivity and generalization of Graph Neural Networks, which relates to representation learning and model architecture analysis.
CUBIC: Concept Embeddings for Unsupervised Bias Identification using VLMs - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper introduces CUBIC, a method for unsupervised bias identification using VLMs, relevant to representation learning and foundational research.
ZEUS: Zero-shot Embeddings for Unsupervised Separation of Tabular Data - Score: 15 (R=8, N=7) - Date: 2025-05-16 - Comment: ZEUS introduces a zero-shot method for clustering tabular data, focusing on representation learning with a novel approach to unsupervised learning.
FlowVAT: Normalizing Flow Variational Inference with Affine-Invariant Tempering - Score: 15 (R=8, N=7) - Date: 2025-05-16 - Comment: The paper introduces FlowVAT, a novel method for variational inference using normalizing flows, which aligns with representation learning by addressing mode-seeking behavior and improving ELBO values.
Emergence of Structure in Ensembles of Random Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-05-16 - Comment: The paper presents a theoretical model for the emergence of structure in ensembles of random neural networks, which is relevant to understanding training dynamics and representation learning in neural networks.
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think - Score: 15 (R=8, N=7) - Date: 2025-05-16 - Comment: The paper introduces a framework for analyzing and steering model reasoning, which aligns with the interest in theoretical insights into LLM behavior and interpretability.
Rethinking Circuit Completeness in Language Models: AND, OR, and ADDER Gates - Score: 15 (R=8, N=7) - Date: 2025-05-16 - Comment: The paper discusses circuit discovery in language models, focusing on mechanistic interpretability and logic gates, which aligns with large language models and representation learning.
Language Agents Mirror Human Causal Reasoning Biases. How Can We Help Them Think Like Scientists? - Score: 15 (R=8, N=7) - Date: 2025-05-15 - Comment: The paper examines causal reasoning biases in LLMs and proposes a test-time sampling method to address them, which provides insights into LLM behavior and interpretability.
Neural Multivariate Regression: Qualitative Insights from the Unconstrained Feature Model - Score: 15 (R=8, N=7) - Date: 2025-05-15 - Comment: The paper provides theoretical insights into neural multivariate regression using the Unconstrained Feature Model (UFM), which aligns with representation learning and training dynamics in neural networks.
A Multi-Task Foundation Model for Wireless Channel Representation Using Contrastive and Masked Autoencoder Learning - Score: 15 (R=8, N=7) - Date: 2025-05-15 - Comment: The paper proposes a transformer-based autoencoder for wireless channel representation, which aligns with 'Representation Learning' and 'Model Architecture' criteria. The use of contrastive and masked autoencoder learning adds methodological novelty.
Behind the Noise: Conformal Quantile Regression Reveals Emergent Representations - Score: 15 (R=8, N=7) - Date: 2025-05-14 - Comment: The paper presents a denoising method using conformal quantile regression that uncovers emergent representations, aligning with foundational research in representation learning and interpretability.
Manifold Learning with Normalizing Flows: Towards Regularity, Expressivity and Iso-Riemannian Geometry - Score: 15 (R=8, N=7) - Date: 2025-05-14 - Comment: This paper explores manifold learning with normalizing flows, addressing challenges in multi-modal data and proposing methods to improve regularity and expressivity. It aligns with representation learning by focusing on geometric structure and interpretability, making it relevant to foundational research.
The Correspondence Between Bounded Graph Neural Networks and Fragments of First-Order Logic - Score: 15 (R=8, N=7) - Date: 2025-05-14 - Comment: This paper provides a theoretical analysis of the expressive power of Graph Neural Networks (GNNs) by linking them to fragments of first-order logic. It aligns with foundational research in representation learning and model analysis.
ALPCAH: Subspace Learning for Sample-wise Heteroscedastic Data - Score: 15 (R=8, N=7) - Date: 2025-05-13 - Comment: The paper introduces a novel subspace learning method for heteroscedastic data, which aligns with representation learning and foundational methods in dimensionality reduction.
Feature Representation Transferring to Lightweight Models via Perception Coherence - Score: 15 (R=8, N=7) - Date: 2025-05-13 - Comment: The paper introduces a novel method for feature representation transfer using perception coherence, aligning with the 'Representation Learning' criterion by addressing how features are encoded and transferred.
Mask-PINNs: Regulating Feature Distributions in Physics-Informed Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-05-13 - Comment: The paper proposes Mask-PINNs to improve feature distribution stability in physics-informed neural networks, which aligns with representation learning and foundational methods.
InfoNCE is a Free Lunch for Semantically guided Graph Contrastive Learning - Score: 15 (R=8, N=7) - Date: 2025-05-13 - Comment: The paper proposes a semantically guided graph contrastive learning method, which aligns with representation learning through its focus on improving contrastive methods and addressing sampling bias.
Autoencoder-Based Hybrid Replay for Class-Incremental Learning - Score: 15 (R=8, N=7) - Date: 2025-05-12 - Comment: The paper introduces an autoencoder-based hybrid replay strategy for class-incremental learning, which aligns with foundational research in autoencoders and representation learning. The use of hybrid autoencoders for both discriminative and generative modeling is a novel contribution.
Register and CLS tokens yield a decoupling of local and global features in large ViTs - Score: 15 (R=8, N=7) - Date: 2025-05-12 - Comment: The paper analyzes the decoupling of local and global features in ViTs, providing insights into architectural behavior and interpretability.
ComPO: Preference Alignment via Comparison Oracles - Score: 15 (R=8, N=7) - Date: 2025-05-09 - Comment: The paper proposes a novel preference alignment method using comparison oracles, which aligns with foundational research in LLM behavior and interpretability. It provides theoretical insights into addressing limitations of existing alignment methods.
Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models - Score: 15 (R=8, N=7) - Date: 2025-05-09 - Comment: The paper introduces a probabilistic embedding approach for frozen vision-language models, which aligns with representation learning and uncertainty quantification, offering a novel post-hoc method.
Clustering with Communication: A Variational Framework for Single Cell Representation Learning - Score: 15 (R=8, N=7) - Date: 2025-05-09 - Comment: The paper introduces a variational autoencoder framework incorporating cell-cell communication signals, which aligns with representation learning. The use of biologically informed priors adds novelty to the generative modeling approach.
A probabilistic view on Riemannian machine learning models for SPD matrices - Score: 15 (R=8, N=7) - Date: 2025-05-06 - Comment: The paper provides a probabilistic framework for machine learning on SPD matrices, which aligns with foundational research in representation learning and theoretical modeling.
Surrogate to Poincar\'e inequalities on manifolds for dimension reduction in nonlinear feature spaces - Score: 15 (R=8, N=7) - Date: 2025-05-06 - Comment: The paper introduces a method for dimension reduction using Poincaré inequalities, which is relevant to representation learning and feature learning.
Adaptively Point-weighting Curriculum Learning - Score: 15 (R=8, N=7) - Date: 2025-05-06 - Comment: The paper proposes a novel curriculum learning algorithm with theoretical analysis, which aligns with foundational research in training dynamics and representation learning.
NeuRel-Attack: Neuron Relearning for Safety Disalignment in Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-05-01 - Comment: The paper identifies vulnerabilities in LLM safety alignment and proposes a method to induce disalignment, which aligns with the 'Large Language Models' criterion by providing insights into LLM behavior and interpretability.

Other Foundational Research (31)

Mean Flows for One-step Generative Modeling - Score: 20.0 (R=0, N=0) - Date: 2025-05-20 - Comment: Author match
SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training - Score: 18 (R=9, N=9) - Date: 2025-05-31 - Comment: The paper presents a thermodynamic view on SGD, offering a novel theoretical perspective on training dynamics.
Distribution free M-estimation - Score: 18 (R=9, N=9) - Date: 2025-05-31 - Comment: The paper characterizes when M-estimation problems are solvable without distribution assumptions, aligning with emerging trends in theoretical work.
Bridging Arbitrary and Tree Metrics via Differentiable Gromov Hyperbolicity - Score: 18 (R=9, N=9) - Date: 2025-05-28 - Comment: The paper presents a novel differentiable optimization framework for bridging arbitrary and tree metrics, which is a cutting-edge theoretical work challenging established assumptions.
Understanding Mode Connectivity via Parameter Space Symmetry - Score: 17 (R=9, N=8) - Date: 2025-05-31 - Comment: The paper investigates mode connectivity in neural networks using parameter space symmetry, providing theoretical insights into neural network behavior.
Multiple Descents in Deep Learning as a Sequence of Order-Chaos Transitions - Score: 17 (R=9, N=8) - Date: 2025-05-27 - Comment: The paper observes a novel 'multiple-descent' phenomenon in LSTM training, providing insights into training dynamics and phase transitions.
Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation - Score: 17 (R=9, N=8) - Date: 2025-05-23 - Comment: The paper theoretically compares two methods for adapting large language models, focusing on the learning dynamics and convergence rates, which aligns with the Large Language Models criterion by providing theoretical insights into LLM behavior.
Neural Thermodynamic Laws for Large Language Model Training - Score: 17 (R=9, N=8) - Date: 2025-05-16 - Comment: The paper introduces Neural Thermodynamic Laws, offering theoretical insights into LLM training dynamics, which aligns with the interest in foundational research on LLMs.
Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-05-13 - Comment: The paper introduces a statistically consistent method for aligning LLMs, which aligns with foundational research in LLM alignment and theoretical insights.
Secrets of GFlowNets' Learning Behavior: A Theoretical Study - Score: 17 (R=9, N=8) - Date: 2025-05-06 - Comment: The paper provides a theoretical study on GFlowNets' learning behavior, contributing to foundational understanding of generative modeling dynamics.
New Statistical and Computational Results for Learning Junta Distributions - Score: 17 (R=8, N=9) - Date: 2025-05-12 - Comment: The paper studies learning junta distributions and connects it to the LPN problem, which aligns with 'Emerging Trends' by addressing a computationally significant theoretical problem.
A Graph Perspective to Probe Structural Patterns of Knowledge in Large Language Models - Score: 16 (R=9, N=7) - Date: 2025-05-27 - Comment: The paper investigates structural patterns of knowledge in LLMs from a graph perspective, which aligns with foundational research in understanding LLM behavior.
Emergence of Hebbian Dynamics in Regularized Non-Local Learners - Score: 16 (R=8, N=8) - Date: 2025-05-26 - Comment: The paper establishes a connection between SGD and Hebbian learning, which is relevant to emerging trends in learning dynamics.
Creative Preference Optimization - Score: 16 (R=8, N=8) - Date: 2025-05-21 - Comment: The paper proposes Creative Preference Optimization to enhance creativity in LLMs, which aligns with foundational research in LLMs by introducing a novel alignment method for creativity.
Unlearning vs. Obfuscation: Are We Truly Removing Knowledge? - Score: 16 (R=8, N=8) - Date: 2025-05-07 - Comment: The paper introduces a novel unlearning method for LLMs, which aligns with emerging trends in ethical AI and foundational aspects of LLM behavior.
HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking - Score: 16 (R=8, N=8) - Date: 2025-05-07 - Comment: The paper proposes a novel reasoning paradigm for LLMs using hierarchical planning, which aligns with foundational research on improving LLM reasoning capabilities.
Param$\Delta$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost - Score: 16 (R=8, N=8) - Date: 2025-05-01 - Comment: The paper proposes a novel method, ParamΔ, for post-training large language models at zero cost, which aligns with the 'Large Language Models' criterion by introducing a new perspective on leveraging model weights without additional training.
Bayesian Neural Scaling Laws Extrapolation with Prior-Fitted Networks - Score: 15 (R=8, N=7) - Date: 2025-05-31 - Comment: The paper explores Bayesian extrapolation of neural scaling laws, which is relevant to emerging trends in foundational model research.
Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper argues for a reassessment of uncertainty quantification in LLMs, proposing new research directions, which aligns with theoretical insights into LLM behavior.
Understanding (Un)Reliability of Steering Vectors in Language Models - Score: 15 (R=8, N=7) - Date: 2025-05-29 - Comment: The paper studies the reliability of steering vectors in language models, which is relevant to foundational research in LLMs.
Editing as Unlearning: Are Knowledge Editing Methods Strong Baselines for Large Language Model Unlearning? - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper explores the connection between knowledge editing and unlearning in LLMs, which is relevant to theoretical insights into LLM behavior.
Statistical inference for Linear Stochastic Approximation with Markovian Noise - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper provides theoretical analysis on model collapse in generative models, which aligns with emerging trends in foundational research.
When Models Don't Collapse: On the Consistency of Iterative MLE - Score: 15 (R=8, N=7) - Date: 2025-05-27 - Comment: The paper provides theoretical insights into iterative MLE and model collapse, which aligns with emerging trends and foundational research.
New Tight Bounds for SGD without Variance Assumption: A Computer-Aided Lyapunov Analysis - Score: 15 (R=8, N=7) - Date: 2025-05-26 - Comment: The paper provides new theoretical bounds for SGD without variance assumption, aligning with emerging trends in theoretical work.
Scalable Valuation of Human Feedback through Provably Robust Model Alignment - Score: 15 (R=8, N=7) - Date: 2025-05-26 - Comment: The paper proposes a new alignment loss for language models with a provable redescending property, which is relevant to large language models and theoretical insights into model behavior.
Understanding Pre-training and Fine-tuning from Loss Landscape Perspectives - Score: 15 (R=8, N=7) - Date: 2025-05-26 - Comment: The paper provides insights into the loss landscape of LLMs, relevant to understanding pretraining and fine-tuning dynamics.
InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-05-21 - Comment: The paper proposes InfiFPO, a preference optimization method for implicit model fusion in LLMs, which aligns with foundational research in LLMs by exploring model fusion techniques.
Learning by solving differential equations - Score: 15 (R=8, N=7) - Date: 2025-05-20 - Comment: The paper explores the use of higher-order ODE solvers in deep learning, which is a novel approach to improving training dynamics in neural networks.
On the $O(\frac{\sqrt{d}}{K^{1/4}})$ Convergence Rate of AdamW Measured by $\ell_1$ Norm - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper provides a theoretical analysis of the convergence rate of AdamW, which is relevant to understanding training dynamics in neural networks.
Revisiting Stochastic Approximation and Stochastic Gradient Descent - Score: 15 (R=8, N=7) - Date: 2025-05-19 - Comment: The paper provides new theoretical insights into stochastic approximation and SGD, which are foundational topics in optimization and machine learning.
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach - Score: 15 (R=8, N=7) - Date: 2025-05-07 - Comment: The paper addresses calibration issues in LLMs and proposes a novel fine-tuning approach, aligning with the 'Large Language Models' criterion for foundational insights into model behavior.