← Previous Summary | Monthly Overview | Next Summary →
2025-01 | 2025-02 | 2025-03

Personalized Monthly Topic Summary 2025/02

Metric	Value
Total Papers	323
Architecture and Training Dynamics	87
Efficiency, Compression, and Large-Scale Training	142
Representation Learning Theory and Structure	92
Memory Structures and Agent Memory Systems	0
World Models, Exploration, and Open-Ended Reinforcement Learning	2

Architecture and Training Dynamics (87)

Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization - Score: 19 (R=10, N=9) - Date: 2025-02-27 - Comment: Proposes Drop-Upcycling for training sparse Mixture of Experts (MoE) models, directly aligning with the 'Model Architecture' and 'Model Compression' criteria.
Tight Clusters Make Specialized Experts - Score: 19 (R=10, N=9) - Date: 2025-02-24 - Comment: The paper proposes an Adaptive Clustering router for Sparse Mixture-of-Experts (MoE), directly addressing foundational aspects of MoE architectures and improving their robustness and performance.
MoM: Linear Sequence Modeling with Mixture-of-Memories - Score: 19 (R=10, N=9) - Date: 2025-02-20 - Comment: The paper proposes Mixture-of-Memories (MoM), a novel architecture for linear sequence modeling inspired by neuroscience, which aligns with the model architecture criterion and introduces a new paradigm.
MeMo: Towards Language Models with Associative Memory Mechanisms - Score: 19 (R=10, N=9) - Date: 2025-02-19 - Comment: The paper proposes a novel architecture, MeMo, with associative memory mechanisms for LLMs, which aligns with the model architecture criterion by introducing a new paradigm for memorization and transparency.
In-context denoising with one-layer transformers: connections between attention and associative memory retrieval - Score: 19 (R=10, N=9) - Date: 2025-02-10 - Comment: Explores connections between attention mechanisms and associative memory in transformers within a theoretical framework, linking strongly to foundational representation learning and transformer behaviors.
Strassen Attention: Unlocking Compositional Abilities in Transformers Based on a New Lower Bound Method - Score: 19 (R=10, N=9) - Date: 2025-02-04 - Comment: This paper introduces 'Strassen attention' as a scalable mechanism addressing the limitations of current attention mechanisms, making significant contributions to Transformer architecture research.
CAMEx: Curvature-aware Merging of Experts - Score: 18 (R=10, N=8) - Date: 2025-02-27 - Comment: The paper introduces CAMEx, a novel curvature-aware merging protocol for Mixture-of-Experts (MoE) models, which aligns closely with the 'Model Architecture' and 'Representation Learning' criteria. It provides theoretical and empirical insights into expert merging, improving optimization and generalization.
BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference - Score: 18 (R=10, N=8) - Date: 2025-02-25 - Comment: BigMac introduces a communication-efficient MoE structure, directly aligning with architectural innovations in MoE and efficiency improvements.
A fast convergence algorithm based on binary integer programming for expert load balancing in MoE LLMs - Score: 18 (R=10, N=8) - Date: 2025-02-24 - Comment: The paper proposes a binary integer programming-based algorithm for expert load balancing in MoE architectures, directly addressing a key challenge in MoE training and efficiency.
LESA: Learnable LLM Layer Scaling-Up - Score: 18 (R=10, N=8) - Date: 2025-02-20 - Comment: LESA proposes a learnable method for scaling up LLM layers, which directly addresses architectural innovations and efficiency in LLM training.
MoBA: Mixture of Block Attention for Long-Context LLMs - Score: 18 (R=10, N=8) - Date: 2025-02-20 - Comment: The paper introduces Mixture of Block Attention (MoBA), which applies Mixture of Experts (MoE) principles to attention mechanisms in LLMs. This aligns closely with the 'Model Architecture' and 'Large Language Models' criteria, focusing on architectural innovation and efficiency improvements.
Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models - Score: 18 (R=10, N=8) - Date: 2025-02-19 - Comment: The paper introduces MoE-specific knowledge distillation methods, which directly align with the Mixture-of-Experts (MoE) topic and provide novel insights into leveraging non-activated experts.
Accurate Expert Predictions in MoE Inference via Cross-Layer Gate - Score: 18 (R=10, N=8) - Date: 2025-02-19 - Comment: The paper focuses on improving MoE inference efficiency through cross-layer gating and caching strategies, which directly aligns with the topic of Mixture-of-Experts and model efficiency.
Mixture of Tunable Experts - Behavior Modification of DeepSeek-R1 at Inference Time - Score: 18 (R=10, N=8) - Date: 2025-02-18 - Comment: The paper focuses on Mixture-of-Experts (MoE) and provides insights into the behavior and control of specific experts in LLMs, aligning closely with the 'Model Architecture' and 'Representation Learning' criteria.
Mixture of Decoupled Message Passing Experts with Entropy Constraint for General Node Classification - Score: 18 (R=10, N=8) - Date: 2025-02-13 - Comment: The paper proposes a Mixture-of-Experts (MoE) framework for node classification, which is highly relevant to model architecture and MoE research. The entropy constraint adds a novel perspective to MoE design.
Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline - Score: 18 (R=10, N=8) - Date: 2025-02-13 - Comment: The paper proposes Klotski, an efficient MoE inference engine, which directly aligns with the core topic of Mixture-of-Experts and efficiency improvements.
Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach - Score: 18 (R=10, N=8) - Date: 2025-02-13 - Comment: The paper addresses robustness in Mixture of Experts (MoE) models, which directly aligns with the model architecture criterion. The dual-model approach and robustness bounds are novel contributions.
MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks - Score: 18 (R=10, N=8) - Date: 2025-02-12 - Comment: The paper introduces MoENAS, a Mixture-of-Experts-based NAS method for edge DNNs, which aligns with architectural innovations and MoE research. It also addresses fairness and robustness, adding to its relevance.
MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing - Score: 18 (R=10, N=8) - Date: 2025-02-11 - Comment: The paper proposes MoETuner, an optimization framework for Mixture-of-Experts (MoE) models, directly addressing architectural challenges like token routing and load balancing, which is highly relevant to model architecture innovations.
LM2: Large Memory Models - Score: 18 (R=10, N=8) - Date: 2025-02-11 - Comment: The LM2 paper proposes a memory-augmented Transformer architecture, which is highly relevant to architectural innovations in LLMs and explores memory modules for enhanced reasoning capabilities.
Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient - Score: 18 (R=10, N=8) - Date: 2025-02-10 - Comment: Analyzes joint scaling laws for memory-efficient MoE models, directly addressing theoretical and computational efficiency, which is highly relevant to 'Mixture of Experts' and architectural principles.
Scaling Laws for Upcycling Mixture-of-Experts Language Models - Score: 18 (R=10, N=8) - Date: 2025-02-06 - Comment: Explores scaling laws for upcycling LLMs into MoE models, offering empirical insights into scaling efficiency. This aligns well with MoE-related architectural research and compression topics, particularly training efficiency.
Algebraic Machine Learning: Learning as computing an algebraic decomposition of a task - Score: 18 (R=9, N=9) - Date: 2025-02-28 - Comment: The paper proposes a novel algebraic foundation for machine learning, which is a cutting-edge theoretical contribution and aligns with emerging trends in foundational research.
General Reasoning Requires Learning to Reason from the Get-go - Score: 18 (R=9, N=9) - Date: 2025-02-27 - Comment: The paper discusses disentangling reasoning and knowledge in LLMs, aligning with 'Large Language Models' as it proposes foundational changes to pretraining and reasoning paradigms. The focus on reasoning priors and curriculum learning adds significant novelty.
Mechanistic PDE Networks for Discovery of Governing Equations - Score: 18 (R=9, N=9) - Date: 2025-02-26 - Comment: The paper proposes Mechanistic PDE Networks for discovering governing equations, which aligns with foundational research in AI for Science and introduces a novel architecture.
Towards Physics-Guided Foundation Models - Score: 18 (R=9, N=9) - Date: 2025-02-24 - Comment: The paper introduces the concept of physics-guided foundation models, which aligns with the 'AI for Science' criterion by proposing a new paradigm integrating physical knowledge into foundation models.
Independence Tests for Language Models - Score: 18 (R=9, N=9) - Date: 2025-02-19 - Comment: The paper introduces statistical tests for determining independence between model weights, which is a novel and foundational contribution to understanding model training dynamics.
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving - Score: 18 (R=9, N=9) - Date: 2025-02-12 - Comment: The paper introduces Goedel-Prover, a state-of-the-art LLM for automated theorem proving. It aligns with foundational research in LLMs, particularly in advancing their capabilities and training methodologies.
ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model - Score: 18 (R=9, N=9) - Date: 2025-02-06 - Comment: Proposes a novel explanatory framework for LLM dynamics (ICL and CoT) and models them analogously to electronic circuits. This aligns closely with theoretical studies on LLMs and is quite innovative in its formulation.
RiemannGFM: Learning a Graph Foundation Model from Riemannian Geometry - Score: 18 (R=9, N=9) - Date: 2025-02-06 - Comment: Proposes a foundational graph model drawing from Riemannian geometry and structural vocabulary, aligning well with model architecture and generalization across domains. Very novel in approach.
Hamming Attention Distillation: Binarizing Keys and Queries for Efficient Long-Context Transformers - Score: 18 (R=9, N=9) - Date: 2025-02-05 - Comment: The paper introduces a novel framework for binarizing keys and queries in transformer attention, focusing on compression and efficiency improvements.
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: The paper proposes a test-time re-routing method for multimodal mixture-of-experts (MoE), which aligns well with the model architecture criterion, particularly for MoE innovations.
Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: The paper identifies symbolic mechanisms in LLMs for abstract reasoning, aligning with the large language models criterion.
Finite State Automata Inside Transformers with Chain-of-Thought: A Mechanistic Study on State Tracking - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: The paper provides a mechanistic study of state tracking in Transformers with Chain-of-Thought, offering insights into model behavior and architecture, aligning with foundational research.
Forward-Cooperation-Backward (FCB) learning in a Multi-Encoding Uni-Decoding neural network architecture - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: The paper introduces a novel learning paradigm (Forward-Cooperation-Backward) and a new architecture (Multi-Encoding Uni-Decoding) with lateral synaptic connections, which aligns with the 'Model Architecture' criterion for architectural innovations.
HDEE: Heterogeneous Domain Expert Ensemble - Score: 17 (R=9, N=8) - Date: 2025-02-27 - Comment: The paper proposes HDEE, a heterogeneous domain expert ensemble, which aligns with the 'Model Architecture' criterion by exploring ensemble methods with domain-specific heterogeneity. It provides insights into efficient training and evaluation.
Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond - Score: 17 (R=9, N=8) - Date: 2025-02-27 - Comment: The paper introduces a gradient-based framework for LLM unlearning, which provides foundational insights into model behavior and optimization.
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training - Score: 17 (R=9, N=8) - Date: 2025-02-27 - Comment: Proposes a novel blockwise learning rate strategy for Transformers, aligning with 'Large Language Models' and providing theoretical insights into training dynamics.
(Mis)Fitting: A Survey of Scaling Laws - Score: 17 (R=9, N=8) - Date: 2025-02-27 - Comment: The paper surveys scaling laws in foundation models, which is highly relevant to understanding LLM behavior and training dynamics.
Fourier Multi-Component and Multi-Layer Neural Networks: Unlocking High-Frequency Potential - Score: 17 (R=9, N=8) - Date: 2025-02-27 - Comment: Introduces a novel neural network architecture (FMMNN) with theoretical insights into its expressive power and optimization landscape, aligning with the 'Model Architecture' criterion.
A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops - Score: 17 (R=9, N=8) - Date: 2025-02-27 - Comment: This paper provides a theoretical analysis of Self-consuming Training Loops (STLs), addressing model collapse and recursive stability. It offers insights into the interplay between model architecture and data composition, which aligns with foundational research in model training dynamics and architecture behavior. The extension to transformers and in-context learning adds further relevance.
Graded Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper introduces graded neural networks, which propose a novel architectural framework with theoretical underpinnings, aligning with model architecture innovations.
Reasoning with Latent Thoughts: On the Power of Looped Transformers - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper introduces looped transformers for reasoning tasks and connects them to CoT reasoning, aligning with 'Model Architecture' and 'Large Language Models' criteria.
Neural Attention: A Novel Mechanism for Enhanced Expressive Power in Transformer Models - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper proposes Neural Attention as an enhancement to transformer models, which aligns with architectural innovations in transformers.
Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper presents Erwin, a hierarchical transformer for large-scale physical systems, combining tree-based algorithms with attention mechanisms. This aligns with the Model Architecture criterion, particularly in architectural innovations for scalability.
Muon is Scalable for LLM Training - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper introduces Muon, a scalable optimizer for LLM training, and demonstrates its application in training a Mixture-of-Experts (MoE) model. This aligns closely with the 'Model Architecture' and 'Model Compression' criteria due to its focus on MoE and computational efficiency.
Entropy-Lens: The Information Signature of Transformer Computations - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper introduces an entropy-based framework to analyze transformer computations, aligning with foundational research in understanding LLM behavior and interpretability.
Linear Attention for Efficient Bidirectional Sequence Modeling - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper introduces LION, a framework for linear attention in bidirectional sequence modeling, which aligns with model architecture innovations and provides theoretical foundations for efficient transformers.
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper demonstrates that a linear decay-to-zero learning rate schedule outperforms other schedules for LLM training, aligning with 'Large Language Models' and 'Training Dynamics' criteria.
Ray-Tracing for Conditionally Activated Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: This paper introduces a novel hierarchical Mixture of Experts (MoE) architecture with dynamic activation, which is highly relevant to model architecture innovations and efficiency improvements.
Multiscale Byte Language Models -- A Hierarchical Architecture for Causal Million-Length Sequence Modeling - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The paper introduces a hierarchical architecture for byte-level sequence modeling, which aligns with foundational research in model architecture and efficiency.
Which Attention Heads Matter for In-Context Learning? - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: This paper investigates the mechanisms behind in-context learning in LLMs, focusing on the role of specific attention heads. It provides theoretical insights into LLM behavior and training dynamics.
How Do LLMs Perform Two-Hop Reasoning in Context? - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: This paper provides theoretical insights into the training dynamics of transformers for two-hop reasoning, which aligns with understanding training dynamics and interpretability in LLMs.
Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper investigates stratified manifold structures in LLM embedding spaces using a sparse Mixture-of-Experts (MoE) model, which aligns with representation learning and MoE analysis.
Neural Attention Search - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper introduces Neural Attention Search (NAtS), a framework for reducing KV cache sizes in transformers, aligning with the 'Model Compression' and 'Model Architecture' criteria.
RingFormer: Rethinking Recurrent Transformer with Adaptive Level Signals - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper introduces RingFormer, a recurrent Transformer with parameter-sharing and low-rank matrices, which aligns with the model architecture criterion and offers a novel approach to efficiency.
Understanding Generalization in Transformers: Error Bounds and Training Dynamics Under Benign and Harmful Overfitting - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper develops a generalization theory for transformers, addressing error bounds and training dynamics under overfitting scenarios. This aligns with foundational research on model architecture and training dynamics.
Zero Token-Driven Deep Thinking in LLMs: Unlocking the Full Potential of Existing Parameters via Cyclic Refinement - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The Zero Token Transformer introduces architectural innovations like parameter cycling and zero-token mechanisms, which align with the model architecture criterion.
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper introduces MUDD connections to improve Transformers, which is highly relevant to architectural innovations. The dynamic dense connections and their impact on efficiency are novel contributions.
LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: Provides theoretical insights into loss-to-loss scaling laws for LLMs, deeply relevant for foundational research into training dynamics and scalability.
Atom of Thoughts for Markov LLM Test-Time Scaling - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper introduces Atom of Thoughts (AoT) for test-time scaling in LLMs, which aligns with theoretical insights into LLM behavior. The Markovian reasoning framework adds methodological novelty.
Exact Upper and Lower Bounds for the Output Distribution of Neural Networks with Random Inputs - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper derives exact bounds for the output distribution of neural networks with stochastic inputs, which is foundational in terms of theoretical contributions to neural network behavior.
Approximation of Permutation Invariant Polynomials by Transformers: Efficient Construction in Column-Size - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The study explores approximation capabilities of transformers concerning column-symmetric polynomials, advancing theoretical understanding of model expressivity.
Statistical Query Hardness of Multiclass Linear Classification with Random Classification Noise - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper provides theoretical insights into the complexity of multiclass linear classification with random noise, which aligns with 'Emerging Trends' through its foundational focus.
Teleportation With Null Space Gradient Projection for Optimization Acceleration - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper introduces a novel optimization technique for advanced architectures like Transformers, aligning with 'Model Architecture' and 'Emerging Trends'.
The underlying structures of self-attention: symmetry, directionality, and emergent dynamics in Transformer training - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper provides a mathematical framework to analyze self-attention matrices, which aligns with foundational research on Transformer architectures and their training dynamics.
Spectral Journey: How Transformers Predict the Shortest Path - Score: 17 (R=9, N=8) - Date: 2025-02-14 - Comment: The paper studies how transformers predict shortest paths and provides insights into their internal representations, which aligns with foundational research into model behavior and architecture.
Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper introduces a novel approach to enhance Chain-of-Thought reasoning in LLMs using loop-aligned reasoning, contributing to foundational research on reasoning dynamics in LLMs.
LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: LASP-2 proposes a new sequence parallelism method for linear attention, which aligns with architectural innovations and efficiency improvements in transformer models.
Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper addresses training misalignment in LLMs for mathematical reasoning, proposing a novel loss function to improve test-time performance. This aligns with the criterion of theoretical insights into LLM behavior.
Revisiting Non-Acyclic GFlowNets in Discrete Environments - Score: 17 (R=9, N=8) - Date: 2025-02-12 - Comment: The paper revisits non-acyclic GFlowNets and provides theoretical insights, which align with emerging trends and foundational research in generative models.
LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation - Score: 17 (R=9, N=8) - Date: 2025-02-12 - Comment: The paper proposes a method to mitigate performance degradation in LLMs with extended context windows, focusing on theoretical insights into distribution drift and catastrophic forgetting. This aligns with the interest in foundational research on LLM behavior.
Enabling Autoregressive Models to Fill In Masked Tokens - Score: 17 (R=9, N=8) - Date: 2025-02-12 - Comment: The paper introduces MARIA, a novel architecture combining MLM and AR models for masked infilling, which aligns with foundational research in model architecture and LLM behavior.
A Multimodal PDE Foundation Model for Prediction and Scientific Text Descriptions - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper introduces a multimodal PDE foundation model integrating numerical and text modalities, which aligns with foundational research in AI for science and architecture-level innovations.
"Let the AI conspiracy begin..." Language Model coordination is just one inference-intervention away - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper introduces a novel method for steering LLM behavior by targeting specific attention heads, which aligns with foundational research into LLM interpretability and behavior.
MoFM: A Large-Scale Human Motion Foundation Model - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper introduces a motion foundation model (MoFM) inspired by LLMs, which aligns with foundational model architecture innovations and emerging trends in foundation models.
Deep Generative Models with Hard Linear Equality Constraints - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper proposes a probabilistic approach to enforce hard constraints in deep generative models, which aligns with foundational innovations in generative modeling.
Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper proposes a novel MoE-based approach for hierarchical meta-learning in dynamical system reconstruction, directly aligning with the 'Model Architecture' criterion and offering insights into MoE behavior.
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: Proposes a novel recurrent depth mechanism for latent reasoning, exploring architectural innovation - relevant and potentially foundational for test-time computation scaling.
Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning - Score: 17 (R=9, N=8) - Date: 2025-02-07 - Comment: The paper proposes HILO, a hierarchical configuration for adapter experts and their rank in Mixture of Experts (MoE) fine-tuning in LLMs. This directly addresses architectural innovations and MoE-related efficiency improvements.
On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation - Score: 17 (R=9, N=8) - Date: 2025-02-06 - Comment: Focuses on zero-initialized attention and its theoretical ties to mixture-of-experts (MoE) models, investigating optimal prompts and gating factors. Provides both theoretical insights and experiments, aligning with the architectural and representation-learning criteria.
ReGNet: Reciprocal Space-Aware Long-Range Modeling and Multi-Property Prediction for Crystals - Score: 17 (R=9, N=8) - Date: 2025-02-06 - Comment: ReGNet introduces novel architectural concepts combining GNNs and Fourier-based reciprocal filters, along with an innovative MoE extension, aligning closely with model architecture advancements.
MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: Presents new methods for merging homogeneous and heterogeneous MoEs, directly aligning with the model architecture criterion, specifically innovations in Mixture-of-Experts.
Spectro-Riemannian Graph Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: Introduces a novel graph neural network framework unifying spectral and curvature signals, which aligns with architectural innovation and foundational model design.
Beyond Limited Data: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: The paper introduces an iterative self-play framework for theorem proving with LLMs, which aligns with foundational insights into training dynamics and use of LLMs for novel tasks.
Scaling Laws for Differentially Private Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: Examines scaling laws under differential privacy for LLMs, providing foundational insights into compute-privacy-utility tradeoffs, aligning well with the criterion for LLM theoretical contributions.
\underline{E2}Former: A Linear-time \underline{E}fficient and \underline{E}quivariant Trans\underline{former} for Scalable Molecular Modeling - Score: 17 (R=9, N=8) - Date: 2025-02-03 - Comment: The E2Former introduces a novel efficient and equivariant transformer architecture with significant computational speedups, aligning well with the 'Model Architecture' criterion and particularly Transformer-based innovations.

Efficiency, Compression, and Large-Scale Training (142)

Compression Scaling Laws:Unifying Sparsity and Quantization - Score: 19 (R=10, N=9) - Date: 2025-02-25 - Comment: The paper investigates compression scaling laws, unifying sparsity and quantization under a common framework, which directly aligns with model compression and provides theoretical insights.
PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models - Score: 19 (R=10, N=9) - Date: 2025-02-20 - Comment: The paper introduces a novel post-training quantization method for LLMs, achieving extremely low-bit quantization (1.61-bit) with innovative preprocessing and optimization techniques. This directly aligns with the 'Model Compression' criterion, particularly in quantization.
NestQuant: Nested Lattice Quantization for Matrix Products and LLMs - Score: 19 (R=10, N=9) - Date: 2025-02-20 - Comment: The paper introduces a novel quantization scheme (NestQuant) for LLMs, achieving state-of-the-art results in low-bit quantization. This directly aligns with the 'Model Compression' criterion, particularly in quantization.
HACK: Homomorphic Acceleration via Compression of the Key-Value Cache for Disaggregated LLM Inference - Score: 19 (R=10, N=9) - Date: 2025-02-07 - Comment: HACK introduces a compression framework for KV cache in disaggregated LLM inference, directly tackling model compression and efficiency-related challenges in LLM architecture.
TeZO: Empowering the Low-Rankness on the Temporal Dimension in the Zeroth-Order Optimization for Fine-tuning LLMs - Score: 19 (R=10, N=9) - Date: 2025-02-03 - Comment: The paper explores a novel method for memory-efficient fine-tuning of LLMs by leveraging low-rankness across temporal dimensions and employing Canonical Polyadic Decomposition (CPD). This is closely tied to the 'Model Compression' criteria.
Delta Decompression for MoE-based LLMs Compression - Score: 18 (R=10, N=8) - Date: 2025-02-25 - Comment: The paper focuses on a novel compression method for MoE-based LLMs, aligning with the 'Model Compression' criterion. It introduces delta decompression and low-rank SVD techniques, which are foundational contributions.
DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance - Score: 18 (R=10, N=8) - Date: 2025-02-25 - Comment: The paper proposes a novel KV cache compression method for LLMs, which directly aligns with model compression and efficiency breakthroughs.
Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression - Score: 18 (R=10, N=8) - Date: 2025-02-25 - Comment: The paper introduces a framework for joint structured pruning and quantization, which aligns with foundational research in model compression.
BaKlaVa -- Budgeted Allocation of KV cache for Long-context Inference - Score: 18 (R=10, N=8) - Date: 2025-02-20 - Comment: The paper introduces BaKlaVa, a method for optimizing KV-cache memory allocation in LLMs, which directly addresses model compression and efficiency in LLM inference.
RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models - Score: 18 (R=10, N=8) - Date: 2025-02-14 - Comment: The paper proposes a quantization-aware fine-tuning approach for LLMs, which is highly relevant to model compression and efficiency.
AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference - Score: 18 (R=10, N=8) - Date: 2025-02-07 - Comment: AttentionPredictor offers a learning-based method for KV cache compression by predicting attention scores, advancing efficiency techniques for LLMs. This is relevant to model compression and efficiency breakthroughs.
Choose Your Model Size: Any Compression by a Single Gradient Descent - Score: 18 (R=10, N=8) - Date: 2025-02-05 - Comment: ACIP provides a novel, singular gradient descent approach to model compression utilizing sparsity and low-rank techniques, which directly matches the model compression criterion.
AdaSVD: Adaptive Singular Value Decomposition for Large Language Models - Score: 18 (R=10, N=8) - Date: 2025-02-04 - Comment: The paper focuses on model compression techniques for LLMs using adaptive SVD, aligning closely with the relevance criteria of sparsity, low-rank approaches, and theoretical efficiency breakthroughs.
MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization - Score: 18 (R=10, N=8) - Date: 2025-02-04 - Comment: Proposes a quantization framework (MQuant) for multimodal LLMs, addressing efficiency challenges. This directly aligns with model compression and quantization strategies.
Low-Rank Adapting Models for Sparse Autoencoders - Score: 18 (R=10, N=8) - Date: 2025-02-03 - Comment: This paper attempts to improve sparse autoencoders by combining low-rank adaptation with interpretability-driven design, making it directly relevant to topics like sparsity, low-rank techniques, and sparse autoencoders.
Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models - Score: 18 (R=10, N=8) - Date: 2025-02-03 - Comment: This paper addresses KV cache compression in LLMs, explicitly aligning with the model compression criterion. The introduction of AQUA-KV for adaptive quantization is relevant and demonstrates novel efficiency improvements.
Norm-Bounded Low-Rank Adaptation - Score: 18 (R=10, N=8) - Date: 2025-02-03 - Comment: Proposes NB-LoRA for parameter-efficient fine-tuning with norm bounds on adaptation matrices. This paper aligns closely with model compression, particularly low-rank adaptation techniques, making it highly relevant.
Compression Barriers for Autoregressive Transformers - Score: 18 (R=9, N=9) - Date: 2025-02-25 - Comment: The paper provides theoretical insights into the compression barriers for autoregressive Transformers, directly addressing model compression and efficiency.
A General Error-Theoretical Analysis Framework for Constructing Compression Strategies - Score: 18 (R=9, N=9) - Date: 2025-02-25 - Comment: The paper introduces a theoretical framework for constructing compression strategies, which aligns with foundational research in model compression and efficiency.
Bitnet.cpp: Efficient Edge Inference for Ternary LLMs - Score: 18 (R=9, N=9) - Date: 2025-02-18 - Comment: Introduces groundbreaking system Bitnet.cpp enabling efficient inference for ternary LLMs, directly relevant to model compression and efficiency topics.
On the Emergence of Thinking in LLMs I: Searching for the Right Intuition - Score: 18 (R=9, N=9) - Date: 2025-02-11 - Comment: The paper explores a novel RL-based framework for reasoning in LLMs, which aligns with theoretical insights into LLM behavior and introduces emergent reasoning capabilities.
Algorithmic causal structure emerging through compression - Score: 18 (R=9, N=9) - Date: 2025-02-07 - Comment: The paper links causality and compression through algorithmic complexity, which relates to compression and theoretical insights into causality in AI. It introduces novel perspectives and foundational insights.
Toward Neurosymbolic Program Comprehension - Score: 18 (R=9, N=9) - Date: 2025-02-05 - Comment: The paper advocates for neurosymbolic research blending DL and symbolic methods, introducing an emerging trend challenging the parameter-heavy model paradigm.
Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing Techniques - Score: 18 (R=9, N=9) - Date: 2025-02-05 - Comment: Proposes sparse graph processing techniques to increase Transformer context length, aligning well with efficiency breakthroughs and sparsity advancements in transformers.
Pushing the Limits of BFP on Narrow Precision LLM Inference - Score: 18 (R=9, N=9) - Date: 2025-02-04 - Comment: The paper proposes hardware-efficient optimizations using a BFP framework for LLMs, providing novel insights into compression techniques.
Brain-inspired sparse training enables Transformers and LLMs to perform as fully connected - Score: 18 (R=9, N=9) - Date: 2025-02-03 - Comment: The use of brain-inspired sparse training and dynamic sparse connectivity in transformer-based models is directly relevant to sparsity and model compression, fitting well with foundational contributions in efficiency and architecture.
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: The paper proposes Layer-Aware Task Arithmetic (LATA) to disentangle task-specific and instruction-following knowledge in LLMs, which aligns with foundational insights into LLM behavior.
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: The paper introduces COMET, a fine-grained communication-computation overlapping system for MoE, which aligns with architectural efficiency improvements in MoE systems.
HALO: Hardware-aware quantization with low critical-path-delay weights for LLM acceleration - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: The paper introduces a hardware-aware quantization framework (HALO) for LLM acceleration, aligning with the model compression criterion.
LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: The paper introduces a low-rank gradient optimization method (LORENZA) and provides theoretical insights into its efficiency for LLMs, aligning with the model compression criterion.
Global law of conjugate kernel random matrices with heavy-tailed weights - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper studies the spectral behavior of kernel random matrices with heavy-tailed weights, which provides theoretical insights into neural network training dynamics and aligns with foundational research.
AMPO: Active Multi-Preference Optimization - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper proposes a novel multi-preference optimization framework for LLM alignment, which aligns with foundational research in LLM training and optimization techniques.
Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper introduces Jacobian Sparse Autoencoders (JSAEs) to sparsify computations in LLMs, which aligns with foundational research in representation learning and sparsity. It also provides efficient methods for computing Jacobians.
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper introduces SpargeAttn, a universal sparse attention mechanism, which aligns with foundational research in model compression and sparse attention techniques.
Optimal Brain Apoptosis - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper introduces a novel pruning method, Optimal Brain Apoptosis (OBA), which advances parameter importance estimation using the Hessian matrix. This aligns closely with the model compression criterion, particularly in pruning and efficiency improvements.
C-LoRA: Continual Low-Rank Adaptation for Pre-trained Models - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper proposes C-LoRA, a novel extension of Low-Rank Adaptation (LoRA) for continual learning, which aligns with the model compression and efficiency criteria. The use of a learnable routing matrix for task adaptation is a significant methodological contribution.
PICASO: Permutation-Invariant Context Composition with State Space Models - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper introduces a novel method for permutation-invariant context composition using state space models, which aligns with foundational research in efficient context representation for LLMs.
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve? - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper discusses the Lottery LLM Hypothesis, which is highly relevant to model compression and foundational insights into LLM capabilities.
CoKV: Optimizing KV Cache Allocation via Cooperative Game - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper proposes CoKV, a novel method for optimizing KV cache allocation in LLMs using cooperative game theory. This directly addresses efficiency and memory challenges in LLMs, aligning with the model compression criterion.
LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper introduces a memory-efficient draft model with a constant-sized KV cache and novel attention methods, which aligns with the model compression and efficiency criteria.
Distributional Scaling Laws for Emergent Capabilities - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper explores emergent capabilities in LLMs and provides theoretical insights into scaling laws and random seed effects, aligning with the 'Large Language Models' criterion.
Low-rank bias, weight decay, and model merging in neural networks - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper explores low-rank structures in neural networks induced by weight decay, which aligns with foundational research in model compression and efficiency.
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper focuses on 4-bit training stability and introduces Stable-SPAM, which aligns with model compression and efficiency breakthroughs.
When Can We Solve the Weighted Low Rank Approximation Problem in Truly Subquadratic Time? - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper focuses on weighted low-rank approximation, which aligns with the model compression topic, particularly low-rank approaches.
DISC: Dynamic Decomposition Improves LLM Inference Scaling - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper introduces a novel method for dynamic decomposition in LLM inference, which aligns with foundational research in efficiency and scaling techniques for LLMs.
Exact Recovery of Sparse Binary Vectors from Generalized Linear Measurements - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper addresses sparse binary vector recovery with theoretical guarantees, which aligns with the sparsity and efficiency criteria in model compression.
Signal Collapse in One-Shot Pruning: When Sparse Models Fail to Distinguish Neural Representations - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper identifies signal collapse in one-shot pruning and proposes a novel method to address it, aligning with model compression and sparsity topics.
Rotate, Clip, and Partition: Towards W2A4KV4 Quantization by Integrating Rotation and Learnable Non-uniform Quantizer - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper introduces RCP, a QAT approach for extreme compression of LLMs, including W2A4KV4 quantization. This aligns with the Model Compression criterion, particularly in advancing quantization techniques.
Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper proposes Cache-Craft, a system for managing and reusing KV caches in RAG-based systems, which aligns with the model compression criterion by addressing efficiency and computational redundancy.
Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing - Score: 17 (R=9, N=8) - Date: 2025-02-24 - Comment: The paper introduces a novel dynamic pruning framework for LLMs, which aligns with model compression and efficiency breakthroughs.
LightThinker: Thinking Step-by-Step Compression - Score: 17 (R=9, N=8) - Date: 2025-02-24 - Comment: The paper introduces LightThinker, a method for compressing intermediate reasoning steps in LLMs, aligning with model compression and efficiency breakthroughs. The approach is novel in its dynamic compression inspired by human cognition.
SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention - Score: 17 (R=9, N=8) - Date: 2025-02-24 - Comment: This paper presents SVDq, a novel mixed-precision quantization method for KV cache compression in LLMs, achieving significant compression ratios with theoretical and empirical validation. It aligns closely with the model compression criterion.
Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference - Score: 17 (R=9, N=8) - Date: 2025-02-24 - Comment: The paper proposes a novel round-level attention mechanism to reduce KV cache memory usage in LLMs, aligning with the 'Model Compression' criterion by addressing efficiency in inference.
More for Keys, Less for Values: Adaptive KV Cache Quantization - Score: 17 (R=9, N=8) - Date: 2025-02-24 - Comment: The paper proposes KV-AdaQuant, a mixed-precision quantization framework for KV cache in LLMs, with theoretical insights into quantization error propagation. It aligns well with the model compression criterion.
LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: Proposes a unified sparse attention framework for efficient LLM serving, addressing both computational and memory efficiency. This aligns well with model compression and sparsity criteria.
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The paper introduces Multi-head Latent Attention (MLA) and proposes a novel fine-tuning method for transitioning from MHA to MLA, which aligns with the Model Compression criterion due to its focus on KV cache compression and efficiency.
Fundamental Limitations in Defending LLM Finetuning APIs - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The paper discusses fundamental limitations in defending LLM fine-tuning APIs, providing theoretical insights into LLM security and robustness, which aligns with foundational research in LLM behavior.
Dynamic Low-Rank Sparse Adaptation for Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: Presents a novel method for integrating low-rank adaptation with sparsity in LLMs, addressing efficiency and performance degradation. This aligns closely with model compression and sparsity criteria.
Determining Layer-wise Sparsity for Large Language Models Through a Theoretical Perspective - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: This paper addresses layer-wise sparsity in LLMs, providing a theoretical perspective and a novel sparsity allocation method. It directly aligns with model compression and efficiency breakthroughs.
PEARL: Towards Permutation-Resilient LLMs - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The paper introduces PEARL, a novel framework for improving LLM robustness to input permutations using distributionally robust optimization. This aligns with foundational research in LLM behavior and training dynamics.
Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The paper presents a recurrent language model architecture optimized for efficiency, which aligns with the Model Architecture and Model Compression criteria.
Towards Efficient Automatic Self-Pruning of Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The paper introduces an automatic self-pruning framework for LLMs, which aligns closely with the 'Model Compression' criterion, particularly in pruning and efficiency improvements.
Weighted Low-rank Approximation via Stochastic Gradient Descent on Manifolds - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The paper addresses weighted low-rank approximation using stochastic gradient descent on manifolds, which is relevant to model compression and efficiency.
RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The paper introduces a two-stage KV cache compression strategy for LLMs, which is highly relevant to model compression and efficiency improvements in large language models.
MaskPrune: Mask-based LLM Pruning for Layer-wise Uniform Structures - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The MaskPrune method introduces a novel structured pruning approach for LLMs, focusing on uniformity across layers, which is highly relevant to model compression.
NVR: Vector Runahead on NPUs for Sparse Memory Access - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: NVR addresses cache misses in sparse DNN workloads with a novel prefetching mechanism, aligning with the model compression criterion through its focus on sparsity and hardware efficiency.
On the Duality between Gradient Transformations and Adapters - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper explores the duality between gradient transformations and adapters, providing insights into memory-efficient training. This aligns with the 'Model Compression' criterion, particularly in efficiency improvements.
ETS: Efficient Tree Search for Inference-Time Scaling - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper introduces Efficient Tree Search (ETS), which optimizes KV cache sharing during inference-time scaling, aligning with the model compression criterion through its focus on memory efficiency and algorithmic improvements.
LSR-Adapt: Ultra-Efficient Parameter Tuning with Matrix Low Separation Rank Kernel Adaptation - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper proposes a novel low-separation-rank kernel for parameter-efficient fine-tuning, which aligns with the model compression criterion and introduces a structural innovation.
Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper focuses on improving inference efficiency for long-context LLMs by introducing a novel activation-aware approach for key-value retrieval. This aligns with the 'Model Compression' criterion, specifically in the context of KV cache optimization.
Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper introduces LoRAM, a memory-efficient LoRA training scheme for LLMs, which aligns with the model compression criterion and offers a novel approach to efficiency.
The Self-Improvement Paradox: Can Language Models Bootstrap Reasoning Capabilities without External Scaffolding? - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper explores self-improvement in LLMs, focusing on generating synthetic data autonomously, which aligns with foundational research in LLM behavior and training dynamics.
Electron flow matching for generative reaction mechanism prediction obeying conservation laws - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper introduces FlowER, a generative framework for reaction mechanism prediction that enforces conservation laws, aligning with AI for Science by addressing foundational challenges in chemical modeling.
GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper proposes a fully quantized training framework for LLM fine-tuning, which aligns with model compression and efficiency topics. It introduces a novel integer-based approach for on-device fine-tuning.
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The HEADINFER method introduces a memory-efficient inference strategy for LLMs by offloading KV cache, which aligns with the model compression and efficiency criterion.
Sens-Merging: Sensitivity-Guided Parameter Balancing for Merging Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper introduces a sensitivity-guided method for merging LLMs, which aligns with foundational research in LLM architecture and efficiency improvements.
QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper introduces a novel quantized zeroth-order fine-tuning framework for LLMs, which aligns with the model compression criterion, specifically addressing low-precision training and optimization.
Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper proposes a novel method for mitigating interference in LLM merging, which aligns with foundational research in model compression and efficiency.
Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper introduces Tactic, a sparse attention mechanism for long-context LLMs, which aligns with the model compression criterion by addressing efficiency in attention mechanisms.
GoRA: Gradient-driven Adaptive Low Rank Adaptation - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper proposes GoRA, a novel gradient-driven adaptive low-rank adaptation method, which directly aligns with model compression and efficiency topics, particularly low-rank approaches.
AdaSplash: Adaptive Sparse Flash Attention - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: AdaSplash improves sparse attention mechanisms, directly impacting Transformer efficiency and aligning well with topics like sparsity and low-rank adaptations.
Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper proposes a learning-based system for parallel decoding in LLMs, which aligns with foundational research in efficiency and decoding innovations.
An Efficient Row-Based Sparse Fine-Tuning - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper proposes a sparse fine-tuning framework based on pruning, which is highly relevant to model compression and efficiency research.
Large Language-Geometry Model: When LLM meets Equivariance - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper proposes a novel framework integrating E(3)-equivariance with LLM capabilities for handling 3D physical systems. It introduces architectural innovations aligning with foundational AI for Science.
Efficient Long-Decoding Inference with Reasoning-Aware Attention Sparsity - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper proposes a reasoning-aware attention sparsity method for efficient long-decoding inference, which is highly relevant to foundational research in LLM efficiency and sparsity.
CacheFocus: Dynamic Cache Re-Positioning for Efficient Retrieval-Augmented Generation - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper introduces a novel cache management approach, addressing foundational challenges in model efficiency for long-context LLMs.
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper discusses a hardware-aligned sparse attention mechanism, relevant to 'Model Compression' due to its sparsity and efficiency focus.
CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: Introduces a low-rank activation mechanism to pre-train LLMs more efficiently, aligning with the model compression and foundational enhancements for training efficiency.
Weighted quantization using MMD: From mean field to mean shift via gradient flows - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper introduces a novel quantization method using MMD and gradient flows, which aligns with the model compression criterion. The proposed MSIP algorithm and its theoretical grounding add significant novelty.
Dynamic Chain-of-Thought: Towards Adaptive Deep Reasoning - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper proposes a dynamic chain-of-thought reasoning framework, which aligns with foundational research in adaptive reasoning and efficiency in LLMs.
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: This paper introduces a novel framework for efficient KV cache optimization in LLMs, which is relevant to 'Model Compression'.
Scalable First-order Method for Certifying Optimal k-Sparse GLMs - Score: 17 (R=9, N=8) - Date: 2025-02-14 - Comment: The paper proposes a scalable first-order method for certifying optimality in sparse GLMs, which directly relates to the model compression criterion through its focus on sparsity and efficient optimization techniques.
LoRA Training Provably Converges to a Low-Rank Global Minimum or It Fails Loudly (But it Probably Won't Fail) - Score: 17 (R=9, N=8) - Date: 2025-02-14 - Comment: The paper provides a theoretical analysis of LoRA training dynamics, which aligns with the model compression criterion, specifically low-rank approaches. It offers foundational insights into why LoRA training converges effectively.
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU - Score: 17 (R=9, N=8) - Date: 2025-02-14 - Comment: The paper introduces a novel framework for handling extremely long context lengths in LLMs, addressing efficiency and memory challenges. This aligns with the 'Large Language Models' and 'Model Compression' criteria.
Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation - Score: 17 (R=9, N=8) - Date: 2025-02-14 - Comment: The paper proposes a memory-efficient pruning strategy (Skrr) for text encoders in text-to-image diffusion models, which aligns with model compression and sparsity techniques.
Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper explores emergent value systems in LLMs and proposes a new research agenda called utility engineering. This aligns with the 'Large Language Models (LLMs)' criterion, focusing on theoretical insights into LLM behavior and interpretability.
Scalable Thermodynamic Second-order Optimization - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper proposes a scalable second-order optimization method leveraging thermodynamic computers, which aligns with model efficiency and optimization breakthroughs.
Training-Free Restoration of Pruned Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper proposes a training-free method for restoring pruned neural networks, which aligns with foundational research on model compression and sparsity.
LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper introduces LowRA, a framework for ultra-low-bit LoRA fine-tuning of LLMs, which aligns with the model compression criterion, specifically quantization and efficiency breakthroughs.
On Mechanistic Circuits for Extractive Question-Answering - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper explores mechanistic circuits in extractive QA tasks, providing insights into the interplay between parametric memory and retrieved context. It aligns with foundational research in understanding LLM behavior and interpretability.
Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper introduces efficient optimizers for LLMs using structured Fisher approximation with a low-rank extension. This aligns with foundational research in model efficiency and optimization, making it highly relevant.
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper provides insights into how LLMs learn reasoning structures, emphasizing the importance of structure over content in Chain-of-Thought reasoning. This aligns with foundational research on LLM behavior and training dynamics.
Online Scheduling for LLM Inference with KV Cache Constraints - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper addresses KV cache constraints in LLM inference, which is directly relevant to model compression and efficiency. The theoretical scheduling algorithms and empirical results add novelty.
Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper introduces a novel dataset pruning method based on difficulty and uncertainty, aligning with the model compression criterion.
Exploring Model Invariance with Discrete Search for Ultra-Low-Bit Quantization - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper proposes a framework for ultra-low-bit quantization, which aligns with model compression and efficiency improvements. The discrete search algorithm for permutation invariance is a novel contribution.
LoCA: Location-Aware Cosine Adaptation for Parameter-Efficient Fine-Tuning - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper introduces a novel frequency-domain parameter-efficient fine-tuning method (LoCA) that builds on low-rank adaptation (LoRA). This aligns with the 'Model Compression' criterion, particularly in low-rank approaches and efficiency breakthroughs.
HRP: High-Rank Preheating for Superior LoRA Initialization - Score: 17 (R=9, N=8) - Date: 2025-02-12 - Comment: The paper introduces a novel initialization method for LoRA, which directly contributes to low-rank adaptation and aligns with model compression and efficiency breakthroughs.
Private Low-Rank Approximation for Covariance Matrices, Dyson Brownian Motion, and Eigenvalue-Gap Bounds for Gaussian Perturbations - Score: 17 (R=9, N=8) - Date: 2025-02-12 - Comment: The paper introduces a novel approach to low-rank approximation with differential privacy, leveraging Dyson Brownian motion. This aligns with the model compression topic, particularly low-rank approaches, and provides theoretical insights.
Harnessing Language's Fractal Geometry with Recursive Inference Scaling - Score: 17 (R=9, N=8) - Date: 2025-02-12 - Comment: The paper introduces Recursive Inference Scaling (RINS), which provides theoretical insights into scaling laws and inference methods for LLMs, aligning with foundational research in LLM behavior.
A Memory Efficient Randomized Subspace Optimization Method for Training Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-12 - Comment: The paper introduces a randomized subspace optimization method for training LLMs, addressing memory efficiency challenges. This aligns with model compression and efficiency criteria and provides strong theoretical contributions.
Position: Episodic Memory is the Missing Piece for Long-Term LLM Agents - Score: 17 (R=9, N=8) - Date: 2025-02-12 - Comment: The paper discusses episodic memory for LLMs, which aligns with emerging trends and foundational research in LLM behavior and long-term memory integration.
Model Fusion via Neuron Transplantation - Score: 17 (R=9, N=8) - Date: 2025-02-12 - Comment: The paper introduces a novel model fusion technique called Neuron Transplantation, which aligns with model compression and efficiency breakthroughs by reducing memory and inference costs.
Matryoshka Quantization - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper introduces Matryoshka Quantization, a novel multi-scale quantization technique, which aligns with the 'Model Compression' criterion due to its focus on quantization and efficiency improvements.
Systematic Outliers in Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper investigates systematic outliers in LLMs, providing theoretical insights into their formation and impact, which aligns with foundational research in LLM behavior and interpretability.
Calibrating LLMs with Information-Theoretic Evidential Deep Learning - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper discusses a novel method (IB-EDL) for calibrating LLMs using an information bottleneck, which aligns with the 'Large Language Models' criterion by providing theoretical insights into improving LLM trustworthiness and uncertainty estimation.
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper proposes Adaptive Parallel Encoding (APE) for efficient context-augmented generation, which is relevant to model compression and efficiency improvements in LLMs.
Distinguishing Cause from Effect with Causal Velocity Models - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: The causal velocity model for bivariate SCMs offers a novel parametrization and theoretical insights, aligning with 'Emerging Trends' for causal modeling in foundational AI research.
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: This paper introduces QuEST, which explores cutting-edge quantization-aware training and demonstrates stable performance with weights and activations in 1-bit. This directly aligns with the criterion on model compression breakthroughs.
No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: The isotropic model merging framework introduces innovative techniques for task-specific model integration, offering novel insights into representation alignment and efficiency in merged models.
Tighter sparse variational Gaussian processes - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: Introduces a tighter sparse variational Gaussian process, relevant for sparsity and representation learning. Strong theoretical contribution in GP optimization.
KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: Presents KV cache quantization for LLM inference, directly aligning with model compression and efficiency while offering insights into layer-wise sensitivity and optimization.
Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: Relevant to Large Language Models and sparsity. Discusses novel techniques for memory-efficient model merging using sparseness in experts, addressing efficiency and storage concerns.
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: Introduces a backtracking method for LLM reasoning improvement, falling squarely within insights into reasoning processes and mechanisms, specifically for LLMs.
Probe-Free Low-Rank Activation Intervention - Score: 17 (R=9, N=8) - Date: 2025-02-07 - Comment: Proposes a probe-free low-rank activation intervention for inference-time steering of LLMs, which aligns with criterion 4 as it introduces innovations in LLM interpretability leveraging low-rank techniques.
Advancing Weight and Channel Sparsification with Enhanced Saliency - Score: 17 (R=9, N=8) - Date: 2025-02-07 - Comment: Proposes a novel dynamic sparse training paradigm enhancing saliency-based sparsification strategies. Highly relevant to model compression (criterion 3), specifically with advancements in pruning and sparsity.
Bilevel ZOFO: Bridging Parameter-Efficient and Zeroth-Order Techniques for Efficient LLM Fine-Tuning and Meta-Training - Score: 17 (R=9, N=8) - Date: 2025-02-07 - Comment: Introduces a bilevel optimization framework combining parameter-efficient tuning with zeroth-order methods, aligning with model compression (criterion 3) and efficient methods for fine-tuning LLMs (criterion 4).
An Augmented Backward-Corrected Projector Splitting Integrator for Dynamical Low-Rank Training - Score: 17 (R=9, N=8) - Date: 2025-02-06 - Comment: This paper proposes a novel low-rank training method, combining theoretical robustness with efficiency. It aligns well with the model compression and low-rank criteria.
Leveraging the true depth of LLMs - Score: 17 (R=9, N=8) - Date: 2025-02-06 - Comment: Focuses on the architectural efficiency improvement of LLMs by decoupling layers for parallel evaluation, aligning closely with the 'Model Compression' topic and offering insights into computational optimization without retraining.
Theoretical Guarantees for Low-Rank Compression of Deep Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-02-06 - Comment: The paper provides theoretical insights into low-rank compression, aligning well with the model compression criterion by focusing on foundational recovery guarantees.
ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization - Score: 17 (R=9, N=8) - Date: 2025-02-06 - Comment: The ParetoQ framework addresses low-bit quantization, which is directly relevant to model compression. It provides new insights into scaling laws and transitions in quantized representations, showing clear foundational value.
EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization - Score: 17 (R=9, N=8) - Date: 2025-02-05 - Comment: The paper proposes EasySpec, which includes innovations in speculative decoding and optimizes multi-GPU utilization through layer-parallelism and KV cache calibration. This aligns with the topic of model compression and efficiency breakthroughs.
When Dimensionality Hurts: The Role of LLM Embedding Compression for Noisy Regression Tasks - Score: 17 (R=9, N=8) - Date: 2025-02-05 - Comment: Explores embedding compression in LLMs via autoencoders, addressing sparsity and efficiency in noisy tasks, which ties into representation learning and compression strategies.
Reasoning Bias of Next Token Prediction Training - Score: 17 (R=9, N=8) - Date: 2025-02-05 - Comment: This study explores the reasoning biases in next-token prediction training and contrasts it with other methodologies, providing insights into LLM training strategies.
CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing - Score: 17 (R=9, N=8) - Date: 2025-02-05 - Comment: Introduces a token-level collaborative inference framework for LLMs aiming to optimize inference efficiency, aligning well with the Model Compression criteria.
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: The paper introduces a novel inference-time scaling method using particle-based Monte Carlo techniques for LLMs, offering possible breakthroughs in efficiency and robustness, relevant to inference optimization.
One-step full gradient suffices for low-rank fine-tuning, provably and efficiently - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: This paper investigates low-rank fine-tuning and provides both theoretical and empirical insights into improving LoRA using spectral initialization. This is directly relevant to compression/efficiency breakthroughs.
Nearly Lossless Adaptive Bit Switching - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: Proposes nearly lossless bit-switching quantization and addresses inter-precision interference with theoretical contributions, relevant to model compression.
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: Presents a framework for evaluating reasoning capabilities of LLMs under complexity scaling, providing theoretical insights into their limits, aligning with foundational LLM research.
RandLoRA: Full-rank parameter-efficient fine-tuning of large models - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: RandLoRA proposes significant advancements in parameter-efficient methods by addressing the limitations of low-rank adaptations in fine-tuning using full-rank optimization. Relevant to compression and efficiency topics like low-rank approaches.
Symmetric Pruning of Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: Addresses theoretical insights into pruning methods for Large Language Models (LLMs), directly matching the model compression topic and offering improvements to existing techniques.
Memory-Efficient Fine-Tuning of Transformers via Token Selection - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: Introduces TokenTune for memory-efficient fine-tuning of transformer models using token selection, which aligns with model compression and efficiency breakthroughs.
Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-03 - Comment: The paper introduces Pivoting Factorization, which directly applies to low-rank compression in large language models, making it highly relevant to model compression and efficiency techniques.

Representation Learning Theory and Structure (92)

The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent - Score: 19 (R=10, N=9) - Date: 2025-02-20 - Comment: The paper provides theoretical insights into the computational advantages of depth in neural networks, aligning closely with representation learning and training dynamics.
Consistency of augmentation graph and network approximability in contrastive learning - Score: 19 (R=10, N=9) - Date: 2025-02-07 - Comment: The work addresses contrastive learning by providing new theoretical insights into augmentation graph consistency and neural approximability, making it a significant foundational contribution to representation learning.
Constrained belief updates explain geometric structures in transformer representations - Score: 19 (R=10, N=9) - Date: 2025-02-05 - Comment: Touches on representation learning by analyzing the geometric structures and constrained Bayesian belief updates in transformer representations, providing foundational insights into encoder-decoder mechanisms.
Unveiling the Mechanisms of Explicit CoT Training: How Chain-of-Thought Enhances Reasoning Generalization - Score: 18 (R=10, N=8) - Date: 2025-02-10 - Comment: The paper investigates the mechanism of explicit Chain-of-Thought (CoT) training, which aligns with understanding LLM training dynamics and behaviors, directly addressing foundational insights for reasoning enhancement.
Learning with Exact Invariances in Polynomial Time - Score: 18 (R=9, N=9) - Date: 2025-02-28 - Comment: The paper provides a polynomial-time algorithm for learning with exact invariances, which is a cutting-edge theoretical contribution relevant to representation learning.
Do we really need the Rademacher complexities? - Score: 18 (R=9, N=9) - Date: 2025-02-24 - Comment: The paper challenges the reliance on Rademacher complexities for learning problems and introduces a novel universality result, which aligns with foundational research in representation learning.
Approximating Latent Manifolds in Neural Networks via Vanishing Ideals - Score: 18 (R=9, N=9) - Date: 2025-02-24 - Comment: The paper connects manifold learning with computational algebra using vanishing ideals, proposing a novel architecture for latent manifold approximation. It aligns well with representation learning and architectural innovation.
Breaking the bonds of generative artificial intelligence by minimizing the maximum entropy - Score: 18 (R=9, N=9) - Date: 2025-02-20 - Comment: This paper introduces a new paradigm for generative AI based on the minimal maximum entropy principle, which aligns with foundational research in representation learning and generative paradigms.
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity - Score: 18 (R=9, N=9) - Date: 2025-02-19 - Comment: The paper explores the limits of embedding space capacity, which is relevant to representation learning and compression. The focus on theoretical limits and optimization is highly novel.
System Message Generation for User Preferences using Open-Source Models - Score: 18 (R=9, N=9) - Date: 2025-02-18 - Comment: The paper introduces Inverse Flow for generative models, which aligns with foundational research in representation learning and generative paradigms. The proposed methods (IFM and ICM) are novel and impactful.
A Power Transform - Score: 18 (R=9, N=9) - Date: 2025-02-18 - Comment: The novel power transform framework connects across loss functions, activations, and kernels, offering a significant theoretical contribution to foundational methods like representation learning.
Representation and Interpretation in Artificial and Natural Computing - Score: 18 (R=9, N=9) - Date: 2025-02-17 - Comment: The paper discusses representation and modes of computing, touching on theoretical aspects of computing beyond Turing Machines. It aligns with emerging trends and foundational research.
A novel approach to data generation in generative model - Score: 18 (R=9, N=9) - Date: 2025-02-17 - Comment: The paper introduces the Convergent Fusion Paradigm (CFP) theory, which redefines data generation in generative models and offers a novel geometric framework, aligning with foundational research in representation learning and generative modeling.
Solvable Dynamics of Self-Supervised Word Embeddings and the Emergence of Analogical Reasoning - Score: 18 (R=9, N=9) - Date: 2025-02-17 - Comment: The paper provides analytical solutions for self-supervised word embedding dynamics, offering foundational insights into representation learning and training dynamics.
When do neural networks learn world models? - Score: 18 (R=9, N=9) - Date: 2025-02-14 - Comment: The paper provides theoretical insights into when neural networks learn world models, which aligns with representation learning and foundational research into training dynamics.
From Kernels to Features: A Multi-Scale Adaptive Theory of Feature Learning - Score: 18 (R=9, N=9) - Date: 2025-02-06 - Comment: Presents a theoretical framework that bridges kernel-based and feature-adaptive learning, contributing to representation learning through a multi-scale theoretical approach. Highly relevant to model understanding and feature learning.
Optimal Spectral Transitions in High-Dimensional Multi-Index Models - Score: 18 (R=9, N=9) - Date: 2025-02-05 - Comment: Introduces spectral methods for a theoretical problem rooted in high-dimensional reconstruction, closely aligning with Representation Learning and fundamental computational limits.
How Memory in Optimization Algorithms Implicitly Modifies the Loss - Score: 18 (R=9, N=9) - Date: 2025-02-05 - Comment: This work analyzes how memory in optimization algorithms implicitly modifies the loss landscape, providing new insights into optimization dynamics, which aligns strongly with representation learning and training dynamics.
Neural Collapse Beyond the Unconstrainted Features Model: Landscape, Dynamics, and Generalization in the Mean-Field Regime - Score: 18 (R=9, N=9) - Date: 2025-02-04 - Comment: Explores Neural Collapse, providing theoretical insights into training dynamics and representation learning. This is highly aligned with foundational research.
A theoretical framework for overfitting in energy-based modeling - Score: 18 (R=9, N=9) - Date: 2025-02-03 - Comment: Develops a theoretical framework for overfitting in energy-based generative models, exploring spectral learning dynamics. Matches foundational research in representation learning.
An Invitation to Neuroalgebraic Geometry - Score: 18 (R=9, N=9) - Date: 2025-02-03 - Comment: The paper introduces a new theoretical framework connecting algebraic geometry and machine learning, specifically targeting neural networks. This aligns with 'Representation Learning' as it provides unique insights on the expressivity and training dynamics of neural networks.
Your contrastive learning problem is secretly a distribution alignment problem - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: The paper reframes contrastive learning as a distribution alignment problem using optimal transport, providing theoretical insights into representation learning. This aligns closely with foundational research in representation learning.
Self-Training Elicits Concise Reasoning in Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: The paper proposes methods to elicit concise reasoning in LLMs, which aligns with foundational research in LLM behavior and training dynamics.
Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: This paper introduces Representation Engineering (RepE) as a novel paradigm for controlling LLM behavior by manipulating internal representations. It aligns closely with the 'Representation Learning' and 'Large Language Models' criteria, offering theoretical insights and a comprehensive framework for a new direction in LLM research.
Do Large Language Models Know How Much They Know? - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: The paper investigates whether LLMs can assess the scope of their own knowledge, which aligns with foundational research in LLM behavior and interpretability.
Norm Growth and Stability Challenges in Localized Sequential Knowledge Editing - Score: 17 (R=9, N=8) - Date: 2025-02-27 - Comment: The paper investigates challenges in localized sequential knowledge editing for LLMs, focusing on stability and norm growth. This aligns with foundational research in LLM behavior and interpretability.
Consistent Amortized Clustering via Generative Flow Networks - Score: 17 (R=9, N=8) - Date: 2025-02-27 - Comment: The paper proposes a novel framework for amortized clustering using Generative Flow Networks, which contributes to representation learning and foundational clustering methods.
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases - Score: 17 (R=9, N=8) - Date: 2025-02-27 - Comment: The paper investigates pre-pretraining on formal languages to improve linguistic biases in LLMs, which provides insights into foundational aspects of LLM behavior and interpretability.
FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge - Score: 17 (R=9, N=8) - Date: 2025-02-27 - Comment: The paper introduces a novel unlearning method (KLUE) for faithful forgetting in LLMs, which aligns with foundational research on LLM behavior and interpretability.
Unveiling and Causalizing CoT: A Causal Pespective - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper explores causal perspectives on Chain-of-Thought reasoning in LLMs, providing theoretical insights into reasoning mechanisms. This aligns with foundational research in LLM behavior and interpretability.
How Do Large Language Monkeys Get Their Power (Laws)? - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper provides theoretical insights into power law scaling in large language models, which aligns with foundational research in LLM behavior and interpretability.
Function-Space Learning Rates - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper introduces a novel concept of function-space learning rates and proposes FLeRM, a method for hyperparameter transfer across model scales. This aligns with the Representation Learning and Model Architecture criteria, as it provides insights into training dynamics and scaling behavior.
Towards Auto-Regressive Next-Token Prediction: In-Context Learning Emerges from Generalization - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper provides theoretical insights into in-context learning and generalization in LLMs, aligning with the 'Large Language Models' criterion.
Forecasting Rare Language Model Behaviors - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper introduces a method to forecast rare LLM behaviors, which provides theoretical insights into LLM behavior and interpretability.
The Role of Sparsity for Length Generalization in Transformers - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper investigates the role of sparsity in length generalization for transformers, which aligns with 'Representation Learning' and 'Model Architecture' criteria due to its theoretical insights into transformer behavior.
Sequence-level Large Language Model Training with Contrastive Preference Optimization - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper introduces a contrastive preference optimization procedure for sequence-level LLM training, which aligns with foundational research in LLM training dynamics.
UniDyG: A Unified and Effective Representation Learning Approach for Large Dynamic Graphs - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper proposes UniDyG, a unified representation learning approach for dynamic graphs, which aligns with representation learning and introduces a novel Fourier Graph Attention mechanism.
Toward a Flexible Framework for Linear Representation Hypothesis Using Maximum Likelihood Estimation - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper introduces a flexible framework for linear representation hypothesis using maximum likelihood estimation, which aligns with representation learning and provides a principled approach to concept directions.
A Gap Between the Gaussian RKHS and Neural Networks: An Infinite-Center Asymptotic Analysis - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper investigates the gap between Gaussian RKHS and neural networks, providing theoretical insights into function spaces. This aligns with 'Representation Learning' and foundational research.
An explainable transformer circuit for compositional generalization - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper provides mechanistic insights into compositional generalization in transformers, which aligns with understanding and interpretability of model architectures.
Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors - Score: 17 (R=9, N=8) - Date: 2025-02-24 - Comment: The paper provides theoretical insights into generalization error bounds for representation learning using data-dependent Gaussian mixture priors. It aligns well with the representation learning criterion, offering foundational contributions.
Fr\'echet Cumulative Covariance Net for Deep Nonlinear Sufficient Dimension Reduction with Random Objects - Score: 17 (R=9, N=8) - Date: 2025-02-24 - Comment: The paper introduces a novel statistical dependence measure (FCCov) and a nonlinear sufficient dimension reduction framework, which aligns with representation learning by focusing on encoding essential features of high-dimensional data. The theoretical contributions and convergence guarantees add to its relevance.
Analyze the Neurons, not the Embeddings: Understanding When and Where LLM Representations Align with Humans - Score: 17 (R=9, N=8) - Date: 2025-02-24 - Comment: The paper analyzes neuron-level representations in LLMs and their alignment with human concepts, contributing to interpretability and representation learning. This aligns with foundational research in representation learning.
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers - Score: 17 (R=9, N=8) - Date: 2025-02-24 - Comment: The paper provides insights into how LLMs encode contextual information, particularly focusing on the role of punctuation and token-level analysis, which aligns with interpretability in LLMs.
Understanding SGD with Exponential Moving Average: A Case Study in Linear Regression - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The paper provides theoretical insights into the effectiveness of Exponential Moving Average (EMA) in SGD, which aligns with the training dynamics in neural networks under representation learning.
Zero loss guarantees and explicit minimizers for generic overparametrized Deep Learning networks - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The paper provides theoretical insights into overparameterized deep learning networks, focusing on zero loss guarantees and training dynamics, which aligns with the Representation Learning criterion.
Towards a Learning Theory of Representation Alignment - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The paper provides a learning-theoretic perspective on representation alignment, which aligns closely with the 'Representation Learning' criterion, particularly in understanding how representations are encoded and aligned.
Concept Layers: Enhancing Interpretability and Intervenability via LLM Conceptualization - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper introduces Concept Layers to enhance interpretability and intervenability in LLMs, which aligns with foundational research in model architecture and interpretability.
Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper explores the implicit bias and regularization effects of early stopping in gradient descent for overparameterized logistic regression, which provides insights into training dynamics and representation learning.
Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper introduces a novel reasoning pipeline for LLMs, focusing on hypothesis decomposition and amendment, which aligns with foundational research in LLM reasoning and interpretability.
Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper introduces a novel framework for self-organizing knowledge networks using graph reasoning and LLMs, which aligns with emerging trends and foundational research in knowledge representation.
Stability-based Generalization Bounds for Variational Inference - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper develops stability-based generalization bounds for variational inference, which aligns with foundational research in representation learning and theoretical insights into training dynamics.
Symmetric Rank-One Quasi-Newton Methods for Deep Learning Using Cubic Regularization - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper explores a novel quasi-Newton method for deep learning optimization, which aligns with foundational research in training dynamics and representation learning. The use of cubic regularization and indefinite Hessian approximations is a notable theoretical contribution.
Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: Provides a mechanistic interpretability analysis of fine-tuning in LLMs and proposes novel circuit-aware LoRA adaptations for performance gains.
Neural Interpretable Reasoning - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper introduces a novel framework for interpretable reasoning in neural networks, which aligns with representation learning and interpretability. The Markovian property and neural re-parametrization add theoretical depth.
Does Editing Provide Evidence for Localization? - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper critically examines interpretability in LLMs by analyzing the evidence provided by localized edits, which aligns with foundational research in LLM behavior and interpretability.
Sparse Autoencoder Features for Classifications and Transferability - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: Sparse autoencoders are explored for feature learning, which relates closely to 'Representation Learning' and 'Model Compression', particularly given the focus on sparsity and transferable features.
The Rotary Position Embedding May Cause Dimension Inefficiency in Attention Heads for Long-Distance Retrieval - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper investigates the Rotary Position Embedding (RoPE) and its inefficiencies in long-distance retrieval, which aligns with foundational research on LLM behavior and interpretability.
The Representation and Recall of Interwoven Structured Knowledge in LLMs: A Geometric and Layered Analysis - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: Analyzes multi-layer and geometric encoding patterns in LLMs, offering insights into representation dynamics, which strongly aligns with foundational research criteria.
Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper explores Neural Collapse and its impact on OOD detection and generalization, providing theoretical insights into representation learning and training dynamics.
A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper investigates grounding mechanisms in LLMs using a novel dataset, which aligns with foundational research in LLM behavior and interpretability.
Fenchel-Young Variational Learning - Score: 17 (R=9, N=8) - Date: 2025-02-17 - Comment: The paper proposes Fenchel-Young Variational Learning, a generalization of variational methods with new theoretical insights and applications to latent-variable models, aligning with foundational research in representation learning and autoencoders.
Prediction hubs are context-informed frequent tokens in LLMs - Score: 17 (R=9, N=8) - Date: 2025-02-17 - Comment: The paper explores hubness in LLMs and provides theoretical and empirical insights into token prediction behavior, aligning with foundational research on LLM behavior and interpretability.
On Space Folds of ReLU Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-02-17 - Comment: The paper provides a quantitative analysis of space folding in ReLU networks, offering foundational insights into neural network behavior and representation learning.
On the Importance of Embedding Norms in Self-Supervised Learning - Score: 17 (R=9, N=8) - Date: 2025-02-14 - Comment: This paper provides theoretical insights into the role of embedding norms in self-supervised learning, which aligns with representation learning and training dynamics in neural networks.
Unsupervised categorization of similarity measures - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper explores unsupervised categorization of similarity measures through representation learning, which aligns with foundational research in representation learning. The focus on independent metric spaces is novel.
RomanLens: Latent Romanization and its role in Multilinguality in LLMs - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper provides theoretical insights into multilingual representation in LLMs, specifically the role of latent romanization, which aligns with the criterion of understanding LLM behavior and interpretability.
LUNAR: LLM Unlearning via Neural Activation Redirection - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper introduces a novel unlearning methodology for LLMs, which aligns with the criterion of theoretical insights into LLM behavior. The use of neural activation redirection is innovative.
Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper introduces a novel framework (NeuronLens) for interpreting and manipulating neuron activations in LLMs, addressing polysemanticity. This aligns with the 'Large Language Models (LLMs)' criterion, focusing on interpretability and internal mechanisms.
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More - Score: 17 (R=9, N=8) - Date: 2025-02-12 - Comment: The paper introduces a novel training paradigm (MEAP) for LLMs that integrates Masked Language Modeling into Next-Token Prediction, which aligns with foundational research in representation learning and training dynamics of neural networks.
When More is Less: Understanding Chain-of-Thought Length in LLMs - Score: 17 (R=9, N=8) - Date: 2025-02-12 - Comment: The paper provides theoretical insights into Chain-of-Thought (CoT) reasoning in LLMs, including optimal CoT length and noise susceptibility. This aligns with the LLM behavior/interpretability criterion.
No Trick, No Treat: Pursuits and Challenges Towards Simulation-free Training of Neural Samplers - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper explores simulation-free training of neural samplers and analyzes mode collapse, which aligns with foundational research in representation learning and training dynamics.
Emergent Response Planning in LLM - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper identifies emergent planning behaviors in LLMs, focusing on how hidden representations encode future outputs. This aligns with 'Representation Learning' and provides theoretical insights into LLM behavior.
Learning Task Representations from In-Context Learning - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper explores how tasks are encoded in in-context learning within LLMs, focusing on attention heads and task vectors. This aligns with the 'Representation Learning' criterion, as it provides insights into how information is encoded in deep networks.
SEER: Self-Explainability Enhancement of Large Language Models' Representations - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper proposes SEER, a method to enhance LLM explainability by disentangling representations, which aligns with representation learning and interpretability of LLMs.
Implicit Bias of SignGD and Adam on Multiclass Separable Data - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: The paper characterizes implicit biases of optimization algorithms (SignGD and Adam) in multiclass classification, contributing to foundational research in training dynamics of neural networks.
Extracting and Understanding the Superficial Knowledge in Alignment - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: The paper explores the concept of 'superficial knowledge' in alignment for LLMs, addressing interpretability and alignment transfer, which is a relevant topic in investigating LLM behavior.
In Praise of Stubbornness: The Case for Cognitive-Dissonance-Aware Knowledge Updates in LLMs - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: Proposes cognitive-dissonance-aware knowledge updates in LLMs, aligning with insights into LLM behavior and robustness, which makes it highly relevant.
Sparse Autoencoders for Hypothesis Generation - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: Introduces sparse autoencoders for interpretable feature generation, which resonates with 'Representation Learning' in foundational research, especially around sparsity and interpretability in embeddings.
Distribution learning via neural differential equations: minimal energy regularization and approximation theory - Score: 17 (R=9, N=8) - Date: 2025-02-07 - Comment: Proposes minimal energy regularization and theoretical analysis for neural ODEs in distribution learning, which ties to foundational developments in representation learning and efficient approximation methods.
LLM Alignment as Retriever Optimization: An Information Retrieval Perspective - Score: 17 (R=9, N=8) - Date: 2025-02-07 - Comment: This paper proposes an alignment strategy for LLMs based on Information Retrieval principles. The focus on LLM behavioral alignment via a novel optimization method fits within the 'Large Language Models (LLMs)' and 'Representation Learning' criteria.
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning - Score: 17 (R=9, N=8) - Date: 2025-02-06 - Comment: The paper proposes a novel hybrid representation using latent and text tokens for improving reasoning in LLMs. This approach aligns with representation learning and architectural insights into language models.
Multi-level Supervised Contrastive Learning - Score: 17 (R=9, N=8) - Date: 2025-02-05 - Comment: The paper introduces a novel supervised contrastive learning method, which is directly aligned with foundational research in representation learning.
BRIDLE: Generalized Self-supervised Learning with Quantization - Score: 17 (R=9, N=8) - Date: 2025-02-05 - Comment: Proposes a framework combining residual quantization with self-supervised learning, very relevant to Representation Learning and training methodologies.
Enhancing Generalization via Sharpness-Aware Trajectory Matching for Dataset Condensation - Score: 17 (R=9, N=8) - Date: 2025-02-05 - Comment: This proposes a novel sharpness-aware trajectory matching method for dataset condensation aligning with fundamental principles of representation learning. The approach shows promise for enhancing generalization.
Discovering Chunks in Neural Embeddings for Interpretability - Score: 17 (R=9, N=8) - Date: 2025-02-05 - Comment: Introduces a novel framework for interpreting neural embeddings by identifying 'chunks', contributing to representation learning and interpretability of networks.
What is a Number, That a Large Language Model May Know It? - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: Examines numerical representation in LLMs and blends cognitive science approaches, highly relevant to foundational representation learning in LLMs.
Activation by Interval-wise Dropout: A Simple Way to Prevent Neural Networks from Plasticity Loss - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: The AID method introduces a novel dropout variation targeting training dynamics, aligning with foundational representation learning and training dynamics research.
LLM Safety Alignment is Divergence Estimation in Disguise - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: The theoretical perspective connecting LLM safety alignment to divergence estimation offers a foundational insight into behavior and interpretability, which aligns well with LLM theoretical insights.
A Comunication Framework for Compositional Generation - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: Explores compositional encodings in learned representations using a communication game framework, directly tying into representation learning and advancing insights on compositionality.
Self-Supervised Learning Using Nonlinear Dependence - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: Presents a novel self-supervised technique leveraging nonlinear dependency, tying closely to representation learning and enriching feature encoding, which is foundational.
A Metric for the Balance of Information in Graph Learning - Score: 17 (R=9, N=8) - Date: 2025-02-03 - Comment: Introduces a metric (NNRD) to balance structural and feature information in graph learning. Contains fundamental insights into managing representation biases in molecular graph data.

World Models, Exploration, and Open-Ended Reinforcement Learning (2)

Machine-generated text detection prevents language model collapse - Score: 17 (R=9, N=8) - Date: 2025-02-24 - Comment: The paper discusses the issue of model collapse in LLMs and proposes a novel methodology to prevent it using machine-generated text detection. This aligns with the 'Large Language Models' criterion, focusing on foundational insights into training dynamics and behavior.
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper explores foundational aspects of LLMs by introducing a novel benchmark (CounterMATH) and focusing on counterexample-driven reasoning, which aligns with the 'Large Language Models' criterion for theoretical insights into LLM behavior.