← Previous Summary | Monthly Overview | Next Summary →
2025-01 | 2025-02 | 2025-03

Personalized Monthly Topic Summary 2025/02

MetricValue
Total Papers792
Model Architecture164
Model Compression and Efficiency260
High Performance Computing40
Representation Learning297
Other Foundational Research31

Model Architecture (164)

  1. Fractal Generative Models - Score: 20.0 (R=0, N=0) - Date: 2025-02-25 - Comment: Author match

  2. OBELiX: A Curated Dataset of Crystal Structures and Experimentally Measured Ionic Conductivities for Lithium Solid-State Electrolytes - Score: 20.0 (R=0, N=0) - Date: 2025-02-21 - Comment: Author match

  3. Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization - Score: 19 (R=10, N=9) - Date: 2025-02-27 - Comment: Proposes Drop-Upcycling for training sparse Mixture of Experts (MoE) models, directly aligning with the 'Model Architecture' and 'Model Compression' criteria.

  4. Tight Clusters Make Specialized Experts - Score: 19 (R=10, N=9) - Date: 2025-02-24 - Comment: The paper proposes an Adaptive Clustering router for Sparse Mixture-of-Experts (MoE), directly addressing foundational aspects of MoE architectures and improving their robustness and performance.

  5. MoM: Linear Sequence Modeling with Mixture-of-Memories - Score: 19 (R=10, N=9) - Date: 2025-02-20 - Comment: The paper proposes Mixture-of-Memories (MoM), a novel architecture for linear sequence modeling inspired by neuroscience, which aligns with the model architecture criterion and introduces a new paradigm.

  6. MeMo: Towards Language Models with Associative Memory Mechanisms - Score: 19 (R=10, N=9) - Date: 2025-02-19 - Comment: The paper proposes a novel architecture, MeMo, with associative memory mechanisms for LLMs, which aligns with the model architecture criterion by introducing a new paradigm for memorization and transparency.

  7. In-context denoising with one-layer transformers: connections between attention and associative memory retrieval - Score: 19 (R=10, N=9) - Date: 2025-02-10 - Comment: Explores connections between attention mechanisms and associative memory in transformers within a theoretical framework, linking strongly to foundational representation learning and transformer behaviors.

  8. Strassen Attention: Unlocking Compositional Abilities in Transformers Based on a New Lower Bound Method - Score: 19 (R=10, N=9) - Date: 2025-02-04 - Comment: This paper introduces 'Strassen attention' as a scalable mechanism addressing the limitations of current attention mechanisms, making significant contributions to Transformer architecture research.

  9. CAMEx: Curvature-aware Merging of Experts - Score: 18 (R=10, N=8) - Date: 2025-02-27 - Comment: The paper introduces CAMEx, a novel curvature-aware merging protocol for Mixture-of-Experts (MoE) models, which aligns closely with the 'Model Architecture' and 'Representation Learning' criteria. It provides theoretical and empirical insights into expert merging, improving optimization and generalization.

  10. BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference - Score: 18 (R=10, N=8) - Date: 2025-02-25 - Comment: BigMac introduces a communication-efficient MoE structure, directly aligning with architectural innovations in MoE and efficiency improvements.

  11. A fast convergence algorithm based on binary integer programming for expert load balancing in MoE LLMs - Score: 18 (R=10, N=8) - Date: 2025-02-24 - Comment: The paper proposes a binary integer programming-based algorithm for expert load balancing in MoE architectures, directly addressing a key challenge in MoE training and efficiency.

  12. LESA: Learnable LLM Layer Scaling-Up - Score: 18 (R=10, N=8) - Date: 2025-02-20 - Comment: LESA proposes a learnable method for scaling up LLM layers, which directly addresses architectural innovations and efficiency in LLM training.

  13. MoBA: Mixture of Block Attention for Long-Context LLMs - Score: 18 (R=10, N=8) - Date: 2025-02-20 - Comment: The paper introduces Mixture of Block Attention (MoBA), which applies Mixture of Experts (MoE) principles to attention mechanisms in LLMs. This aligns closely with the 'Model Architecture' and 'Large Language Models' criteria, focusing on architectural innovation and efficiency improvements.

  14. Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models - Score: 18 (R=10, N=8) - Date: 2025-02-19 - Comment: The paper introduces MoE-specific knowledge distillation methods, which directly align with the Mixture-of-Experts (MoE) topic and provide novel insights into leveraging non-activated experts.

  15. Accurate Expert Predictions in MoE Inference via Cross-Layer Gate - Score: 18 (R=10, N=8) - Date: 2025-02-19 - Comment: The paper focuses on improving MoE inference efficiency through cross-layer gating and caching strategies, which directly aligns with the topic of Mixture-of-Experts and model efficiency.

  16. Mixture of Tunable Experts - Behavior Modification of DeepSeek-R1 at Inference Time - Score: 18 (R=10, N=8) - Date: 2025-02-18 - Comment: The paper focuses on Mixture-of-Experts (MoE) and provides insights into the behavior and control of specific experts in LLMs, aligning closely with the 'Model Architecture' and 'Representation Learning' criteria.

  17. Mixture of Decoupled Message Passing Experts with Entropy Constraint for General Node Classification - Score: 18 (R=10, N=8) - Date: 2025-02-13 - Comment: The paper proposes a Mixture-of-Experts (MoE) framework for node classification, which is highly relevant to model architecture and MoE research. The entropy constraint adds a novel perspective to MoE design.

  18. Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline - Score: 18 (R=10, N=8) - Date: 2025-02-13 - Comment: The paper proposes Klotski, an efficient MoE inference engine, which directly aligns with the core topic of Mixture-of-Experts and efficiency improvements.

  19. Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach - Score: 18 (R=10, N=8) - Date: 2025-02-13 - Comment: The paper addresses robustness in Mixture of Experts (MoE) models, which directly aligns with the model architecture criterion. The dual-model approach and robustness bounds are novel contributions.

  20. MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks - Score: 18 (R=10, N=8) - Date: 2025-02-12 - Comment: The paper introduces MoENAS, a Mixture-of-Experts-based NAS method for edge DNNs, which aligns with architectural innovations and MoE research. It also addresses fairness and robustness, adding to its relevance.

  21. MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing - Score: 18 (R=10, N=8) - Date: 2025-02-11 - Comment: The paper proposes MoETuner, an optimization framework for Mixture-of-Experts (MoE) models, directly addressing architectural challenges like token routing and load balancing, which is highly relevant to model architecture innovations.

  22. LM2: Large Memory Models - Score: 18 (R=10, N=8) - Date: 2025-02-11 - Comment: The LM2 paper proposes a memory-augmented Transformer architecture, which is highly relevant to architectural innovations in LLMs and explores memory modules for enhanced reasoning capabilities.

  23. Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient - Score: 18 (R=10, N=8) - Date: 2025-02-10 - Comment: Analyzes joint scaling laws for memory-efficient MoE models, directly addressing theoretical and computational efficiency, which is highly relevant to 'Mixture of Experts' and architectural principles.

  24. Scaling Laws for Upcycling Mixture-of-Experts Language Models - Score: 18 (R=10, N=8) - Date: 2025-02-06 - Comment: Explores scaling laws for upcycling LLMs into MoE models, offering empirical insights into scaling efficiency. This aligns well with MoE-related architectural research and compression topics, particularly training efficiency.

  25. General Reasoning Requires Learning to Reason from the Get-go - Score: 18 (R=9, N=9) - Date: 2025-02-27 - Comment: The paper discusses disentangling reasoning and knowledge in LLMs, aligning with 'Large Language Models' as it proposes foundational changes to pretraining and reasoning paradigms. The focus on reasoning priors and curriculum learning adds significant novelty.

  26. Mechanistic PDE Networks for Discovery of Governing Equations - Score: 18 (R=9, N=9) - Date: 2025-02-26 - Comment: The paper proposes Mechanistic PDE Networks for discovering governing equations, which aligns with foundational research in AI for Science and introduces a novel architecture.

  27. Independence Tests for Language Models - Score: 18 (R=9, N=9) - Date: 2025-02-19 - Comment: The paper introduces statistical tests for determining independence between model weights, which is a novel and foundational contribution to understanding model training dynamics.

  28. RiemannGFM: Learning a Graph Foundation Model from Riemannian Geometry - Score: 18 (R=9, N=9) - Date: 2025-02-06 - Comment: Proposes a foundational graph model drawing from Riemannian geometry and structural vocabulary, aligning well with model architecture and generalization across domains. Very novel in approach.

  29. Hamming Attention Distillation: Binarizing Keys and Queries for Efficient Long-Context Transformers - Score: 18 (R=9, N=9) - Date: 2025-02-05 - Comment: The paper introduces a novel framework for binarizing keys and queries in transformer attention, focusing on compression and efficiency improvements.

  30. Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks - Score: 17 (R=10, N=7) - Date: 2025-02-25 - Comment: The paper evaluates MoE LLMs, focusing on expert contributions and gating network behavior, which directly aligns with the model architecture topic, particularly MoE analysis.

  31. (GG) MoE vs. MLP on Tabular Data - Score: 17 (R=10, N=7) - Date: 2025-02-07 - Comment: The paper introduces GG MoE with Gumbel-Softmax gating, exploring an innovative Mixture-of-Experts (MoE) model for efficiency in tabular data representation. It directly aligns with the MoE-specific architectural topic.

  32. R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: The paper proposes a test-time re-routing method for multimodal mixture-of-experts (MoE), which aligns well with the model architecture criterion, particularly for MoE innovations.

  33. Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: The paper identifies symbolic mechanisms in LLMs for abstract reasoning, aligning with the large language models criterion.

  34. Finite State Automata Inside Transformers with Chain-of-Thought: A Mechanistic Study on State Tracking - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: The paper provides a mechanistic study of state tracking in Transformers with Chain-of-Thought, offering insights into model behavior and architecture, aligning with foundational research.

  35. Forward-Cooperation-Backward (FCB) learning in a Multi-Encoding Uni-Decoding neural network architecture - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: The paper introduces a novel learning paradigm (Forward-Cooperation-Backward) and a new architecture (Multi-Encoding Uni-Decoding) with lateral synaptic connections, which aligns with the 'Model Architecture' criterion for architectural innovations.

  36. HDEE: Heterogeneous Domain Expert Ensemble - Score: 17 (R=9, N=8) - Date: 2025-02-27 - Comment: The paper proposes HDEE, a heterogeneous domain expert ensemble, which aligns with the 'Model Architecture' criterion by exploring ensemble methods with domain-specific heterogeneity. It provides insights into efficient training and evaluation.

  37. The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training - Score: 17 (R=9, N=8) - Date: 2025-02-27 - Comment: Proposes a novel blockwise learning rate strategy for Transformers, aligning with 'Large Language Models' and providing theoretical insights into training dynamics.

  38. (Mis)Fitting: A Survey of Scaling Laws - Score: 17 (R=9, N=8) - Date: 2025-02-27 - Comment: The paper surveys scaling laws in foundation models, which is highly relevant to understanding LLM behavior and training dynamics.

  39. Fourier Multi-Component and Multi-Layer Neural Networks: Unlocking High-Frequency Potential - Score: 17 (R=9, N=8) - Date: 2025-02-27 - Comment: Introduces a novel neural network architecture (FMMNN) with theoretical insights into its expressive power and optimization landscape, aligning with the 'Model Architecture' criterion.

  40. A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops - Score: 17 (R=9, N=8) - Date: 2025-02-27 - Comment: This paper provides a theoretical analysis of Self-consuming Training Loops (STLs), addressing model collapse and recursive stability. It offers insights into the interplay between model architecture and data composition, which aligns with foundational research in model training dynamics and architecture behavior. The extension to transformers and in-context learning adds further relevance.

  41. Graded Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper introduces graded neural networks, which propose a novel architectural framework with theoretical underpinnings, aligning with model architecture innovations.

  42. Reasoning with Latent Thoughts: On the Power of Looped Transformers - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper introduces looped transformers for reasoning tasks and connects them to CoT reasoning, aligning with 'Model Architecture' and 'Large Language Models' criteria.

  43. Neural Attention: A Novel Mechanism for Enhanced Expressive Power in Transformer Models - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper proposes Neural Attention as an enhancement to transformer models, which aligns with architectural innovations in transformers.

  44. Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper presents Erwin, a hierarchical transformer for large-scale physical systems, combining tree-based algorithms with attention mechanisms. This aligns with the Model Architecture criterion, particularly in architectural innovations for scalability.

  45. Muon is Scalable for LLM Training - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper introduces Muon, a scalable optimizer for LLM training, and demonstrates its application in training a Mixture-of-Experts (MoE) model. This aligns closely with the 'Model Architecture' and 'Model Compression' criteria due to its focus on MoE and computational efficiency.

  46. Entropy-Lens: The Information Signature of Transformer Computations - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper introduces an entropy-based framework to analyze transformer computations, aligning with foundational research in understanding LLM behavior and interpretability.

  47. Linear Attention for Efficient Bidirectional Sequence Modeling - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper introduces LION, a framework for linear attention in bidirectional sequence modeling, which aligns with model architecture innovations and provides theoretical foundations for efficient transformers.

  48. Ray-Tracing for Conditionally Activated Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: This paper introduces a novel hierarchical Mixture of Experts (MoE) architecture with dynamic activation, which is highly relevant to model architecture innovations and efficiency improvements.

  49. Multiscale Byte Language Models -- A Hierarchical Architecture for Causal Million-Length Sequence Modeling - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The paper introduces a hierarchical architecture for byte-level sequence modeling, which aligns with foundational research in model architecture and efficiency.

  50. Which Attention Heads Matter for In-Context Learning? - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: This paper investigates the mechanisms behind in-context learning in LLMs, focusing on the role of specific attention heads. It provides theoretical insights into LLM behavior and training dynamics.

  51. How Do LLMs Perform Two-Hop Reasoning in Context? - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: This paper provides theoretical insights into the training dynamics of transformers for two-hop reasoning, which aligns with understanding training dynamics and interpretability in LLMs.

  52. Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper investigates stratified manifold structures in LLM embedding spaces using a sparse Mixture-of-Experts (MoE) model, which aligns with representation learning and MoE analysis.

  53. Neural Attention Search - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper introduces Neural Attention Search (NAtS), a framework for reducing KV cache sizes in transformers, aligning with the 'Model Compression' and 'Model Architecture' criteria.

  54. RingFormer: Rethinking Recurrent Transformer with Adaptive Level Signals - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper introduces RingFormer, a recurrent Transformer with parameter-sharing and low-rank matrices, which aligns with the model architecture criterion and offers a novel approach to efficiency.

  55. Understanding Generalization in Transformers: Error Bounds and Training Dynamics Under Benign and Harmful Overfitting - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper develops a generalization theory for transformers, addressing error bounds and training dynamics under overfitting scenarios. This aligns with foundational research on model architecture and training dynamics.

  56. Zero Token-Driven Deep Thinking in LLMs: Unlocking the Full Potential of Existing Parameters via Cyclic Refinement - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The Zero Token Transformer introduces architectural innovations like parameter cycling and zero-token mechanisms, which align with the model architecture criterion.

  57. MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper introduces MUDD connections to improve Transformers, which is highly relevant to architectural innovations. The dynamic dense connections and their impact on efficiency are novel contributions.

  58. LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: Provides theoretical insights into loss-to-loss scaling laws for LLMs, deeply relevant for foundational research into training dynamics and scalability.

  59. Approximation of Permutation Invariant Polynomials by Transformers: Efficient Construction in Column-Size - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The study explores approximation capabilities of transformers concerning column-symmetric polynomials, advancing theoretical understanding of model expressivity.

  60. Teleportation With Null Space Gradient Projection for Optimization Acceleration - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper introduces a novel optimization technique for advanced architectures like Transformers, aligning with 'Model Architecture' and 'Emerging Trends'.

  61. The underlying structures of self-attention: symmetry, directionality, and emergent dynamics in Transformer training - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper provides a mathematical framework to analyze self-attention matrices, which aligns with foundational research on Transformer architectures and their training dynamics.

  62. Spectral Journey: How Transformers Predict the Shortest Path - Score: 17 (R=9, N=8) - Date: 2025-02-14 - Comment: The paper studies how transformers predict shortest paths and provides insights into their internal representations, which aligns with foundational research into model behavior and architecture.

  63. Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper introduces a novel approach to enhance Chain-of-Thought reasoning in LLMs using loop-aligned reasoning, contributing to foundational research on reasoning dynamics in LLMs.

  64. LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: LASP-2 proposes a new sequence parallelism method for linear attention, which aligns with architectural innovations and efficiency improvements in transformer models.

  65. LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation - Score: 17 (R=9, N=8) - Date: 2025-02-12 - Comment: The paper proposes a method to mitigate performance degradation in LLMs with extended context windows, focusing on theoretical insights into distribution drift and catastrophic forgetting. This aligns with the interest in foundational research on LLM behavior.

  66. Enabling Autoregressive Models to Fill In Masked Tokens - Score: 17 (R=9, N=8) - Date: 2025-02-12 - Comment: The paper introduces MARIA, a novel architecture combining MLM and AR models for masked infilling, which aligns with foundational research in model architecture and LLM behavior.

  67. A Multimodal PDE Foundation Model for Prediction and Scientific Text Descriptions - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper introduces a multimodal PDE foundation model integrating numerical and text modalities, which aligns with foundational research in AI for science and architecture-level innovations.

  68. "Let the AI conspiracy begin..." Language Model coordination is just one inference-intervention away - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper introduces a novel method for steering LLM behavior by targeting specific attention heads, which aligns with foundational research into LLM interpretability and behavior.

  69. MoFM: A Large-Scale Human Motion Foundation Model - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper introduces a motion foundation model (MoFM) inspired by LLMs, which aligns with foundational model architecture innovations and emerging trends in foundation models.

  70. Deep Generative Models with Hard Linear Equality Constraints - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper proposes a probabilistic approach to enforce hard constraints in deep generative models, which aligns with foundational innovations in generative modeling.

  71. Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper proposes a novel MoE-based approach for hierarchical meta-learning in dynamical system reconstruction, directly aligning with the 'Model Architecture' criterion and offering insights into MoE behavior.

  72. Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: Proposes a novel recurrent depth mechanism for latent reasoning, exploring architectural innovation - relevant and potentially foundational for test-time computation scaling.

  73. Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning - Score: 17 (R=9, N=8) - Date: 2025-02-07 - Comment: The paper proposes HILO, a hierarchical configuration for adapter experts and their rank in Mixture of Experts (MoE) fine-tuning in LLMs. This directly addresses architectural innovations and MoE-related efficiency improvements.

  74. On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation - Score: 17 (R=9, N=8) - Date: 2025-02-06 - Comment: Focuses on zero-initialized attention and its theoretical ties to mixture-of-experts (MoE) models, investigating optimal prompts and gating factors. Provides both theoretical insights and experiments, aligning with the architectural and representation-learning criteria.

  75. ReGNet: Reciprocal Space-Aware Long-Range Modeling and Multi-Property Prediction for Crystals - Score: 17 (R=9, N=8) - Date: 2025-02-06 - Comment: ReGNet introduces novel architectural concepts combining GNNs and Fourier-based reciprocal filters, along with an innovative MoE extension, aligning closely with model architecture advancements.

  76. MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: Presents new methods for merging homogeneous and heterogeneous MoEs, directly aligning with the model architecture criterion, specifically innovations in Mixture-of-Experts.

  77. Spectro-Riemannian Graph Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: Introduces a novel graph neural network framework unifying spectral and curvature signals, which aligns with architectural innovation and foundational model design.

  78. Beyond Limited Data: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: The paper introduces an iterative self-play framework for theorem proving with LLMs, which aligns with foundational insights into training dynamics and use of LLMs for novel tasks.

  79. \underline{E2}Former: A Linear-time \underline{E}fficient and \underline{E}quivariant Trans\underline{former} for Scalable Molecular Modeling - Score: 17 (R=9, N=8) - Date: 2025-02-03 - Comment: The E2Former introduces a novel efficient and equivariant transformer architecture with significant computational speedups, aligning well with the 'Model Architecture' criterion and particularly Transformer-based innovations.

  80. Geometric Kolmogorov-Arnold Superposition Theorem - Score: 17 (R=8, N=9) - Date: 2025-02-25 - Comment: The paper extends the Kolmogorov-Arnold Superposition Theorem to incorporate equivariance and invariance, which is a significant theoretical contribution relevant to model architecture.

  81. The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE - Score: 16 (R=9, N=7) - Date: 2025-02-25 - Comment: The paper investigates the impact of reducing symmetries on MoE and introduces a novel architecture (MoIE), aligning with the 'Model Architecture' criterion.

  82. The geometry of BERT - Score: 16 (R=9, N=7) - Date: 2025-02-18 - Comment: The paper provides a theoretical analysis of BERT's attention mechanism and internal geometry, which aligns with foundational research in Transformer interpretability.

  83. From Layers to States: A State Space Model Perspective to Deep Neural Network Layer Dynamics - Score: 16 (R=9, N=7) - Date: 2025-02-18 - Comment: This paper introduces a state-space model layer aggregation for deep networks. It aligns with 'Model Architecture', offering insights into layer dynamics and integration of SSM techniques.

  84. An Analysis for Reasoning Bias of Language Models with Small Initialization - Score: 16 (R=9, N=7) - Date: 2025-02-10 - Comment: This examines initialization scale effects on reasoning bias in LLMs, aligning closely with training dynamics and theoretical understanding of LLM behavior, which are foundational research topics.

  85. Masked Generative Nested Transformers with Decode Time Scaling - Score: 16 (R=9, N=7) - Date: 2025-02-04 - Comment: Focuses on decode-time scaling in nested transformers for visual generation tasks, aligning closely with the prompt’s interest in compute efficiency and transformer architecture innovations.

  86. Scalable Equilibrium Sampling with Sequential Boltzmann Generators - Score: 16 (R=8, N=8) - Date: 2025-02-26 - Comment: The paper introduces Sequential Boltzmann Generators with a Transformer-based normalizing flow, which is relevant to foundational research in generative modeling and architecture-level innovations.

  87. Sparks of cognitive flexibility: self-guided context inference for flexible stimulus-response mapping by attentional routing - Score: 16 (R=8, N=8) - Date: 2025-02-24 - Comment: The paper proposes a novel neural network (WiNN) for flexible stimulus-response mapping using attentional routing, which aligns with the 'Representation Learning' criterion by addressing training dynamics and adaptability.

  88. Deep Tree Tensor Networks for Image Recognition - Score: 16 (R=8, N=8) - Date: 2025-02-17 - Comment: The paper introduces a novel architecture, Deep Tree Tensor Networks (DTTN), which focuses on tensor networks and their application to feature interactions. This aligns with the 'Model Architecture' criterion, particularly in architectural innovations.

  89. Life-Code: Central Dogma Modeling with Multi-Omics Sequence Unification - Score: 16 (R=8, N=8) - Date: 2025-02-12 - Comment: The paper introduces a multi-omics framework with architectural innovations like codon tokenizers and hybrid long-sequence models, aligning with foundational research in AI for science.

  90. Automatic Annotation Augmentation Boosts Translation between Molecules and Natural Language - Score: 16 (R=8, N=8) - Date: 2025-02-11 - Comment: The paper introduces a framework for augmenting molecular annotations using LLMs, which aligns with the 'AI for Science' criterion by proposing a novel generative paradigm for molecular modeling.

  91. "Short-length" Adversarial Training Helps LLMs Defend "Long-length" Jailbreak Attacks: Theoretical and Empirical Evidence - Score: 16 (R=8, N=8) - Date: 2025-02-07 - Comment: The paper introduces adversarial training strategies for LLM robustness, with theoretical analysis and experimental validation. This contributes to foundational understanding in the behavior and robustness of LLMs.

  92. Learning the RoPEs: Better 2D and 3D Position Encodings with STRING - Score: 16 (R=8, N=8) - Date: 2025-02-05 - Comment: The paper significantly extends Rotary Position Encodings (RoPEs) into the domain of 2D and 3D position encodings using STRING. This aligns directly with foundational model architectures and contributes theoretical advancements.

  93. mPOLICE: Provable Enforcement of Multi-Region Affine Constraints in Deep Neural Networks - Score: 16 (R=8, N=8) - Date: 2025-02-05 - Comment: Introduces mPOLICE to handle multi-region constraints in neural networks, relevant to model architecture and training efficiency.

  94. Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers - Score: 16 (R=8, N=8) - Date: 2025-02-05 - Comment: The work presents theoretical bounds on scratchpad lengths in chain-of-thought reasoning for transformers, providing fundamental insights into LLM training dynamics and architectural limitations.

  95. Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge - Score: 16 (R=8, N=8) - Date: 2025-02-05 - Comment: Explores knowledge editing in multilingual LLMs with attention mechanisms, advancing architectural insights in LLMs and their interpretability.

  96. Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging - Score: 16 (R=8, N=8) - Date: 2025-02-05 - Comment: Proposes 'Soup-of-Experts,' a model architecture leveraging expert combinations via parameter averaging, which might be an interesting take on MoE-like approaches.

  97. GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation - Score: 16 (R=8, N=8) - Date: 2025-02-04 - Comment: Proposes a novel graph-enhanced retrieval-augmented generation model that builds on foundational architecture concepts like graph neural networks, which aligns with model architecture relevance.

  98. Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers - Score: 16 (R=8, N=8) - Date: 2025-02-04 - Comment: The paper provides insight into why Adam outperforms SGD in transformer training, contributing to foundational understanding of optimizer behavior in model training dynamics.

  99. BCAT: A Block Causal Transformer for PDE Foundation Models for Fluid Dynamics - Score: 16 (R=8, N=8) - Date: 2025-02-03 - Comment: The paper introduces a PDE foundation model with a novel block causal Transformer architecture, offering significant advancements in dynamic spatiotemporal modeling, which aligns with model architecture innovation.

  100. How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines - Score: 15 (R=9, N=6) - Date: 2025-02-18 - Comment: This paper offers a broad survey of scaling laws, which encompass topics such as sparse models and mixture-of-experts. This aligns with the 'Model Architecture' and 'Representation Learning' criteria as it touches on foundational and theoretical aspects of scaling models.

  101. LangProBe: a Language Programs Benchmark - Score: 15 (R=8, N=7) - Date: 2025-02-28 - Comment: The paper introduces a benchmark for evaluating language program architectures and optimization strategies, which is relevant to foundational research in LLMs and model architecture.

  102. Teasing Apart Architecture and Initial Weights as Sources of Inductive Bias in Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-02-28 - Comment: The paper explores the role of architecture and initial weights as sources of inductive bias, which aligns with the model architecture criterion.

  103. Variation Matters: from Mitigating to Embracing Zero-Shot NAS Ranking Function Variation - Score: 15 (R=8, N=7) - Date: 2025-02-28 - Comment: The paper addresses variation in zero-shot NAS ranking functions, which is relevant to architectural optimization and efficiency.

  104. Sliding Window Attention Training for Efficient Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-02-27 - Comment: The paper proposes a sliding window attention mechanism to improve efficiency in LLMs, which aligns with foundational research in model architecture and efficiency improvements.

  105. Revisiting Convolution Architecture in the Realm of DNA Foundation Models - Score: 15 (R=8, N=7) - Date: 2025-02-27 - Comment: The paper revisits CNNs for DNA foundation models, proposing ConvNova with architectural innovations like dilated and gated convolutions. It aligns with the 'Model Architecture' criterion by challenging the dominance of Transformers and SSMs.

  106. MixLLM: Dynamic Routing in Mixed Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-02-27 - Comment: Proposes a dynamic routing system for mixed LLMs, which aligns with 'Model Architecture' through its focus on dynamic systems and efficiency improvements.

  107. A Priori Generalizability Estimate for a CNN - Score: 15 (R=8, N=7) - Date: 2025-02-26 - Comment: The paper introduces a novel diagnostic tool using singular value decomposition for CNNs, which aligns with foundational research in model architecture and generalization analysis.

  108. Quantifying Logical Consistency in Transformers via Query-Key Alignment - Score: 15 (R=8, N=7) - Date: 2025-02-25 - Comment: The paper proposes a novel evaluation strategy for logical reasoning in Transformers, which is relevant to understanding LLM behavior and interpretability.

  109. To Share or Not to Share: Investigating Weight Sharing in Variational Graph Autoencoders - Score: 15 (R=8, N=7) - Date: 2025-02-25 - Comment: The investigation of weight sharing in variational graph autoencoders (VGAE) aligns with representation learning and architectural analysis.

  110. Hierarchical Residuals Exploit Brain-Inspired Compositionality - Score: 15 (R=8, N=7) - Date: 2025-02-25 - Comment: The introduction of Hierarchical Residual Networks (HiResNets) provides architectural innovation inspired by biological systems, aligning with the model architecture criterion.

  111. A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models - Score: 15 (R=8, N=7) - Date: 2025-02-25 - Comment: The paper introduces a stronger mixture of low-rank experts for fine-tuning foundation models, aligning with 'Model Compression' and 'Model Architecture' criteria.

  112. Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow - Score: 15 (R=8, N=7) - Date: 2025-02-25 - Comment: The paper introduces a novel feature attribution method for Transformers, which aligns with the model architecture topic, specifically analysis of existing architectures.

  113. On the Robustness of Transformers against Context Hijacking for Linear Classification - Score: 15 (R=8, N=7) - Date: 2025-02-24 - Comment: The paper provides theoretical insights into the robustness of transformers against context hijacking, which aligns with the analysis of transformer architectures.

  114. AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware Platforms - Score: 15 (R=8, N=7) - Date: 2025-02-24 - Comment: The paper introduces a framework for optimizing attention mechanisms across hardware platforms, which aligns with model architecture innovations, particularly in the context of Transformers.

  115. Unveiling Reasoning Thresholds in Language Models: Scaling, Fine-Tuning, and Interpretability through Attention Maps - Score: 15 (R=8, N=7) - Date: 2025-02-24 - Comment: The paper investigates reasoning thresholds in LLMs and provides insights into scaling and interpretability through attention maps, contributing to foundational understanding of LLM behavior.

  116. Prompt-to-Leaderboard - Score: 15 (R=8, N=7) - Date: 2025-02-21 - Comment: The paper introduces a method for prompt-specific evaluation of LLMs, which aligns with foundational research in LLM behavior and evaluation.

  117. seqKAN: Sequence processing with Kolmogorov-Arnold Networks - Score: 15 (R=8, N=7) - Date: 2025-02-21 - Comment: The paper introduces a new architecture for sequence processing based on Kolmogorov-Arnold Networks, which aligns with the Model Architecture criterion.

  118. Reward Models Identify Consistency, Not Causality - Score: 15 (R=8, N=7) - Date: 2025-02-21 - Comment: The analysis of reward models prioritizing structural consistency over causal correctness provides theoretical insights into LLM alignment and reasoning quality.

  119. EpMAN: Episodic Memory AttentioN for Generalizing to Longer Contexts - Score: 15 (R=8, N=7) - Date: 2025-02-21 - Comment: The paper proposes EpMAN, a method for improving long-context processing in LLMs, which aligns with the Large Language Models criterion due to its focus on architectural innovation for handling long contexts.

  120. Tuning Algorithmic and Architectural Hyperparameters in Graph-Based Semi-Supervised Learning with Provable Guarantees - Score: 15 (R=8, N=7) - Date: 2025-02-19 - Comment: The paper studies hyperparameter tuning in graph-based semi-supervised learning with provable guarantees, which aligns with foundational research in graph neural networks and architectural innovations.

  121. Spiking Vision Transformer with Saccadic Attention - Score: 15 (R=8, N=7) - Date: 2025-02-19 - Comment: The paper proposes a Spiking Vision Transformer with a novel Saccadic Spike Self-Attention mechanism, which aligns with architectural innovations in Transformers. The focus on spatio-temporal interactions is relevant to foundational research.

  122. Towards Mechanistic Interpretability of Graph Transformers via Attention Graphs - Score: 15 (R=8, N=7) - Date: 2025-02-19 - Comment: The paper introduces Attention Graphs for mechanistic interpretability of Graph Transformers, which aligns with representation learning and interpretability. The network science perspective adds novelty.

  123. Meta-Statistical Learning: Supervised Learning of Statistical Inference - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: The paper introduces a novel framework for statistical inference using Transformer-based architectures, which aligns with representation learning and architectural insights. The use of permutation-invariant Transformers is particularly relevant.

  124. DATA: Decomposed Attention-based Task Adaptation for Rehearsal-Free Continual Learning - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: The paper introduces a decomposed attention-based task adaptation method for continual learning, which is relevant to foundational research in representation learning and model efficiency.

  125. Error Bound Analysis for the Regularized Loss of Deep Linear Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: The paper provides theoretical insights into the optimization landscape of deep linear networks, which aligns with foundational research in training dynamics and optimization.

  126. AdaGC: Improving Training Stability for Large Language Model Pretraining - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: The paper proposes AdaGC, an adaptive gradient clipping framework for LLM pretraining, which is relevant to foundational research in training stability and optimization.

  127. Heterogeneous Resource Allocation with Multi-task Learning for Wireless Networks - Score: 15 (R=8, N=7) - Date: 2025-02-17 - Comment: The paper proposes a multi-task learning framework with conditional computation, aligning with architectural innovations like conditional/dynamic networks.

  128. The Ann Arbor Architecture for Agent-Oriented Programming - Score: 15 (R=8, N=7) - Date: 2025-02-17 - Comment: The paper introduces a conceptual framework for agent-oriented programming of LLMs, which provides a novel perspective on in-context learning and aligns with foundational research on LLMs.

  129. Biologically Plausible Brain Graph Transformer - Score: 15 (R=8, N=7) - Date: 2025-02-14 - Comment: The paper introduces a biologically plausible brain graph transformer, which aligns with architectural innovations and representation learning by encoding small-world properties of brain graphs.

  130. Auditing Prompt Caching in Language Model APIs - Score: 15 (R=8, N=7) - Date: 2025-02-12 - Comment: The paper audits prompt caching in LLM APIs, which provides insights into LLM behavior and architecture, aligning with foundational research in LLMs.

  131. EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification - Score: 15 (R=8, N=7) - Date: 2025-02-12 - Comment: The paper proposes a method to mitigate saturation effects in gradient-based circuit identification for transformer models, which aligns with interpretability and mechanistic insights into LLMs.

  132. ENFORCE: Exact Nonlinear Constrained Learning with Adaptive-depth Neural Projection - Score: 15 (R=8, N=7) - Date: 2025-02-11 - Comment: The paper introduces ENFORCE, a novel neural network architecture for exact nonlinear constrained learning, which includes an adaptive-depth neural projection module. This aligns with the 'Model Architecture' criterion due to its architectural innovation.

  133. EquiTabPFN: A Target-Permutation Equivariant Prior Fitted Networks - Score: 15 (R=8, N=7) - Date: 2025-02-11 - Comment: The paper introduces a novel equivariant model for tabular data, which aligns with architectural innovations and addresses a fundamental property in model design.

  134. Powerformer: A Transformer with Weighted Causal Attention for Time-series Forecasting - Score: 15 (R=8, N=7) - Date: 2025-02-11 - Comment: The paper introduces Powerformer, a Transformer variant with weighted causal attention, which aligns with architectural innovations in Transformers and their adaptation to time-series data.

  135. Mol-MoE: Training Preference-Guided Routers for Molecule Generation - Score: 15 (R=8, N=7) - Date: 2025-02-11 - Comment: The paper introduces Mol-MoE, a mixture-of-experts architecture for molecule generation. It aligns with the 'Model Architecture' criterion due to its focus on MoE and its novel routing mechanism.

  136. Mechanistic Understandings of Representation Vulnerabilities and Engineering Robust Vision Transformers - Score: 15 (R=8, N=7) - Date: 2025-02-10 - Comment: This paper investigates vulnerabilities in Vision Transformers and introduces a neural defense mechanism. Its insights into ViT behavior and robustness align with the 'analysis on existing architectures' criterion.

  137. On the Expressive Power of Subgraph Graph Neural Networks for Graphs with Bounded Cycles - Score: 15 (R=8, N=7) - Date: 2025-02-07 - Comment: The paper provides a theoretical analysis of subgraph-based GNN architectures, which aligns with studies on architecture-level innovations. It contributes to understanding GNN limitations and expressive power.

  138. CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally - Score: 15 (R=8, N=7) - Date: 2025-02-07 - Comment: Explores compositional limitations and cross-modal alignment of CLIP, proposing an architectural modification using linear transformation to address this issue.

  139. Maximizing the Position Embedding for Vision Transformers with Global Average Pooling - Score: 15 (R=8, N=7) - Date: 2025-02-06 - Comment: The paper introduces a novel method to enhance position embeddings in vision transformers. This aligns with architectural insights and analysis on transformers.

  140. On the Expressivity of Selective State-Space Layers: A Multivariate Polynomial Approach - Score: 15 (R=8, N=7) - Date: 2025-02-05 - Comment: The paper analyzes selective state-space layers, contributing theoretical insights into efficient sequence modeling, which has relevance to foundational architectural research.

  141. Fast Vision Mamba: Pooling Spatial Dimensions for Accelerated Processing - Score: 15 (R=8, N=7) - Date: 2025-02-04 - Comment: Introduces architectural enhancements for State Space Models in vision tasks, which aligns with the architectural innovation criterion. The technique improves efficiency but does not significantly advance foundational model theory.

  142. The role of positional encodings in the ARC benchmark - Score: 15 (R=8, N=7) - Date: 2025-02-04 - Comment: Analyzes the role of positional encodings in transformer-based tasks. Directly related to foundational aspects of model architectures and improvements in encoder-decoder setups.

  143. Understanding Oversmoothing in GNNs as Consensus in Opinion Dynamics - Score: 15 (R=8, N=7) - Date: 2025-02-04 - Comment: Connects GNN oversmoothing with opinion dynamics and proposes a new model that addresses oversmoothing issues. Relevant for understanding and innovating architectures.

  144. Designing a Conditional Prior Distribution for Flow-Based Generative Models - Score: 15 (R=7, N=8) - Date: 2025-02-14 - Comment: The paper proposes a novel approach to designing conditional prior distributions for flow-based generative models, which is relevant to architectural innovations in generative modeling.

  145. Mol-LLM: Generalist Molecular LLM with Improved Graph Utilization - Score: 15 (R=7, N=8) - Date: 2025-02-06 - Comment: Develops a molecular LLM with a unique multimodal training approach that improves structure understanding. This offers theoretical insights into model architecture for molecular tasks.

  146. FragmentNet: Adaptive Graph Fragmentation for Graph-to-Sequence Molecular Representation Learning - Score: 15 (R=7, N=8) - Date: 2025-02-04 - Comment: Explores fragmentation in graph molecular representation learning with architectural innovations like VQVAE-GCN and transformers. Moderately related to innovative architectures.

  147. Understanding the Uncertainty of LLM Explanations: A Perspective Based on Reasoning Topology - Score: 14 (R=7, N=7) - Date: 2025-02-25 - Comment: The paper introduces a novel framework for quantifying uncertainty in LLM explanations using a reasoning topology perspective. While it provides insights into LLM behavior, it focuses on interpretability and reasoning rather than foundational breakthroughs in LLM training or architecture.

  148. Revealing and Mitigating Over-Attention in Knowledge Editing - Score: 14 (R=7, N=7) - Date: 2025-02-21 - Comment: The paper addresses over-attention in knowledge editing for LLMs, which aligns with the Large Language Models criterion due to its focus on improving model behavior and interpretability.

  149. RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision - Score: 14 (R=7, N=7) - Date: 2025-02-20 - Comment: The paper introduces a novel framework for retrieval-augmented generation (RAG) and proposes a new agent architecture, which is relevant to model architecture innovations but leans towards application-driven improvements.

  150. LLM4GNAS: A Large Language Model Based Toolkit for Graph Neural Architecture Search - Score: 14 (R=7, N=7) - Date: 2025-02-18 - Comment: The paper introduces a toolkit for Graph Neural Architecture Search (GNAS) using LLMs, which aligns with 'Model Architecture' through its focus on automating GNN design.

  151. Amortized In-Context Bayesian Posterior Estimation - Score: 14 (R=7, N=7) - Date: 2025-02-11 - Comment: The paper explores amortized Bayesian posterior estimation using transformers and permutation-invariant architectures, which touches on representation learning and architectural analysis.

  152. GNNs Getting ComFy: Community and Feature Similarity Guided Rewiring - Score: 14 (R=7, N=7) - Date: 2025-02-10 - Comment: The paper proposes new strategies for graph optimization and rewiring to enhance GNNs, especially by increasing alignment between label and community structures. It contributes to graph-related architectural refinements, which are relevant to model architecture analysis.

  153. Transformers and Their Roles as Time Series Foundation Models - Score: 14 (R=7, N=7) - Date: 2025-02-06 - Comment: The analysis on Transformers as time series foundation models touches on theoretical aspects of autoregressive properties and generalization bounds. Though tied to time series, it offers insights into Transformer architecture and training dynamics.

  154. Beyond Topological Self-Explainable GNNs: A Formal Explainability Perspective - Score: 14 (R=7, N=7) - Date: 2025-02-06 - Comment: Analyzes explainable GNNs with formal perspectives and proposes Dual-Channel GNNs, introducing a thoughtful architectural modification. It weakly aligns with the broader interest in architectural innovations.

  155. LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models - Score: 14 (R=7, N=7) - Date: 2025-02-05 - Comment: LV-XAttn introduces cross-attention mechanisms for LLMs to handle long visual inputs efficiently, focusing on distributed attention and model architecture.

  156. Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment - Score: 14 (R=7, N=7) - Date: 2025-02-03 - Comment: Proposes enhancements to speculative decoding via LLM-based judging, underscoring efficiency in autoregressive generation. Ties to LLM behavior but lacks foundational contributions to architecture or theory.

  157. NeoBERT: A Next-Generation BERT - Score: 13 (R=7, N=6) - Date: 2025-02-28 - Comment: NeoBERT introduces architectural advancements for bidirectional models, focusing on pretraining and fine-tuning improvements. While it is relevant to model architecture, it lacks groundbreaking insights into foundational architectural innovations.

  158. A recurrent vision transformer shows signatures of primate visual attention - Score: 13 (R=7, N=6) - Date: 2025-02-18 - Comment: The paper discusses an architectural innovation with a recurrent mechanism in vision transformers, which aligns with the 'Model Architecture' criterion.

  159. Understanding LLMs' Fluid Intelligence Deficiency: An Analysis of the ARC Task - Score: 13 (R=7, N=6) - Date: 2025-02-13 - Comment: The paper analyzes LLMs' deficiencies in fluid intelligence, which provides theoretical insights into LLM behavior. However, it focuses on a specific task (ARC) and does not propose architectural or training innovations.

  160. Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities - Score: 13 (R=7, N=6) - Date: 2025-02-11 - Comment: This paper evaluates LLM capabilities using model tampering attacks, providing insights into robustness and unlearning methods. While it touches on LLM behavior, the focus is on evaluation techniques rather than foundational breakthroughs in LLM training or architecture.

  161. Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization - Score: 13 (R=7, N=6) - Date: 2025-02-10 - Comment: The paper studies uncertainty-based routing in SLMs versus LLMs, highlighting architectural efficiency for on-device setups, which aligns partially with compression/dynamic routing interests.

  162. Investigating the Robustness of Deductive Reasoning with Large Language Models - Score: 13 (R=7, N=6) - Date: 2025-02-10 - Comment: This examines the robustness of deductive reasoning in LLMs, focusing on logical deduction tasks and perturbation robustness. It offers empirical insights but lacks fundamental architectural or theoretical advancements.

  163. Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration - Score: 13 (R=7, N=6) - Date: 2025-02-05 - Comment: Addresses object hallucination in vision-language models using attention calibration techniques relevant to the interpretability of LLMs but does not explore foundational architectural transformations.

  164. Distributionally Robust Direct Preference Optimization - Score: 13 (R=7, N=6) - Date: 2025-02-05 - Comment: Addresses alignment of LLMs with human preferences under distribution shifts, which is tangentially related but not specifically foundational to LLM architectural or theoretical breakthroughs.

Model Compression and Efficiency (260)

  1. Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models - Score: 20.0 (R=0, N=0) - Date: 2025-02-12 - Comment: Author match

  2. A physics-based data-driven model for CO$_2$ gas diffusion electrodes to drive automated laboratories - Score: 20.0 (R=0, N=0) - Date: 2025-02-11 - Comment: Author match

  3. Compression Scaling Laws:Unifying Sparsity and Quantization - Score: 19 (R=10, N=9) - Date: 2025-02-25 - Comment: The paper investigates compression scaling laws, unifying sparsity and quantization under a common framework, which directly aligns with model compression and provides theoretical insights.

  4. PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models - Score: 19 (R=10, N=9) - Date: 2025-02-20 - Comment: The paper introduces a novel post-training quantization method for LLMs, achieving extremely low-bit quantization (1.61-bit) with innovative preprocessing and optimization techniques. This directly aligns with the 'Model Compression' criterion, particularly in quantization.

  5. NestQuant: Nested Lattice Quantization for Matrix Products and LLMs - Score: 19 (R=10, N=9) - Date: 2025-02-20 - Comment: The paper introduces a novel quantization scheme (NestQuant) for LLMs, achieving state-of-the-art results in low-bit quantization. This directly aligns with the 'Model Compression' criterion, particularly in quantization.

  6. HACK: Homomorphic Acceleration via Compression of the Key-Value Cache for Disaggregated LLM Inference - Score: 19 (R=10, N=9) - Date: 2025-02-07 - Comment: HACK introduces a compression framework for KV cache in disaggregated LLM inference, directly tackling model compression and efficiency-related challenges in LLM architecture.

  7. TeZO: Empowering the Low-Rankness on the Temporal Dimension in the Zeroth-Order Optimization for Fine-tuning LLMs - Score: 19 (R=10, N=9) - Date: 2025-02-03 - Comment: The paper explores a novel method for memory-efficient fine-tuning of LLMs by leveraging low-rankness across temporal dimensions and employing Canonical Polyadic Decomposition (CPD). This is closely tied to the 'Model Compression' criteria.

  8. Delta Decompression for MoE-based LLMs Compression - Score: 18 (R=10, N=8) - Date: 2025-02-25 - Comment: The paper focuses on a novel compression method for MoE-based LLMs, aligning with the 'Model Compression' criterion. It introduces delta decompression and low-rank SVD techniques, which are foundational contributions.

  9. DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance - Score: 18 (R=10, N=8) - Date: 2025-02-25 - Comment: The paper proposes a novel KV cache compression method for LLMs, which directly aligns with model compression and efficiency breakthroughs.

  10. Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression - Score: 18 (R=10, N=8) - Date: 2025-02-25 - Comment: The paper introduces a framework for joint structured pruning and quantization, which aligns with foundational research in model compression.

  11. BaKlaVa -- Budgeted Allocation of KV cache for Long-context Inference - Score: 18 (R=10, N=8) - Date: 2025-02-20 - Comment: The paper introduces BaKlaVa, a method for optimizing KV-cache memory allocation in LLMs, which directly addresses model compression and efficiency in LLM inference.

  12. RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models - Score: 18 (R=10, N=8) - Date: 2025-02-14 - Comment: The paper proposes a quantization-aware fine-tuning approach for LLMs, which is highly relevant to model compression and efficiency.

  13. AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference - Score: 18 (R=10, N=8) - Date: 2025-02-07 - Comment: AttentionPredictor offers a learning-based method for KV cache compression by predicting attention scores, advancing efficiency techniques for LLMs. This is relevant to model compression and efficiency breakthroughs.

  14. Choose Your Model Size: Any Compression by a Single Gradient Descent - Score: 18 (R=10, N=8) - Date: 2025-02-05 - Comment: ACIP provides a novel, singular gradient descent approach to model compression utilizing sparsity and low-rank techniques, which directly matches the model compression criterion.

  15. AdaSVD: Adaptive Singular Value Decomposition for Large Language Models - Score: 18 (R=10, N=8) - Date: 2025-02-04 - Comment: The paper focuses on model compression techniques for LLMs using adaptive SVD, aligning closely with the relevance criteria of sparsity, low-rank approaches, and theoretical efficiency breakthroughs.

  16. MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization - Score: 18 (R=10, N=8) - Date: 2025-02-04 - Comment: Proposes a quantization framework (MQuant) for multimodal LLMs, addressing efficiency challenges. This directly aligns with model compression and quantization strategies.

  17. Low-Rank Adapting Models for Sparse Autoencoders - Score: 18 (R=10, N=8) - Date: 2025-02-03 - Comment: This paper attempts to improve sparse autoencoders by combining low-rank adaptation with interpretability-driven design, making it directly relevant to topics like sparsity, low-rank techniques, and sparse autoencoders.

  18. Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models - Score: 18 (R=10, N=8) - Date: 2025-02-03 - Comment: This paper addresses KV cache compression in LLMs, explicitly aligning with the model compression criterion. The introduction of AQUA-KV for adaptive quantization is relevant and demonstrates novel efficiency improvements.

  19. Norm-Bounded Low-Rank Adaptation - Score: 18 (R=10, N=8) - Date: 2025-02-03 - Comment: Proposes NB-LoRA for parameter-efficient fine-tuning with norm bounds on adaptation matrices. This paper aligns closely with model compression, particularly low-rank adaptation techniques, making it highly relevant.

  20. Compression Barriers for Autoregressive Transformers - Score: 18 (R=9, N=9) - Date: 2025-02-25 - Comment: The paper provides theoretical insights into the compression barriers for autoregressive Transformers, directly addressing model compression and efficiency.

  21. A General Error-Theoretical Analysis Framework for Constructing Compression Strategies - Score: 18 (R=9, N=9) - Date: 2025-02-25 - Comment: The paper introduces a theoretical framework for constructing compression strategies, which aligns with foundational research in model compression and efficiency.

  22. Bitnet.cpp: Efficient Edge Inference for Ternary LLMs - Score: 18 (R=9, N=9) - Date: 2025-02-18 - Comment: Introduces groundbreaking system Bitnet.cpp enabling efficient inference for ternary LLMs, directly relevant to model compression and efficiency topics.

  23. On the Emergence of Thinking in LLMs I: Searching for the Right Intuition - Score: 18 (R=9, N=9) - Date: 2025-02-11 - Comment: The paper explores a novel RL-based framework for reasoning in LLMs, which aligns with theoretical insights into LLM behavior and introduces emergent reasoning capabilities.

  24. Algorithmic causal structure emerging through compression - Score: 18 (R=9, N=9) - Date: 2025-02-07 - Comment: The paper links causality and compression through algorithmic complexity, which relates to compression and theoretical insights into causality in AI. It introduces novel perspectives and foundational insights.

  25. Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing Techniques - Score: 18 (R=9, N=9) - Date: 2025-02-05 - Comment: Proposes sparse graph processing techniques to increase Transformer context length, aligning well with efficiency breakthroughs and sparsity advancements in transformers.

  26. Pushing the Limits of BFP on Narrow Precision LLM Inference - Score: 18 (R=9, N=9) - Date: 2025-02-04 - Comment: The paper proposes hardware-efficient optimizations using a BFP framework for LLMs, providing novel insights into compression techniques.

  27. Brain-inspired sparse training enables Transformers and LLMs to perform as fully connected - Score: 18 (R=9, N=9) - Date: 2025-02-03 - Comment: The use of brain-inspired sparse training and dynamic sparse connectivity in transformer-based models is directly relevant to sparsity and model compression, fitting well with foundational contributions in efficiency and architecture.

  28. Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: The paper proposes Layer-Aware Task Arithmetic (LATA) to disentangle task-specific and instruction-following knowledge in LLMs, which aligns with foundational insights into LLM behavior.

  29. HALO: Hardware-aware quantization with low critical-path-delay weights for LLM acceleration - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: The paper introduces a hardware-aware quantization framework (HALO) for LLM acceleration, aligning with the model compression criterion.

  30. LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: The paper introduces a low-rank gradient optimization method (LORENZA) and provides theoretical insights into its efficiency for LLMs, aligning with the model compression criterion.

  31. Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper introduces Jacobian Sparse Autoencoders (JSAEs) to sparsify computations in LLMs, which aligns with foundational research in representation learning and sparsity. It also provides efficient methods for computing Jacobians.

  32. SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper introduces SpargeAttn, a universal sparse attention mechanism, which aligns with foundational research in model compression and sparse attention techniques.

  33. Optimal Brain Apoptosis - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper introduces a novel pruning method, Optimal Brain Apoptosis (OBA), which advances parameter importance estimation using the Hessian matrix. This aligns closely with the model compression criterion, particularly in pruning and efficiency improvements.

  34. C-LoRA: Continual Low-Rank Adaptation for Pre-trained Models - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper proposes C-LoRA, a novel extension of Low-Rank Adaptation (LoRA) for continual learning, which aligns with the model compression and efficiency criteria. The use of a learnable routing matrix for task adaptation is a significant methodological contribution.

  35. PICASO: Permutation-Invariant Context Composition with State Space Models - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper introduces a novel method for permutation-invariant context composition using state space models, which aligns with foundational research in efficient context representation for LLMs.

  36. The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve? - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper discusses the Lottery LLM Hypothesis, which is highly relevant to model compression and foundational insights into LLM capabilities.

  37. CoKV: Optimizing KV Cache Allocation via Cooperative Game - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper proposes CoKV, a novel method for optimizing KV cache allocation in LLMs using cooperative game theory. This directly addresses efficiency and memory challenges in LLMs, aligning with the model compression criterion.

  38. LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper introduces a memory-efficient draft model with a constant-sized KV cache and novel attention methods, which aligns with the model compression and efficiency criteria.

  39. Low-rank bias, weight decay, and model merging in neural networks - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper explores low-rank structures in neural networks induced by weight decay, which aligns with foundational research in model compression and efficiency.

  40. Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper focuses on 4-bit training stability and introduces Stable-SPAM, which aligns with model compression and efficiency breakthroughs.

  41. When Can We Solve the Weighted Low Rank Approximation Problem in Truly Subquadratic Time? - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper focuses on weighted low-rank approximation, which aligns with the model compression topic, particularly low-rank approaches.

  42. DISC: Dynamic Decomposition Improves LLM Inference Scaling - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper introduces a novel method for dynamic decomposition in LLM inference, which aligns with foundational research in efficiency and scaling techniques for LLMs.

  43. Exact Recovery of Sparse Binary Vectors from Generalized Linear Measurements - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper addresses sparse binary vector recovery with theoretical guarantees, which aligns with the sparsity and efficiency criteria in model compression.

  44. Signal Collapse in One-Shot Pruning: When Sparse Models Fail to Distinguish Neural Representations - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper identifies signal collapse in one-shot pruning and proposes a novel method to address it, aligning with model compression and sparsity topics.

  45. Rotate, Clip, and Partition: Towards W2A4KV4 Quantization by Integrating Rotation and Learnable Non-uniform Quantizer - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper introduces RCP, a QAT approach for extreme compression of LLMs, including W2A4KV4 quantization. This aligns with the Model Compression criterion, particularly in advancing quantization techniques.

  46. Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper proposes Cache-Craft, a system for managing and reusing KV caches in RAG-based systems, which aligns with the model compression criterion by addressing efficiency and computational redundancy.

  47. Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing - Score: 17 (R=9, N=8) - Date: 2025-02-24 - Comment: The paper introduces a novel dynamic pruning framework for LLMs, which aligns with model compression and efficiency breakthroughs.

  48. LightThinker: Thinking Step-by-Step Compression - Score: 17 (R=9, N=8) - Date: 2025-02-24 - Comment: The paper introduces LightThinker, a method for compressing intermediate reasoning steps in LLMs, aligning with model compression and efficiency breakthroughs. The approach is novel in its dynamic compression inspired by human cognition.

  49. SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention - Score: 17 (R=9, N=8) - Date: 2025-02-24 - Comment: This paper presents SVDq, a novel mixed-precision quantization method for KV cache compression in LLMs, achieving significant compression ratios with theoretical and empirical validation. It aligns closely with the model compression criterion.

  50. Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference - Score: 17 (R=9, N=8) - Date: 2025-02-24 - Comment: The paper proposes a novel round-level attention mechanism to reduce KV cache memory usage in LLMs, aligning with the 'Model Compression' criterion by addressing efficiency in inference.

  51. More for Keys, Less for Values: Adaptive KV Cache Quantization - Score: 17 (R=9, N=8) - Date: 2025-02-24 - Comment: The paper proposes KV-AdaQuant, a mixed-precision quantization framework for KV cache in LLMs, with theoretical insights into quantization error propagation. It aligns well with the model compression criterion.

  52. LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: Proposes a unified sparse attention framework for efficient LLM serving, addressing both computational and memory efficiency. This aligns well with model compression and sparsity criteria.

  53. Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The paper introduces Multi-head Latent Attention (MLA) and proposes a novel fine-tuning method for transitioning from MHA to MLA, which aligns with the Model Compression criterion due to its focus on KV cache compression and efficiency.

  54. Dynamic Low-Rank Sparse Adaptation for Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: Presents a novel method for integrating low-rank adaptation with sparsity in LLMs, addressing efficiency and performance degradation. This aligns closely with model compression and sparsity criteria.

  55. Determining Layer-wise Sparsity for Large Language Models Through a Theoretical Perspective - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: This paper addresses layer-wise sparsity in LLMs, providing a theoretical perspective and a novel sparsity allocation method. It directly aligns with model compression and efficiency breakthroughs.

  56. PEARL: Towards Permutation-Resilient LLMs - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The paper introduces PEARL, a novel framework for improving LLM robustness to input permutations using distributionally robust optimization. This aligns with foundational research in LLM behavior and training dynamics.

  57. Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The paper presents a recurrent language model architecture optimized for efficiency, which aligns with the Model Architecture and Model Compression criteria.

  58. Towards Efficient Automatic Self-Pruning of Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The paper introduces an automatic self-pruning framework for LLMs, which aligns closely with the 'Model Compression' criterion, particularly in pruning and efficiency improvements.

  59. Weighted Low-rank Approximation via Stochastic Gradient Descent on Manifolds - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The paper addresses weighted low-rank approximation using stochastic gradient descent on manifolds, which is relevant to model compression and efficiency.

  60. RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The paper introduces a two-stage KV cache compression strategy for LLMs, which is highly relevant to model compression and efficiency improvements in large language models.

  61. MaskPrune: Mask-based LLM Pruning for Layer-wise Uniform Structures - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The MaskPrune method introduces a novel structured pruning approach for LLMs, focusing on uniformity across layers, which is highly relevant to model compression.

  62. NVR: Vector Runahead on NPUs for Sparse Memory Access - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: NVR addresses cache misses in sparse DNN workloads with a novel prefetching mechanism, aligning with the model compression criterion through its focus on sparsity and hardware efficiency.

  63. On the Duality between Gradient Transformations and Adapters - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper explores the duality between gradient transformations and adapters, providing insights into memory-efficient training. This aligns with the 'Model Compression' criterion, particularly in efficiency improvements.

  64. ETS: Efficient Tree Search for Inference-Time Scaling - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper introduces Efficient Tree Search (ETS), which optimizes KV cache sharing during inference-time scaling, aligning with the model compression criterion through its focus on memory efficiency and algorithmic improvements.

  65. LSR-Adapt: Ultra-Efficient Parameter Tuning with Matrix Low Separation Rank Kernel Adaptation - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper proposes a novel low-separation-rank kernel for parameter-efficient fine-tuning, which aligns with the model compression criterion and introduces a structural innovation.

  66. Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper focuses on improving inference efficiency for long-context LLMs by introducing a novel activation-aware approach for key-value retrieval. This aligns with the 'Model Compression' criterion, specifically in the context of KV cache optimization.

  67. Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper introduces LoRAM, a memory-efficient LoRA training scheme for LLMs, which aligns with the model compression criterion and offers a novel approach to efficiency.

  68. Electron flow matching for generative reaction mechanism prediction obeying conservation laws - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper introduces FlowER, a generative framework for reaction mechanism prediction that enforces conservation laws, aligning with AI for Science by addressing foundational challenges in chemical modeling.

  69. GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper proposes a fully quantized training framework for LLM fine-tuning, which aligns with model compression and efficiency topics. It introduces a novel integer-based approach for on-device fine-tuning.

  70. HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The HEADINFER method introduces a memory-efficient inference strategy for LLMs by offloading KV cache, which aligns with the model compression and efficiency criterion.

  71. Sens-Merging: Sensitivity-Guided Parameter Balancing for Merging Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper introduces a sensitivity-guided method for merging LLMs, which aligns with foundational research in LLM architecture and efficiency improvements.

  72. QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper introduces a novel quantized zeroth-order fine-tuning framework for LLMs, which aligns with the model compression criterion, specifically addressing low-precision training and optimization.

  73. Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper proposes a novel method for mitigating interference in LLM merging, which aligns with foundational research in model compression and efficiency.

  74. Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper introduces Tactic, a sparse attention mechanism for long-context LLMs, which aligns with the model compression criterion by addressing efficiency in attention mechanisms.

  75. GoRA: Gradient-driven Adaptive Low Rank Adaptation - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper proposes GoRA, a novel gradient-driven adaptive low-rank adaptation method, which directly aligns with model compression and efficiency topics, particularly low-rank approaches.

  76. AdaSplash: Adaptive Sparse Flash Attention - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: AdaSplash improves sparse attention mechanisms, directly impacting Transformer efficiency and aligning well with topics like sparsity and low-rank adaptations.

  77. An Efficient Row-Based Sparse Fine-Tuning - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper proposes a sparse fine-tuning framework based on pruning, which is highly relevant to model compression and efficiency research.

  78. Efficient Long-Decoding Inference with Reasoning-Aware Attention Sparsity - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper proposes a reasoning-aware attention sparsity method for efficient long-decoding inference, which is highly relevant to foundational research in LLM efficiency and sparsity.

  79. CacheFocus: Dynamic Cache Re-Positioning for Efficient Retrieval-Augmented Generation - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper introduces a novel cache management approach, addressing foundational challenges in model efficiency for long-context LLMs.

  80. Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper discusses a hardware-aligned sparse attention mechanism, relevant to 'Model Compression' due to its sparsity and efficiency focus.

  81. CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: Introduces a low-rank activation mechanism to pre-train LLMs more efficiently, aligning with the model compression and foundational enhancements for training efficiency.

  82. Weighted quantization using MMD: From mean field to mean shift via gradient flows - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper introduces a novel quantization method using MMD and gradient flows, which aligns with the model compression criterion. The proposed MSIP algorithm and its theoretical grounding add significant novelty.

  83. Dynamic Chain-of-Thought: Towards Adaptive Deep Reasoning - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper proposes a dynamic chain-of-thought reasoning framework, which aligns with foundational research in adaptive reasoning and efficiency in LLMs.

  84. QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: This paper introduces a novel framework for efficient KV cache optimization in LLMs, which is relevant to 'Model Compression'.

  85. Scalable First-order Method for Certifying Optimal k-Sparse GLMs - Score: 17 (R=9, N=8) - Date: 2025-02-14 - Comment: The paper proposes a scalable first-order method for certifying optimality in sparse GLMs, which directly relates to the model compression criterion through its focus on sparsity and efficient optimization techniques.

  86. LoRA Training Provably Converges to a Low-Rank Global Minimum or It Fails Loudly (But it Probably Won't Fail) - Score: 17 (R=9, N=8) - Date: 2025-02-14 - Comment: The paper provides a theoretical analysis of LoRA training dynamics, which aligns with the model compression criterion, specifically low-rank approaches. It offers foundational insights into why LoRA training converges effectively.

  87. InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU - Score: 17 (R=9, N=8) - Date: 2025-02-14 - Comment: The paper introduces a novel framework for handling extremely long context lengths in LLMs, addressing efficiency and memory challenges. This aligns with the 'Large Language Models' and 'Model Compression' criteria.

  88. Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation - Score: 17 (R=9, N=8) - Date: 2025-02-14 - Comment: The paper proposes a memory-efficient pruning strategy (Skrr) for text encoders in text-to-image diffusion models, which aligns with model compression and sparsity techniques.

  89. Scalable Thermodynamic Second-order Optimization - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper proposes a scalable second-order optimization method leveraging thermodynamic computers, which aligns with model efficiency and optimization breakthroughs.

  90. Training-Free Restoration of Pruned Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper proposes a training-free method for restoring pruned neural networks, which aligns with foundational research on model compression and sparsity.

  91. LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper introduces LowRA, a framework for ultra-low-bit LoRA fine-tuning of LLMs, which aligns with the model compression criterion, specifically quantization and efficiency breakthroughs.

  92. Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper introduces efficient optimizers for LLMs using structured Fisher approximation with a low-rank extension. This aligns with foundational research in model efficiency and optimization, making it highly relevant.

  93. LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper provides insights into how LLMs learn reasoning structures, emphasizing the importance of structure over content in Chain-of-Thought reasoning. This aligns with foundational research on LLM behavior and training dynamics.

  94. Online Scheduling for LLM Inference with KV Cache Constraints - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper addresses KV cache constraints in LLM inference, which is directly relevant to model compression and efficiency. The theoretical scheduling algorithms and empirical results add novelty.

  95. Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper introduces a novel dataset pruning method based on difficulty and uncertainty, aligning with the model compression criterion.

  96. Exploring Model Invariance with Discrete Search for Ultra-Low-Bit Quantization - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper proposes a framework for ultra-low-bit quantization, which aligns with model compression and efficiency improvements. The discrete search algorithm for permutation invariance is a novel contribution.

  97. LoCA: Location-Aware Cosine Adaptation for Parameter-Efficient Fine-Tuning - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper introduces a novel frequency-domain parameter-efficient fine-tuning method (LoCA) that builds on low-rank adaptation (LoRA). This aligns with the 'Model Compression' criterion, particularly in low-rank approaches and efficiency breakthroughs.

  98. HRP: High-Rank Preheating for Superior LoRA Initialization - Score: 17 (R=9, N=8) - Date: 2025-02-12 - Comment: The paper introduces a novel initialization method for LoRA, which directly contributes to low-rank adaptation and aligns with model compression and efficiency breakthroughs.

  99. Private Low-Rank Approximation for Covariance Matrices, Dyson Brownian Motion, and Eigenvalue-Gap Bounds for Gaussian Perturbations - Score: 17 (R=9, N=8) - Date: 2025-02-12 - Comment: The paper introduces a novel approach to low-rank approximation with differential privacy, leveraging Dyson Brownian motion. This aligns with the model compression topic, particularly low-rank approaches, and provides theoretical insights.

  100. A Memory Efficient Randomized Subspace Optimization Method for Training Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-12 - Comment: The paper introduces a randomized subspace optimization method for training LLMs, addressing memory efficiency challenges. This aligns with model compression and efficiency criteria and provides strong theoretical contributions.

  101. Model Fusion via Neuron Transplantation - Score: 17 (R=9, N=8) - Date: 2025-02-12 - Comment: The paper introduces a novel model fusion technique called Neuron Transplantation, which aligns with model compression and efficiency breakthroughs by reducing memory and inference costs.

  102. Matryoshka Quantization - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper introduces Matryoshka Quantization, a novel multi-scale quantization technique, which aligns with the 'Model Compression' criterion due to its focus on quantization and efficiency improvements.

  103. Calibrating LLMs with Information-Theoretic Evidential Deep Learning - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper discusses a novel method (IB-EDL) for calibrating LLMs using an information bottleneck, which aligns with the 'Large Language Models' criterion by providing theoretical insights into improving LLM trustworthiness and uncertainty estimation.

  104. APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper proposes Adaptive Parallel Encoding (APE) for efficient context-augmented generation, which is relevant to model compression and efficiency improvements in LLMs.

  105. QuEST: Stable Training of LLMs with 1-Bit Weights and Activations - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: This paper introduces QuEST, which explores cutting-edge quantization-aware training and demonstrates stable performance with weights and activations in 1-bit. This directly aligns with the criterion on model compression breakthroughs.

  106. No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: The isotropic model merging framework introduces innovative techniques for task-specific model integration, offering novel insights into representation alignment and efficiency in merged models.

  107. Tighter sparse variational Gaussian processes - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: Introduces a tighter sparse variational Gaussian process, relevant for sparsity and representation learning. Strong theoretical contribution in GP optimization.

  108. KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: Presents KV cache quantization for LLM inference, directly aligning with model compression and efficiency while offering insights into layer-wise sensitivity and optimization.

  109. Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: Relevant to Large Language Models and sparsity. Discusses novel techniques for memory-efficient model merging using sparseness in experts, addressing efficiency and storage concerns.

  110. Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: Introduces a backtracking method for LLM reasoning improvement, falling squarely within insights into reasoning processes and mechanisms, specifically for LLMs.

  111. Probe-Free Low-Rank Activation Intervention - Score: 17 (R=9, N=8) - Date: 2025-02-07 - Comment: Proposes a probe-free low-rank activation intervention for inference-time steering of LLMs, which aligns with criterion 4 as it introduces innovations in LLM interpretability leveraging low-rank techniques.

  112. Advancing Weight and Channel Sparsification with Enhanced Saliency - Score: 17 (R=9, N=8) - Date: 2025-02-07 - Comment: Proposes a novel dynamic sparse training paradigm enhancing saliency-based sparsification strategies. Highly relevant to model compression (criterion 3), specifically with advancements in pruning and sparsity.

  113. Bilevel ZOFO: Bridging Parameter-Efficient and Zeroth-Order Techniques for Efficient LLM Fine-Tuning and Meta-Training - Score: 17 (R=9, N=8) - Date: 2025-02-07 - Comment: Introduces a bilevel optimization framework combining parameter-efficient tuning with zeroth-order methods, aligning with model compression (criterion 3) and efficient methods for fine-tuning LLMs (criterion 4).

  114. An Augmented Backward-Corrected Projector Splitting Integrator for Dynamical Low-Rank Training - Score: 17 (R=9, N=8) - Date: 2025-02-06 - Comment: This paper proposes a novel low-rank training method, combining theoretical robustness with efficiency. It aligns well with the model compression and low-rank criteria.

  115. Leveraging the true depth of LLMs - Score: 17 (R=9, N=8) - Date: 2025-02-06 - Comment: Focuses on the architectural efficiency improvement of LLMs by decoupling layers for parallel evaluation, aligning closely with the 'Model Compression' topic and offering insights into computational optimization without retraining.

  116. Theoretical Guarantees for Low-Rank Compression of Deep Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-02-06 - Comment: The paper provides theoretical insights into low-rank compression, aligning well with the model compression criterion by focusing on foundational recovery guarantees.

  117. ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization - Score: 17 (R=9, N=8) - Date: 2025-02-06 - Comment: The ParetoQ framework addresses low-bit quantization, which is directly relevant to model compression. It provides new insights into scaling laws and transitions in quantized representations, showing clear foundational value.

  118. EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization - Score: 17 (R=9, N=8) - Date: 2025-02-05 - Comment: The paper proposes EasySpec, which includes innovations in speculative decoding and optimizes multi-GPU utilization through layer-parallelism and KV cache calibration. This aligns with the topic of model compression and efficiency breakthroughs.

  119. When Dimensionality Hurts: The Role of LLM Embedding Compression for Noisy Regression Tasks - Score: 17 (R=9, N=8) - Date: 2025-02-05 - Comment: Explores embedding compression in LLMs via autoencoders, addressing sparsity and efficiency in noisy tasks, which ties into representation learning and compression strategies.

  120. Reasoning Bias of Next Token Prediction Training - Score: 17 (R=9, N=8) - Date: 2025-02-05 - Comment: This study explores the reasoning biases in next-token prediction training and contrasts it with other methodologies, providing insights into LLM training strategies.

  121. CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing - Score: 17 (R=9, N=8) - Date: 2025-02-05 - Comment: Introduces a token-level collaborative inference framework for LLMs aiming to optimize inference efficiency, aligning well with the Model Compression criteria.

  122. A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: The paper introduces a novel inference-time scaling method using particle-based Monte Carlo techniques for LLMs, offering possible breakthroughs in efficiency and robustness, relevant to inference optimization.

  123. One-step full gradient suffices for low-rank fine-tuning, provably and efficiently - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: This paper investigates low-rank fine-tuning and provides both theoretical and empirical insights into improving LoRA using spectral initialization. This is directly relevant to compression/efficiency breakthroughs.

  124. Nearly Lossless Adaptive Bit Switching - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: Proposes nearly lossless bit-switching quantization and addresses inter-precision interference with theoretical contributions, relevant to model compression.

  125. RandLoRA: Full-rank parameter-efficient fine-tuning of large models - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: RandLoRA proposes significant advancements in parameter-efficient methods by addressing the limitations of low-rank adaptations in fine-tuning using full-rank optimization. Relevant to compression and efficiency topics like low-rank approaches.

  126. Symmetric Pruning of Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: Addresses theoretical insights into pruning methods for Large Language Models (LLMs), directly matching the model compression topic and offering improvements to existing techniques.

  127. Memory-Efficient Fine-Tuning of Transformers via Token Selection - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: Introduces TokenTune for memory-efficient fine-tuning of transformer models using token selection, which aligns with model compression and efficiency breakthroughs.

  128. Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-03 - Comment: The paper introduces Pivoting Factorization, which directly applies to low-rank compression in large language models, making it highly relevant to model compression and efficiency techniques.

  129. The Gradient of Algebraic Model Counting - Score: 17 (R=8, N=9) - Date: 2025-02-26 - Comment: The paper introduces a novel loss function derived from the Fokker-Planck equation, bridging dynamics with density estimation. This aligns with emerging trends in foundational research.

  130. Near-Optimal Approximations for Bayesian Inference in Function Space - Score: 17 (R=8, N=9) - Date: 2025-02-26 - Comment: The paper proposes a scalable Bayesian inference algorithm in function space, which is a foundational contribution to probabilistic modeling and aligns with emerging trends in theoretical research.

  131. Learning Is a Kan Extension - Score: 17 (R=8, N=9) - Date: 2025-02-20 - Comment: The paper provides a theoretical foundation by linking error minimization algorithms to Kan extensions, which is a novel and cutting-edge theoretical contribution relevant to emerging trends.

  132. Computational-Statistical Tradeoffs at the Next-Token Prediction Barrier: Autoregressive and Imitation Learning under Misspecification - Score: 17 (R=8, N=9) - Date: 2025-02-19 - Comment: The paper provides theoretical insights into error amplification in next-token prediction and its computational-statistical tradeoffs, aligning with emerging trends in foundational research.

  133. Global Universal Scaling and Ultra-Small Parameterization in Machine Learning Interatomic Potentials with Super-Linearity - Score: 17 (R=8, N=9) - Date: 2025-02-12 - Comment: The paper introduces a physics-informed MLIP model with ultra-small parameterization and scalability, aligning with foundational research in efficiency and sparsity. It incorporates physical constraints, which is a novel approach.

  134. Variational decision diagrams for quantum-inspired machine learning applications - Score: 17 (R=8, N=9) - Date: 2025-02-07 - Comment: The introduction of Variational Decision Diagrams (VDDs) as a graph-based structure for quantum-inspired machine learning is highly relevant to emerging trends in foundational machine learning techniques. This paper offers novel insights into decision diagram-based ansatz alternatives.

  135. A Periodic Bayesian Flow for Material Generation - Score: 17 (R=8, N=9) - Date: 2025-02-05 - Comment: Introduces a periodic Bayesian flow for generative modeling of crystal structures, incorporating foundational elements of generative paradigms in material science.

  136. Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective - Score: 17 (R=8, N=9) - Date: 2025-02-04 - Comment: Breakthrough use of second-order optimization in physics-informed neural networks aligns with tackling gradient alignment and efficiency challenges, making it relevant to foundational optimization issues.

  137. On Pruning State-Space LLMs - Score: 16 (R=9, N=7) - Date: 2025-02-27 - Comment: The paper explores pruning methods for state-space models (SSMs) as an alternative to transformer-based LLMs, aligning with the 'Model Compression' criterion. It provides insights into pruning techniques and their effects on SSMs, which is relevant to foundational research in efficiency.

  138. Systematic Weight Evaluation for Pruning Large Language Models: Enhancing Performance and Sustainability - Score: 16 (R=9, N=7) - Date: 2025-02-25 - Comment: The paper proposes a systematic weight evaluation for pruning LLMs, aligning with 'Model Compression'. It emphasizes sustainability and efficiency, which are relevant contributions.

  139. Sparsity May Be All You Need: Sparse Random Parameter Adaptation - Score: 16 (R=9, N=7) - Date: 2025-02-25 - Comment: The paper proposes a sparse parameter adaptation method for LLM fine-tuning, which aligns with sparsity and efficiency in model compression.

  140. Pruning as a Defense: Reducing Memorization in Large Language Models - Score: 16 (R=9, N=7) - Date: 2025-02-25 - Comment: The paper investigates pruning as a method to reduce memorization in LLMs, which aligns with the model compression and sparsity criteria.

  141. Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis - Score: 16 (R=9, N=7) - Date: 2025-02-20 - Comment: The paper benchmarks post-training quantization (PTQ) for LLMs, providing a comprehensive taxonomy and evaluation. This aligns with the 'Model Compression' criterion, offering insights into quantization strategies.

  142. Keep what you need : extracting efficient subnetworks from large audio representation models - Score: 16 (R=9, N=7) - Date: 2025-02-19 - Comment: The paper introduces a method for extracting efficient subnetworks from large audio models using sparsity-inducing losses. This aligns with model compression topics like pruning and sparsity.

  143. LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging - Score: 16 (R=9, N=7) - Date: 2025-02-18 - Comment: Introduces low-rank estimation techniques for model merging, advancing low-rank methodologies closely tied to compression and scaling topics.

  144. On multi-token prediction for efficient LLM inference - Score: 16 (R=9, N=7) - Date: 2025-02-14 - Comment: The paper investigates multi-token prediction for efficient LLM inference, which aligns with foundational research in model efficiency and training dynamics.

  145. MPIC: Position-Independent Multimodal Context Caching System for Efficient MLLM Serving - Score: 16 (R=9, N=7) - Date: 2025-02-05 - Comment: The paper introduces a position-independent caching system for Multimodal Large Language Model (MLLM) inference, specifically addressing efficiency in KV cache management. This aligns well with the model compression and efficiency domain.

  146. Can LLMs Maintain Fundamental Abilities under KV Cache Compression? - Score: 16 (R=9, N=7) - Date: 2025-02-05 - Comment: This directly explores the effects of KV cache compression on LLM capabilities, aligning with the topic of model compression and efficiency research.

  147. Efficient Reasoning with Hidden Thinking - Score: 16 (R=9, N=7) - Date: 2025-02-04 - Comment: The Heima framework introduces hidden latent reasoning representations for CoT reasoning in LLMs, aligning closely with efficiency breakthroughs in reasoning methods.

  148. Scalable Signature Kernel Computations for Long Time Series via Local Neumann Series Expansions - Score: 16 (R=8, N=8) - Date: 2025-02-28 - Comment: The paper introduces a novel method for scalable signature kernel computations, which aligns with foundational research in efficiency and algorithmic breakthroughs.

  149. Extremely Greedy Equivalence Search - Score: 16 (R=8, N=8) - Date: 2025-02-28 - Comment: The paper proposes an improvement to the Greedy Equivalence Search algorithm, which aligns with foundational research in model efficiency and algorithmic innovation.

  150. Optimal Approximate Matrix Multiplication over Sliding Windows - Score: 16 (R=8, N=8) - Date: 2025-02-27 - Comment: Presents a novel algorithm for approximate matrix multiplication in sliding windows with theoretical guarantees, aligning with 'Model Compression' and efficiency breakthroughs.

  151. Optimal Stochastic Trace Estimation in Generative Modeling - Score: 16 (R=8, N=8) - Date: 2025-02-27 - Comment: The paper proposes an improved stochastic trace estimator for generative modeling, which aligns with foundational research in efficiency and optimization methods.

  152. Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning - Score: 16 (R=8, N=8) - Date: 2025-02-26 - Comment: The paper explores the optimal scaling of test-time compute for LLM reasoning, introducing a Thinking-Optimal Scaling strategy. This aligns with foundational research on LLM behavior and efficiency.

  153. LeanKAN: A Parameter-Lean Kolmogorov-Arnold Network Layer with Improved Memory Efficiency and Convergence Behavior - Score: 16 (R=8, N=8) - Date: 2025-02-26 - Comment: The paper introduces LeanKAN, a parameter-efficient alternative to Kolmogorov-Arnold networks, with improved memory efficiency and convergence. It aligns with foundational research in model architecture and efficiency.

  154. Knowledge Distillation with Training Wheels - Score: 16 (R=8, N=8) - Date: 2025-02-26 - Comment: The paper proposes a novel framework for knowledge distillation with test-time teacher assistance, which aligns with foundational research in model compression and training dynamics.

  155. A Fokker-Planck-Based Loss Function that Bridges Dynamics with Density Estimation - Score: 16 (R=8, N=8) - Date: 2025-02-26 - Comment: The paper introduces a scalable graph condensation method with evolving capabilities, which aligns with foundational research in model compression and efficiency.

  156. When to Forget? Complexity Trade-offs in Machine Unlearning - Score: 16 (R=8, N=8) - Date: 2025-02-25 - Comment: The paper analyzes machine unlearning and provides theoretical bounds on unlearning complexity. This aligns with the Emerging Trends criterion, as it challenges established assumptions in model retraining.

  157. CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter - Score: 16 (R=8, N=8) - Date: 2025-02-25 - Comment: The paper introduces CORAL, a framework for speculative decoding in LLMs, addressing training-inference misalignment and efficiency. This aligns with the Model Compression criterion, particularly in improving inference efficiency.

  158. Dynamic Parallel Tree Search for Efficient LLM Reasoning - Score: 16 (R=8, N=8) - Date: 2025-02-25 - Comment: The paper introduces a novel parallelism framework for efficient LLM reasoning, which aligns with foundational research in efficiency and LLM behavior.

  159. Since Faithfulness Fails: The Performance Limits of Neural Causal Discovery - Score: 16 (R=8, N=8) - Date: 2025-02-25 - Comment: The paper critiques neural causal discovery methods, identifying fundamental limitations, which aligns with emerging trends in challenging established assumptions.

  160. Confidence Estimation via Sequential Likelihood Mixing - Score: 16 (R=8, N=8) - Date: 2025-02-21 - Comment: The paper provides a framework for constructing confidence sets with theoretical insights, which aligns with foundational research in emerging trends.

  161. FlexTok: Resampling Images into 1D Token Sequences of Flexible Length - Score: 16 (R=8, N=8) - Date: 2025-02-20 - Comment: FlexTok introduces a novel approach to image tokenization with variable-length sequences, which aligns with model architecture innovations through its hierarchical and semantic compression.

  162. Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning - Score: 16 (R=8, N=8) - Date: 2025-02-20 - Comment: The paper combines LLMs and symbolic reasoning for mathematical theorem proving, which aligns with foundational research in LLM reasoning capabilities.

  163. Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models - Score: 16 (R=8, N=8) - Date: 2025-02-20 - Comment: The paper proposes a method to improve Chain-of-Thought reasoning efficiency in LLMs, which aligns with foundational research in LLM behavior and training dynamics.

  164. Scalable Model Merging with Progressive Layer-wise Distillation - Score: 16 (R=8, N=8) - Date: 2025-02-19 - Comment: The paper proposes a novel layer-wise distillation method for scalable model merging, which is relevant to model compression and efficiency. The theoretical insights into data-agnostic algorithms add to its novelty.

  165. SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs - Score: 16 (R=8, N=8) - Date: 2025-02-19 - Comment: The paper focuses on accelerating LLM token generation on CPUs using sparsity and AMX, which aligns with model compression and efficiency topics. The use of unstructured sparsity in attention computation is novel.

  166. Efficient Neural SDE Training using Wiener-Space Cubature - Score: 16 (R=8, N=8) - Date: 2025-02-19 - Comment: The paper introduces a novel training technique for neural SDEs using Wiener-space cubature, which is relevant to efficiency improvements and foundational methods.

  167. On the Query Complexity of Verifier-Assisted Language Generation - Score: 16 (R=8, N=8) - Date: 2025-02-18 - Comment: This paper explores verifier-assisted constrained generation, offering novel mathematical insights and advancing theoretical understanding of inference-time algorithms. Relevant to foundational LLM research.

  168. APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs - Score: 16 (R=8, N=8) - Date: 2025-02-18 - Comment: The paper introduces APB, a framework for accelerating long-context inference in LLMs, which is relevant to model efficiency and compression with significant speedup contributions.

  169. Low-Rank Thinning - Score: 16 (R=8, N=8) - Date: 2025-02-18 - Comment: The paper introduces a low-rank analysis for sub-Gaussian thinning, which has implications for model compression and efficiency, particularly in attention mechanisms.

  170. MaZO: Masked Zeroth-Order Optimization for Multi-Task Fine-Tuning of Large Language Models - Score: 16 (R=8, N=8) - Date: 2025-02-18 - Comment: The paper introduces MaZO, a novel framework for multi-task fine-tuning of LLMs using zeroth-order optimization, which aligns with the 'Model Compression' criterion due to its focus on memory-efficient optimization and parameter-level innovations.

  171. Learning the Exact Time Integration Algorithm for Initial Value Problems by Randomized Neural Networks - Score: 16 (R=8, N=8) - Date: 2025-02-18 - Comment: The paper introduces a method for learning time integration algorithms using randomized neural networks, which is foundational in terms of algorithmic innovation and efficiency.

  172. STAR: Spectral Truncation and Rescale for Model Merging - Score: 16 (R=8, N=8) - Date: 2025-02-17 - Comment: The paper introduces STAR, a method for model merging that addresses merging conflicts through spectral truncation and rescaling. This aligns with model efficiency and compression criteria.

  173. Data-Adaptive Low-Rank Sparse Subspace Clustering - Score: 16 (R=8, N=8) - Date: 2025-02-17 - Comment: The paper proposes a data-adaptive low-rank sparse subspace clustering algorithm, which aligns with foundational research in representation learning and sparsity.

  174. What is a Sketch-and-Precondition Derivation for Low-Rank Approximation? Inverse Power Error or Inverse Power Estimation? - Score: 16 (R=8, N=8) - Date: 2025-02-13 - Comment: The paper introduces a novel sketch-and-precondition framework for low-rank approximation, which aligns with the criterion of foundational research in model compression and efficiency.

  175. Low Tensor-Rank Adaptation of Kolmogorov--Arnold Networks - Score: 16 (R=8, N=8) - Date: 2025-02-11 - Comment: The paper proposes low tensor-rank adaptation for Kolmogorov--Arnold networks, which aligns with low-rank approaches in model compression and foundational architecture innovations.

  176. Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning - Score: 16 (R=8, N=8) - Date: 2025-02-06 - Comment: Explores zeroth-order optimization to improve memory efficiency for LLM fine-tuning, contributing key insights to model compression and efficiency. The novel layer-wise divergence-driven adaptation adds a theoretical layer to existing fine-tuning approaches.

  177. eagle: early approximated gradient based learning rate estimator - Score: 16 (R=8, N=8) - Date: 2025-02-04 - Comment: The EAGLE optimizer introduces a novel optimization method featuring adaptive switching, relevant to training dynamics and efficiency.

  178. Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning - Score: 16 (R=8, N=8) - Date: 2025-02-04 - Comment: The paper addresses reasoning performance along theoretical lines with new error decomposition techniques and methodology related to LLMs, aligning with the LLM behavior/interpretability criterion.

  179. Demystifying MPNNs: Message Passing as Merely Efficient Matrix Multiplication - Score: 16 (R=8, N=8) - Date: 2025-02-04 - Comment: Provides theoretical analysis relating Message Passing Neural Networks to efficient matrix multiplication, contributing to foundational understanding of graph neural networks.

  180. Redefining Machine Unlearning: A Conformal Prediction-Motivated Approach - Score: 16 (R=8, N=8) - Date: 2025-02-04 - Comment: The paper redefines machine unlearning metrics with a novel conformal prediction approach and proposes an improved unlearning framework, contributing significantly to model compression and interpretability.

  181. Recommendations from Sparse Comparison Data: Provably Fast Convergence for Nonconvex Matrix Factorization - Score: 15 (R=8, N=7) - Date: 2025-02-28 - Comment: The paper provides theoretical analysis for nonconvex matrix factorization in sparse data regimes, which is relevant to foundational research in representation learning and efficiency.

  182. Mixtraining: A Better Trade-Off Between Compute and Performance - Score: 15 (R=8, N=7) - Date: 2025-02-28 - Comment: The paper proposes a novel training framework combining SSL and SL, which aligns with foundational research in training dynamics and efficiency.

  183. END: Early Noise Dropping for Efficient and Effective Context Denoising - Score: 15 (R=8, N=7) - Date: 2025-02-27 - Comment: The paper introduces Early Noise Dropping (END), which provides insights into how LLMs process noisy contexts and improves efficiency. This aligns with foundational research on LLM behavior and interpretability.

  184. FCoT-VL:Advancing Text-oriented Large Vision-Language Models with Efficient Visual Token Compression - Score: 15 (R=8, N=7) - Date: 2025-02-27 - Comment: The paper proposes a token compression framework for vision-language models, which aligns with model compression and efficiency improvements.

  185. A General Framework to Enhance Fine-tuning-based LLM Unlearning - Score: 15 (R=8, N=7) - Date: 2025-02-26 - Comment: The paper proposes a framework for enhancing fine-tuning-based LLM unlearning, which aligns with foundational research in LLM training and optimization.

  186. Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation - Score: 15 (R=8, N=7) - Date: 2025-02-25 - Comment: The paper introduces a low-rank and sparse model merging technique for multi-lingual speech tasks, which aligns with sparsity and efficiency in model compression.

  187. Verification of Bit-Flip Attacks against Quantized Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-02-25 - Comment: The paper introduces a verification framework for bit-flip attacks on quantized neural networks, which aligns with model compression and theoretical insights into quantization.

  188. Verifying Quantized Graph Neural Networks is PSPACE-complete - Score: 15 (R=8, N=7) - Date: 2025-02-25 - Comment: The paper investigates the verification of quantized GNNs, which aligns with model compression and theoretical insights into quantization.

  189. R$^3$Mem: Bridging Memory Retention and Retrieval via Reversible Compression - Score: 15 (R=8, N=7) - Date: 2025-02-25 - Comment: The paper introduces a memory network for LLMs with reversible compression, aligning with 'Model Compression' and 'Large Language Models' criteria.

  190. FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference - Score: 15 (R=8, N=7) - Date: 2025-02-25 - Comment: The paper focuses on KV cache compression and introduces FairKV, which addresses load imbalance in multi-GPU inference. This aligns with the Model Compression criterion, particularly in the context of efficiency improvements.

  191. Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-02-25 - Comment: The paper evaluates quantization methods for LLMs with a focus on safety and reliability, which aligns with the model compression and efficiency criteria.

  192. The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer - Score: 15 (R=8, N=7) - Date: 2025-02-24 - Comment: The paper analyzes reasoning efficiency in LLMs, providing insights into reasoning length and performance, which aligns with theoretical insights into LLM behavior.

  193. PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric Pruning - Score: 15 (R=8, N=7) - Date: 2025-02-24 - Comment: The paper introduces a pruning-based approach to mitigate knowledge conflicts in knowledge-augmented generation, which aligns with model compression and efficiency improvements.

  194. FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling - Score: 15 (R=8, N=7) - Date: 2025-02-21 - Comment: The paper proposes a speculative sampling framework for LLMs, focusing on efficiency improvements, which aligns with the Model Compression criterion.

  195. Data-Efficient Pretraining with Group-Level Data Influence Modeling - Score: 15 (R=8, N=7) - Date: 2025-02-21 - Comment: The paper introduces a novel data-efficient pretraining method, which aligns with foundational research in representation learning and data utility modeling.

  196. PLPHP: Per-Layer Per-Head Vision Token Pruning for Efficient Large Vision-Language Models - Score: 15 (R=8, N=7) - Date: 2025-02-21 - Comment: The paper proposes a fine-grained token pruning method for large vision-language models, which aligns with model compression and efficiency improvements.

  197. HPS: Hard Preference Sampling for Human Preference Alignment - Score: 15 (R=8, N=7) - Date: 2025-02-21 - Comment: The HPS framework for human preference alignment in LLMs introduces a novel training loss and sampling strategy, relevant to LLM alignment and optimization.

  198. Revisiting Privacy, Utility, and Efficiency Trade-offs when Fine-Tuning Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-02-20 - Comment: The paper revisits trade-offs in privacy, utility, and efficiency during LLM fine-tuning, providing insights into efficient fine-tuning methods like LoRA. This aligns with model compression and efficiency.

  199. GPU Memory Usage Optimization for Backward Propagation in Deep Network Training - Score: 15 (R=8, N=7) - Date: 2025-02-19 - Comment: The paper focuses on memory optimization during backward propagation, which aligns with model compression and efficiency topics. The dynamic programming algorithm for checkpoint selection is a novel contribution.

  200. Efficient and Effective Prompt Tuning via Prompt Decomposition and Compressed Outer Product - Score: 15 (R=8, N=7) - Date: 2025-02-19 - Comment: The paper proposes a novel prompt tuning method using prompt decomposition and compressed outer product, which aligns with model compression and efficiency topics. It introduces a new approach to reduce memory usage and computational costs.

  201. Minimal Ranks, Maximum Confidence: Parameter-efficient Uncertainty Quantification for LoRA - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: The paper proposes a Bayesian variant of LoRA for uncertainty quantification, which aligns with foundational research in model compression and parameter-efficient methods.

  202. Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: TATA framework for teaching LLMs adaptive reasoning strategies is relevant to foundational research on LLM behavior and training improvements, particularly its aptitude-aware data selection component.

  203. Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models? - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: This paper focuses on continual quantization-aware pretraining with foundational implications for model compression, aligning well with efficiency improvements in large-scale models.

  204. Towards Reasoning Ability of Small Language Models - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: The paper systematically studies reasoning abilities in small language models, which is relevant to foundational research in LLM behavior and interpretability.

  205. Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: The paper addresses uncertainty-aware search in LLMs, which is relevant to foundational research in LLM behavior and inference optimization.

  206. Diversified Sampling Improves Scaling LLM inference - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: The paper introduces a novel sampling technique to improve LLM inference by enhancing diversity, which aligns with foundational research in LLM efficiency and inference optimization.

  207. GRIFFIN: Effective Token Alignment for Faster Speculative Decoding - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: The paper proposes a novel token alignment strategy for speculative decoding in LLMs, which is relevant to model efficiency and compression. The improvements in decoding speed and alignment are notable.

  208. Towards Watermarking of Open-Source LLMs - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: Explores watermarking for open-source LLMs with considerations of durability and robustness, contributing to foundational security in LLM frameworks.

  209. Enhancing Multilingual LLM Pretraining with Model-Based Data Selection - Score: 15 (R=8, N=7) - Date: 2025-02-17 - Comment: The paper discusses model-based data selection for multilingual LLM pretraining, which aligns with foundational research in LLM efficiency and dataset curation.

  210. Conditional Latent Coding with Learnable Synthesized Reference for Deep Image Compression - Score: 15 (R=8, N=7) - Date: 2025-02-17 - Comment: The paper introduces a conditional latent coding method for image compression, which aligns with model compression criteria through its focus on efficient coding and dictionary-based synthesis.

  211. Cost-Saving LLM Cascades with Early Abstention - Score: 15 (R=8, N=7) - Date: 2025-02-14 - Comment: The paper explores cost-saving strategies for LLM cascades with early abstention, which provides insights into efficiency and decision-making in LLMs, aligning with model compression and efficiency.

  212. Loss Landscape Analysis for Reliable Quantized ML Models for Scientific Sensing - Score: 15 (R=8, N=7) - Date: 2025-02-13 - Comment: The paper explores loss landscape analysis in the context of quantized ML models, which aligns with model compression and robustness. The focus on quantization and robustness trade-offs is relevant to foundational research in efficiency.

  213. Column-wise Quantization of Weights and Partial Sums for Accurate and Efficient Compute-In-Memory Accelerators - Score: 15 (R=8, N=7) - Date: 2025-02-13 - Comment: The paper introduces a column-wise quantization method for compute-in-memory accelerators, which aligns with the model compression criterion, particularly in quantization and efficiency improvements.

  214. No Data, No Optimization: A Lightweight Method To Disrupt Neural Networks With Sign-Flips - Score: 15 (R=8, N=7) - Date: 2025-02-13 - Comment: The paper introduces a lightweight method to disrupt neural networks via sign-flips, which aligns with the model compression criterion, particularly in sparsity and robustness.

  215. Gradient Based Method for the Fusion of Lattice Quantizers - Score: 15 (R=8, N=7) - Date: 2025-02-13 - Comment: The paper proposes novel algorithms for lattice quantization, which aligns with model compression and efficiency topics. The focus on gradient-based methods and high-dimensional settings adds theoretical depth.

  216. Quantification of model error for inverse problems in the Weak Neural Variational Inference framework - Score: 15 (R=8, N=7) - Date: 2025-02-12 - Comment: The paper extends the Weak Neural Variational Inference framework to quantify model errors in PDE-based inverse problems, which aligns with foundational research in AI for Science.

  217. XAMBA: Enabling Efficient State Space Models on Resource-Constrained Neural Processing Units - Score: 15 (R=8, N=7) - Date: 2025-02-12 - Comment: The paper introduces XAMBA, a framework for optimizing state-space models on NPUs, which aligns with foundational research in model efficiency and compression.

  218. When, Where and Why to Average Weights? - Score: 15 (R=8, N=7) - Date: 2025-02-11 - Comment: The paper evaluates weight averaging techniques, which is relevant to training dynamics and efficiency improvements in neural networks.

  219. Spectral-factorized Positive-definite Curvature Learning for NN Training - Score: 15 (R=8, N=7) - Date: 2025-02-11 - Comment: The paper introduces a novel Riemannian optimization approach for curvature learning in neural network training, which aligns with foundational research in training dynamics and efficiency improvements.

  220. Compressing Model with Few Class-Imbalance Samples: An Out-of-Distribution Expedition - Score: 15 (R=8, N=7) - Date: 2025-02-11 - Comment: The paper proposes a framework for few-sample model compression with class imbalance, which aligns with model compression and sparsity topics.

  221. Speeding up Speculative Decoding via Approximate Verification - Score: 15 (R=8, N=7) - Date: 2025-02-10 - Comment: SPRINTER improves speculative decoding efficiency in LLMs, potentially aligning with model compression and efficiency gains in LLMs.

  222. Efficient Few-Shot Continual Learning in Vision-Language Models - Score: 15 (R=8, N=7) - Date: 2025-02-07 - Comment: Proposes a low-rank adaptation method for continual learning in vision-language models, offering resource-efficient structural updates. Relevant to model compression (criterion 3) and structured sparsity innovations.

  223. TQ-DiT: Efficient Time-Aware Quantization for Diffusion Transformers - Score: 15 (R=8, N=7) - Date: 2025-02-07 - Comment: Proposes quantization strategies (MRQ and TGQ) tailored for Diffusion Transformers, fitting well under model compression (criterion 3) with a focus on low-bit quantization innovations.

  224. Adaptive Semantic Prompt Caching with VectorQ - Score: 15 (R=8, N=7) - Date: 2025-02-07 - Comment: The paper introduces VectorQ, a framework for adaptive semantic prompt caching which focuses on improving inference efficiency for LLMs. This relates to the model compression criterion, particularly in dealing with KV cache and efficiency concerns.

  225. Efficient Image Restoration via Latent Consistency Flow Matching - Score: 15 (R=8, N=7) - Date: 2025-02-07 - Comment: Presents an efficient latent-space image restoration model emphasizing computational reduction, relevant to model compression and efficiency research.

  226. Activation-Informed Merging of Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-02-05 - Comment: AIM introduces an activation-informed merging strategy for LLMs and incorporates principles from model compression, aligning well with efficiency and foundational innovation criteria.

  227. VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation - Score: 15 (R=8, N=7) - Date: 2025-02-05 - Comment: Proposes KV-cache optimizations for vision-language-action models in robotic manipulation, relevant to model compression and efficiency.

  228. Compact Rule-Based Classifier Learning via Gradient Descent - Score: 15 (R=8, N=7) - Date: 2025-02-04 - Comment: The proposed compact rule-based classifier is relevant for model efficiency and aligns with topics like sparsity and gradient-based optimization innovations.

  229. Trading Inference-Time Compute for Adversarial Robustness - Score: 15 (R=8, N=7) - Date: 2025-02-04 - Comment: The paper explores compute scaling for adversarial robustness, contributing to inference-time optimization and resilience, with implications for LLM efficiency.

  230. Beyond Worst-Case Dimensionality Reduction for Sparse Vectors - Score: 15 (R=7, N=8) - Date: 2025-02-28 - Comment: The paper provides theoretical insights into dimensionality reduction for sparse vectors, which is relevant to representation learning and sparsity but focuses on a specific mathematical framework.

  231. Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD Variants - Score: 15 (R=7, N=8) - Date: 2025-02-05 - Comment: The paper establishes connections between advanced optimizers and SGD variants, which contributes novel theoretical insights into training dynamics.

  232. Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation - Score: 15 (R=7, N=8) - Date: 2025-02-05 - Comment: Provides theoretical benefits of reasoning paradigms in LLMs, examining metastable dynamics of CoT reasoning, relevant to understanding LLM inference processes.

  233. Process Reinforcement through Implicit Rewards - Score: 15 (R=7, N=8) - Date: 2025-02-04 - Comment: Focuses on dense process reinforcement in LLMs during reasoning tasks and introduces implicit reward techniques. Novel contribution to LLM optimization might be of interest.

  234. Refining Adaptive Zeroth-Order Optimization at Ease - Score: 15 (R=7, N=8) - Date: 2025-02-04 - Comment: This paper introduces a novel zeroth-order optimization method with theoretical variance-aware convergence analysis, which provides insights into efficiency optimization relevant to compression and training dynamics.

  235. A single-loop SPIDER-type stochastic subgradient method for expectation-constrained nonconvex nonsmooth optimization - Score: 15 (R=7, N=8) - Date: 2025-02-04 - Comment: The proposed stochastic subgradient method for constrained optimization involves novel penalty models, contributing to optimization efficiency and theoretical advances.

  236. Binary Neural Networks for Large Language Model: A Survey - Score: 14 (R=8, N=6) - Date: 2025-02-27 - Comment: The paper surveys binary quantization techniques for LLMs, aligning with the 'Model Compression' criterion. It provides a comprehensive review of binary quantization methods, which is relevant to efficiency improvements.

  237. Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones? - Score: 14 (R=7, N=7) - Date: 2025-02-28 - Comment: The paper proposes a novel distillation pipeline for LLMs, focusing on reward learning and reinforcement learning, which partially aligns with foundational research in LLM behavior.

  238. Set and functional prediction: randomness, exchangeability, and conformal - Score: 14 (R=7, N=7) - Date: 2025-02-27 - Comment: The paper explores conformal prediction and its efficiency, which could have implications for foundational research in prediction and uncertainty quantification.

  239. Model-Free Adversarial Purification via Coarse-To-Fine Tensor Network Representation - Score: 14 (R=7, N=7) - Date: 2025-02-26 - Comment: The paper proposes a novel tensor network decomposition for adversarial purification, which aligns with foundational research in model robustness and efficiency.

  240. Scaling LLM Pre-training with Vocabulary Curriculum - Score: 14 (R=7, N=7) - Date: 2025-02-26 - Comment: The paper proposes a vocabulary curriculum for LLM pretraining, which offers insights into training efficiency and tokenization but does not introduce a major architectural or theoretical breakthrough.

  241. An Improved Privacy and Utility Analysis of Differentially Private SGD with Bounded Domain and Smooth Losses - Score: 14 (R=7, N=7) - Date: 2025-02-26 - Comment: The paper provides an improved privacy and utility analysis for DPSGD, which is relevant to foundational research in model training dynamics and efficiency.

  242. Learning Backbones: Sparsifying Graphs through Zero Forcing for Effective Graph-Based Learning - Score: 14 (R=7, N=7) - Date: 2025-02-26 - Comment: The paper introduces a novel graph sparsification framework using zero-forcing, which is relevant to model compression and sparsity.

  243. General Uncertainty Estimation with Delta Variances - Score: 14 (R=7, N=7) - Date: 2025-02-21 - Comment: The Delta Variances method for epistemic uncertainty quantification provides a unified perspective on related methods, relevant to model interpretability and efficiency.

  244. Temporal Misalignment and Probabilistic Neurons - Score: 14 (R=7, N=7) - Date: 2025-02-21 - Comment: The paper discusses spiking neural networks and ANN-SNN conversion, focusing on energy efficiency and temporal dynamics, which aligns with Model Compression and emerging trends.

  245. Dynamic Activation with Knowledge Distillation for Energy-Efficient Spiking NN Ensembles - Score: 14 (R=7, N=7) - Date: 2025-02-21 - Comment: The paper introduces a novel energy-efficient spiking neural network ensemble, which partially aligns with 'Model Compression' through its focus on energy efficiency and dynamic activation.

  246. PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection - Score: 14 (R=7, N=7) - Date: 2025-02-18 - Comment: PRISM introduces a training-free method for data selection, leveraging intrinsic properties of MLLMs, making it somewhat relevant in the context of model efficiency and data pruning.

  247. CoT-Valve: Length-Compressible Chain-of-Thought Tuning - Score: 14 (R=7, N=7) - Date: 2025-02-14 - Comment: The paper introduces CoT-Valve, a method for dynamically controlling reasoning chain lengths in LLMs, which aligns with efficiency and interpretability in LLMs. However, it is more of an optimization strategy than a foundational breakthrough.

  248. EdgeEar: Efficient and Accurate Ear Recognition for Edge Devices - Score: 14 (R=7, N=7) - Date: 2025-02-13 - Comment: The paper introduces a lightweight model for ear recognition using low-rank approximations, which aligns with the criterion of model compression and efficiency.

  249. Learning Accurate, Efficient, and Interpretable MLPs on Multiplex Graphs via Node-wise Multi-View Ensemble Distillation - Score: 14 (R=7, N=7) - Date: 2025-02-11 - Comment: The paper introduces a knowledge distillation framework for multiplex graphs, which aligns with efficiency improvements and representation learning in graph-based models.

  250. Training Set Reconstruction from Differentially Private Forests: How Effective is DP? - Score: 14 (R=7, N=7) - Date: 2025-02-11 - Comment: The paper introduces a reconstruction attack on differentially private random forests, which aligns with the 'Model Compression' criterion by exploring the trade-offs between privacy guarantees and model utility.

  251. Flopping for FLOPs: Leveraging equivariance for computational efficiency - Score: 14 (R=7, N=7) - Date: 2025-02-10 - Comment: Explores network equivariance for computational efficiency, which aligns partially with model architecture innovations and computational efficiency, albeit with a narrow focus.

  252. Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation - Score: 14 (R=7, N=7) - Date: 2025-02-06 - Comment: The proposed SpecPrefill framework for LLM inference accelerates token generation with lightweight models, which could be influential for model efficiency but is task-focused without foundational breakthroughs.

  253. Transolver++: An Accurate Neural Solver for PDEs on Million-Scale Geometries - Score: 14 (R=7, N=7) - Date: 2025-02-05 - Comment: Transolver++ enhances PDE solving with scalable architectures, touching upon efficient model architecture and parallelism.

  254. Efficient rule induction by ignoring pointless rules - Score: 14 (R=7, N=7) - Date: 2025-02-04 - Comment: The paper introduces a new approach for inductive logic programming rule induction, which aligns well with model efficiency innovations but is primarily focused on specific applications.

  255. SCU: An Efficient Machine Unlearning Scheme for Deep Learning Enabled Semantic Communications - Score: 13 (R=7, N=6) - Date: 2025-02-28 - Comment: The paper proposes a machine unlearning scheme for semantic communications, focusing on mutual information minimization and contrastive compensation, which aligns partially with 'Model Compression' for efficiency-related innovations.

  256. SparseTransX: Efficient Training of Translation-Based Knowledge Graph Embeddings Using Sparse Matrix Operations - Score: 13 (R=7, N=6) - Date: 2025-02-25 - Comment: The paper focuses on efficient training of knowledge graph embeddings using sparse matrix operations, which is relevant to model compression and sparsity.

  257. When Compression Meets Model Compression: Memory-Efficient Double Compression for Large Language Models - Score: 13 (R=7, N=6) - Date: 2025-02-24 - Comment: The paper introduces a double compression framework for LLMs, combining quantization and pruning. While it addresses memory efficiency, the contributions appear incremental and lack significant theoretical breakthroughs.

  258. EvoP: Robust LLM Inference via Evolutionary Pruning - Score: 13 (R=7, N=6) - Date: 2025-02-24 - Comment: The paper introduces EvoP, an evolutionary pruning framework for LLMs. While it addresses pruning, the contributions are more focused on practical efficiency rather than foundational theoretical insights.

  259. HSI: A Holistic Style Injector for Arbitrary Style Transfer - Score: 13 (R=7, N=6) - Date: 2025-02-10 - Comment: This paper proposes a style transfer module focused on computational efficiency, which is relevant to compression and sparsity research due to attention-based mechanisms being optimized.

  260. Robust Federated Finetuning of LLMs via Alternating Optimization of LoRA - Score: 13 (R=7, N=6) - Date: 2025-02-05 - Comment: Focuses on LoRA, a parameter-efficient fine-tuning method, which aligns with model compression techniques like low-rank adaptation. However, it is applied in a federated learning setup, slightly diluting relevance.

High Performance Computing (40)

  1. Forgotten Polygons: Multimodal Large Language Models are Shape-Blind - Score: 20.0 (R=0, N=0) - Date: 2025-02-25 - Comment: Author match

  2. Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path? - Score: 20.0 (R=0, N=0) - Date: 2025-02-24 - Comment: Author match

  3. Intuitive physics understanding emerges from self-supervised pretraining on natural videos - Score: 20.0 (R=0, N=0) - Date: 2025-02-18 - Comment: Author match

  4. Shaping Inductive Bias in Diffusion Models through Frequency-Based Noise Control - Score: 20.0 (R=0, N=0) - Date: 2025-02-17 - Comment: Author match

  5. Monte Carlo Tree Diffusion for System 2 Planning - Score: 20.0 (R=0, N=0) - Date: 2025-02-13 - Comment: Author match

  6. Toward Neurosymbolic Program Comprehension - Score: 18 (R=9, N=9) - Date: 2025-02-05 - Comment: The paper advocates for neurosymbolic research blending DL and symbolic methods, introducing an emerging trend challenging the parameter-heavy model paradigm.

  7. Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: The paper introduces COMET, a fine-grained communication-computation overlapping system for MoE, which aligns with architectural efficiency improvements in MoE systems.

  8. Global law of conjugate kernel random matrices with heavy-tailed weights - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper studies the spectral behavior of kernel random matrices with heavy-tailed weights, which provides theoretical insights into neural network training dynamics and aligns with foundational research.

  9. AMPO: Active Multi-Preference Optimization - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper proposes a novel multi-preference optimization framework for LLM alignment, which aligns with foundational research in LLM training and optimization techniques.

  10. Distributional Scaling Laws for Emergent Capabilities - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper explores emergent capabilities in LLMs and provides theoretical insights into scaling laws and random seed effects, aligning with the 'Large Language Models' criterion.

  11. Fundamental Limitations in Defending LLM Finetuning APIs - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The paper discusses fundamental limitations in defending LLM fine-tuning APIs, providing theoretical insights into LLM security and robustness, which aligns with foundational research in LLM behavior.

  12. The Self-Improvement Paradox: Can Language Models Bootstrap Reasoning Capabilities without External Scaffolding? - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper explores self-improvement in LLMs, focusing on generating synthetic data autonomously, which aligns with foundational research in LLM behavior and training dynamics.

  13. Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper proposes a learning-based system for parallel decoding in LLMs, which aligns with foundational research in efficiency and decoding innovations.

  14. Large Language-Geometry Model: When LLM meets Equivariance - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper proposes a novel framework integrating E(3)-equivariance with LLM capabilities for handling 3D physical systems. It introduces architectural innovations aligning with foundational AI for Science.

  15. Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper explores emergent value systems in LLMs and proposes a new research agenda called utility engineering. This aligns with the 'Large Language Models (LLMs)' criterion, focusing on theoretical insights into LLM behavior and interpretability.

  16. On Mechanistic Circuits for Extractive Question-Answering - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper explores mechanistic circuits in extractive QA tasks, providing insights into the interplay between parametric memory and retrieved context. It aligns with foundational research in understanding LLM behavior and interpretability.

  17. Harnessing Language's Fractal Geometry with Recursive Inference Scaling - Score: 17 (R=9, N=8) - Date: 2025-02-12 - Comment: The paper introduces Recursive Inference Scaling (RINS), which provides theoretical insights into scaling laws and inference methods for LLMs, aligning with foundational research in LLM behavior.

  18. Position: Episodic Memory is the Missing Piece for Long-Term LLM Agents - Score: 17 (R=9, N=8) - Date: 2025-02-12 - Comment: The paper discusses episodic memory for LLMs, which aligns with emerging trends and foundational research in LLM behavior and long-term memory integration.

  19. Systematic Outliers in Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper investigates systematic outliers in LLMs, providing theoretical insights into their formation and impact, which aligns with foundational research in LLM behavior and interpretability.

  20. Distinguishing Cause from Effect with Causal Velocity Models - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: The causal velocity model for bivariate SCMs offers a novel parametrization and theoretical insights, aligning with 'Emerging Trends' for causal modeling in foundational AI research.

  21. ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: Presents a framework for evaluating reasoning capabilities of LLMs under complexity scaling, providing theoretical insights into their limits, aligning with foundational LLM research.

  22. Asymptotics of Non-Convex Generalized Linear Models in High-Dimensions: A proof of the replica formula - Score: 17 (R=8, N=9) - Date: 2025-02-28 - Comment: The paper rigorously validates statistical physics predictions for non-convex GLMs, aligning with the emerging trends criterion.

  23. Ansatz-free Hamiltonian learning with Heisenberg-limited scaling - Score: 17 (R=8, N=9) - Date: 2025-02-18 - Comment: The study introduces Heisenberg-limited precision in Hamiltonian learning, which falls under 'Emerging Trends' for foundational quantum system insights.

  24. Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon - Score: 16 (R=9, N=7) - Date: 2025-02-13 - Comment: The paper critiques LLM evaluation methods and introduces a meta-evaluation framework to detect overfitting. This aligns with the criterion of theoretical insights into LLM behavior.

  25. Aligning Compound AI Systems via System-level DPO - Score: 16 (R=8, N=8) - Date: 2025-02-26 - Comment: The paper introduces a system-level alignment method for compound AI systems, which is relevant to emerging trends in foundational AI research.

  26. A Non-Asymptotic Theory of Seminorm Lyapunov Stability: From Deterministic to Stochastic Iterative Algorithms - Score: 16 (R=8, N=8) - Date: 2025-02-21 - Comment: The paper provides a theoretical framework for seminorm-contractive operators and iterative algorithms, which aligns with the Emerging Trends criterion due to its foundational theoretical contributions.

  27. Numerical Schemes for Signature Kernels - Score: 16 (R=8, N=8) - Date: 2025-02-13 - Comment: The paper introduces advanced numerical schemes for signature kernels, which are relevant to representation learning and efficiency. The theoretical convergence and GPU-parallelization aspects enhance its foundational contribution.

  28. Negative Dependence as a toolbox for machine learning : review and new developments - Score: 16 (R=8, N=8) - Date: 2025-02-12 - Comment: The paper reviews negative dependence as a machine learning methodology and explores its applications, including neural networks, which aligns with emerging trends in foundational research.

  29. Generating Symbolic World Models via Test-time Scaling of Large Language Models - Score: 16 (R=8, N=8) - Date: 2025-02-10 - Comment: The work addresses the use of PDDL for state transition modeling in LLMs, presenting a novel system for symbolic world modeling that links planning with model-generation tasks, aligning partially with large model behavior insights.

  30. WaferLLM: A Wafer-Scale LLM Inference System - Score: 16 (R=8, N=8) - Date: 2025-02-10 - Comment: The work proposes novel architectures and strategies for wafer-scale LLM inference, which relates closely to model efficiency and architecture-level innovations.

  31. Temperature-Annealed Boltzmann Generators - Score: 16 (R=8, N=8) - Date: 2025-02-03 - Comment: Proposes temperature-annealed Boltzmann generators for efficient sampling in molecular systems, demonstrating innovations in energy efficiency, which aligns with AI for science criteria focused on foundational molecular modeling.

  32. SkipPipe: Partial and Reordered Pipelining Framework for Training LLMs in Heterogeneous Networks - Score: 15 (R=8, N=7) - Date: 2025-02-28 - Comment: The paper proposes a novel pipeline training framework for LLMs, which aligns with foundational research in model efficiency and training dynamics.

  33. Self-rewarding correction for mathematical reasoning - Score: 15 (R=8, N=7) - Date: 2025-02-28 - Comment: The paper proposes a self-rewarding correction mechanism for mathematical reasoning in LLMs, which aligns with foundational insights into LLM behavior and self-correction mechanisms.

  34. From RAG to Memory: Non-Parametric Continual Learning for Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-02-21 - Comment: The paper introduces a novel framework for non-parametric continual learning in LLMs, which aligns with the 'Large Language Models' criterion, particularly in advancing memory and retrieval mechanisms.

  35. CER: Confidence Enhanced Reasoning in LLMs - Score: 15 (R=8, N=7) - Date: 2025-02-21 - Comment: The paper proposes a confidence-aware reasoning framework for LLMs, which aligns with foundational research in LLM behavior and reasoning.

  36. Evaluating the Paperclip Maximizer: Are RL-Based Language Models More Likely to Pursue Instrumental Goals? - Score: 15 (R=8, N=7) - Date: 2025-02-19 - Comment: The paper investigates instrumental convergence in RL-trained LLMs, which provides theoretical insights into LLM behavior and alignment challenges, aligning with foundational research in LLMs.

  37. Smoothing Out Hallucinations: Mitigating LLM Hallucination with Smoothed Knowledge Distillation - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: The paper addresses mitigating hallucination in LLMs via knowledge distillation, which connects to understanding and improving LLM behavior.

  38. Process Reward Models for LLM Agents: Practical Framework and Directions - Score: 15 (R=8, N=7) - Date: 2025-02-17 - Comment: The paper introduces a framework for training LLM agents with process reward models, which aligns with foundational research in LLM training and optimization.

  39. Blending Optimal Control and Biologically Plausible Learning for Noise-Robust Physical Neural Networks - Score: 15 (R=7, N=8) - Date: 2025-02-27 - Comment: This paper explores training methods for physical neural networks (PNNs) by blending optimal control and biologically plausible learning. It aligns with 'Emerging Trends' by proposing a novel training paradigm for neuromorphic systems.

  40. Jackpot! Alignment as a Maximal Lottery - Score: 14 (R=7, N=7) - Date: 2025-02-04 - Comment: Applies maximal lottery-based probabilistic social choice to LLM alignment, which is novel in exploring the intersection of RLHF and social choice theory.

Representation Learning (297)

  1. Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models - Score: 20.0 (R=0, N=0) - Date: 2025-02-21 - Comment: Author match

  2. In-Context Parametric Inference: Point or Distribution Estimators? - Score: 20.0 (R=0, N=0) - Date: 2025-02-18 - Comment: Author match

  3. Layer by Layer: Uncovering Hidden Representations in Language Models - Score: 20.0 (R=0, N=0) - Date: 2025-02-05 - Comment: Author match

  4. The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent - Score: 19 (R=10, N=9) - Date: 2025-02-20 - Comment: The paper provides theoretical insights into the computational advantages of depth in neural networks, aligning closely with representation learning and training dynamics.

  5. Consistency of augmentation graph and network approximability in contrastive learning - Score: 19 (R=10, N=9) - Date: 2025-02-07 - Comment: The work addresses contrastive learning by providing new theoretical insights into augmentation graph consistency and neural approximability, making it a significant foundational contribution to representation learning.

  6. Constrained belief updates explain geometric structures in transformer representations - Score: 19 (R=10, N=9) - Date: 2025-02-05 - Comment: Touches on representation learning by analyzing the geometric structures and constrained Bayesian belief updates in transformer representations, providing foundational insights into encoder-decoder mechanisms.

  7. Unveiling the Mechanisms of Explicit CoT Training: How Chain-of-Thought Enhances Reasoning Generalization - Score: 18 (R=10, N=8) - Date: 2025-02-10 - Comment: The paper investigates the mechanism of explicit Chain-of-Thought (CoT) training, which aligns with understanding LLM training dynamics and behaviors, directly addressing foundational insights for reasoning enhancement.

  8. Learning with Exact Invariances in Polynomial Time - Score: 18 (R=9, N=9) - Date: 2025-02-28 - Comment: The paper provides a polynomial-time algorithm for learning with exact invariances, which is a cutting-edge theoretical contribution relevant to representation learning.

  9. Do we really need the Rademacher complexities? - Score: 18 (R=9, N=9) - Date: 2025-02-24 - Comment: The paper challenges the reliance on Rademacher complexities for learning problems and introduces a novel universality result, which aligns with foundational research in representation learning.

  10. Approximating Latent Manifolds in Neural Networks via Vanishing Ideals - Score: 18 (R=9, N=9) - Date: 2025-02-24 - Comment: The paper connects manifold learning with computational algebra using vanishing ideals, proposing a novel architecture for latent manifold approximation. It aligns well with representation learning and architectural innovation.

  11. Breaking the bonds of generative artificial intelligence by minimizing the maximum entropy - Score: 18 (R=9, N=9) - Date: 2025-02-20 - Comment: This paper introduces a new paradigm for generative AI based on the minimal maximum entropy principle, which aligns with foundational research in representation learning and generative paradigms.

  12. Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity - Score: 18 (R=9, N=9) - Date: 2025-02-19 - Comment: The paper explores the limits of embedding space capacity, which is relevant to representation learning and compression. The focus on theoretical limits and optimization is highly novel.

  13. System Message Generation for User Preferences using Open-Source Models - Score: 18 (R=9, N=9) - Date: 2025-02-18 - Comment: The paper introduces Inverse Flow for generative models, which aligns with foundational research in representation learning and generative paradigms. The proposed methods (IFM and ICM) are novel and impactful.

  14. A Power Transform - Score: 18 (R=9, N=9) - Date: 2025-02-18 - Comment: The novel power transform framework connects across loss functions, activations, and kernels, offering a significant theoretical contribution to foundational methods like representation learning.

  15. Representation and Interpretation in Artificial and Natural Computing - Score: 18 (R=9, N=9) - Date: 2025-02-17 - Comment: The paper discusses representation and modes of computing, touching on theoretical aspects of computing beyond Turing Machines. It aligns with emerging trends and foundational research.

  16. A novel approach to data generation in generative model - Score: 18 (R=9, N=9) - Date: 2025-02-17 - Comment: The paper introduces the Convergent Fusion Paradigm (CFP) theory, which redefines data generation in generative models and offers a novel geometric framework, aligning with foundational research in representation learning and generative modeling.

  17. Solvable Dynamics of Self-Supervised Word Embeddings and the Emergence of Analogical Reasoning - Score: 18 (R=9, N=9) - Date: 2025-02-17 - Comment: The paper provides analytical solutions for self-supervised word embedding dynamics, offering foundational insights into representation learning and training dynamics.

  18. When do neural networks learn world models? - Score: 18 (R=9, N=9) - Date: 2025-02-14 - Comment: The paper provides theoretical insights into when neural networks learn world models, which aligns with representation learning and foundational research into training dynamics.

  19. From Kernels to Features: A Multi-Scale Adaptive Theory of Feature Learning - Score: 18 (R=9, N=9) - Date: 2025-02-06 - Comment: Presents a theoretical framework that bridges kernel-based and feature-adaptive learning, contributing to representation learning through a multi-scale theoretical approach. Highly relevant to model understanding and feature learning.

  20. Optimal Spectral Transitions in High-Dimensional Multi-Index Models - Score: 18 (R=9, N=9) - Date: 2025-02-05 - Comment: Introduces spectral methods for a theoretical problem rooted in high-dimensional reconstruction, closely aligning with Representation Learning and fundamental computational limits.

  21. How Memory in Optimization Algorithms Implicitly Modifies the Loss - Score: 18 (R=9, N=9) - Date: 2025-02-05 - Comment: This work analyzes how memory in optimization algorithms implicitly modifies the loss landscape, providing new insights into optimization dynamics, which aligns strongly with representation learning and training dynamics.

  22. Neural Collapse Beyond the Unconstrainted Features Model: Landscape, Dynamics, and Generalization in the Mean-Field Regime - Score: 18 (R=9, N=9) - Date: 2025-02-04 - Comment: Explores Neural Collapse, providing theoretical insights into training dynamics and representation learning. This is highly aligned with foundational research.

  23. A theoretical framework for overfitting in energy-based modeling - Score: 18 (R=9, N=9) - Date: 2025-02-03 - Comment: Develops a theoretical framework for overfitting in energy-based generative models, exploring spectral learning dynamics. Matches foundational research in representation learning.

  24. An Invitation to Neuroalgebraic Geometry - Score: 18 (R=9, N=9) - Date: 2025-02-03 - Comment: The paper introduces a new theoretical framework connecting algebraic geometry and machine learning, specifically targeting neural networks. This aligns with 'Representation Learning' as it provides unique insights on the expressivity and training dynamics of neural networks.

  25. Your contrastive learning problem is secretly a distribution alignment problem - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: The paper reframes contrastive learning as a distribution alignment problem using optimal transport, providing theoretical insights into representation learning. This aligns closely with foundational research in representation learning.

  26. Self-Training Elicits Concise Reasoning in Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: The paper proposes methods to elicit concise reasoning in LLMs, which aligns with foundational research in LLM behavior and training dynamics.

  27. Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: This paper introduces Representation Engineering (RepE) as a novel paradigm for controlling LLM behavior by manipulating internal representations. It aligns closely with the 'Representation Learning' and 'Large Language Models' criteria, offering theoretical insights and a comprehensive framework for a new direction in LLM research.

  28. Do Large Language Models Know How Much They Know? - Score: 17 (R=9, N=8) - Date: 2025-02-28 - Comment: The paper investigates whether LLMs can assess the scope of their own knowledge, which aligns with foundational research in LLM behavior and interpretability.

  29. Norm Growth and Stability Challenges in Localized Sequential Knowledge Editing - Score: 17 (R=9, N=8) - Date: 2025-02-27 - Comment: The paper investigates challenges in localized sequential knowledge editing for LLMs, focusing on stability and norm growth. This aligns with foundational research in LLM behavior and interpretability.

  30. Consistent Amortized Clustering via Generative Flow Networks - Score: 17 (R=9, N=8) - Date: 2025-02-27 - Comment: The paper proposes a novel framework for amortized clustering using Generative Flow Networks, which contributes to representation learning and foundational clustering methods.

  31. Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases - Score: 17 (R=9, N=8) - Date: 2025-02-27 - Comment: The paper investigates pre-pretraining on formal languages to improve linguistic biases in LLMs, which provides insights into foundational aspects of LLM behavior and interpretability.

  32. FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge - Score: 17 (R=9, N=8) - Date: 2025-02-27 - Comment: The paper introduces a novel unlearning method (KLUE) for faithful forgetting in LLMs, which aligns with foundational research on LLM behavior and interpretability.

  33. Unveiling and Causalizing CoT: A Causal Pespective - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper explores causal perspectives on Chain-of-Thought reasoning in LLMs, providing theoretical insights into reasoning mechanisms. This aligns with foundational research in LLM behavior and interpretability.

  34. How Do Large Language Monkeys Get Their Power (Laws)? - Score: 17 (R=9, N=8) - Date: 2025-02-26 - Comment: The paper provides theoretical insights into power law scaling in large language models, which aligns with foundational research in LLM behavior and interpretability.

  35. Function-Space Learning Rates - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper introduces a novel concept of function-space learning rates and proposes FLeRM, a method for hyperparameter transfer across model scales. This aligns with the Representation Learning and Model Architecture criteria, as it provides insights into training dynamics and scaling behavior.

  36. Towards Auto-Regressive Next-Token Prediction: In-Context Learning Emerges from Generalization - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper provides theoretical insights into in-context learning and generalization in LLMs, aligning with the 'Large Language Models' criterion.

  37. Forecasting Rare Language Model Behaviors - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper introduces a method to forecast rare LLM behaviors, which provides theoretical insights into LLM behavior and interpretability.

  38. The Role of Sparsity for Length Generalization in Transformers - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper investigates the role of sparsity in length generalization for transformers, which aligns with 'Representation Learning' and 'Model Architecture' criteria due to its theoretical insights into transformer behavior.

  39. Sequence-level Large Language Model Training with Contrastive Preference Optimization - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper introduces a contrastive preference optimization procedure for sequence-level LLM training, which aligns with foundational research in LLM training dynamics.

  40. UniDyG: A Unified and Effective Representation Learning Approach for Large Dynamic Graphs - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper proposes UniDyG, a unified representation learning approach for dynamic graphs, which aligns with representation learning and introduces a novel Fourier Graph Attention mechanism.

  41. Toward a Flexible Framework for Linear Representation Hypothesis Using Maximum Likelihood Estimation - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper introduces a flexible framework for linear representation hypothesis using maximum likelihood estimation, which aligns with representation learning and provides a principled approach to concept directions.

  42. A Gap Between the Gaussian RKHS and Neural Networks: An Infinite-Center Asymptotic Analysis - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper investigates the gap between Gaussian RKHS and neural networks, providing theoretical insights into function spaces. This aligns with 'Representation Learning' and foundational research.

  43. An explainable transformer circuit for compositional generalization - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper provides mechanistic insights into compositional generalization in transformers, which aligns with understanding and interpretability of model architectures.

  44. Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors - Score: 17 (R=9, N=8) - Date: 2025-02-24 - Comment: The paper provides theoretical insights into generalization error bounds for representation learning using data-dependent Gaussian mixture priors. It aligns well with the representation learning criterion, offering foundational contributions.

  45. Fr\'echet Cumulative Covariance Net for Deep Nonlinear Sufficient Dimension Reduction with Random Objects - Score: 17 (R=9, N=8) - Date: 2025-02-24 - Comment: The paper introduces a novel statistical dependence measure (FCCov) and a nonlinear sufficient dimension reduction framework, which aligns with representation learning by focusing on encoding essential features of high-dimensional data. The theoretical contributions and convergence guarantees add to its relevance.

  46. Analyze the Neurons, not the Embeddings: Understanding When and Where LLM Representations Align with Humans - Score: 17 (R=9, N=8) - Date: 2025-02-24 - Comment: The paper analyzes neuron-level representations in LLMs and their alignment with human concepts, contributing to interpretability and representation learning. This aligns with foundational research in representation learning.

  47. LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers - Score: 17 (R=9, N=8) - Date: 2025-02-24 - Comment: The paper provides insights into how LLMs encode contextual information, particularly focusing on the role of punctuation and token-level analysis, which aligns with interpretability in LLMs.

  48. Understanding SGD with Exponential Moving Average: A Case Study in Linear Regression - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The paper provides theoretical insights into the effectiveness of Exponential Moving Average (EMA) in SGD, which aligns with the training dynamics in neural networks under representation learning.

  49. Zero loss guarantees and explicit minimizers for generic overparametrized Deep Learning networks - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The paper provides theoretical insights into overparameterized deep learning networks, focusing on zero loss guarantees and training dynamics, which aligns with the Representation Learning criterion.

  50. Towards a Learning Theory of Representation Alignment - Score: 17 (R=9, N=8) - Date: 2025-02-21 - Comment: The paper provides a learning-theoretic perspective on representation alignment, which aligns closely with the 'Representation Learning' criterion, particularly in understanding how representations are encoded and aligned.

  51. Concept Layers: Enhancing Interpretability and Intervenability via LLM Conceptualization - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper introduces Concept Layers to enhance interpretability and intervenability in LLMs, which aligns with foundational research in model architecture and interpretability.

  52. Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper explores the implicit bias and regularization effects of early stopping in gradient descent for overparameterized logistic regression, which provides insights into training dynamics and representation learning.

  53. Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment - Score: 17 (R=9, N=8) - Date: 2025-02-20 - Comment: The paper introduces a novel reasoning pipeline for LLMs, focusing on hypothesis decomposition and amendment, which aligns with foundational research in LLM reasoning and interpretability.

  54. Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper introduces a novel framework for self-organizing knowledge networks using graph reasoning and LLMs, which aligns with emerging trends and foundational research in knowledge representation.

  55. Stability-based Generalization Bounds for Variational Inference - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper develops stability-based generalization bounds for variational inference, which aligns with foundational research in representation learning and theoretical insights into training dynamics.

  56. Symmetric Rank-One Quasi-Newton Methods for Deep Learning Using Cubic Regularization - Score: 17 (R=9, N=8) - Date: 2025-02-19 - Comment: The paper explores a novel quasi-Newton method for deep learning optimization, which aligns with foundational research in training dynamics and representation learning. The use of cubic regularization and indefinite Hessian approximations is a notable theoretical contribution.

  57. Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: Provides a mechanistic interpretability analysis of fine-tuning in LLMs and proposes novel circuit-aware LoRA adaptations for performance gains.

  58. Neural Interpretable Reasoning - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper introduces a novel framework for interpretable reasoning in neural networks, which aligns with representation learning and interpretability. The Markovian property and neural re-parametrization add theoretical depth.

  59. Does Editing Provide Evidence for Localization? - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper critically examines interpretability in LLMs by analyzing the evidence provided by localized edits, which aligns with foundational research in LLM behavior and interpretability.

  60. Sparse Autoencoder Features for Classifications and Transferability - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: Sparse autoencoders are explored for feature learning, which relates closely to 'Representation Learning' and 'Model Compression', particularly given the focus on sparsity and transferable features.

  61. The Rotary Position Embedding May Cause Dimension Inefficiency in Attention Heads for Long-Distance Retrieval - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper investigates the Rotary Position Embedding (RoPE) and its inefficiencies in long-distance retrieval, which aligns with foundational research on LLM behavior and interpretability.

  62. The Representation and Recall of Interwoven Structured Knowledge in LLMs: A Geometric and Layered Analysis - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: Analyzes multi-layer and geometric encoding patterns in LLMs, offering insights into representation dynamics, which strongly aligns with foundational research criteria.

  63. Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper explores Neural Collapse and its impact on OOD detection and generalization, providing theoretical insights into representation learning and training dynamics.

  64. A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper investigates grounding mechanisms in LLMs using a novel dataset, which aligns with foundational research in LLM behavior and interpretability.

  65. Fenchel-Young Variational Learning - Score: 17 (R=9, N=8) - Date: 2025-02-17 - Comment: The paper proposes Fenchel-Young Variational Learning, a generalization of variational methods with new theoretical insights and applications to latent-variable models, aligning with foundational research in representation learning and autoencoders.

  66. Prediction hubs are context-informed frequent tokens in LLMs - Score: 17 (R=9, N=8) - Date: 2025-02-17 - Comment: The paper explores hubness in LLMs and provides theoretical and empirical insights into token prediction behavior, aligning with foundational research on LLM behavior and interpretability.

  67. On Space Folds of ReLU Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-02-17 - Comment: The paper provides a quantitative analysis of space folding in ReLU networks, offering foundational insights into neural network behavior and representation learning.

  68. On the Importance of Embedding Norms in Self-Supervised Learning - Score: 17 (R=9, N=8) - Date: 2025-02-14 - Comment: This paper provides theoretical insights into the role of embedding norms in self-supervised learning, which aligns with representation learning and training dynamics in neural networks.

  69. Unsupervised categorization of similarity measures - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper explores unsupervised categorization of similarity measures through representation learning, which aligns with foundational research in representation learning. The focus on independent metric spaces is novel.

  70. RomanLens: Latent Romanization and its role in Multilinguality in LLMs - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper provides theoretical insights into multilingual representation in LLMs, specifically the role of latent romanization, which aligns with the criterion of understanding LLM behavior and interpretability.

  71. LUNAR: LLM Unlearning via Neural Activation Redirection - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper introduces a novel unlearning methodology for LLMs, which aligns with the criterion of theoretical insights into LLM behavior. The use of neural activation redirection is innovative.

  72. Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper introduces a novel framework (NeuronLens) for interpreting and manipulating neuron activations in LLMs, addressing polysemanticity. This aligns with the 'Large Language Models (LLMs)' criterion, focusing on interpretability and internal mechanisms.

  73. Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More - Score: 17 (R=9, N=8) - Date: 2025-02-12 - Comment: The paper introduces a novel training paradigm (MEAP) for LLMs that integrates Masked Language Modeling into Next-Token Prediction, which aligns with foundational research in representation learning and training dynamics of neural networks.

  74. When More is Less: Understanding Chain-of-Thought Length in LLMs - Score: 17 (R=9, N=8) - Date: 2025-02-12 - Comment: The paper provides theoretical insights into Chain-of-Thought (CoT) reasoning in LLMs, including optimal CoT length and noise susceptibility. This aligns with the LLM behavior/interpretability criterion.

  75. No Trick, No Treat: Pursuits and Challenges Towards Simulation-free Training of Neural Samplers - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper explores simulation-free training of neural samplers and analyzes mode collapse, which aligns with foundational research in representation learning and training dynamics.

  76. Emergent Response Planning in LLM - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper identifies emergent planning behaviors in LLMs, focusing on how hidden representations encode future outputs. This aligns with 'Representation Learning' and provides theoretical insights into LLM behavior.

  77. Learning Task Representations from In-Context Learning - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper explores how tasks are encoded in in-context learning within LLMs, focusing on attention heads and task vectors. This aligns with the 'Representation Learning' criterion, as it provides insights into how information is encoded in deep networks.

  78. SEER: Self-Explainability Enhancement of Large Language Models' Representations - Score: 17 (R=9, N=8) - Date: 2025-02-11 - Comment: The paper proposes SEER, a method to enhance LLM explainability by disentangling representations, which aligns with representation learning and interpretability of LLMs.

  79. Implicit Bias of SignGD and Adam on Multiclass Separable Data - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: The paper characterizes implicit biases of optimization algorithms (SignGD and Adam) in multiclass classification, contributing to foundational research in training dynamics of neural networks.

  80. Extracting and Understanding the Superficial Knowledge in Alignment - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: The paper explores the concept of 'superficial knowledge' in alignment for LLMs, addressing interpretability and alignment transfer, which is a relevant topic in investigating LLM behavior.

  81. In Praise of Stubbornness: The Case for Cognitive-Dissonance-Aware Knowledge Updates in LLMs - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: Proposes cognitive-dissonance-aware knowledge updates in LLMs, aligning with insights into LLM behavior and robustness, which makes it highly relevant.

  82. Sparse Autoencoders for Hypothesis Generation - Score: 17 (R=9, N=8) - Date: 2025-02-10 - Comment: Introduces sparse autoencoders for interpretable feature generation, which resonates with 'Representation Learning' in foundational research, especially around sparsity and interpretability in embeddings.

  83. Distribution learning via neural differential equations: minimal energy regularization and approximation theory - Score: 17 (R=9, N=8) - Date: 2025-02-07 - Comment: Proposes minimal energy regularization and theoretical analysis for neural ODEs in distribution learning, which ties to foundational developments in representation learning and efficient approximation methods.

  84. LLM Alignment as Retriever Optimization: An Information Retrieval Perspective - Score: 17 (R=9, N=8) - Date: 2025-02-07 - Comment: This paper proposes an alignment strategy for LLMs based on Information Retrieval principles. The focus on LLM behavioral alignment via a novel optimization method fits within the 'Large Language Models (LLMs)' and 'Representation Learning' criteria.

  85. Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning - Score: 17 (R=9, N=8) - Date: 2025-02-06 - Comment: The paper proposes a novel hybrid representation using latent and text tokens for improving reasoning in LLMs. This approach aligns with representation learning and architectural insights into language models.

  86. Multi-level Supervised Contrastive Learning - Score: 17 (R=9, N=8) - Date: 2025-02-05 - Comment: The paper introduces a novel supervised contrastive learning method, which is directly aligned with foundational research in representation learning.

  87. BRIDLE: Generalized Self-supervised Learning with Quantization - Score: 17 (R=9, N=8) - Date: 2025-02-05 - Comment: Proposes a framework combining residual quantization with self-supervised learning, very relevant to Representation Learning and training methodologies.

  88. Enhancing Generalization via Sharpness-Aware Trajectory Matching for Dataset Condensation - Score: 17 (R=9, N=8) - Date: 2025-02-05 - Comment: This proposes a novel sharpness-aware trajectory matching method for dataset condensation aligning with fundamental principles of representation learning. The approach shows promise for enhancing generalization.

  89. Discovering Chunks in Neural Embeddings for Interpretability - Score: 17 (R=9, N=8) - Date: 2025-02-05 - Comment: Introduces a novel framework for interpreting neural embeddings by identifying 'chunks', contributing to representation learning and interpretability of networks.

  90. What is a Number, That a Large Language Model May Know It? - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: Examines numerical representation in LLMs and blends cognitive science approaches, highly relevant to foundational representation learning in LLMs.

  91. Activation by Interval-wise Dropout: A Simple Way to Prevent Neural Networks from Plasticity Loss - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: The AID method introduces a novel dropout variation targeting training dynamics, aligning with foundational representation learning and training dynamics research.

  92. LLM Safety Alignment is Divergence Estimation in Disguise - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: The theoretical perspective connecting LLM safety alignment to divergence estimation offers a foundational insight into behavior and interpretability, which aligns well with LLM theoretical insights.

  93. A Comunication Framework for Compositional Generation - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: Explores compositional encodings in learned representations using a communication game framework, directly tying into representation learning and advancing insights on compositionality.

  94. Self-Supervised Learning Using Nonlinear Dependence - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: Presents a novel self-supervised technique leveraging nonlinear dependency, tying closely to representation learning and enriching feature encoding, which is foundational.

  95. A Metric for the Balance of Information in Graph Learning - Score: 17 (R=9, N=8) - Date: 2025-02-03 - Comment: Introduces a metric (NNRD) to balance structural and feature information in graph learning. Contains fundamental insights into managing representation biases in molecular graph data.

  96. Effective Field Neural Network - Score: 17 (R=8, N=9) - Date: 2025-02-26 - Comment: The paper introduces Effective Field Neural Networks (EFNNs), inspired by field theory, to model many-body interactions. This aligns with foundational research in representation learning and emerging trends, as it proposes a novel paradigm for encoding domain knowledge into neural networks.

  97. Do Graph Diffusion Models Accurately Capture and Generate Substructure Distributions? - Score: 17 (R=8, N=9) - Date: 2025-02-05 - Comment: Investigates expressivity limits in graph diffusion models, which ties into foundational representation learning with implications for architecture analysis.

  98. Al-Khwarizmi: Discovering Physical Laws with Foundation Models - Score: 17 (R=8, N=9) - Date: 2025-02-05 - Comment: The paper introduces Al-Khwarizmi for automated physical law discovery with foundational models, aligning with AI for Science innovations and novel generative paradigms.

  99. Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution - Score: 17 (R=8, N=9) - Date: 2025-02-03 - Comment: Unifies attribution methodologies and contributes to enhancing interpretability in AI systems; aligns with emerging trends in foundational understanding.

  100. Applications of Statistical Field Theory in Deep Learning - Score: 16 (R=9, N=7) - Date: 2025-02-27 - Comment: This paper provides a review of statistical field theory applied to deep learning, which could offer theoretical insights into representation learning and training dynamics.

  101. What are Models Thinking about? Understanding Large Language Model Hallucinations "Psychology" through Model Inner State Analysis - Score: 16 (R=9, N=7) - Date: 2025-02-20 - Comment: The paper analyzes internal states of LLMs to understand hallucinations, which provides theoretical insights into LLM behavior and interpretability.

  102. Language Models Can Predict Their Own Behavior - Score: 16 (R=9, N=7) - Date: 2025-02-20 - Comment: The paper explores the ability of LLMs to predict their own behavior using internal representations, which aligns with interpretability and efficiency in LLMs.

  103. The Validation Gap: A Mechanistic Analysis of How Language Models Compute Arithmetic but Fail to Validate It - Score: 16 (R=9, N=7) - Date: 2025-02-18 - Comment: The paper provides a mechanistic analysis of error detection in LLMs, focusing on arithmetic validation, which aligns with interpretability and foundational research in LLM behavior.

  104. A Mathematics Framework of Artificial Shifted Population Risk and Its Further Understanding Related to Consistency Regularization - Score: 16 (R=9, N=7) - Date: 2025-02-18 - Comment: The proposed mathematical framework for understanding consistency regularization in data augmentation contributes to theoretical insights and training dynamics, aligning with 'Representation Learning'.

  105. Can Large Language Models Understand Intermediate Representations? - Score: 16 (R=9, N=7) - Date: 2025-02-13 - Comment: The paper investigates LLMs' understanding of intermediate representations, which is relevant to foundational research in LLM behavior and interpretability. The focus on control flow and execution reasoning adds depth.

  106. PH-VAE: A Polynomial Hierarchical Variational Autoencoder Towards Disentangled Representation Learning - Score: 16 (R=9, N=7) - Date: 2025-02-06 - Comment: The paper develops a hierarchical VAE with polynomial divergence, which directly relates to representation learning and disentangled representation, a relevant foundational topic.

  107. Modular Training of Neural Networks aids Interpretability - Score: 16 (R=9, N=7) - Date: 2025-02-05 - Comment: Presents modular training to improve interpretability and simplifies learned functions, aligning with foundational aspects of representation learning.

  108. A Revisit of Total Correlation in Disentangled Variational Auto-Encoder with Partial Disentanglement - Score: 16 (R=9, N=7) - Date: 2025-02-05 - Comment: Addresses foundational aspects of representation learning by proposing a partially disentangled VAE through novel extensions like the Partial Correlation term. This directly aligns with insights into how latent variables are structured and encoded, a key topic in representation learning.

  109. GRADIEND: Monosemantic Feature Learning within Neural Networks Applied to Gender Debiasing of Transformer Models - Score: 16 (R=9, N=7) - Date: 2025-02-04 - Comment: Introduces an encoding-decoding mechanism for gender debiasing in transformer models, directly relevant to representation learning in foundational transformer-based research.

  110. Topological Autoencoders++: Fast and Accurate Cycle-Aware Dimensionality Reduction - Score: 16 (R=8, N=8) - Date: 2025-02-28 - Comment: The paper proposes a novel topology-aware dimensionality reduction method with theoretical analysis, aligning with foundational research in representation learning.

  111. Identifiable Multi-View Causal Discovery Without Non-Gaussianity - Score: 16 (R=8, N=8) - Date: 2025-02-28 - Comment: The paper proposes a novel approach to causal discovery in multi-view SEMs, which aligns with representation learning and introduces theoretical advancements in causal modeling.

  112. INFO-SEDD: Continuous Time Markov Chains as Scalable Information Metrics Estimators - Score: 16 (R=8, N=8) - Date: 2025-02-27 - Comment: The paper introduces a novel method for estimating information-theoretic quantities using Continuous-Time Markov Chains, which could have implications for representation learning and foundational methods in information theory.

  113. Golden Ratio Mixing of Real and Synthetic Data for Stabilizing Generative Model Training - Score: 16 (R=8, N=8) - Date: 2025-02-26 - Comment: The paper investigates generative model training stability and proposes a novel weighting scheme, which aligns with foundational research in representation learning and training dynamics.

  114. C-3DPO: Constrained Controlled Classification for Direct Preference Optimization - Score: 16 (R=8, N=8) - Date: 2025-02-26 - Comment: The paper proposes a novel constrained classification framework for preference optimization, which provides theoretical insights into DPO-style algorithms. This aligns with foundational research in representation learning.

  115. The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence - Score: 16 (R=8, N=8) - Date: 2025-02-25 - Comment: The paper explores refusal mechanisms in LLMs using gradient-based representation engineering, which provides insights into LLM behavior and interpretability.

  116. Provable Benefits of Unsupervised Pre-training and Transfer Learning via Single-Index Models - Score: 16 (R=8, N=8) - Date: 2025-02-25 - Comment: The paper provides theoretical insights into the benefits of unsupervised pre-training and transfer learning, aligning with foundational research in representation learning.

  117. Brain-Model Evaluations Need the NeuroAI Turing Test - Score: 16 (R=8, N=8) - Date: 2025-02-25 - Comment: The paper proposes a NeuroAI Turing Test framework, which introduces a novel paradigm for evaluating models based on representational convergence, aligning with foundational research in representation learning.

  118. Category-free Out-of-Distribution Node Detection with Feature Resonance - Score: 16 (R=8, N=8) - Date: 2025-02-25 - Comment: The paper proposes a novel framework for OOD node detection in graphs using feature resonance, which aligns with representation learning and introduces a theoretically grounded approach.

  119. Implicit Bias of Gradient Descent for Non-Homogeneous Deep Networks - Score: 16 (R=8, N=8) - Date: 2025-02-25 - Comment: The paper provides theoretical insights into the implicit bias of gradient descent for non-homogeneous deep networks, aligning with 'Representation Learning' and training dynamics.

  120. Learning to Reason from Feedback at Test-Time - Score: 16 (R=8, N=8) - Date: 2025-02-25 - Comment: The paper introduces a novel test-time optimization paradigm for LLMs to utilize feedback effectively, which aligns with foundational research in LLM behavior and reasoning.

  121. Solving Inverse Problems with Deep Linear Neural Networks: Global Convergence Guarantees for Gradient Descent with Weight Decay - Score: 16 (R=8, N=8) - Date: 2025-02-24 - Comment: The paper provides theoretical guarantees for deep linear networks solving inverse problems, contributing to foundational understanding of training dynamics and representation learning.

  122. A Tale of Two Structures: Do LLMs Capture the Fractal Complexity of Language? - Score: 16 (R=8, N=8) - Date: 2025-02-24 - Comment: The paper investigates whether LLMs capture the fractal complexity of language, which aligns with theoretical insights into LLM behavior and interpretability.

  123. Generalization Error of $f$-Divergence Stabilized Algorithms via Duality - Score: 16 (R=8, N=8) - Date: 2025-02-21 - Comment: The paper explores generalization error with $f$-divergence regularization, providing theoretical insights into optimization, which aligns with foundational research in representation learning.

  124. Generalization Certificates for Adversarially Robust Bayesian Linear Regression - Score: 16 (R=8, N=8) - Date: 2025-02-21 - Comment: The paper introduces adversarially robust Bayesian linear regression and provides theoretical guarantees, aligning with the Emerging Trends criterion due to its foundational contributions to robustness in machine learning.

  125. Refining embeddings with fill-tuning: data-efficient generalised performance improvements for materials foundation models - Score: 16 (R=8, N=8) - Date: 2025-02-20 - Comment: The paper introduces 'fill-tuning' for improving embeddings in foundation models, which aligns with representation learning and offers a novel methodology for general performance improvement.

  126. SPEX: Scaling Feature Interaction Explanations for LLMs - Score: 16 (R=8, N=8) - Date: 2025-02-20 - Comment: SPEX introduces a novel sparse Fourier transform-based method for scalable feature interaction explanations in LLMs, aligning with the representation learning criterion by addressing how models encode and interact with input features.

  127. Mixup Regularization: A Probabilistic Perspective - Score: 16 (R=8, N=8) - Date: 2025-02-20 - Comment: The paper introduces a probabilistic perspective on mixup regularization, which is relevant to representation learning and provides a novel theoretical framework for conditional density estimation.

  128. Generalization error bound for denoising score matching under relaxed manifold assumption - Score: 16 (R=8, N=8) - Date: 2025-02-20 - Comment: The paper provides theoretical bounds for denoising score matching under relaxed manifold assumptions, which aligns with foundational research in representation learning.

  129. Towards Invariance to Node Identifiers in Graph Neural Networks - Score: 16 (R=8, N=8) - Date: 2025-02-20 - Comment: The paper proposes a regularization method to enforce invariance to node identifiers in GNNs, which is relevant to representation learning and introduces a novel theoretical perspective.

  130. How Expressive are Knowledge Graph Foundation Models? - Score: 16 (R=8, N=8) - Date: 2025-02-20 - Comment: The paper studies the expressive power of Knowledge Graph Foundation Models (KGFMs) and proposes richer motifs for relation representation, aligning with representation learning and foundational model analysis.

  131. Random Forest Autoencoders for Guided Representation Learning - Score: 16 (R=8, N=8) - Date: 2025-02-20 - Comment: The paper proposes Random Forest Autoencoders for supervised visualization, combining autoencoders with random forests. This aligns with the 'Representation Learning' criterion, particularly in guided feature learning.

  132. A Neural Difference-of-Entropies Estimator for Mutual Information - Score: 16 (R=8, N=8) - Date: 2025-02-19 - Comment: The paper introduces a neural difference-of-entropies estimator for mutual information, which is relevant to representation learning and foundational research in information theory.

  133. Learning the symmetric group: large from small - Score: 16 (R=8, N=8) - Date: 2025-02-19 - Comment: The paper explores generalization in learning symmetric groups, which is an emerging trend in foundational research. The method of scaling tasks from small to large groups is a novel theoretical contribution.

  134. Unveiling Mode Connectivity in Graph Neural Networks - Score: 16 (R=8, N=8) - Date: 2025-02-19 - Comment: The paper investigates mode connectivity in GNNs, which provides theoretical insights into optimization dynamics and loss landscapes, aligning with representation learning and emerging trends.

  135. Generalized Kernel Inducing Points by Duality Gap for Dataset Distillation - Score: 16 (R=8, N=8) - Date: 2025-02-19 - Comment: The paper extends Kernel Inducing Points for dataset distillation, which aligns with foundational research in representation learning and efficiency improvements.

  136. Stability Bounds for Smooth Optimal Transport Maps and their Statistical Implications - Score: 16 (R=8, N=8) - Date: 2025-02-19 - Comment: The paper provides stability bounds for optimal transport maps, which is a theoretical contribution relevant to foundational research in representation learning and optimization.

  137. An Interpretable Automated Mechanism Design Framework with Large Language Models - Score: 16 (R=8, N=8) - Date: 2025-02-19 - Comment: The paper explores mechanism design using LLMs and introduces a novel framework for code generation and interpretability, which aligns with foundational research in LLM behavior and interpretability.

  138. SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models - Score: 16 (R=8, N=8) - Date: 2025-02-18 - Comment: This paper uses Sparse Autoencoders (SAEs) to interpret instruction-following in LLMs, connecting both 'Representation Learning' and interpretability of models.

  139. Shortcuts and Identifiability in Concept-based Models from a Neuro-Symbolic Lens - Score: 16 (R=8, N=8) - Date: 2025-02-18 - Comment: The paper studies concept-based models and reasoning shortcuts, which aligns with representation learning and interpretability. The theoretical conditions for identifiability add significant novelty.

  140. How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training - Score: 16 (R=8, N=8) - Date: 2025-02-18 - Comment: The exploration of knowledge circuit evolution in LLMs aligns with 'Representation Learning', focusing on interpretability and continual pre-training insights.

  141. Generalization of the Gibbs algorithm with high probability at low temperatures - Score: 16 (R=8, N=8) - Date: 2025-02-18 - Comment: The paper addresses generalization bounds for the Gibbs algorithm with a focus on flat minima, relevant to emerging foundational insights in optimization.

  142. On Vanishing Gradients, Over-Smoothing, and Over-Squashing in GNNs: Bridging Recurrent and Graph Learning - Score: 16 (R=8, N=8) - Date: 2025-02-18 - Comment: The paper provides a unification of over-smoothing, over-squashing, and vanishing gradients in GNNs with theoretical insights and proposes improvements. Foundational relevance for graph neural networks.

  143. Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data - Score: 16 (R=8, N=8) - Date: 2025-02-17 - Comment: The paper introduces a novel theoretical framework for learning from imbalanced data, including a new margin loss function and learning guarantees, which aligns with foundational research in representation learning.

  144. Estimation of the Learning Coefficient Using Empirical Loss - Score: 16 (R=8, N=8) - Date: 2025-02-17 - Comment: The paper proposes a novel method for estimating the learning coefficient using empirical loss, which contributes to theoretical insights into model generalization. This aligns with foundational research in representation learning.

  145. Generalizability through Explainability: Countering Overfitting with Counterfactual Examples - Score: 16 (R=8, N=8) - Date: 2025-02-14 - Comment: The paper introduces CF-Reg, a novel regularization method leveraging counterfactual examples to mitigate overfitting. This aligns with representation learning and training dynamics in neural networks.

  146. Improving Deep Regression with Tightness - Score: 16 (R=8, N=8) - Date: 2025-02-14 - Comment: The paper introduces a theoretical explanation for improving deep regression by reducing conditional entropy and proposes novel regularization strategies. This aligns with representation learning and training dynamics.

  147. New Bounds for Sparse Variational Gaussian Processes - Score: 16 (R=8, N=8) - Date: 2025-02-14 - Comment: The paper introduces a tighter bound for sparse variational Gaussian processes, aligning with the 'Representation Learning' criterion due to its focus on improving foundational methods.

  148. ReTreever: Tree-based Coarse-to-Fine Representations for Retrieval - Score: 16 (R=8, N=8) - Date: 2025-02-13 - Comment: The paper introduces a tree-based hierarchical representation for document retrieval, which aligns with representation learning and efficiency improvements. The hierarchical structure and optimization for retrieval performance are novel.

  149. The Observational Partial Order of Causal Structures with Latent Variables - Score: 16 (R=8, N=8) - Date: 2025-02-13 - Comment: The paper provides a theoretical analysis of causal structures with latent variables, which aligns with emerging trends in foundational research.

  150. MAGELLAN: Metacognitive predictions of learning progress guide autotelic LLM agents in large goal spaces - Score: 16 (R=8, N=8) - Date: 2025-02-12 - Comment: The paper proposes a metacognitive framework for goal prioritization in LLM agents, which aligns with emerging trends in LLM behavior and interpretability. The focus on metacognitive learning is novel and impactful.

  151. Understanding the Generalization Error of Markov algorithms through Poissonization - Score: 16 (R=8, N=8) - Date: 2025-02-12 - Comment: The paper provides a theoretical framework for analyzing generalization error in Markov algorithms using Poissonization, which contributes to understanding training dynamics in neural networks.

  152. Prot2Chat: Protein LLM with Early Fusion of Sequence and Structure - Score: 16 (R=8, N=8) - Date: 2025-02-12 - Comment: The paper introduces a protein LLM framework integrating sequence and structure, which aligns with foundational research in AI for science and multimodal representation learning.

  153. Prompt-Driven Continual Graph Learning - Score: 16 (R=8, N=8) - Date: 2025-02-11 - Comment: The paper introduces a prompt-driven framework for continual graph learning, which aligns with the 'Emerging Trends' criterion by proposing a novel hierarchical prompting mechanism for dynamic graph tasks.

  154. Circuit-tuning: A Mechanistic Approach for Identifying Parameter Redundancy and Fine-tuning Neural Networks - Score: 16 (R=8, N=8) - Date: 2025-02-11 - Comment: The paper introduces circuit-tuning, a mechanistic approach for fine-tuning neural networks, which aligns with foundational research in training dynamics and interpretability.

  155. On the Computability of Multiclass PAC Learning - Score: 16 (R=8, N=8) - Date: 2025-02-11 - Comment: The paper focuses on theoretical insights into PAC learning, which aligns with foundational research in representation learning, particularly in understanding training dynamics and learnability.

  156. Rethinking Oversmoothing in Graph Neural Networks: A Rank-Based Perspective - Score: 16 (R=8, N=8) - Date: 2025-02-10 - Comment: This paper analyzes oversmoothing in GNNs using a rank-based perspective, which could be highly relevant for representation learning and training dynamics in graph structures.

  157. Position-aware Automatic Circuit Discovery - Score: 16 (R=8, N=8) - Date: 2025-02-10 - Comment: The paper proposes position-aware circuit discovery for understanding LLM mechanisms, introducing improvements to circuit analysis. This ties into interpretability and underlying structural insights, relevant for foundational LLM analysis.

  158. TruthFlow: Truthful LLM Generation via Representation Flow Correction - Score: 16 (R=8, N=8) - Date: 2025-02-10 - Comment: TruthFlow introduces a novel representation correction technique for LLMs, showing potential as a foundational method for controlling LLM behavior. This aligns with theoretical insights into LLM representation and interpretability.

  159. It's All in The [MASK]: Simple Instruction-Tuning Enables BERT-like Masked Language Models As Generative Classifiers - Score: 16 (R=8, N=8) - Date: 2025-02-07 - Comment: Presents a unique approach to repurpose BERT-like MLM heads for generative classification. Aligns with representation learning and explores architectural utility without traditional task heads.

  160. Rethinking Approximate Gaussian Inference in Classification - Score: 16 (R=8, N=8) - Date: 2025-02-06 - Comment: This research simplifies Gaussian inference in classification tasks with an alternative learning objective, making it aligned with advancements in representation learning and uncertainty quantification.

  161. Signature Reconstruction from Randomized Signatures - Score: 16 (R=8, N=8) - Date: 2025-02-06 - Comment: Abstract connects representation learning with algebraic reconstruction using ODE-driven systems, offering insights into foundational methods for feature extraction. This framework is relatively novel.

  162. Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting - Score: 16 (R=8, N=8) - Date: 2025-02-06 - Comment: The study proposes a novel weighting scheme for fine-tuning models to mitigate catastrophic forgetting, providing relevant contributions to representation learning, especially in training dynamics.

  163. Networks with Finite VC Dimension: Pro and Contra - Score: 16 (R=8, N=8) - Date: 2025-02-06 - Comment: The paper discusses VC dimension and its implications on approximation and empirical errors in neural networks. Its focus on theoretical trade-offs and high-dimensional geometry aligns well with foundational representation learning insights.

  164. ReMiDi: Reconstruction of Microstructure Using a Differentiable Diffusion MRI Simulator - Score: 16 (R=8, N=8) - Date: 2025-02-05 - Comment: The paper focuses on a novel reconstruction method with representation encoding using autoencoders, which aligns with the Representation Learning and Model Architecture criteria.

  165. Self-supervised Subgraph Neural Network With Deep Reinforcement Walk Exploration - Score: 16 (R=8, N=8) - Date: 2025-02-05 - Comment: Proposes self-supervised SGNNs which utilize reinforcement for exploring subgraph structures, relevant to representation learning through graph methods.

  166. Comply: Learning Sentences with Complex Weights inspired by Fruit Fly Olfaction - Score: 16 (R=8, N=8) - Date: 2025-02-05 - Comment: Biologically inspired neural network for encoding, focusing on sentence representation learning with sparse contextual embeddings.

  167. Representations Shape Weak-to-Strong Generalization: Theoretical Insights and Empirical Predictions - Score: 16 (R=8, N=8) - Date: 2025-02-04 - Comment: Offers theoretical insights into weak-to-strong generalization using representation kernels, providing a fresh perspective relevant to representation learning.

  168. HoP: Homeomorphic Polar Learning for Hard Constrained Optimization - Score: 16 (R=8, N=8) - Date: 2025-02-04 - Comment: Proposes a constrained optimization approach embedding homeomorphic mapping into neural networks, relevant to efficiency and representation learning with a novel L2O formulation.

  169. LLM Program Optimization via Retrieval Augmented Search - Score: 16 (R=8, N=8) - Date: 2025-02-04 - Comment: Proposes a novel blackbox adaptation method (Retrieval Augmented Search) utilizing LLMs for program optimization, which includes theoretical contributions relevant to LLM efficiency and interpretability.

  170. Error Slice Discovery via Manifold Compactness - Score: 16 (R=8, N=8) - Date: 2025-02-03 - Comment: The paper introduces a novel metric for evaluating coherence in error slice discovery using manifold compactness and optimizes for risk and coherence simultaneously, which aligns with criteria around representation learning and training dynamics.

  171. Neural SDEs as a Unified Approach to Continuous-Domain Sequence Modeling - Score: 16 (R=8, N=8) - Date: 2025-02-03 - Comment: Proposes Neural SDEs for sequence modeling in a continuous-time perspective, offering foundational insights into time-series representation learning. Novel approach to intrinsic modeling dynamics.

  172. Sanity Checking Causal Representation Learning on a Simple Real-World System - Score: 15 (R=8, N=7) - Date: 2025-02-28 - Comment: The paper evaluates causal representation learning methods on a real-world system, highlighting reproducibility challenges, which aligns with foundational research in representation learning.

  173. Erasing Without Remembering: Safeguarding Knowledge Forgetting in Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-02-28 - Comment: The paper explores machine unlearning in LLMs, introducing a benchmark (UGBench) and a perturbation-based method (PERMU) to enhance unlearning generalization. This aligns with the 'Large Language Models' criterion for theoretical insights into LLM behavior.

  174. Incremental Learning with Repetition via Pseudo-Feature Projection - Score: 15 (R=8, N=7) - Date: 2025-02-28 - Comment: The paper introduces a novel exemplar-free incremental learning method (Horde) with dynamic feature extractor alignment, which aligns with 'Representation Learning' for insights into training dynamics and feature learning.

  175. Obtaining Example-Based Explanations from Deep Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-02-28 - Comment: The paper proposes example-based explanations for deep neural networks, which aligns with representation learning and interpretability.

  176. Accurate and Scalable Graph Neural Networks via Message Invariance - Score: 15 (R=8, N=7) - Date: 2025-02-28 - Comment: The paper introduces a novel concept of message invariance to address computational challenges in GNNs, which aligns with representation learning and training dynamics in neural networks.

  177. Spectral Analysis of Representational Similarity with Limited Neurons - Score: 15 (R=8, N=7) - Date: 2025-02-28 - Comment: The paper provides a theoretical framework for representational similarity measures using Random Matrix Theory, which aligns with the representation learning criterion.

  178. Tell me why: Visual foundation models as self-explainable classifiers - Score: 15 (R=8, N=7) - Date: 2025-02-28 - Comment: The paper introduces a novel prototypical architecture for interpretability in visual foundation models, which aligns with representation learning and architectural insights.

  179. Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation - Score: 15 (R=8, N=7) - Date: 2025-02-27 - Comment: The paper evaluates LLMs' ability to falsify solutions, which is a novel perspective on reasoning and interpretability in LLMs.

  180. Invariance Pair-Guided Learning: Enhancing Robustness in Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-02-27 - Comment: The paper introduces a novel training approach to enhance robustness in neural networks, which aligns with representation learning and training dynamics.

  181. Investigating Generalization of One-shot LLM Steering Vectors - Score: 15 (R=8, N=7) - Date: 2025-02-27 - Comment: The paper investigates steering vectors for LLMs, which aligns with 'Representation Learning' as it explores how LLMs encode and control behaviors. The focus on one-shot optimization and generalization adds novelty.

  182. Mechanistic Understanding of Language Models in Syntactic Code Completion - Score: 15 (R=8, N=7) - Date: 2025-02-27 - Comment: The paper investigates the mechanistic understanding of language models in syntactic code completion, which aligns with interpretability and training dynamics in LLMs.

  183. An Overview of Large Language Models for Statisticians - Score: 15 (R=8, N=7) - Date: 2025-02-26 - Comment: The paper explores the intersection of statistics and LLMs, focusing on uncertainty quantification, interpretability, and fairness. This aligns with foundational research in LLM behavior and interpretability.

  184. Synthetic Text Generation for Training Large Language Models via Gradient Matching - Score: 15 (R=8, N=7) - Date: 2025-02-26 - Comment: The paper proposes a theoretically rigorous approach for generating synthetic text for LLM training, which aligns with foundational research in representation learning and LLM training dynamics.

  185. Hallucination Detection in LLMs Using Spectral Features of Attention Maps - Score: 15 (R=8, N=7) - Date: 2025-02-26 - Comment: The paper introduces a novel method for hallucination detection in LLMs using spectral features of attention maps. This aligns with the criteria for theoretical insights into LLM behavior and interpretability, making it relevant to foundational research.

  186. CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought - Score: 15 (R=8, N=7) - Date: 2025-02-25 - Comment: The paper proposes a framework for uncertainty quantification in LLMs using Chain-of-Thought reasoning, which aligns with foundational research in LLM behavior and interpretability.

  187. Subsampling Graphs with GNN Performance Guarantees - Score: 15 (R=8, N=7) - Date: 2025-02-25 - Comment: The paper introduces a graph subsampling method with theoretical guarantees, which aligns with representation learning and efficiency in GNNs.

  188. Are Sparse Autoencoders Useful? A Case Study in Sparse Probing - Score: 15 (R=8, N=7) - Date: 2025-02-25 - Comment: The paper evaluates sparse autoencoders for probing LLM activations, which aligns with representation learning and interpretability criteria.

  189. Subspace Recovery in Winsorized PCA: Insights into Accuracy and Robustness - Score: 15 (R=8, N=7) - Date: 2025-02-25 - Comment: The paper explores Winsorized PCA for robust subspace recovery, which aligns with 'Representation Learning' through its focus on theoretical robustness and accuracy in feature space.

  190. Understanding the Emergence of Multimodal Representation Alignment - Score: 15 (R=8, N=7) - Date: 2025-02-25 - Comment: The paper investigates the emergence of multimodal representation alignment, which aligns with representation learning and provides insights into training dynamics.

  191. Graph Self-Supervised Learning with Learnable Structural and Positional Encodings - Score: 15 (R=8, N=7) - Date: 2025-02-25 - Comment: The paper proposes a GNN framework with enhanced structural and positional encodings, which aligns with foundational research in representation learning.

  192. Patterns Over Principles: The Fragility of Inductive Reasoning in LLMs under Noisy Observations - Score: 15 (R=8, N=7) - Date: 2025-02-25 - Comment: The paper evaluates inductive reasoning in LLMs under noisy observations, which provides insights into LLM behavior and interpretability, aligning with the LLM topic.

  193. NeurFlow: Interpreting Neural Networks through Neuron Groups and Functional Interactions - Score: 15 (R=8, N=7) - Date: 2025-02-25 - Comment: The paper proposes a framework for interpreting neural networks through neuron groups and functional interactions, which aligns with foundational research in representation learning and interpretability.

  194. Towards Understanding Gradient Flow Dynamics of Homogeneous Neural Networks Beyond the Origin - Score: 15 (R=8, N=7) - Date: 2025-02-25 - Comment: The paper provides insights into gradient flow dynamics of homogeneous neural networks, which aligns with 'Representation Learning' due to its focus on training dynamics and sparsity structure.

  195. CoME: An Unlearning-based Approach to Conflict-free Model Editing - Score: 15 (R=8, N=7) - Date: 2025-02-25 - Comment: The paper introduces a model editing framework for LLMs, which aligns with 'Model Compression' and 'Representation Learning' due to its focus on unlearning and knowledge updates.

  196. MaxSup: Overcoming Representation Collapse in Label Smoothing - Score: 15 (R=8, N=7) - Date: 2025-02-25 - Comment: The paper addresses representation collapse in label smoothing, which is relevant to 'Representation Learning'. The proposed MaxSup method offers a novel approach to regularization.

  197. Scale-Free Graph-Language Models - Score: 15 (R=8, N=7) - Date: 2025-02-24 - Comment: The paper proposes a graph-language model integrating graph generation and text embedding with a scale-free structural prior, which aligns with the 'Representation Learning' criterion by addressing foundational aspects of graph-based learning.

  198. Curvature Corrected Nonnegative Manifold Data Factorization - Score: 15 (R=8, N=7) - Date: 2025-02-24 - Comment: The paper introduces a novel geometry-aware method for manifold-valued data factorization, which aligns with representation learning through its focus on low-rank approximations and interpretable factors.

  199. The Multi-Faceted Monosemanticity in Multimodal Representations - Score: 15 (R=8, N=7) - Date: 2025-02-24 - Comment: The paper explores interpretability in multimodal models, particularly CLIP, and introduces a novel categorization of features, aligning with representation learning and interpretability.

  200. Disentangled Latent Spaces for Reduced Order Models using Deterministic Autoencoders - Score: 15 (R=8, N=7) - Date: 2025-02-21 - Comment: The paper explores disentangled latent spaces using deterministic autoencoders, which aligns with the 'Representation Learning' criterion, particularly in feature learning and interpretability.

  201. ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification - Score: 15 (R=8, N=7) - Date: 2025-02-21 - Comment: The paper proposes a novel self-verification framework for LLMs, which aligns with foundational research on LLM behavior and interpretability. The structured curriculum and confidence-aware decoding mechanism are notable contributions.

  202. Affinity and Diversity: A Unified Metric for Demonstration Selection via Internal Representations - Score: 15 (R=8, N=7) - Date: 2025-02-21 - Comment: The paper proposes a unified metric for demonstration selection in in-context learning, leveraging internal representations. This aligns with representation learning and provides insights into ICL dynamics.

  203. Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information - Score: 15 (R=8, N=7) - Date: 2025-02-21 - Comment: The discovery of Temporal Heads in LLMs and their role in encoding temporal knowledge provides insights into model interpretability and internal representation learning.

  204. Rectified Lagrangian for Out-of-Distribution Detection in Modern Hopfield Networks - Score: 15 (R=8, N=7) - Date: 2025-02-21 - Comment: The paper proposes a rectified Lagrangian for out-of-distribution detection in modern Hopfield networks, which aligns with representation learning and foundational innovations in neural network dynamics.

  205. Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images - Score: 15 (R=8, N=7) - Date: 2025-02-20 - Comment: The paper introduces a novel finetuning objective (S-VCO) for vision-language models, which aligns with representation learning and architectural insights.

  206. Flow-based generative models as iterative algorithms in probability space - Score: 15 (R=8, N=7) - Date: 2025-02-20 - Comment: The paper provides a theoretical framework for flow-based generative models, which aligns with foundational research in generative modeling and representation learning.

  207. Task Shift: From Classification to Regression in Overparameterized Linear Models - Score: 15 (R=8, N=7) - Date: 2025-02-20 - Comment: The paper investigates task shift from classification to regression in overparameterized linear models, providing theoretical insights into generalization and interpolation. This aligns with foundational research in representation learning.

  208. Enhanced uncertainty quantification variational autoencoders for the solution of Bayesian inverse problems - Score: 15 (R=8, N=7) - Date: 2025-02-19 - Comment: The paper proposes a novel loss function for variational autoencoders in Bayesian inverse problems, which aligns with foundational research in representation learning and generative models.

  209. Asymptotic Optimism of Random-Design Linear and Kernel Regression Models - Score: 15 (R=8, N=7) - Date: 2025-02-19 - Comment: The paper provides theoretical insights into model complexity measures and compares neural networks with kernel models, which aligns with representation learning and training dynamics.

  210. B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability - Score: 15 (R=8, N=7) - Date: 2025-02-19 - Comment: The paper introduces B-cos LMs for improved explainability in language models, which is relevant to representation learning and interpretability. The adaptation of B-cos networks to NLP tasks is a novel contribution.

  211. RM-PoT: Reformulating Mathematical Problems and Solving via Program of Thoughts - Score: 15 (R=8, N=7) - Date: 2025-02-19 - Comment: The paper proposes a framework for reformulating mathematical problems to improve LLM reasoning, which aligns with LLM behavior/interpretability.

  212. Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities? - Score: 15 (R=8, N=7) - Date: 2025-02-19 - Comment: The paper critiques test-time scaling in LLMs and proposes a novel method for improving scalability, which aligns with the LLM behavior/interpretability criterion.

  213. On the kernel learning problem - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: The paper addresses kernel learning with a novel variational problem, which aligns with foundational research in representation learning and multiscale data structures.

  214. ReLearn: Unlearning via Learning for Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: The paper proposes a method for unlearning in LLMs, which is relevant to foundational research in LLM behavior and interpretability, particularly in preserving linguistic coherence.

  215. Logarithmic Width Suffices for Robust Memorization - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: The paper provides a theoretical analysis of memorization capability in neural networks with respect to robust conditions. This aligns with 'Representation Learning', as it explores training dynamics and capacity in feedforward networks.

  216. Revisiting Weak-to-Strong Generalization in Theory and Practice: Reverse KL vs. Forward KL - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: The paper explores reverse KL divergence for weak-to-strong generalization, which provides theoretical insights into optimization and generalization in LLMs.

  217. Deep Incomplete Multi-view Learning via Cyclic Permutation of VAEs - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: The paper introduces a cyclic permutation method for incomplete multi-view data in variational autoencoders, relevant to 'Representation Learning' via multi-view generative modeling.

  218. Unlocking the Power of Function Vectors for Characterizing and Mitigating Catastrophic Forgetting in Continual Instruction Tuning - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: The paper addresses catastrophic forgetting in LLMs using function vectors, which aligns with representation learning and training dynamics, offering theoretical insights.

  219. The Relationship between No-Regret Learning and Online Conformal Prediction - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: Discusses theoretical links between no-regret learning and online conformal prediction, which is well-aligned with foundational research insights into ML algorithms.

  220. Learning Identifiable Structures Helps Avoid Bias in DNN-based Supervised Causal Learning - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: Focuses on causal discovery with a bias-free approach to DNN architectures, connecting strongly to representation learning and structured prediction.

  221. ReReLRP - Remembering and Recognizing Tasks with LRP - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: The paper addresses catastrophic forgetting using Layerwise Relevance Propagation (LRP), which aligns with 'Representation Learning' and provides insights into memory efficiency and explainability.

  222. Why is prompting hard? Understanding prompts on binary sequence predictors - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: This paper provides statistical and empirical analysis on prompting and sheds light on LLM behavior and training paradigms, aligning with theoretical insights on LLM interpretability.

  223. Superpose Singular Features for Model Merging - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: The work introduces a novel approach to model merging using singular value decomposition, which may have implications for foundational architecture studies such as model compression techniques.

  224. LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: The paper proposes a novel framework for feature selection using LLMs, which aligns with 'Representation Learning' through its focus on integrating domain-specific reasoning into feature selection.

  225. MixMin: Finding Data Mixtures via Convex Minimization - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: The paper introduces a method (MixMin) for optimizing data mixtures, which aligns with foundational research in data efficiency and representation learning.

  226. Revisiting Generalization Power of a DNN in Terms of Symbolic Interactions - Score: 15 (R=8, N=7) - Date: 2025-02-17 - Comment: The paper provides a novel perspective on DNN generalization by analyzing symbolic interactions, which aligns with representation learning and training dynamics.

  227. Elastic Representation: Mitigating Spurious Correlations for Group Robustness - Score: 15 (R=8, N=7) - Date: 2025-02-17 - Comment: Elastic Representation introduces a novel method to mitigate spurious correlations, aligning with representation learning and sparsity-related methods.

  228. Trust Me, I Know the Way: Predictive Uncertainty in the Presence of Shortcut Learning - Score: 15 (R=8, N=7) - Date: 2025-02-14 - Comment: The paper discusses predictive uncertainty in neural networks in the context of shortcut learning. It provides theoretical insights into representation learning, making it relevant to foundational research.

  229. Neural Force Field: Learning Generalized Physical Representation from a Few Examples - Score: 15 (R=8, N=7) - Date: 2025-02-14 - Comment: This paper introduces Neural Force Field (NFF), a physics-inspired representation learning framework using Neural ODEs. It aligns with the representation learning criterion by focusing on interpretable and generalizable representations of physical dynamics, which is foundational.

  230. Self-Consistency of the Internal Reward Models Improves Self-Rewarding Language Models - Score: 15 (R=8, N=7) - Date: 2025-02-14 - Comment: The paper introduces a framework to improve self-rewarding LLMs by enhancing consistency among internal reward models. It aligns with foundational research on LLM behavior and interpretability.

  231. Automated Consistency Analysis of LLMs - Score: 15 (R=8, N=7) - Date: 2025-02-13 - Comment: The paper introduces a framework for consistency analysis of LLMs, which aligns with theoretical insights into LLM behavior and interpretability.

  232. Dataset Ownership Verification in Contrastive Pre-trained Models - Score: 15 (R=8, N=7) - Date: 2025-02-12 - Comment: The paper proposes a dataset ownership verification method for contrastive pre-trained models, which aligns with representation learning and provides novel insights into embedding space relationships.

  233. Variational Learning Induces Adaptive Label Smoothing - Score: 15 (R=8, N=7) - Date: 2025-02-12 - Comment: The paper connects variational learning to adaptive label smoothing, providing insights into handling overconfident predictions. This aligns with representation learning and training dynamics in neural networks.

  234. Does Training on Synthetic Data Make Models Less Robust? - Score: 15 (R=8, N=7) - Date: 2025-02-12 - Comment: The paper investigates the robustness of LLMs trained on synthetic data, providing insights into LLM behavior and interpretability. This aligns with the interest in foundational research on LLMs.

  235. Related Knowledge Perturbation Matters: Rethinking Multiple Pieces of Knowledge Editing in Same-Subject - Score: 15 (R=8, N=7) - Date: 2025-02-12 - Comment: The paper addresses knowledge editing in LLMs and introduces a benchmark for Same-Subject Related Knowledge Editing, which aligns with foundational research in LLM behavior and interpretability.

  236. iLOCO: Distribution-Free Inference for Feature Interactions - Score: 15 (R=8, N=7) - Date: 2025-02-11 - Comment: The paper introduces iLOCO, a model-agnostic metric for feature interactions, which is relevant to representation learning and interpretability.

  237. Propagation of Chaos for Mean-Field Langevin Dynamics and its Application to Model Ensemble - Score: 15 (R=8, N=7) - Date: 2025-02-11 - Comment: The paper provides theoretical insights into the optimization complexity of mean-field Langevin dynamics and introduces a model ensemble strategy with guarantees, which aligns with foundational research in representation learning and training dynamics.

  238. Diagonal Symmetrization of Neural Network Solvers for the Many-Electron Schr\"odinger Equation - Score: 15 (R=8, N=7) - Date: 2025-02-11 - Comment: This paper explores incorporating diagonal symmetries into neural networks for many-body quantum problems. It provides theoretical insights into symmetrization and its computational-statistical tradeoffs, which align with foundational research in representation learning and AI for science.

  239. Noise Sensitivity of Hierarchical Functions and Deep Learning Lower Bounds in General Product Measures - Score: 15 (R=8, N=7) - Date: 2025-02-10 - Comment: Exploring noise sensitivity and hierarchical structures has potential implications for deep learning theory, particularly of representation learning and gradient descent complexity, making it relevant.

  240. Preference Optimization via Contrastive Divergence: Your Reward Model is Secretly an NLL Estimator - Score: 15 (R=8, N=7) - Date: 2025-02-10 - Comment: The paper formulates preference optimization through contrastive divergence with theoretical and algorithmic contributions, overlapping with theoretical insights into training and representation learning.

  241. No Images, No Problem: Retaining Knowledge in Continual VQA with Questions-Only Memory - Score: 15 (R=8, N=7) - Date: 2025-02-10 - Comment: Novel approach to continual learning in VQA using attention distillation and question-only memory. Relevant to representation learning and efficient memory methods.

  242. PerPO: Perceptual Preference Optimization via Discriminative Rewarding - Score: 15 (R=8, N=7) - Date: 2025-02-10 - Comment: Presents a novel optimization method for aligning multimodal LLMs' perception, which could contribute to representation learning innovations.

  243. Multiple Invertible and Partial-Equivariant Function for Latent Vector Transformation to Enhance Disentanglement in VAEs - Score: 15 (R=8, N=7) - Date: 2025-02-07 - Comment: This paper introduces a novel method (Multiple Invertible and Partial-Equivariant Transformation) aimed at improving disentanglement in VAEs. Its focus aligns with foundational research in representation learning, specifically with insights into how deep networks encode information (criterion 1).

  244. Building Bridges between Regression, Clustering, and Classification - Score: 15 (R=8, N=7) - Date: 2025-02-06 - Comment: This paper introduces a new strategy for regression tasks by linking them with clustering and classification through a target encoder and prediction decoder, which could be relevant to representation learning.

  245. Avoiding spurious sharpness minimization broadens applicability of SAM - Score: 15 (R=8, N=7) - Date: 2025-02-05 - Comment: Proposes Functional-SAM, which refines sharpness minimization to improve applicability across NLP and LLM domains. This contributes to training dynamics and generalization in large models, aligning with insights into optimization techniques for foundational models.

  246. BARE: Combining Base and Instruction-Tuned Language Models for Better Synthetic Data Generation - Score: 15 (R=8, N=7) - Date: 2025-02-05 - Comment: Presents a method (BARE) to improve synthetic data generation by combining base and post-tuned models, potentially relevant to insights into foundation models and representation learning.

  247. Lifelong Sequential Knowledge Editing without Model Degradation - Score: 15 (R=8, N=7) - Date: 2025-02-04 - Comment: This paper introduces a method for long-term knowledge editing in large models, focusing on preventing model degradation and overfitting. It touches on an architecture-level improvement via norm-constrained methods, relevant to representation learning and model architecture.

  248. The dark deep side of DeepSeek: Fine-tuning attacks against the safety alignment of CoT-enabled models - Score: 15 (R=8, N=7) - Date: 2025-02-04 - Comment: Examines vulnerabilities in Chain of Thought reasoning for safety alignment of LLMs, theoretically relevant to interpretability and robustness.

  249. Can We Predict the Effect of Prompts? - Score: 15 (R=8, N=7) - Date: 2025-02-03 - Comment: The paper proposes predictive prompt analysis for LLMs via sparse autoencoders, which aligns with representation learning enhancements and interpretability-related advances.

  250. Student-t processes as infinite-width limits of posterior Bayesian neural networks - Score: 15 (R=7, N=8) - Date: 2025-02-07 - Comment: The paper introduces Student-t processes in Bayesian neural networks as a generalization of Gaussian processes, offering significant theoretical insights into uncertainty estimation. It aligns moderately with representation learning for theoretical analysis.

  251. Taking a Big Step: Large Learning Rates in Denoising Score Matching Prevent Memorization - Score: 15 (R=7, N=8) - Date: 2025-02-06 - Comment: Investigates implicit regularization in diffusion models via large learning rates, which is a relevant topic in representation learning as it addresses training dynamics, though slightly tangential due to its application to diffusion models.

  252. Poisson Hierarchical Indian Buffet Processes for Within and Across Group Sharing of Latent Features-With Indications for Microbiome Species Sampling Models - Score: 15 (R=7, N=8) - Date: 2025-02-05 - Comment: The paper discusses a model for latent feature sharing (relevant to representation learning) with an emphasis on sparse methods and training dynamics. However, its focus on applications like microbiome analysis slightly dilutes its relevance to foundational research.

  253. Enhance Learning Efficiency of Oblique Decision Tree via Feature Concatenation - Score: 15 (R=7, N=8) - Date: 2025-02-04 - Comment: Proposes advancements to Oblique Decision Trees (ODT), improving their efficiency and generalization. The focus on sparsity and representation alignment qualifies it for foundational relevance.

  254. What is causal about causal models and representations? - Score: 15 (R=7, N=8) - Date: 2025-02-04 - Comment: The paper rigorously investigates the conceptual foundations of causal models, connecting to causal representation learning and theory, but lacks direct application to neural architectures.

  255. Understanding Generalization in Physics Informed Models through Affine Variety Dimensions - Score: 15 (R=7, N=8) - Date: 2025-02-04 - Comment: By analyzing generalization in physics-informed machine learning models, this work advances theoretical understanding consistent with AI for foundational scientific modeling.

  256. Unraveling Zeroth-Order Optimization through the Lens of Low-Dimensional Structured Perturbations - Score: 15 (R=7, N=8) - Date: 2025-02-03 - Comment: This work provides a new theoretical framework for Zeroth-Order Optimization, including structured perturbations and connections to generalization, aligning with 'Representation Learning' and efficiency methods.

  257. Representation Engineering for Large-Language Models: Survey and Research Challenges - Score: 14 (R=8, N=6) - Date: 2025-02-26 - Comment: The paper surveys representation engineering for large language models, which aligns with foundational research in representation learning and interpretability.

  258. The impact of conformer quality on learned representations of molecular conformer ensembles - Score: 14 (R=8, N=6) - Date: 2025-02-20 - Comment: The paper investigates the impact of conformer quality on 3D representation learning models, which aligns with the 'Representation Learning' criterion. It provides insights into how input quality affects learned representations, making it relevant. However, the novelty is moderate as it primarily raises practical considerations rather than introducing groundbreaking methods.

  259. Hierarchical Sparse Bayesian Multitask Model with Scalable Inference for Microbiome Analysis - Score: 14 (R=8, N=6) - Date: 2025-02-05 - Comment: The paper focuses on a hierarchical Bayesian multitask learning model and sparsity, offering insights into representation learning, specifically shared sparsity structures across tasks.

  260. Walking the Web of Concept-Class Relationships in Incrementally Trained Interpretable Models - Score: 14 (R=7, N=7) - Date: 2025-02-28 - Comment: The paper proposes MuCIL for interpretable models in incremental learning, which aligns with representation learning and interpretability but is more niche.

  261. Generalized Exponentiated Gradient Algorithms Using the Euler Two-Parameter Logarithm - Score: 14 (R=7, N=7) - Date: 2025-02-26 - Comment: The paper introduces a new class of gradient algorithms using generalized entropies and deformed logarithms. It provides theoretical insights into optimization methods, which could have implications for representation learning.

  262. PLS-based approach for fair representation learning - Score: 14 (R=7, N=7) - Date: 2025-02-25 - Comment: The paper proposes a PLS-based approach for fair representation learning, introducing fairness constraints in dimensionality reduction. This aligns with the Representation Learning criterion, particularly in the context of fair feature learning.

  263. On Theoretical Limits of Learning with Label Differential Privacy - Score: 14 (R=7, N=7) - Date: 2025-02-21 - Comment: The paper explores theoretical limits of learning with label differential privacy, providing foundational insights into privacy-preserving learning.

  264. DivIL: Unveiling and Addressing Over-Invariance for Out-of- Distribution Generalization - Score: 14 (R=7, N=7) - Date: 2025-02-19 - Comment: The paper proposes a method to address over-invariance in invariant learning, which is relevant to representation learning and training dynamics.

  265. Revealing Bias Formation in Deep Neural Networks Through the Geometric Mechanisms of Human Visual Decoupling - Score: 14 (R=7, N=7) - Date: 2025-02-18 - Comment: Proposes geometric analysis for bias formation in DNNs, providing insights into representation learning influenced by visual decoupling mechanisms.

  266. Cognitive Neural Architecture Search Reveals Hierarchical Entailment - Score: 14 (R=7, N=7) - Date: 2025-02-18 - Comment: The paper explores a neural architecture search optimized for brain-alignment and analyses representational hierarchies, linking it to foundational research on model architecture and representation learning.

  267. Neuron Platonic Intrinsic Representation From Dynamics Using Contrastive Learning - Score: 14 (R=7, N=7) - Date: 2025-02-18 - Comment: This work applies contrastive learning to detect intrinsic neuron representations, aligning with 'Representation Learning' and exploring interpretability via neuronal dynamics. However, it is somewhat domain-specific.

  268. Enhancing Performance of Explainable AI Models with Constrained Concept Refinement - Score: 14 (R=7, N=7) - Date: 2025-02-11 - Comment: The paper introduces a framework for constrained concept refinement to improve explainable AI models, which is relevant to representation learning and interpretability.

  269. Right Time to Learn:Promoting Generalization via Bio-inspired Spacing Effect in Knowledge Distillation - Score: 14 (R=7, N=7) - Date: 2025-02-11 - Comment: The paper introduces a novel knowledge distillation strategy inspired by biological learning, which aligns with representation learning through its focus on training dynamics and generalization improvements.

  270. Learning low-dimensional representations of ensemble forecast fields using autoencoder-based methods - Score: 14 (R=7, N=7) - Date: 2025-02-10 - Comment: Proposes representation learning techniques for ensemble forecasts using autoencoders. Relevant to dimensionality reduction and representation learning.

  271. Finding Pegasus: Enhancing Unsupervised Anomaly Detection in High-Dimensional Data using a Manifold-Based Approach - Score: 14 (R=7, N=7) - Date: 2025-02-07 - Comment: Proposes combining anomaly detection algorithms to enhance recall by leveraging insights about high-dimensional manifolds, relevant to representation learning.

  272. T-SCEND: Test-time Scalable MCTS-enhanced Diffusion Model - Score: 14 (R=7, N=7) - Date: 2025-02-05 - Comment: The T-SCEND framework targets reasoning tasks using enhanced diffusion models with better energy-based training. It offers methodological innovations related to tuning and optimization processes for complex tasks.

  273. Learning Hyperparameters via a Data-Emphasized Variational Objective - Score: 14 (R=7, N=7) - Date: 2025-02-05 - Comment: The paper proposes learning hyperparameters via a variational objective, touching on theoretical insights in model training dynamics which aligns with representation learning.

  274. TReMu: Towards Neuro-Symbolic Temporal Reasoning for LLM-Agents with Memory in Multi-Session Dialogues - Score: 14 (R=7, N=7) - Date: 2025-02-04 - Comment: Proposes a neuro-symbolic approach to enhance temporal reasoning in LLM agents for multi-session dialogues, touching on foundational interpretability improvements.

  275. A Statistical Learning Perspective on Semi-dual Adversarial Neural Optimal Transport Solvers - Score: 14 (R=7, N=7) - Date: 2025-02-04 - Comment: The paper investigates neural OT solvers theoretically, potentially offering insights relevant to representation learning and optimization, though slightly peripheral to foundational model advances.

  276. Advanced Weakly-Supervised Formula Exploration for Neuro-Symbolic Mathematical Reasoning - Score: 14 (R=7, N=7) - Date: 2025-02-04 - Comment: Proposes a neuro-symbolic framework and addresses reasoning with weak supervision, moderately relevant to representation learning but more niche.

  277. Estimating LLM Uncertainty with Logits - Score: 14 (R=7, N=7) - Date: 2025-02-04 - Comment: Proposes a novel framework for estimating token-level uncertainty in LLMs using logits, addressing fundamental interpretability and reliability concerns, with some ties to representation learning.

  278. Fantastic Multi-Task Gradient Updates and How to Find Them In a Cone - Score: 14 (R=7, N=7) - Date: 2025-02-04 - Comment: Proposes a new method (ConicGrad) for resolving gradient conflicts in multi-task learning, touching on optimization dynamics but not directly on foundational advances in representation learning or architecture design.

  279. Learning Sheaf Laplacian Optimizing Restriction Maps - Score: 14 (R=7, N=7) - Date: 2025-02-04 - Comment: Presents a new method for inferring sheaf Laplacians with potential implications for representation learning but remains focused on mathematical frameworks.

  280. No Foundations without Foundations -- Why semi-mechanistic models are essential for regulatory biology - Score: 14 (R=7, N=7) - Date: 2025-02-04 - Comment: This paper provides a semi-mechanistic framework that involves variational autoencoders and structural causal models, which has ties to representation learning and causal abstraction, albeit primarily within the context of regulatory biology.

  281. Locality-aware Surrogates for Gradient-based Black-box Optimization - Score: 14 (R=7, N=7) - Date: 2025-02-04 - Comment: Proposes optimization of sheaf Laplacians within graph theory, interesting for representation learning but with a mathematical niche emphasis.

  282. Scalable Multi-phase Word Embedding Using Conjunctive Propositional Clauses - Score: 14 (R=7, N=7) - Date: 2025-02-04 - Comment: This paper focuses on a novel approach to constructing scalable and interpretable embeddings using Tsetlin Machines, which ties to representation learning. However, the application on sentiment analysis leans it partially towards applied NLP.

  283. Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming - Score: 14 (R=7, N=7) - Date: 2025-02-04 - Comment: This paper develops constitutional classifiers for defending against jailbreaks using synthetic rule-based data, contributing insights into LLM reliability and interpretability.

  284. Contrast-Aware Calibration for Fine-Tuned CLIP: Leveraging Image-Text Alignment - Score: 14 (R=7, N=7) - Date: 2025-02-03 - Comment: Proposes a contrast-aware calibration method for vision-language models like CLIP, focusing on fine-tuning dynamics and addressing misalignment issues, which partially matches the representation learning criterion.

  285. BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning - Score: 14 (R=7, N=7) - Date: 2025-02-03 - Comment: Introduces a probabilistic framework and reinforcement learning-inspired approach to enhance reasoning in LLMs but primarily focuses on process improvements and does not delve into foundational insights about architectures or representation learning.

  286. Do Sparse Autoencoders Generalize? A Case Study of Answerability - Score: 13 (R=7, N=6) - Date: 2025-02-28 - Comment: The paper focuses on sparse autoencoders (SAEs) and their generalization properties, which aligns with the representation learning criterion. However, the focus on 'answerability' datasets makes it slightly application-driven.

  287. Revisiting Self-Consistency from Dynamic Distributional Alignment Perspective on Answer Aggregation - Score: 13 (R=7, N=6) - Date: 2025-02-28 - Comment: The paper reframes self-consistency in reasoning as a dynamic distributional alignment problem, which provides insights into LLM behavior but does not introduce foundational changes to LLMs.

  288. A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models - Score: 13 (R=7, N=6) - Date: 2025-02-26 - Comment: The survey focuses on interpretability for multimodal foundation models, which aligns with foundational research in understanding model behavior but lacks direct methodological contributions.

  289. Large Language Models and Mathematical Reasoning Failures - Score: 13 (R=7, N=6) - Date: 2025-02-18 - Comment: The paper analyzes reasoning failures in LLMs, which aligns with the criterion of theoretical insights into LLM behavior. However, it focuses on empirical evaluation rather than introducing new methods or theories.

  290. MuSC: Improving Complex Instruction Following with Multi-granularity Self-Contrastive Training - Score: 13 (R=7, N=6) - Date: 2025-02-18 - Comment: MuSC proposes a novel multi-granularity self-contrastive training regime relevant to LLM instruction alignment, though it leans more on practical enhancements than theoretical innovation.

  291. ADO: Automatic Data Optimization for Inputs in LLM Prompts - Score: 13 (R=7, N=6) - Date: 2025-02-18 - Comment: The paper explores input data optimization for LLM prompts, which aligns with 'Representation Learning' through its focus on improving input representations.

  292. An Empirical Analysis of Uncertainty in Large Language Model Evaluations - Score: 13 (R=7, N=6) - Date: 2025-02-18 - Comment: The paper examines uncertainty in LLM evaluations, which touches on interpretability and reliability of foundational models. It remains empirical without groundbreaking theoretical insights.

  293. Fast Proxies for LLM Robustness Evaluation - Score: 13 (R=7, N=6) - Date: 2025-02-18 - Comment: The paper is about fast proxy metrics for evaluating LLM robustness against adversarial attacks. While the topic of robustness is related to foundational work, this specific contribution seems more empirical and evaluation-focused without deep theoretical insights.

  294. Sign-Symmetry Learning Rules are Robust Fine-Tuners - Score: 13 (R=7, N=6) - Date: 2025-02-11 - Comment: The paper explores biologically inspired learning rules for fine-tuning neural networks, which aligns with 'Representation Learning' as it revisits alternative training mechanisms.

  295. Self-Regulation and Requesting Interventions - Score: 13 (R=7, N=6) - Date: 2025-02-10 - Comment: This paper proposes a self-regulation mechanism for LLMs, which aligns with research into model behavior and interpretability. However, the approach relies heavily on reinforcement learning and task-specific interventions, limiting its relevance to foundational LLM research.

  296. Analyzing Similarity Metrics for Data Selection for Language Model Pretraining - Score: 13 (R=7, N=6) - Date: 2025-02-05 - Comment: The paper analyzes similarity metrics for data selection in LLM pretraining, examining embedding models and their impact. While not proposing architectural changes, it adds theoretical insights into data curation for large-scale models, subtly aligning with foundational LLM research.

  297. Principal Components for Neural Network Initialization - Score: 13 (R=7, N=6) - Date: 2025-02-04 - Comment: Introduces PCA-based strategies for neural network initialization, which aligns with representation-related insights but lacks a transformative innovation.

Other Foundational Research (31)

  1. Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions - Score: 20.0 (R=0, N=0) - Date: 2025-02-26 - Comment: Author match

  2. Algebraic Machine Learning: Learning as computing an algebraic decomposition of a task - Score: 18 (R=9, N=9) - Date: 2025-02-28 - Comment: The paper proposes a novel algebraic foundation for machine learning, which is a cutting-edge theoretical contribution and aligns with emerging trends in foundational research.

  3. Towards Physics-Guided Foundation Models - Score: 18 (R=9, N=9) - Date: 2025-02-24 - Comment: The paper introduces the concept of physics-guided foundation models, which aligns with the 'AI for Science' criterion by proposing a new paradigm integrating physical knowledge into foundation models.

  4. Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving - Score: 18 (R=9, N=9) - Date: 2025-02-12 - Comment: The paper introduces Goedel-Prover, a state-of-the-art LLM for automated theorem proving. It aligns with foundational research in LLMs, particularly in advancing their capabilities and training methodologies.

  5. ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model - Score: 18 (R=9, N=9) - Date: 2025-02-06 - Comment: Proposes a novel explanatory framework for LLM dynamics (ICL and CoT) and models them analogously to electronic circuits. This aligns closely with theoretical studies on LLMs and is quite innovative in its formulation.

  6. Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond - Score: 17 (R=9, N=8) - Date: 2025-02-27 - Comment: The paper introduces a gradient-based framework for LLM unlearning, which provides foundational insights into model behavior and optimization.

  7. Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs - Score: 17 (R=9, N=8) - Date: 2025-02-25 - Comment: The paper demonstrates that a linear decay-to-zero learning rate schedule outperforms other schedules for LLM training, aligning with 'Large Language Models' and 'Training Dynamics' criteria.

  8. Machine-generated text detection prevents language model collapse - Score: 17 (R=9, N=8) - Date: 2025-02-24 - Comment: The paper discusses the issue of model collapse in LLMs and proposes a novel methodology to prevent it using machine-generated text detection. This aligns with the 'Large Language Models' criterion, focusing on foundational insights into training dynamics and behavior.

  9. Atom of Thoughts for Markov LLM Test-Time Scaling - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper introduces Atom of Thoughts (AoT) for test-time scaling in LLMs, which aligns with theoretical insights into LLM behavior. The Markovian reasoning framework adds methodological novelty.

  10. Exact Upper and Lower Bounds for the Output Distribution of Neural Networks with Random Inputs - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper derives exact bounds for the output distribution of neural networks with stochastic inputs, which is foundational in terms of theoretical contributions to neural network behavior.

  11. Statistical Query Hardness of Multiclass Linear Classification with Random Classification Noise - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper provides theoretical insights into the complexity of multiclass linear classification with random noise, which aligns with 'Emerging Trends' through its foundational focus.

  12. One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs - Score: 17 (R=9, N=8) - Date: 2025-02-18 - Comment: The paper explores foundational aspects of LLMs by introducing a novel benchmark (CounterMATH) and focusing on counterexample-driven reasoning, which aligns with the 'Large Language Models' criterion for theoretical insights into LLM behavior.

  13. Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning - Score: 17 (R=9, N=8) - Date: 2025-02-13 - Comment: The paper addresses training misalignment in LLMs for mathematical reasoning, proposing a novel loss function to improve test-time performance. This aligns with the criterion of theoretical insights into LLM behavior.

  14. Revisiting Non-Acyclic GFlowNets in Discrete Environments - Score: 17 (R=9, N=8) - Date: 2025-02-12 - Comment: The paper revisits non-acyclic GFlowNets and provides theoretical insights, which align with emerging trends and foundational research in generative models.

  15. Scaling Laws for Differentially Private Language Models - Score: 17 (R=9, N=8) - Date: 2025-02-04 - Comment: Examines scaling laws under differential privacy for LLMs, providing foundational insights into compute-privacy-utility tradeoffs, aligning well with the criterion for LLM theoretical contributions.

  16. Are all models wrong? Fundamental limits in distribution-free empirical model falsification - Score: 17 (R=8, N=9) - Date: 2025-02-11 - Comment: The paper explores fundamental limits in model class risk and empirical model falsification, which aligns with emerging trends in theoretical machine learning and foundational research.

  17. A Theory for Conditional Generative Modeling on Multiple Data Sources - Score: 16 (R=8, N=8) - Date: 2025-02-21 - Comment: The theoretical analysis of multi-source training in conditional generative modeling provides foundational insights into generative model training dynamics.

  18. Continuous Diffusion Model for Language Modeling - Score: 16 (R=8, N=8) - Date: 2025-02-18 - Comment: This paper introduces a continuous diffusion model for language modeling with connections to statistical manifolds, providing theoretical innovations in generative modeling for discrete data.

  19. ContinuouSP: Generative Model for Crystal Structure Prediction with Invariance and Continuity - Score: 16 (R=8, N=8) - Date: 2025-02-05 - Comment: Proposes a novel generative model for crystal structure prediction with invariance and continuity, potentially relevant under AI for Science with foundational elements.

  20. Local minima of the empirical risk in high dimension: General theorems and convex examples - Score: 16 (R=8, N=8) - Date: 2025-02-05 - Comment: The paper provides insights into the geometry of empirical risk landscapes, particularly for two-layer neural networks. This aligns with foundational training dynamics and high-dimensional learning principles.

  21. Global Framework for Simultaneous Emulation Across the Nuclear Landscape - Score: 15 (R=8, N=7) - Date: 2025-02-28 - Comment: The paper introduces a hierarchical framework combining Bayesian neural networks with ab initio calculations for nuclear emulation, which aligns with 'AI for Science' for foundational research in generative paradigms.

  22. Capturing Nuanced Preferences: Preference-Aligned Distillation for Small Language Models - Score: 15 (R=8, N=7) - Date: 2025-02-21 - Comment: The paper introduces a nuanced distillation framework for aligning small language models with human preferences, which aligns with foundational improvements in LLM training.

  23. Multi-Faceted Studies on Data Poisoning can Advance LLM Development - Score: 15 (R=8, N=7) - Date: 2025-02-21 - Comment: The paper discusses data poisoning in the context of LLMs, offering insights into data-model interactions, which aligns with foundational research in LLM behavior.

  24. Why Safeguarded Ships Run Aground? Aligned Large Language Models' Safety Mechanisms Tend to Be Anchored in The Template Region - Score: 15 (R=8, N=7) - Date: 2025-02-20 - Comment: The paper investigates safety alignment vulnerabilities in LLMs, providing theoretical insights into their behavior under adversarial conditions.

  25. Navigating the Helpfulness-Truthfulness Trade-Off with Uncertainty-Aware Instruction Fine-Tuning - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: The paper examines a novel uncertainty-aware instruction fine-tuning paradigm for LLMs, focusing on balancing helpfulness and truthfulness, providing theoretical insights relevant to LLM behavior.

  26. K-Edit: Language Model Editing with Contextual Knowledge Awareness - Score: 15 (R=8, N=7) - Date: 2025-02-18 - Comment: The paper discusses knowledge-based model editing for LLMs, which aligns with the criterion of theoretical insights into LLM behavior. The use of knowledge graphs for contextual consistency adds methodological novelty.

  27. Iterative Deepening Sampling for Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-02-11 - Comment: The paper proposes an iterative deepening sampling algorithm to enhance self-correction in LLMs, which contributes to foundational insights into training dynamics and reasoning capabilities.

  28. First-ish Order Methods: Hessian-aware Scalings of Gradient Descent - Score: 15 (R=8, N=7) - Date: 2025-02-07 - Comment: Proposes Hessian-aware scaling to improve gradient descent. Focuses on optimization dynamics, a topic of interest in foundational research on training dynamics for neural networks.

  29. LIBRA: Measuring Bias of Large Language Model from a Local Context - Score: 15 (R=8, N=7) - Date: 2025-02-05 - Comment: The study introduces a framework for measuring biases in LLMs, focusing on local context, revealing new insights into LLM behavior beyond application.

  30. RIGNO: A Graph-based framework for robust and accurate operator learning for PDEs on arbitrary domains - Score: 15 (R=8, N=7) - Date: 2025-02-04 - Comment: RIGNO utilizes GNNs for operator learning in PDEs, a novel framework that aligns with emerging trends in neural operators for scientific modeling, suggesting relevance to AI for science.

  31. Nonasymptotic CLT and Error Bounds for Two-Time-Scale Stochastic Approximation - Score: 15 (R=7, N=8) - Date: 2025-02-17 - Comment: The paper provides theoretical insights into two-time-scale stochastic approximation with non-asymptotic CLT and error bounds, which could be relevant to foundational research in optimization and training dynamics.