← Previous Summary | Monthly Overview | Next Summary →
2025-06 | 2025-07 | 2025-08

Personalized Monthly Topic Summary 2025/07

Metric	Value
Total Papers	486
Model Architecture	120
Model Compression and Efficiency	119
High Performance Computing	32
Representation Learning	188
Other Foundational Research	27

Model Architecture (120)

Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization - Score: 20.0 (R=0, N=0) - Date: 2025-07-23 - Comment: Author match
AlphaGo Moment for Model Architecture Discovery - Score: 19 (R=10, N=9) - Date: 2025-07-25 - Comment: The paper introduces ASI-Arch, an autonomous system for neural architecture discovery, which is a significant innovation in model architecture.
Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning - Score: 19 (R=10, N=9) - Date: 2025-07-16 - Comment: The paper provides a structural diagnosis of LLMs' limitations in symbolic reasoning, offering theoretical insights into their architectural limits. This aligns with the criteria for foundational research in LLMs.
A unified framework on the universal approximation of transformer-type architectures - Score: 19 (R=10, N=9) - Date: 2025-07-01 - Comment: The paper provides a unified theoretical framework for the universal approximation property of transformer-type architectures, which is a significant contribution to model architecture analysis.
Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging - Score: 19 (R=10, N=9) - Date: 2025-07-01 - Comment: The paper presents a novel MoE compression framework, which is highly relevant to model compression and MoE.
Apple Intelligence Foundation Language Models: Tech Report 2025 - Score: 18 (R=10, N=8) - Date: 2025-07-21 - Comment: The paper discusses architectural innovations in foundation models, including a novel Parallel-Track Mixture-of-Experts transformer, which is relevant to model architecture and LLMs.
The Recursive Coherence Principle: A Formal Constraint on Scalable Intelligence, Alignment, and Reasoning Architecture - Score: 18 (R=9, N=9) - Date: 2025-07-23 - Comment: The Recursive Coherence Principle introduces a new theoretical framework for scalable intelligence and reasoning architecture, aligning with emerging trends in foundational AI research.
Estimating Interventional Distributions with Uncertain Causal Graphs through Meta-Learning - Score: 18 (R=9, N=9) - Date: 2025-07-09 - Comment: The paper introduces a meta-learning approach for Bayesian causal inference, which is a cutting-edge theoretical work in emerging trends.
Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence - Score: 18 (R=9, N=9) - Date: 2025-07-01 - Comment: The paper argues for a shift from statistical learning to exact learning for general intelligence, which is relevant to emerging trends.
AI's Euclid's Elements Moment: From Language Models to Computable Thought - Score: 18 (R=9, N=9) - Date: 2025-07-01 - Comment: The paper presents a theoretical framework for understanding AI development, which aligns with emerging trends and foundational research.
MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention - Score: 17 (R=9, N=8) - Date: 2025-07-31 - Comment: The paper presents MoCHA, which uses a sparse Mixture of Experts Connectors module, aligning with the model architecture criterion.
Transformers as Unrolled Inference in Probabilistic Laplacian Eigenmaps: An Interpretation and Potential Improvements - Score: 17 (R=9, N=8) - Date: 2025-07-29 - Comment: This paper provides a probabilistic interpretation of transformers, offering insights into their structure and potential improvements, which aligns with model architecture analysis.
MeLA: A Metacognitive LLM-Driven Architecture for Automatic Heuristic Design - Score: 17 (R=9, N=8) - Date: 2025-07-29 - Comment: The paper introduces a novel architecture, MeLA, which uses a metacognitive framework to evolve prompts for LLMs, aligning with the Large Language Models criterion.
EcoTransformer: Attention without Multiplication - Score: 17 (R=9, N=8) - Date: 2025-07-29 - Comment: EcoTransformer proposes a new Transformer architecture that reduces computational costs, aligning with model architecture innovations.
Enhancing Materials Discovery with Valence Constrained Design in Generative Modeling - Score: 17 (R=9, N=8) - Date: 2025-07-29 - Comment: CrysVCD integrates chemical rules into generative modeling for materials discovery, aligning with AI for Science through foundational research in molecular modeling.
Innovator: Scientific Continued Pretraining with Fine-grained MoE Upcycling - Score: 17 (R=9, N=8) - Date: 2025-07-28 - Comment: The paper introduces a novel approach to convert a dense LLM into a Mixture-of-Experts model, which is relevant to model architecture and large language models.
The New LLM Bottleneck: A Systems Perspective on Latent Attention and Mixture-of-Experts - Score: 17 (R=9, N=8) - Date: 2025-07-22 - Comment: The paper discusses the implications of Mixture-of-Experts and Latent Attention on system design for LLMs, aligning with the model architecture and LLM criteria.
Change of Thought: Adaptive Test-Time Computation - Score: 17 (R=9, N=8) - Date: 2025-07-21 - Comment: The paper presents a novel SELF-Transformer architecture that iteratively refines attention weights, contributing to model architecture innovation.
Training Transformers with Enforced Lipschitz Constants - Score: 17 (R=9, N=8) - Date: 2025-07-18 - Comment: The paper explores training Transformers with enforced Lipschitz constants, providing insights into model architecture and training dynamics.
DASViT: Differentiable Architecture Search for Vision Transformer - Score: 17 (R=9, N=8) - Date: 2025-07-18 - Comment: The paper introduces DASViT, a differentiable architecture search for Vision Transformers, which aligns with model architecture innovations by exploring novel designs for ViTs.
Compact Vision Transformer by Reduction of Kernel Complexity - Score: 17 (R=9, N=8) - Date: 2025-07-18 - Comment: The paper introduces a compact vision transformer with kernel complexity reduction, which is relevant to model architecture and compression.
Mixture of Raytraced Experts - Score: 17 (R=9, N=8) - Date: 2025-07-17 - Comment: The paper introduces a Mixture of Experts architecture with dynamic expert selection, directly relevant to model architecture innovations.
SToFM: a Multi-scale Foundation Model for Spatial Transcriptomics - Score: 17 (R=9, N=8) - Date: 2025-07-17 - Comment: The paper introduces a multi-scale foundation model for spatial transcriptomics, which is relevant to AI for science and foundational model research.
SystolicAttention: Fusing FlashAttention within a Single Systolic Array - Score: 17 (R=9, N=8) - Date: 2025-07-16 - Comment: The paper proposes a novel systolic array architecture for executing FlashAttention, focusing on architectural innovations for efficiency. This aligns with the model architecture and compression criteria.
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation - Score: 17 (R=9, N=8) - Date: 2025-07-15 - Comment: The paper introduces Mixture-of-Recursions (MoR), a novel framework combining parameter sharing and adaptive computation, which is highly relevant to model architecture innovations.
From Sequence to Structure: Uncovering Substructure Reasoning in Transformers - Score: 17 (R=9, N=8) - Date: 2025-07-15 - Comment: The paper explores how transformers can perform substructure extraction from graph data, which aligns with the model architecture criterion.
A Random Matrix Theory Perspective on the Learning Dynamics of Multi-head Latent Attention - Score: 17 (R=9, N=8) - Date: 2025-07-15 - Comment: The paper provides a theoretical analysis of multi-head latent attention in transformers using random matrix theory, which is relevant to model architecture and representation learning.
On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment - Score: 17 (R=9, N=8) - Date: 2025-07-11 - Comment: The paper discusses the computational challenges in filtering for AI alignment, providing theoretical insights into LLM behavior.
Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation - Score: 17 (R=9, N=8) - Date: 2025-07-10 - Comment: The paper introduces a decoder-hybrid-decoder architecture with efficient memory sharing, contributing to model architecture innovations.
Differential Mamba - Score: 17 (R=9, N=8) - Date: 2025-07-09 - Comment: The paper introduces a novel differential mechanism for the Mamba architecture, which aligns with foundational research in model architecture.
Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition - Score: 17 (R=9, N=8) - Date: 2025-07-09 - Comment: The paper discusses a novel MoE architecture for ASR, focusing on shared routing decisions, which provides insights into MoE architectures.
RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs - Score: 17 (R=9, N=8) - Date: 2025-07-08 - Comment: The paper introduces RefineX, a novel framework for refining pre-training data in LLMs, which aligns with foundational research in LLM pretraining and data quality improvement.
Knowledge Protocol Engineering: A New Paradigm for AI in Domain-Specific Knowledge Work - Score: 17 (R=9, N=8) - Date: 2025-07-04 - Comment: The paper introduces Knowledge Protocol Engineering, a new paradigm for AI in domain-specific knowledge work, which could be considered an emerging trend with potential foundational impact.
Fast and Simplex: 2-Simplicial Attention in Triton - Score: 17 (R=9, N=8) - Date: 2025-07-04 - Comment: The paper investigates the use of 2-simplicial Transformer, aligning with the model architecture criterion by exploring a novel attention mechanism.
Toward a Robust and Generalizable Metamaterial Foundation Model - Score: 17 (R=9, N=8) - Date: 2025-07-04 - Comment: The paper introduces a Bayesian transformer-based foundation model for metamaterials, aligning with the AI for Science criterion by offering a new generative paradigm for material design.
Energy-Based Transformers are Scalable Learners and Thinkers - Score: 17 (R=9, N=8) - Date: 2025-07-04 - Comment: The paper introduces Energy-Based Transformers, a new class of models that improve learning and inference, relevant to model architecture innovations.
A Scalable and Quantum-Accurate Foundation Model for Biomolecular Force Field via Linearly Tensorized Quadrangle Attention - Score: 17 (R=9, N=8) - Date: 2025-07-04 - Comment: The paper introduces a novel neural network for biomolecular simulations, aligning with the AI for Science criterion by providing a new architecture-level innovation for biomolecular modeling.
GradMetaNet: An Equivariant Architecture for Learning on Gradients - Score: 17 (R=9, N=8) - Date: 2025-07-03 - Comment: The paper introduces GradMetaNet, a novel architecture for learning on gradients, focusing on equivariant design and efficient gradient representation, which aligns with representation learning and model architecture criteria.
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact - Score: 17 (R=9, N=8) - Date: 2025-07-02 - Comment: The paper discusses foundational aspects of AGI, focusing on cognitive and architectural foundations, which aligns with emerging trends and theoretical insights into AI.
Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer - Score: 17 (R=9, N=8) - Date: 2025-07-02 - Comment: The paper provides a theoretical analysis of the attention mechanism in GPT-2 using a physics-based framework, offering insights into LLM behavior and interpretability.
Towards Building Private LLMs: Exploring Multi-Node Expert Parallelism on Apple Silicon for Mixture-of-Experts Large Language Model - Score: 17 (R=9, N=8) - Date: 2025-07-01 - Comment: The paper explores multi-node expert parallelism for Mixture-of-Experts LLMs, which is relevant to model architecture and efficiency.
Transition Matching: Scalable and Flexible Generative Modeling - Score: 17 (R=9, N=8) - Date: 2025-07-01 - Comment: The paper introduces Transition Matching, a novel generative paradigm that unifies diffusion/flow models and continuous AR generation, which aligns with emerging trends in foundational research.
Masked Gated Linear Unit - Score: 17 (R=9, N=8) - Date: 2025-07-01 - Comment: The paper introduces Masked Gated Linear Units, a novel architectural innovation, aligning with the model architecture criterion.
Generalized Linear Mode Connectivity for Transformers - Score: 17 (R=9, N=8) - Date: 2025-07-01 - Comment: The paper presents a generalized framework for understanding linear mode connectivity in Transformers, addressing symmetries in parameter space. This aligns with the core topic of model architecture, providing theoretical insights into the geometry of neural network loss landscapes.
Residual Matrix Transformers: Scaling the Size of the Residual Stream - Score: 17 (R=9, N=8) - Date: 2025-07-01 - Comment: The introduction of Residual Matrix Transformers (RMT) offers a novel architectural modification to transformers, focusing on efficiency and scaling, which is relevant to model architecture and compression.
Unified Multimodal Understanding via Byte-Pair Visual Encoding - Score: 17 (R=8, N=9) - Date: 2025-07-01 - Comment: The paper proposes a novel generative modeling paradigm, Transition Matching, which advances both diffusion/flow models and continuous AR generation, contributing to emerging trends in generative modeling.
Mixture of Length and Pruning Experts for Knowledge Graphs Reasoning - Score: 16 (R=9, N=7) - Date: 2025-07-29 - Comment: The paper introduces a mixture-of-experts framework for knowledge graph reasoning, relevant to model architecture with a focus on MoE.
DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models - Score: 16 (R=9, N=7) - Date: 2025-07-15 - Comment: The paper reviews the evolution of large AI models and introduces novel algorithms like Mixture-of-Experts, which aligns with the model architecture criterion.
Transformers Don't In-Context Learn Least Squares Regression - Score: 16 (R=9, N=7) - Date: 2025-07-15 - Comment: The paper provides theoretical insights into the behavior of transformers in in-context learning, which aligns with the interest in understanding LLM behavior and interpretability.
Leveraging Manifold Embeddings for Enhanced Graph Transformer Representations and Learning - Score: 16 (R=9, N=7) - Date: 2025-07-11 - Comment: The paper introduces a Riemannian mixture-of-experts layer in graph transformers, which is relevant to model architecture innovations.
FlexOlmo: Open Language Models for Flexible Data Use - Score: 16 (R=9, N=7) - Date: 2025-07-10 - Comment: FlexOlmo employs a mixture-of-experts (MoE) architecture, which is relevant to model architecture innovations.
MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models - Score: 16 (R=9, N=7) - Date: 2025-07-10 - Comment: The paper introduces a Mixture of Experts (MoE) model for time-series forecasting, focusing on integrating time and frequency domain features. This aligns with the core topic of model architecture, specifically MoE.
Efficient Training of Large-Scale AI Models Through Federated Mixture-of-Experts: A System-Level Approach - Score: 16 (R=9, N=7) - Date: 2025-07-09 - Comment: The paper discusses a system-level approach to federated training of Mixture-of-Experts (MoE) models, which aligns with the core topic of model architecture, specifically MoE.
Neural Inhibition Improves Dynamic Routing and Mixture of Experts - Score: 16 (R=9, N=7) - Date: 2025-07-08 - Comment: The paper explores neural inhibition in dynamic routing and Mixture of Experts, contributing to model architecture innovations.
LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing - Score: 16 (R=9, N=7) - Date: 2025-07-02 - Comment: The paper presents a modular MoE framework integrating LoRA experts, which is relevant to model architecture and compression.
GASPnet: Global Agreement to Synchronize Phases - Score: 16 (R=8, N=8) - Date: 2025-07-23 - Comment: The paper proposes a novel mechanism inspired by neuroscience for visual binding in neural networks, which is relevant to model architecture innovations.
Flow Equivariant Recurrent Neural Networks - Score: 16 (R=8, N=8) - Date: 2025-07-22 - Comment: The paper extends equivariant network theory to recurrent neural networks, which is a novel approach in model architecture.
Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding - Score: 16 (R=8, N=8) - Date: 2025-07-18 - Comment: The paper introduces a new architecture for code understanding and debugging, which is relevant to model architecture innovations.
Optimizers Qualitatively Alter Solutions And We Should Leverage This - Score: 16 (R=8, N=8) - Date: 2025-07-17 - Comment: The paper discusses the role of optimizers in influencing the qualitative properties of learned solutions, which is relevant to understanding training dynamics in neural networks.
(Almost) Free Modality Stitching of Foundation Models - Score: 16 (R=8, N=8) - Date: 2025-07-15 - Comment: The paper proposes a novel method for aligning multi-modal models using hypernetworks, which is relevant to model architecture innovations.
Meta-autoencoders: An approach to discovery and representation of relationships between dynamically evolving classes - Score: 16 (R=8, N=8) - Date: 2025-07-15 - Comment: The paper introduces the concept of meta-autoencoders, which is relevant to model architecture and representation learning.
A document is worth a structured record: Principled inductive bias design for document recognition - Score: 16 (R=8, N=8) - Date: 2025-07-14 - Comment: The paper proposes a novel perspective on document recognition with a base transformer architecture, aligning with model architecture innovations.
Solving the Hubbard model with Neural Quantum States - Score: 16 (R=8, N=8) - Date: 2025-07-04 - Comment: The paper uses transformer-based architectures for neural quantum states, providing insights into representation learning and model architecture in quantum systems.
Autoregressive Image Generation with Linear Complexity: A Spatial-Aware Decay Perspective - Score: 16 (R=8, N=8) - Date: 2025-07-03 - Comment: The paper introduces LASAD, a novel attention mechanism for autoregressive image generation, focusing on spatial-aware decay, which is relevant to model architecture and efficiency.
Synchronization of mean-field models on the circle - Score: 15 (R=8, N=7) - Date: 2025-07-31 - Comment: The paper provides theoretical insights into synchronization in mean-field models, with an application to self-attention dynamics in transformers, which aligns with the interest in model architecture analysis.
Detection Transformers Under the Knife: A Neuroscience-Inspired Approach to Ablations - Score: 15 (R=8, N=7) - Date: 2025-07-30 - Comment: The paper provides insights into the internal components of detection transformers, contributing to model architecture analysis.
Reservoir Computation with Networks of Differentiating Neuron Ring Oscillators - Score: 15 (R=8, N=7) - Date: 2025-07-30 - Comment: The paper introduces a novel approach to reservoir computing using differentiating neurons, which is relevant to emerging trends in machine learning architectures.
Kolmogorov Arnold Network Autoencoder in Medicine - Score: 15 (R=8, N=7) - Date: 2025-07-29 - Comment: The paper discusses Kolmogorov Arnold Network Autoencoder, which involves architectural innovations by introducing learnable activation functions on the edges of neural networks.
Smooth Reading: Bridging the Gap of Recurrent LLM to Self-Attention LLM on Long-Context Tasks - Score: 15 (R=8, N=7) - Date: 2025-07-28 - Comment: The paper discusses a novel method for improving the performance of Recurrent LLMs on long-context tasks, which aligns with the interest in foundational research on LLM architectures.
Mammo-Mamba: A Hybrid State-Space and Transformer Architecture with Sequential Mixture of Experts for Multi-View Mammography - Score: 15 (R=8, N=7) - Date: 2025-07-24 - Comment: The paper introduces a novel architecture combining state-space models, transformers, and a sequential mixture of experts, which aligns with model architecture innovations.
DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD - Score: 15 (R=8, N=7) - Date: 2025-07-24 - Comment: The paper introduces a new Transformer variant, DNT, which can be trained with momentum SGD, aligning with the model architecture criterion.
Pixel-Resolved Long-Context Learning for Turbulence at Exascale: Resolving Small-scale Eddies Toward the Viscous Limit - Score: 15 (R=8, N=7) - Date: 2025-07-23 - Comment: The paper presents a new architecture for turbulence modeling using transformers, which is relevant to model architecture innovations.
Scaling Linear Attention with Sparse State Expansion - Score: 15 (R=8, N=7) - Date: 2025-07-23 - Comment: The paper introduces Sparse State Expansion (SSE) for linear attention, which is relevant to model architecture innovations, particularly in improving efficiency and scalability of Transformers.
Differential Multimodal Transformers - Score: 15 (R=8, N=7) - Date: 2025-07-23 - Comment: The paper focuses on extending the Differential Attention mechanism to a multimodal transformer model, which aligns with the interest in model architecture innovations.
Probing Information Distribution in Transformer Architectures through Entropy Analysis - Score: 15 (R=8, N=7) - Date: 2025-07-22 - Comment: The paper uses entropy analysis to probe information distribution in Transformer architectures, aligning with the model architecture criterion.
Beyond Model Base Selection: Weaving Knowledge to Master Fine-grained Neural Network Design - Score: 15 (R=8, N=7) - Date: 2025-07-22 - Comment: The paper discusses a new approach to neural network design, focusing on model architecture and refinement, which is relevant to model architecture innovations.
Universal crystal material property prediction via multi-view geometric fusion in graph transformers - Score: 15 (R=8, N=7) - Date: 2025-07-22 - Comment: The paper introduces a novel multi-view graph transformer framework with a mixture of experts router for crystal material property prediction, aligning with the model architecture criterion.
DIVER-0 : A Fully Channel Equivariant EEG Foundation Model - Score: 15 (R=8, N=7) - Date: 2025-07-22 - Comment: The paper presents a novel EEG foundation model with architectural innovations like full spatio-temporal attention and new positional encoding methods, aligning with model architecture criteria.
Kolmogorov Arnold Networks (KANs) for Imbalanced Data -- An Empirical Perspective - Score: 15 (R=8, N=7) - Date: 2025-07-21 - Comment: The paper introduces Kolmogorov Arnold Networks (KANs) as a new architectural advancement, which is relevant to model architecture innovations.
OmniVec2 -- A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning - Score: 15 (R=8, N=7) - Date: 2025-07-21 - Comment: The paper presents a novel transformer-based architecture for multimodal and multitask learning, which is relevant to model architecture innovations.
FormulaOne: Measuring the Depth of Algorithmic Reasoning Beyond Competitive Programming - Score: 15 (R=8, N=7) - Date: 2025-07-18 - Comment: The paper introduces a benchmark, FormulaOne, to test the depth of algorithmic reasoning in AI models, which aligns with emerging trends in AI research.
Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy - Score: 15 (R=8, N=7) - Date: 2025-07-18 - Comment: The paper proposes an approximately orthogonal fine-tuning strategy for Vision Transformers, focusing on parameter-efficient fine-tuning and generalization, relevant to model architecture.
Making Language Model a Hierarchical Classifier and Generator - Score: 15 (R=8, N=7) - Date: 2025-07-18 - Comment: The paper proposes a hierarchical decoder architecture for language models, which aligns with the core topic of model architecture, specifically focusing on architectural innovations.
An Memory-Efficient Framework for Deformable Transformer with Neural Architecture Search - Score: 15 (R=8, N=7) - Date: 2025-07-17 - Comment: The paper focuses on optimizing Deformable Attention Transformers for hardware efficiency, which aligns with model compression and efficiency breakthroughs.
Langevin Flows for Modeling Neural Latent Dynamics - Score: 15 (R=8, N=7) - Date: 2025-07-16 - Comment: The paper introduces LangevinFlow, a sequential Variational Auto-Encoder with a novel approach to modeling neural latent dynamics, relevant to model architecture.
MMOne: Representing Multiple Modalities in One Scene - Score: 15 (R=8, N=7) - Date: 2025-07-16 - Comment: The paper proposes a framework for multimodal scene representation, focusing on architectural innovations to handle modality conflicts. This aligns with the model architecture criterion.
Graph World Model - Score: 15 (R=8, N=7) - Date: 2025-07-15 - Comment: The paper introduces a graph world model that supports multi-modal information, which aligns with model architecture innovations.
Cameras as Relative Positional Encoding - Score: 15 (R=8, N=7) - Date: 2025-07-15 - Comment: The paper discusses techniques for conditioning transformers on camera geometry, which is relevant to model architecture innovations.
Some Super-approximation Rates of ReLU Neural Networks for Korobov Functions - Score: 15 (R=8, N=7) - Date: 2025-07-15 - Comment: The paper examines approximation rates of ReLU neural networks, which is relevant to model architecture analysis through theoretical insights into neural network expressivity.
Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving - Score: 15 (R=8, N=7) - Date: 2025-07-15 - Comment: The paper discusses a novel architecture for LLM serving, focusing on post-transformer models and memory efficiency, which aligns with model architecture and compression topics.
MixLoRA-DSI: Dynamically Expandable Mixture-of-LoRA Experts for Rehearsal-Free Generative Retrieval over Dynamic Corpora - Score: 15 (R=8, N=7) - Date: 2025-07-15 - Comment: The paper presents MixLoRA-DSI, a framework for generative retrieval using a mixture of Low-Rank Adaptation experts, which is relevant to model architecture and efficiency.
Explainable AI in Genomics: Transcription Factor Binding Site Prediction with Mixture of Experts - Score: 15 (R=8, N=7) - Date: 2025-07-15 - Comment: The paper introduces a novel Mixture of Experts approach for TFBS prediction, which is relevant to model architecture innovations.
BioAnalyst: A Foundation Model for Biodiversity - Score: 15 (R=8, N=7) - Date: 2025-07-15 - Comment: The paper introduces BioAnalyst, a foundation model for biodiversity, which is relevant to AI for Science through foundational research in ecological modeling.
Zero-Shot Neural Architecture Search with Weighted Response Correlation - Score: 15 (R=8, N=7) - Date: 2025-07-15 - Comment: The paper introduces a novel zero-shot NAS method with a new proxy for architecture estimation, which is relevant to model architecture and efficiency.
AbbIE: Autoregressive Block-Based Iterative Encoder for Efficient Sequence Modeling - Score: 15 (R=8, N=7) - Date: 2025-07-14 - Comment: The paper introduces a novel recursive generalization of the Transformer architecture, relevant to model architecture innovations.
Space filling positionality and the Spiroformer - Score: 15 (R=8, N=7) - Date: 2025-07-14 - Comment: The paper proposes the Spiroformer, a transformer model for geometric domains, which is relevant to model architecture.
Admissibility of Stein Shrinkage for Batch Normalization in the Presence of Adversarial Attacks - Score: 15 (R=8, N=7) - Date: 2025-07-14 - Comment: The paper applies Stein shrinkage to batch normalization, which is a novel approach to improve training dynamics in neural networks.
Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders - Score: 15 (R=8, N=7) - Date: 2025-07-11 - Comment: The paper introduces a novel framework for modifying the bottleneck of a pre-trained autoencoder, which aligns with the topic of model architecture and representation learning.
Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning - Score: 15 (R=8, N=7) - Date: 2025-07-11 - Comment: The paper proposes a novel framework, DTME-MTL, for transformer-based multi-task learning, focusing on token space manipulation to resolve gradient conflicts. This relates to model architecture as it offers insights into transformer adaptability.
Modern Methods in Associative Memory - Score: 15 (R=8, N=7) - Date: 2025-07-09 - Comment: The paper discusses Associative Memories and their connection to modern AI architectures like Transformers, which aligns with foundational research in model architecture.
LoRA-Augmented Generation (LAG) for Knowledge-Intensive Language Tasks - Score: 15 (R=8, N=7) - Date: 2025-07-09 - Comment: The paper proposes LoRA-Augmented Generation for efficient selection and combination of task-specific adapters, which is relevant to model architecture innovations.
Critiques of World Models - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper critiques world models and proposes a new architecture for a general-purpose world model, which aligns with the interest in model architecture innovations.
Bridging KAN and MLP: MJKAN, a Hybrid Architecture with Both Efficiency and Expressiveness - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper proposes MJKAN, a hybrid architecture combining KANs and MLPs, which aligns with model architecture innovations.
Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper proposes a novel framework for stereo matching using Mixture-of-Experts, which is relevant to model architecture innovations.
Scaling Context Requires Rethinking Attention - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper introduces 'power attention', a novel architectural layer for sequence modeling, which aligns with the interest in architectural innovations.
Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper proposes a method for determining feature dimensions in linear attention, contributing to model architecture and efficiency.
Quantifying Cross-Attention Interaction in Transformers for Interpreting TCR-pMHC Binding - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper introduces a method for interpreting cross-attention mechanisms in transformers, which is relevant to model architecture and interpretability of transformers.
Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and Physics - Score: 15 (R=8, N=7) - Date: 2025-07-04 - Comment: The paper introduces a new attention mechanism, Multipole Attention Neural Operator (MANO), which is relevant to model architecture innovations.
MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent - Score: 15 (R=8, N=7) - Date: 2025-07-04 - Comment: The paper introduces a novel agent workflow, MemAgent, for handling long-text tasks with linear complexity, which aligns with the interest in foundational research on LLMs and architecture-level innovations.
Towards Foundation Auto-Encoders for Time-Series Anomaly Detection - Score: 15 (R=8, N=7) - Date: 2025-07-03 - Comment: The paper proposes Foundation Auto-Encoders for time-series anomaly detection, which involves foundational research in autoencoders and generative models.
Decomposing Prediction Mechanisms for In-Context Recall - Score: 15 (R=8, N=7) - Date: 2025-07-03 - Comment: The paper explores prediction mechanisms in transformers, which aligns with the interest in understanding how deep networks encode information.
Long-Sequence Memory with Temporal Kernels and Dense Hopfield Functionals - Score: 15 (R=8, N=7) - Date: 2025-07-03 - Comment: The paper introduces a novel energy functional for long-sequence memory, which could have implications for transformer architectures and long-sequence modeling.
NN-Former: Rethinking Graph Structure in Neural Architecture Representation - Score: 15 (R=8, N=7) - Date: 2025-07-02 - Comment: The paper proposes a novel predictor combining GNNs and transformers for neural architecture representation, which aligns with the Model Architecture criterion.
SAFER: Probing Safety in Reward Models with Sparse Autoencoder - Score: 15 (R=8, N=7) - Date: 2025-07-02 - Comment: The paper introduces a novel framework using sparse autoencoders to interpret and improve reward models in LLMs, which aligns with representation learning and model architecture insights.
Mixture of Reasonings: Teach Large Language Models to Reason with Adaptive Strategies - Score: 15 (R=8, N=7) - Date: 2025-07-02 - Comment: The paper introduces a framework for embedding diverse reasoning strategies into LLMs, which is relevant to model architecture innovations.
Flexible Language Modeling in Continuous Space with Transformer-based Autoregressive Flows - Score: 15 (R=8, N=7) - Date: 2025-07-02 - Comment: The paper explores a novel framework for language modeling in continuous space using transformer-based autoregressive flows, aligning with the Model Architecture criterion.
Failure by Interference: Language Models Make Balanced Parentheses Errors When Faulty Mechanisms Overshadow Sound Ones - Score: 15 (R=8, N=7) - Date: 2025-07-02 - Comment: The paper investigates the mechanisms behind errors in language models and introduces a method to improve model performance, which aligns with the large language models criterion.
Model Fusion via Neuron Interpolation - Score: 15 (R=8, N=7) - Date: 2025-07-02 - Comment: The paper introduces a novel neuron-centric model fusion algorithm, which aligns with foundational research in model architecture.
BWLer: Barycentric Weight Layer Elucidates a Precision-Conditioning Tradeoff for PINNs - Score: 15 (R=8, N=7) - Date: 2025-07-01 - Comment: The paper introduces a novel layer for improving precision in physics-informed neural networks, which aligns with model architecture innovations.
A Systematic Study of Compositional Syntactic Transformer Language Models - Score: 15 (R=8, N=7) - Date: 2025-07-01 - Comment: The paper focuses on compositional syntactic transformer language models, which aligns with the interest in model architecture, particularly transformers. It provides a unified framework and empirical evaluation, contributing to the understanding of transformer-based models.

Model Compression and Efficiency (119)

FGFP: A Fractional Gaussian Filter and Pruning for Deep Neural Networks Compression - Score: 17 (R=9, N=8) - Date: 2025-07-31 - Comment: The paper introduces a novel framework for model compression using fractional Gaussian filters and pruning, which aligns with interests in model compression and efficiency.
MSQ: Memory-Efficient Bit Sparsification Quantization - Score: 17 (R=9, N=8) - Date: 2025-07-31 - Comment: The paper proposes a novel quantization method, MSQ, which addresses memory efficiency and training complexity, aligning with the model compression criterion.
MemShare: Memory Efficient Inference for Large Reasoning Models through KV Cache Reuse - Score: 17 (R=9, N=8) - Date: 2025-07-30 - Comment: The paper introduces MemShare, a novel KV cache management approach for memory efficiency in large reasoning models, which aligns with model compression through KV cache reuse.
TransPrune: Token Transition Pruning for Efficient Large Vision-Language Model - Score: 17 (R=9, N=8) - Date: 2025-07-29 - Comment: The paper introduces a novel token pruning method for large vision-language models, focusing on efficiency improvements through token transition analysis, which aligns with model compression and efficiency breakthroughs.
Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression - Score: 17 (R=9, N=8) - Date: 2025-07-29 - Comment: The paper proposes an adaptive search algorithm for sparsity and KV cache compression in large multimodal models, relevant to model compression.
WEEP: A Differentiable Nonconvex Sparse Regularizer via Weakly-Convex Envelope - Score: 17 (R=9, N=8) - Date: 2025-07-29 - Comment: The paper introduces WEEP, a novel differentiable sparse regularizer, which aligns with the representation learning and model compression criteria by addressing sparsity and differentiability.
HCAttention: Extreme KV Cache Compression via Heterogeneous Attention Computing for LLMs - Score: 17 (R=9, N=8) - Date: 2025-07-29 - Comment: The paper presents a new method for KV cache compression in LLMs, which is relevant to model compression and efficiency.
Linearly Convergent Algorithms for Nonsmooth Problems with Unknown Smooth Pieces - Score: 17 (R=9, N=8) - Date: 2025-07-28 - Comment: The paper introduces a linearly convergent algorithm for optimizing piecewise smooth functions, which is a significant theoretical contribution in optimization.
Query Efficient Structured Matrix Learning - Score: 17 (R=9, N=8) - Date: 2025-07-28 - Comment: The paper discusses learning structured matrix approximations, which is relevant to model compression and efficiency.
Probably Approximately Correct Causal Discovery - Score: 17 (R=9, N=8) - Date: 2025-07-28 - Comment: The paper introduces the PACC Discovery framework, extending PAC learning principles to causal discovery, which is a novel theoretical contribution.
The Right to be Forgotten in Pruning: Unveil Machine Unlearning on Sparse Models - Score: 17 (R=9, N=8) - Date: 2025-07-28 - Comment: The paper introduces a novel concept of 'un-pruning' in sparse models, which is relevant to model compression and sparsity.
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm - Score: 17 (R=9, N=8) - Date: 2025-07-25 - Comment: The paper provides a theoretical foundation for GPTQ quantization by relating it to Babai's nearest plane algorithm, relevant to model compression.
Revisiting LLM Reasoning via Information Bottleneck - Score: 17 (R=9, N=8) - Date: 2025-07-25 - Comment: The paper presents a theoretical characterization of LLM reasoning using the information bottleneck principle, offering insights into LLM behavior.
Squeeze10-LLM: Squeezing LLMs' Weights by 10 Times via a Staged Mixed-Precision Quantization Method - Score: 17 (R=9, N=8) - Date: 2025-07-25 - Comment: The paper presents a novel quantization method for LLMs, focusing on model compression through a staged mixed-precision approach, which aligns with the model compression criteria.
Dataset Distillation as Data Compression: A Rate-Utility Perspective - Score: 17 (R=9, N=8) - Date: 2025-07-24 - Comment: The paper proposes a joint rate-utility optimization method for dataset distillation, relevant to model compression and efficiency.
SiLQ: Simple Large Language Model Quantization-Aware Training - Score: 17 (R=9, N=8) - Date: 2025-07-24 - Comment: The paper presents a quantization-aware training approach for large language models, which is relevant to model compression and efficiency.
LINR-PCGC: Lossless Implicit Neural Representations for Point Cloud Geometry Compression - Score: 17 (R=9, N=8) - Date: 2025-07-22 - Comment: The paper focuses on model compression through a novel lossless point cloud geometry compression method using implicit neural representations, which aligns with the model compression criterion.
LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-07-22 - Comment: The paper proposes a new KV cache optimization paradigm for LLMs, which is relevant to model compression and efficiency improvements.
IPPRO: Importance-based Pruning with PRojective Offset for Magnitude-indifferent Structural Pruning - Score: 17 (R=9, N=8) - Date: 2025-07-22 - Comment: The paper introduces a novel pruning strategy that challenges the traditional magnitude-based pruning, which aligns with the model compression criterion.
PoTPTQ: A Two-step Power-of-Two Post-training for LLMs - Score: 17 (R=9, N=8) - Date: 2025-07-17 - Comment: The paper proposes a novel quantization framework for LLMs, focusing on efficiency and compression, which aligns with model compression criteria.
Multi-Trigger Poisoning Amplifies Backdoor Vulnerabilities in LLMs - Score: 17 (R=9, N=8) - Date: 2025-07-16 - Comment: The paper explores vulnerabilities in LLMs through multi-trigger poisoning, which provides insights into LLM behavior and security, aligning with foundational research in LLMs.
First-Order Error Matters: Accurate Compensation for Quantized Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-07-16 - Comment: The paper introduces FOEM, a novel post-training quantization method that incorporates first-order gradient terms, which is relevant to model compression and efficiency breakthroughs.
Quantize-then-Rectify: Efficient VQ-VAE Training - Score: 17 (R=9, N=8) - Date: 2025-07-15 - Comment: The paper presents a novel framework for efficient VQ-VAE training, which is relevant to model compression through quantization and efficiency improvements.
Compression Method for Deep Diagonal State Space Model Based on $H^2$ Optimal Reduction - Score: 17 (R=9, N=8) - Date: 2025-07-15 - Comment: The paper proposes a novel compression method for deep diagonal state space models using $H^2$ optimal reduction, which aligns with the model compression criterion.
BitParticle: Partializing Sparse Dual-Factors to Build Quasi-Synchronizing MAC Arrays for Energy-efficient DNNs - Score: 17 (R=9, N=8) - Date: 2025-07-15 - Comment: The paper proposes a novel MAC unit design leveraging bit-level sparsity, which aligns with the model compression criterion.
On Information Geometry and Iterative Optimization in Model Compression: Operator Factorization - Score: 17 (R=9, N=8) - Date: 2025-07-15 - Comment: The paper explores information geometry for model compression, focusing on operator factorization and iterative optimization, which is relevant to model compression techniques.
Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-07-15 - Comment: The paper introduces a method for pruning neurons in LLMs to enhance generalization, which is relevant to model compression and LLM behavior.
Lizard: An Efficient Linearization Framework for Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-07-15 - Comment: The paper proposes a linearization framework for LLMs, addressing memory and computational bottlenecks, which is relevant to model compression and architecture.
Greedy Low-Rank Gradient Compression for Distributed Learning with Convergence Guarantees - Score: 17 (R=9, N=8) - Date: 2025-07-14 - Comment: The paper focuses on low-rank gradient compression, which is relevant to model compression and efficiency breakthroughs.
BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity - Score: 17 (R=9, N=8) - Date: 2025-07-14 - Comment: The paper presents a novel MoE architecture, BlockFFN, focusing on activation sparsity and efficient training, which is relevant to model architecture and compression.
Compress Any Segment Anything Model (SAM) - Score: 17 (R=9, N=8) - Date: 2025-07-14 - Comment: The paper proposes a novel compression algorithm for SAM models, which is relevant to model compression through a new hyper-compression method.
Low-rank Momentum Factorization for Memory Efficient Training - Score: 17 (R=9, N=8) - Date: 2025-07-14 - Comment: The paper proposes a low-rank momentum factorization method for memory-efficient training, relevant to model compression and efficiency.
UnIT: Scalable Unstructured Inference-Time Pruning for MAC-efficient Neural Inference on MCUs - Score: 17 (R=9, N=8) - Date: 2025-07-11 - Comment: The paper introduces UnIT, a method for unstructured inference-time pruning, which aligns with interests in model compression and efficiency.
CCQ: Convolutional Code for Extreme Low-bit Quantization in LLMs - Score: 17 (R=9, N=8) - Date: 2025-07-11 - Comment: The paper presents a novel quantization method for LLMs, focusing on extreme low-bit quantization, which aligns with the model compression criterion.
Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts - Score: 17 (R=9, N=8) - Date: 2025-07-11 - Comment: The paper explores sparse adapters for scalable merging of parameter-efficient experts, which is relevant to model compression and efficiency by studying sparse methods and modular architectures.
The Primacy of Magnitude in Low-Rank Adaptation - Score: 17 (R=9, N=8) - Date: 2025-07-10 - Comment: The paper discusses a magnitude-driven initialization scheme for low-rank adaptation, contributing to model compression and efficiency.
PERK: Long-Context Reasoning as Parameter-Efficient Test-Time Learning - Score: 17 (R=9, N=8) - Date: 2025-07-10 - Comment: The paper introduces PERK, a parameter-efficient method for long-context reasoning using low-rank adapters, which aligns with model compression and efficiency breakthroughs.
SingLoRA: Low Rank Adaptation Using a Single Matrix - Score: 17 (R=9, N=8) - Date: 2025-07-09 - Comment: The paper proposes SingLoRA, a novel approach to low-rank adaptation, relevant to model compression and efficiency.
Activation Steering for Chain-of-Thought Compression - Score: 17 (R=9, N=8) - Date: 2025-07-08 - Comment: The paper introduces Activation-Steered Compression (ASC), a novel inference-time technique for compressing chains of thought in LLMs by modifying hidden representations, aligning with model compression and efficiency breakthroughs.
SOSAE: Self-Organizing Sparse AutoEncoder - Score: 17 (R=9, N=8) - Date: 2025-07-08 - Comment: The paper introduces a Self-Organizing Sparse AutoEncoder, which is relevant to representation learning and model compression through sparsity.
any4: Learned 4-bit Numeric Representation for LLMs - Score: 17 (R=9, N=8) - Date: 2025-07-08 - Comment: The paper presents a learned 4-bit quantization method for LLMs, contributing to model compression and efficiency.
DOTResize: Reducing LLM Width via Discrete Optimal Transport-based Neuron Merging - Score: 17 (R=9, N=8) - Date: 2025-07-08 - Comment: DOTResize presents a novel approach to model compression using Discrete Optimal Transport, relevant to model compression and efficiency.
IMPACT: Importance-Aware Activation Space Reconstruction - Score: 17 (R=9, N=8) - Date: 2025-07-08 - Comment: The paper presents a novel approach to model compression by focusing on activation space reconstruction rather than weight reconstruction, which aligns with the model compression criterion. It introduces a new framework, IMPACT, that optimizes low-rank approximations to preserve accuracy, which is a significant theoretical contribution.
BLaST: High Performance Inference and Pretraining using BLock Sparse Transformers - Score: 17 (R=9, N=8) - Date: 2025-07-08 - Comment: The paper introduces BLaST, a method for sparsification in Transformers, which aligns with model compression and efficiency breakthroughs.
Position: A Theory of Deep Learning Must Include Compositional Sparsity - Score: 17 (R=9, N=8) - Date: 2025-07-04 - Comment: The paper discusses the role of compositional sparsity in deep learning, which aligns with the representation learning criterion, focusing on how deep networks encode information.
Sparse Gaussian Processes: Structured Approximations and Power-EP Revisited - Score: 17 (R=9, N=8) - Date: 2025-07-04 - Comment: The paper discusses structured approximations in sparse Gaussian processes, which is relevant to representation learning and model compression through sparsity and efficiency improvements.
Automatic Rank Determination for Low-Rank Adaptation via Submodular Function Maximization - Score: 17 (R=9, N=8) - Date: 2025-07-03 - Comment: The paper proposes a new method for rank determination in low-rank adaptation, which is relevant to model compression and efficiency.
Parsimonious Gaussian mixture models with piecewise-constant eigenvalue profiles - Score: 17 (R=9, N=8) - Date: 2025-07-03 - Comment: The paper introduces a new family of parsimonious Gaussian mixture models, which aligns with representation learning through its focus on model efficiency and low-rank approaches.
DBellQuant: Breaking the Bell with Double-Bell Transformation for LLMs Post Training Binarization - Score: 17 (R=9, N=8) - Date: 2025-07-03 - Comment: The paper presents a novel quantization framework for LLMs, which is relevant to model compression and efficiency.
A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search - Score: 17 (R=9, N=8) - Date: 2025-07-02 - Comment: The paper introduces a framework for inference compute scaling in LLMs, providing theoretical insights into LLM behavior, aligning with the Large Language Models criterion.
Towards Efficient and Accurate Spiking Neural Networks via Adaptive Bit Allocation - Score: 17 (R=9, N=8) - Date: 2025-07-01 - Comment: The paper presents an adaptive bit allocation strategy for spiking neural networks, which aligns with model compression and efficiency breakthroughs.
Spectra 1.1: Scaling Laws and Efficient Inference for Ternary Language Models - Score: 17 (R=9, N=8) - Date: 2025-07-01 - Comment: The paper introduces ternary language models and novel quantization methods, aligning with the model compression criterion.
Agent4S: The Transformation of Research Paradigms from the Perspective of Large Language Models - Score: 17 (R=8, N=9) - Date: 2025-07-01 - Comment: The paper proposes a new paradigm, Agent4S, for automating research workflows using LLMs, which is an emerging trend in AI for Science.
Reviving Your MNEME: Predicting The Side Effects of LLM Unlearning and Fine-Tuning via Sparse Model Diffing - Score: 16 (R=9, N=7) - Date: 2025-07-30 - Comment: The paper presents MNEME, a framework for detecting side effects in LLMs using sparse model diffing, contributing to theoretical insights into LLM behavior.
Are LLM Belief Updates Consistent with Bayes' Theorem? - Score: 16 (R=9, N=7) - Date: 2025-07-25 - Comment: The paper investigates LLM belief updates in relation to Bayes' theorem, providing theoretical insights into LLM behavior.
CompLeak: Deep Learning Model Compression Exacerbates Privacy Leakage - Score: 16 (R=9, N=7) - Date: 2025-07-24 - Comment: The paper discusses model compression techniques like pruning and quantization, which are relevant to model compression and efficiency breakthroughs.
DFQ-ViT: Data-Free Quantization for Vision Transformers without Fine-tuning - Score: 16 (R=9, N=7) - Date: 2025-07-22 - Comment: The paper focuses on data-free quantization for Vision Transformers, which aligns with model compression through quantization, a relevant topic.
A Sparsity Predicting Approach for Large Language Models via Activation Pattern Clustering - Score: 16 (R=9, N=7) - Date: 2025-07-22 - Comment: The paper presents a clustering-based approach to predict activation sparsity in LLMs, which aligns with model compression and efficiency improvements.
IAM: Efficient Inference through Attention Mapping between Different-scale LLMs - Score: 16 (R=9, N=7) - Date: 2025-07-17 - Comment: The paper introduces the IAM framework for efficient inference in LLMs, which aligns with model compression by proposing a method to reduce resource consumption and improve efficiency.
QuarterMap: Efficient Post-Training Token Pruning for Visual State Space Models - Score: 16 (R=9, N=7) - Date: 2025-07-15 - Comment: The paper focuses on model compression through post-training token pruning, which aligns with the model compression criteria.
Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores - Score: 16 (R=9, N=7) - Date: 2025-07-14 - Comment: The paper presents Compactor, a KV cache compression strategy, which is relevant to model compression, particularly in LLMs.
Krul: Efficient State Restoration for Multi-turn Conversations with Dynamic Cross-layer KV Sharing - Score: 16 (R=9, N=7) - Date: 2025-07-14 - Comment: The paper discusses efficient state restoration in LLMs using dynamic KV cache compression, which aligns with model compression and efficiency breakthroughs.
Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs - Score: 16 (R=9, N=7) - Date: 2025-07-11 - Comment: The paper proposes a training-free spatio-temporal token merging method for video LLMs, which is relevant to model compression and efficiency improvements.
Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator - Score: 16 (R=8, N=8) - Date: 2025-07-28 - Comment: The paper explores approximating the Fisher Information Matrix using existing gradient accumulators, which is relevant to model efficiency and compression.
Demonstration of Efficient Predictive Surrogates for Large-scale Quantum Processors - Score: 16 (R=8, N=8) - Date: 2025-07-24 - Comment: The paper introduces predictive surrogates for quantum processors, which is relevant to AI for Science with a focus on foundational research in quantum modeling.
ReDi: Rectified Discrete Flow - Score: 16 (R=8, N=8) - Date: 2025-07-23 - Comment: The paper introduces a novel method for discrete flow-based models, focusing on efficient data synthesis, which aligns with interests in model efficiency and generative paradigms.
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs - Score: 16 (R=8, N=8) - Date: 2025-07-11 - Comment: The paper explores dynamic architecture adaptation for LLMs, which is relevant to model architecture innovations and efficiency improvements.
Mathematical artificial data for operator learning - Score: 16 (R=8, N=8) - Date: 2025-07-10 - Comment: The paper introduces the Mathematical Artificial Data (MAD) framework, a new paradigm for operator learning, which could be considered a novel approach in AI for Science.
Cascade: Token-Sharded Private LLM Inference - Score: 16 (R=8, N=8) - Date: 2025-07-08 - Comment: The paper proposes a new multi-party inference protocol for LLMs, focusing on privacy and efficiency, which is relevant to large language models and model compression.
DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer - Score: 16 (R=8, N=8) - Date: 2025-07-08 - Comment: The paper presents a novel masked autoregressive image generation framework with a deep compression hybrid tokenizer, which is relevant to model compression and efficiency.
Test-Time Scaling with Reflective Generative Model - Score: 16 (R=8, N=8) - Date: 2025-07-03 - Comment: The paper introduces a reflective generative model with a self-supervised process reward model, which is relevant to model architecture and efficiency improvements.
Tensor Decomposition Networks for Fast Machine Learning Interatomic Potential Computations - Score: 16 (R=8, N=8) - Date: 2025-07-03 - Comment: The paper introduces tensor decomposition networks for efficient computation in machine learning interatomic potentials, aligning with model compression and efficiency.
Tensor Train Quantum State Tomography using Compressed Sensing - Score: 16 (R=8, N=8) - Date: 2025-07-01 - Comment: The paper introduces a low-rank approach to quantum state tomography, which is relevant to model compression and efficiency.
Regularizing Subspace Redundancy of Low-Rank Adaptation - Score: 15 (R=8, N=7) - Date: 2025-07-29 - Comment: The paper proposes a method to regularize subspace redundancy in low-rank adaptation, which is relevant to model compression and efficiency.
Diagonally-Weighted Generalized Method of Moments Estimation for Gaussian Mixture Modeling - Score: 15 (R=8, N=7) - Date: 2025-07-29 - Comment: The paper proposes a diagonally-weighted GMM for Gaussian mixture modeling, which involves low-rank approaches and efficiency improvements, aligning with model compression.
Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis - Score: 15 (R=8, N=7) - Date: 2025-07-29 - Comment: SparseVAR introduces a novel approach to reduce computational overhead in high-resolution image synthesis, aligning with model compression through sparsity.
Iterative Pretraining Framework for Interatomic Potentials - Score: 15 (R=8, N=7) - Date: 2025-07-29 - Comment: The paper proposes an iterative pretraining framework for interatomic potentials, relevant to AI for science with a focus on foundational research in molecular modeling.
Nonconvex Optimization Framework for Group-Sparse Feedback Linear-Quadratic Optimal Control. II: Non-Penalty Approach - Score: 15 (R=8, N=7) - Date: 2025-07-29 - Comment: The paper focuses on nonconvex optimization for group-sparse feedback, which is relevant to model compression through sparsity and optimization techniques.
CLoRA: Parameter-Efficient Continual Learning with Low-Rank Adaptation - Score: 15 (R=8, N=7) - Date: 2025-07-29 - Comment: The paper explores low-rank adaptation for continual learning, focusing on parameter efficiency, which aligns with model compression and efficiency breakthroughs.
Sparse-mode Dynamic Mode Decomposition for Disambiguating Local and Global Structures - Score: 15 (R=8, N=7) - Date: 2025-07-29 - Comment: The paper introduces sparse-mode dynamic mode decomposition, focusing on sparsity and feature extraction, which aligns with representation learning and model compression.
DeltaLLM: A Training-Free Framework Exploiting Temporal Sparsity for Efficient Edge LLM Inference - Score: 15 (R=8, N=7) - Date: 2025-07-29 - Comment: The paper presents a framework for efficient LLM inference on edge devices by exploiting temporal sparsity, relevant to model compression and efficiency.
Perfect Clustering in Very Sparse Diverse Multiplex Networks - Score: 15 (R=8, N=7) - Date: 2025-07-28 - Comment: The paper presents a tensor-based methodology for perfect clustering in sparse networks, contributing to foundational research in network models.
ProGMLP: A Progressive Framework for GNN-to-MLP Knowledge Distillation with Efficient Trade-offs - Score: 15 (R=8, N=7) - Date: 2025-07-28 - Comment: The paper presents a framework for knowledge distillation from GNNs to MLPs, which involves model compression and efficiency.
C2G-KD: PCA-Constrained Generator for Data-Free Knowledge Distillation - Score: 15 (R=8, N=7) - Date: 2025-07-25 - Comment: The paper introduces a data-free knowledge distillation framework using PCA-constrained generators, which aligns with model compression and efficiency.
Enhancing Quantization-Aware Training on Edge Devices via Relative Entropy Coreset Selection and Cascaded Layer Correction - Score: 15 (R=8, N=7) - Date: 2025-07-25 - Comment: The paper enhances quantization-aware training on edge devices, focusing on model compression through coreset selection and layer correction.
Incentivised Orchestrated Training Architecture (IOTA): A Technical Primer for Release - Score: 15 (R=8, N=7) - Date: 2025-07-25 - Comment: The paper introduces a new architecture for distributed training of LLMs, focusing on scalability and efficiency, which aligns with model architecture and compression topics.
HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter Merging - Score: 15 (R=8, N=7) - Date: 2025-07-24 - Comment: The paper introduces a new model merging technique, HydraOpt, which is relevant to model compression through low-rank approaches.
Reducing GPU Memory Fragmentation via Spatio-Temporal Planning for Efficient Large-Scale Model Training - Score: 15 (R=8, N=7) - Date: 2025-07-23 - Comment: The paper presents a GPU memory allocator to reduce fragmentation in large-scale model training, which is relevant to model efficiency and compression.
ToFe: Lagged Token Freezing and Reusing for Efficient Vision Transformer Inference - Score: 15 (R=8, N=7) - Date: 2025-07-23 - Comment: The paper proposes a novel token freezing and reusing framework for efficient vision transformer inference, which is relevant to model compression and efficiency.
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization - Score: 15 (R=8, N=7) - Date: 2025-07-22 - Comment: The paper presents a novel framework for reasoning efficiency in large models, which is relevant to large language models.
DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization - Score: 15 (R=8, N=7) - Date: 2025-07-18 - Comment: The paper addresses post-training quantization in diffusion models, focusing on computational efficiency and quantization error, relevant to model compression.
RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization - Score: 15 (R=8, N=7) - Date: 2025-07-17 - Comment: The paper introduces a Riemannian framework for LoRA optimization, which is relevant to model compression and efficiency.
Effective Fine-Tuning of Vision Transformers with Low-Rank Adaptation for Privacy-Preserving Image Classification - Score: 15 (R=8, N=7) - Date: 2025-07-17 - Comment: The paper discusses low-rank adaptation for vision transformers, which is relevant to model compression and efficiency.
Protenix-Mini: Efficient Structure Predictor via Compact Architecture, Few-Step Diffusion and Switchable pLM - Score: 15 (R=8, N=7) - Date: 2025-07-17 - Comment: The paper presents a compact architecture for protein structure prediction, focusing on model efficiency and architectural pruning, which aligns with model compression and architecture innovation.
Elk: Exploring the Efficiency of Inter-core Connected AI Chips with Deep Learning Compiler Techniques - Score: 15 (R=8, N=7) - Date: 2025-07-16 - Comment: The paper discusses a DL compiler framework for AI chips, focusing on efficiency and architecture-level innovations, which aligns with model compression and architecture.
Multiple Choice Learning of Low Rank Adapters for Language Modeling - Score: 15 (R=8, N=7) - Date: 2025-07-15 - Comment: The paper proposes a method for language modeling using low-rank adapters, which aligns with the model compression criterion.
Think Clearly: Improving Reasoning via Redundant Token Pruning - Score: 15 (R=8, N=7) - Date: 2025-07-15 - Comment: The paper introduces a method for improving reasoning in LLMs via token pruning, which is relevant to model compression and efficiency.
ConsNoTrainLoRA: Data-driven Weight Initialization of Low-rank Adapters using Constraints - Score: 15 (R=8, N=7) - Date: 2025-07-14 - Comment: The paper introduces a data-driven weight initialization method for low-rank adapters, which is relevant to model compression and efficiency improvements.
Efficient Parametric SVD of Koopman Operator for Stochastic Dynamical Systems - Score: 15 (R=8, N=7) - Date: 2025-07-11 - Comment: The paper proposes a scalable method for learning singular functions of the Koopman operator, which is relevant to model compression through low-rank approximation.
DAF: An Efficient End-to-End Dynamic Activation Framework for on-Device DNN Training - Score: 15 (R=8, N=7) - Date: 2025-07-11 - Comment: The paper introduces a dynamic activation framework for efficient on-device DNN training, which is relevant to model compression through activation quantization.
QS4D: Quantization-aware training for efficient hardware deployment of structured state-space sequential models - Score: 15 (R=8, N=7) - Date: 2025-07-09 - Comment: The paper discusses quantization-aware training for structured state-space models, relevant to model compression and efficiency.
Beyond Communication Overhead: A Multilevel Monte Carlo Approach for Mitigating Compression Bias in Distributed Learning - Score: 15 (R=8, N=7) - Date: 2025-07-09 - Comment: The paper introduces a novel Multilevel Monte Carlo compression scheme, which is relevant to model compression and efficiency.
DANCE: Resource-Efficient Neural Architecture Search with Data-Aware and Continuous Adaptation - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper introduces DANCE, a resource-efficient neural architecture search method, which is relevant to model architecture and efficiency.
LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper introduces a method for efficient fine-tuning via subnet localization, which relates to model compression and efficiency.
SmartThinker: Learning to Compress and Preserve Reasoning by Step-Level Length Control - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper proposes a framework for compressing reasoning models, which aligns with model compression and efficiency.
Efficient Perplexity Bound and Ratio Matching in Discrete Diffusion Language Models - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper introduces new theoretical results for discrete diffusion models, which is relevant to foundational research in model architecture and efficiency.
Normalized Iterative Hard Thresholding for Tensor Recovery - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper proposes a tensor extension of NIHT for low-rank tensor recovery, which is relevant to model compression and efficiency.
MPX: Mixed Precision Training for JAX - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper introduces a mixed-precision training toolbox for JAX, which is relevant to model compression and efficiency.
HGCA: Hybrid GPU-CPU Attention for Long Context LLM Inference - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper presents a hybrid CPU-GPU attention mechanism for LLM inference, focusing on model compression and efficiency.
SPEAR: Structured Pruning for Spiking Neural Networks via Synaptic Operation Estimation and Reinforcement Learning - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper introduces a novel SNN pruning framework, which is relevant to model compression through pruning.
Efficient Certified Reasoning for Binarized Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper presents a scalable approach for Binarized Neural Networks, which is relevant to model compression and efficiency.
DiceHuBERT: Distilling HuBERT with a Self-Supervised Learning Objective - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper presents a novel distillation framework for compressing HuBERT, which is relevant to model compression and efficiency.
Beyond Token Pruning: Operation Pruning in Vision-Language Models - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper proposes a novel operation pruning method for vision-language models, relevant to model compression and efficiency.
Learning few-step posterior samplers by unfolding and distillation of diffusion models - Score: 15 (R=8, N=7) - Date: 2025-07-04 - Comment: The paper proposes a novel framework integrating deep unfolding and model distillation for diffusion models, which aligns with foundational research in model architecture and efficiency.
Continual Gradient Low-Rank Projection Fine-Tuning for LLMs - Score: 15 (R=8, N=7) - Date: 2025-07-04 - Comment: The paper presents a novel training strategy for continual learning in LLMs using low-rank projection, aligning with the model compression criterion.
Adaptive Iterative Soft-Thresholding Algorithm with the Median Absolute Deviation - Score: 15 (R=8, N=7) - Date: 2025-07-04 - Comment: The paper provides a theoretical analysis of the adaptive Iterative Soft-Thresholding Algorithm, which is relevant to model compression techniques like sparsity and thresholding.
Gradient-based Fine-Tuning through Pre-trained Model Regularization - Score: 15 (R=8, N=7) - Date: 2025-07-02 - Comment: The paper proposes a gradient-based fine-tuning method with regularization, which is relevant to model compression and efficiency.
Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model - Score: 15 (R=8, N=7) - Date: 2025-07-01 - Comment: The paper proposes a novel algorithm to improve token efficiency in large reasoning models, which aligns with the large language models criterion.
Efficient Algorithms for Learning and Compressing Monophonic Halfspaces in Graphs - Score: 15 (R=8, N=7) - Date: 2025-07-01 - Comment: The paper presents efficient algorithms for learning and compressing monophonic halfspaces in graphs, contributing to model compression and efficiency breakthroughs.

High Performance Computing (32)

Torsional-GFN: a conditional conformation generator for small molecules - Score: 20.0 (R=0, N=0) - Date: 2025-07-17 - Comment: Author match
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety - Score: 20.0 (R=0, N=0) - Date: 2025-07-16 - Comment: Author match
AI Mother Tongue: Self-Emergent Communication in MARL via Endogenous Symbol Systems - Score: 18 (R=9, N=9) - Date: 2025-07-16 - Comment: The paper introduces a novel framework for emergent communication in multi-agent systems, which challenges existing assumptions and introduces new paradigms, aligning with emerging trends.
AI Should Sense Better, Not Just Scale Bigger: Adaptive Sensing as a Paradigm Shift - Score: 18 (R=9, N=9) - Date: 2025-07-11 - Comment: The paper advocates for adaptive sensing as a paradigm shift, which is an emerging trend challenging established assumptions.
Hebbian Physics Networks: A Self-Organizing Computational Architecture Based on Local Physical Laws - Score: 18 (R=9, N=9) - Date: 2025-07-02 - Comment: The paper introduces a novel computational architecture, Hebbian Physics Networks, which is grounded in non-equilibrium thermodynamics and offers a new perspective on modeling complex dynamical systems. This aligns with the emerging trends criterion.
Quantum-Informed Machine Learning for Chaotic Systems - Score: 17 (R=9, N=8) - Date: 2025-07-29 - Comment: The paper introduces a quantum-informed machine learning framework, which is an emerging trend in AI for science, offering a new paradigm for learning chaotic systems.
Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding - Score: 17 (R=9, N=8) - Date: 2025-07-28 - Comment: The paper introduces a novel attention mechanism and system co-design for LLMs, relevant to model architecture and efficiency improvements.
The Other Mind: How Language Models Exhibit Human Temporal Cognition - Score: 17 (R=9, N=8) - Date: 2025-07-22 - Comment: The paper explores temporal cognition in large language models, providing theoretical insights into LLM behavior, which aligns with the large language models criterion.
Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential - Score: 17 (R=9, N=8) - Date: 2025-07-17 - Comment: The paper introduces a novel framework for multi-token prediction in LLMs, which aligns with large language models by proposing a new method for improving inference speed.
Neurosymbolic Reasoning Shortcuts under the Independence Assumption - Score: 17 (R=9, N=8) - Date: 2025-07-16 - Comment: The paper discusses the limitations of the independence assumption in neurosymbolic predictors, providing theoretical insights into model behavior. This aligns with emerging trends in foundational research.
Defining neurosymbolic AI - Score: 17 (R=9, N=8) - Date: 2025-07-16 - Comment: The paper provides a formal definition of neurosymbolic AI, which is foundational research in AI, potentially introducing new paradigms.
KELPS: A Framework for Verified Multi-Language Autoformalization via Semantic-Syntactic Alignment - Score: 17 (R=9, N=8) - Date: 2025-07-14 - Comment: The paper introduces a framework for autoformalization in multiple languages, which involves foundational research in LLMs and theoretical insights.
Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding - Score: 17 (R=9, N=8) - Date: 2025-07-11 - Comment: The paper presents Helix Parallelism, a new execution strategy for LLMs, which is relevant to model architecture and efficiency improvements.
A Dynamical Systems Perspective on the Analysis of Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-07-08 - Comment: The paper uses dynamical systems to analyze neural networks, providing insights into training dynamics and architecture, aligning with representation learning and model architecture.
Exploring Core and Periphery Precepts in Biological and Artificial Intelligence: An Outcome-Based Perspective - Score: 17 (R=8, N=9) - Date: 2025-07-08 - Comment: The paper introduces 'core and periphery' principles for intelligent systems, which could be a novel theoretical framework in AI.
On-the-Fly Fine-Tuning of Foundational Neural Network Potentials: A Bayesian Neural Network Approach - Score: 16 (R=9, N=7) - Date: 2025-07-21 - Comment: The paper introduces a Bayesian neural network approach for fine-tuning foundational models in molecular modeling, which aligns with foundational research in AI for Science.
Higher-Order Kuramoto Oscillator Network for Dense Associative Memory - Score: 16 (R=8, N=8) - Date: 2025-07-30 - Comment: The paper introduces a higher-order Kuramoto oscillator network for dense associative memory, which is relevant to emerging trends in foundational research.
Merge Kernel for Bayesian Optimization on Permutation Space - Score: 16 (R=8, N=8) - Date: 2025-07-18 - Comment: The paper introduces a novel Merge Kernel for Bayesian Optimization, which is a foundational contribution to optimization methods.
Mpemba Effect in Large-Language Model Training Dynamics: A Minimal Analysis of the Valley-River model - Score: 16 (R=8, N=8) - Date: 2025-07-08 - Comment: The paper connects LLM training dynamics to the Mpemba effect, providing theoretical insights into learning rate schedules, which is relevant to training dynamics in neural networks.
Scientific Machine Learning of Chaotic Systems Discovers Governing Equations for Neural Populations - Score: 16 (R=8, N=8) - Date: 2025-07-08 - Comment: The paper presents a method for discovering governing equations in chaotic systems, relevant to foundational research in AI for Science.
A New Perspective On AI Safety Through Control Theory Methodologies - Score: 16 (R=8, N=8) - Date: 2025-07-01 - Comment: The paper discusses AI safety through control theory, which is an emerging trend challenging established assumptions in AI safety.
Learning Stochastic Multiscale Models - Score: 16 (R=8, N=8) - Date: 2025-07-01 - Comment: The paper proposes a method for learning stochastic multiscale models, which is relevant to AI for Science and emerging trends in modeling complex systems.
DEM-NeRF: A Neuro-Symbolic Method for Scientific Discovery through Physics-Informed Simulation - Score: 15 (R=8, N=7) - Date: 2025-07-30 - Comment: The paper presents a neuro-symbolic framework combining neural networks with symbolic equations for scientific discovery, which involves foundational research in AI for science.
Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning - Score: 15 (R=8, N=7) - Date: 2025-07-29 - Comment: The paper discusses a new algorithm for distributed training focusing on flat optima recovery, which is relevant to emerging trends in model training dynamics.
Learning Stochastic Hamiltonian Systems via Stochastic Generating Function Neural Network - Score: 15 (R=8, N=7) - Date: 2025-07-22 - Comment: The paper introduces a novel neural network model using an autoencoder framework for learning stochastic Hamiltonian systems, which aligns with representation learning and model architecture criteria.
A universal augmentation framework for long-range electrostatics in machine learning interatomic potentials - Score: 15 (R=8, N=7) - Date: 2025-07-22 - Comment: The paper introduces a new method for integrating long-range electrostatics into MLIPs, which is relevant to foundational research in AI for Science.
SPARQL Query Generation with LLMs: Measuring the Impact of Training Data Memorization and Knowledge Injection - Score: 15 (R=8, N=7) - Date: 2025-07-21 - Comment: The paper discusses the impact of training data memorization and knowledge injection in LLMs, which relates to theoretical insights into LLM behavior.
Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime - Score: 15 (R=8, N=7) - Date: 2025-07-16 - Comment: The paper provides insights into the convergence of SGD in the smooth interpolation regime, which is relevant to training dynamics in neural networks.
Scaling Laws for Optimal Data Mixtures - Score: 15 (R=8, N=7) - Date: 2025-07-15 - Comment: The paper proposes a systematic method for determining optimal data mixtures using scaling laws, which is relevant to foundational research in large language models.
Coding Triangle: How Does Large Language Model Understand Code? - Score: 15 (R=8, N=7) - Date: 2025-07-09 - Comment: The paper evaluates LLMs' understanding of code, which is relevant to large language models, focusing on theoretical insights into LLM behavior.
Spooky Action at a Distance: Normalization Layers Enable Side-Channel Spatial Communication - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper discusses the use of normalization layers in neural networks, providing insights into their role in spatial communication, which is relevant to model architecture analysis.
What Neuroscience Can Teach AI About Learning in Continuously Changing Environments - Score: 15 (R=8, N=7) - Date: 2025-07-04 - Comment: The paper explores insights from neuroscience for AI learning, which aligns with the Emerging Trends criterion by challenging established assumptions in AI learning.

Representation Learning (188)

FACT: the Features At Convergence Theorem for neural networks - Score: 19 (R=10, N=9) - Date: 2025-07-09 - Comment: The paper presents the Features At Convergence Theorem (FACT) for neural networks, providing insights into how neural networks learn and represent features, aligning with representation learning.
The Price equation reveals a universal force-metric-bias law of algorithmic learning and natural selection - Score: 18 (R=9, N=9) - Date: 2025-07-25 - Comment: The paper presents a universal force-metric-bias law that unifies various learning algorithms, offering a theoretical insight into learning dynamics, which is relevant to representation learning.
A statistical physics framework for optimal learning - Score: 18 (R=9, N=9) - Date: 2025-07-11 - Comment: The paper presents a statistical physics framework for optimal learning, which is relevant to representation learning and emerging trends in foundational research.
Proof of a perfect platonic representation hypothesis - Score: 18 (R=9, N=9) - Date: 2025-07-03 - Comment: The paper provides a theoretical insight into representation learning by proving the Platonic Representation Hypothesis for deep linear networks.
Amorphous Solid Model of Vectorial Hopfield Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-07-31 - Comment: The paper introduces a novel extension of the Hopfield model with connections to amorphous solid physics, which is relevant to representation learning and emerging trends.
DO-EM: Density Operator Expectation Maximization - Score: 17 (R=9, N=8) - Date: 2025-07-31 - Comment: The paper presents a novel Expectation-Maximization framework for density operator models, which is a foundational research in quantum generative modeling, aligning with emerging trends.
Teaching the Teacher: Improving Neural Network Distillability for Symbolic Regression via Jacobian Regularization - Score: 17 (R=9, N=8) - Date: 2025-07-31 - Comment: The paper introduces a novel training paradigm using Jacobian regularization to improve neural network distillability, which is relevant to representation learning and model compression.
Better Together: Cross and Joint Covariances Enhance Signal Detectability in Undersampled Data - Score: 17 (R=9, N=8) - Date: 2025-07-31 - Comment: The paper discusses the use of covariance matrices to detect shared signals in high-dimensional data, which aligns with representation learning and foundational research in understanding data encoding.
Hyperbolic Genome Embeddings - Score: 17 (R=9, N=8) - Date: 2025-07-30 - Comment: The paper introduces hyperbolic CNNs for genomic sequence modeling, contributing to representation learning with a novel approach.
What Does it Mean for a Neural Network to Learn a "World Model"? - Score: 17 (R=9, N=8) - Date: 2025-07-30 - Comment: The paper proposes criteria for neural networks to learn a 'world model', offering theoretical insights into representation learning.
Shapley Uncertainty in Natural Language Generation - Score: 17 (R=9, N=8) - Date: 2025-07-30 - Comment: The paper introduces a Shapley-based uncertainty metric for LLMs, which provides theoretical insights into LLM behavior and interpretability.
EvoSLD: Automated Neural Scaling Law Discovery With Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-07-30 - Comment: The paper introduces EvoSLD, an automated framework for discovering scaling laws using evolutionary algorithms and LLMs, which aligns with foundational research in LLMs and representation learning.
The Geometry of Harmfulness in LLMs through Subconcept Probing - Score: 17 (R=9, N=8) - Date: 2025-07-30 - Comment: The paper explores the geometry of harmfulness in LLMs through subconcept probing, providing theoretical insights into LLM behavior and interpretability.
Dimer-Enhanced Optimization: A First-Order Approach to Escaping Saddle Points in Neural Network Training - Score: 17 (R=9, N=8) - Date: 2025-07-29 - Comment: The paper introduces Dimer-Enhanced Optimization, a novel first-order method inspired by physics to escape saddle points in neural network training, aligning with representation learning and training dynamics.
The wall confronting large language models - Score: 17 (R=9, N=8) - Date: 2025-07-29 - Comment: The paper discusses theoretical limitations of LLMs, which aligns with the interest in theoretical insights into LLM behavior.
Feature learning is decoupled from generalization in high capacity neural networks - Score: 17 (R=9, N=8) - Date: 2025-07-29 - Comment: The paper examines feature learning in neural networks, providing insights into representation learning and generalization, which aligns with foundational research in representation learning.
State evolution beyond first-order methods I: Rigorous predictions and finite-sample guarantees - Score: 17 (R=9, N=8) - Date: 2025-07-29 - Comment: The paper provides a theoretical framework for state evolution in high-dimensional nonconvex optimization, which is relevant to representation learning and emerging trends.
SIDE: Sparse Information Disentanglement for Explainable Artificial Intelligence - Score: 17 (R=9, N=8) - Date: 2025-07-28 - Comment: The paper introduces a method for sparse information disentanglement in neural networks, contributing to explainable AI and representation learning.
CLEAR: Unlearning Spurious Style-Content Associations with Contrastive LEarning with Anti-contrastive Regularization - Score: 17 (R=9, N=8) - Date: 2025-07-28 - Comment: The paper proposes a contrastive learning framework to disentangle task-relevant and task-irrelevant features, which is relevant to representation learning.
Principled Multimodal Representation Learning - Score: 17 (R=9, N=8) - Date: 2025-07-24 - Comment: The paper proposes a novel framework for multimodal representation learning, addressing challenges in simultaneous alignment of multiple modalities, which aligns with representation learning.
Self-Contradiction as Self-Improvement: Mitigating the Generation-Understanding Gap in MLLMs - Score: 17 (R=9, N=8) - Date: 2025-07-23 - Comment: The paper discusses a novel approach to improving multimodal large language models by addressing self-contradiction, which aligns with foundational research in LLM behavior and interpretability.
Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework - Score: 17 (R=9, N=8) - Date: 2025-07-23 - Comment: The paper introduces a novel method for detecting pre-training data in LLMs using neuron activation patterns, which is relevant to understanding LLM behavior and interpretability.
Depth Gives a False Sense of Privacy: LLM Internal States Inversion - Score: 17 (R=9, N=8) - Date: 2025-07-23 - Comment: The paper challenges assumptions about LLM internal state privacy with inversion attacks, which is relevant to theoretical insights into LLM behavior.
Learning without training: The implicit dynamics of in-context learning - Score: 17 (R=9, N=8) - Date: 2025-07-23 - Comment: The paper provides theoretical insights into in-context learning dynamics in LLMs, which is highly relevant to understanding LLM behavior and representation learning.
Better Models and Algorithms for Learning Ising Models from Dynamics - Score: 17 (R=9, N=8) - Date: 2025-07-22 - Comment: The paper presents a new algorithm for learning Ising models from dynamics, which is a fundamental research area in representation learning.
Rethinking Memorization Measures and their Implications in Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-07-22 - Comment: The paper provides theoretical insights into memorization in LLMs, which aligns with the interest in understanding LLM behavior and interpretability.
Large Language Models as Innovators: A Framework to Leverage Latent Space Exploration for Novelty Discovery - Score: 17 (R=9, N=8) - Date: 2025-07-21 - Comment: The paper introduces a model-agnostic latent-space ideation framework for LLMs, which aligns with foundational research in LLM behavior and interpretability.
Provable Low-Frequency Bias of In-Context Learning of Representations - Score: 17 (R=9, N=8) - Date: 2025-07-21 - Comment: The paper provides a theoretical explanation of in-context learning in LLMs, focusing on representation learning and offering new insights into the mechanisms of ICL.
Causal Language Control in Multilingual Transformers via Sparse Feature Steering - Score: 17 (R=9, N=8) - Date: 2025-07-21 - Comment: The paper explores sparse feature steering in multilingual transformers, which aligns with representation learning and model architecture insights, particularly in sparse methods and transformer layers.
Probabilistic Soundness Guarantees in LLM Reasoning Chains - Score: 17 (R=9, N=8) - Date: 2025-07-18 - Comment: The paper introduces a novel probabilistic framework for error detection in LLM reasoning chains, which aligns with foundational research in LLM behavior and interpretability.
Understanding the Evolution of the Neural Tangent Kernel at the Edge of Stability - Score: 17 (R=9, N=8) - Date: 2025-07-18 - Comment: The paper provides insights into the training dynamics of neural networks by examining the behavior of Neural Tangent Kernels (NTKs) during the Edge of Stability (EoS). This aligns with the representation learning criterion as it enhances understanding of how deep networks encode information.
Cluster Contrast for Unsupervised Visual Representation Learning - Score: 17 (R=9, N=8) - Date: 2025-07-17 - Comment: The paper introduces a novel approach to unsupervised visual representation learning by combining contrastive learning and clustering, aligning with the representation learning criterion.
Composing Linear Layers from Irreducibles - Score: 17 (R=9, N=8) - Date: 2025-07-17 - Comment: The paper explores the compositional structure of linear layers using geometric primitives, which aligns with foundational research in model architecture and representation learning.
Tracing the Path to Grokking: Embeddings, Dropout, and Network Activation - Score: 17 (R=9, N=8) - Date: 2025-07-17 - Comment: The paper provides insights into neural network training dynamics and introduces metrics related to sparsity and embedding similarity, which are relevant to representation learning.
Memorization Sinks: Isolating Memorization during LLM Training - Score: 17 (R=9, N=8) - Date: 2025-07-15 - Comment: The paper introduces MemSinks, a new paradigm for isolating memorization in LLMs, which aligns with foundational research in LLM behavior and interpretability.
Algorithm Development in Neural Networks: Insights from the Streaming Parity Task - Score: 17 (R=9, N=8) - Date: 2025-07-15 - Comment: The paper provides insights into the learning dynamics of RNNs, which is relevant to representation learning and understanding neural network behavior.
A Generalization Theory for Zero-Shot Prediction - Score: 17 (R=9, N=8) - Date: 2025-07-15 - Comment: The paper presents a theoretical framework for zero-shot prediction, contributing to foundational research in representation learning.
Continuous-Time Signal Decomposition: An Implicit Neural Generalization of PCA and ICA - Score: 17 (R=9, N=8) - Date: 2025-07-15 - Comment: The paper generalizes PCA and ICA for continuous-time signals, which is relevant to representation learning and introduces a novel theoretical framework.
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability? - Score: 17 (R=9, N=8) - Date: 2025-07-14 - Comment: The paper discusses the limitations of causal abstraction in mechanistic interpretability, which is relevant to representation learning as it explores how information is encoded in neural networks.
From Language to Logic: A Bi-Level Framework for Structured Reasoning - Score: 17 (R=9, N=8) - Date: 2025-07-14 - Comment: The paper introduces a novel bi-level framework for structured reasoning, which aligns with foundational research in representation learning and LLM behavior.
Position: We Need An Algorithmic Understanding of Generative AI - Score: 17 (R=9, N=8) - Date: 2025-07-11 - Comment: The paper proposes AlgEval, a framework for understanding the algorithms that LLMs learn, which aligns with the interest in theoretical insights into LLM behavior.
Neural Concept Verifier: Scaling Prover-Verifier Games via Concept Encodings - Score: 17 (R=9, N=8) - Date: 2025-07-11 - Comment: The paper introduces a new framework, Neural Concept Verifier, which combines Prover-Verifier Games with concept encodings for interpretable, nonlinear classification. This aligns with representation learning as it provides insights into how deep networks encode information.
Neural networks leverage nominally quantum and post-quantum representations - Score: 17 (R=9, N=8) - Date: 2025-07-11 - Comment: The paper discusses how neural networks discover quantum and post-quantum representations, contributing to representation learning insights.
A Principled Framework for Multi-View Contrastive Learning - Score: 17 (R=9, N=8) - Date: 2025-07-10 - Comment: The paper proposes a principled framework for multi-view contrastive learning, addressing limitations in current methods and providing theoretical extensions, relevant to representation learning.
Denoising Multi-Beta VAE: Representation Learning for Disentanglement and Generation - Score: 17 (R=9, N=8) - Date: 2025-07-10 - Comment: The paper introduces a novel generative modeling framework for disentangled representation learning, which is a foundational contribution to representation learning.
Instance-Wise Monotonic Calibration by Constrained Transformation - Score: 17 (R=9, N=8) - Date: 2025-07-10 - Comment: The paper proposes a novel monotonic post-hoc calibration method, which is a foundational contribution to representation learning by ensuring expressiveness, robustness, and interpretability.
Exploring Task Performance with Interpretable Models via Sparse Auto-Encoders - Score: 17 (R=9, N=8) - Date: 2025-07-10 - Comment: The paper explores representation learning through sparse autoencoders, focusing on extracting monosemantic features from LLM neurons, which aligns with insights into how deep networks encode information.
KPFlow: An Operator Perspective on Dynamic Collapse Under Gradient Descent Training of Recurrent Networks - Score: 17 (R=9, N=8) - Date: 2025-07-10 - Comment: The paper provides a theoretical framework for understanding gradient descent in recurrent networks, focusing on representation learning and training dynamics.
SPARC: Concept-Aligned Sparse Autoencoders for Cross-Model and Cross-Modal Interpretability - Score: 17 (R=9, N=8) - Date: 2025-07-10 - Comment: The paper introduces SPARC, a framework for cross-model and cross-modal interpretability using sparse autoencoders, aligning with representation learning and interpretability.
Causal Foundation Models: Disentangling Physics from Instrument Properties - Score: 17 (R=9, N=8) - Date: 2025-07-09 - Comment: The paper presents a causally-motivated foundation model using a dual-encoder architecture and structured contrastive learning, which aligns with representation learning and model architecture criteria.
Beyond Scaling Curves: Internal Dynamics of Neural Networks Through the NTK Lens - Score: 17 (R=9, N=8) - Date: 2025-07-08 - Comment: The paper provides insights into neural network dynamics through the NTK lens, contributing to representation learning and understanding training dynamics.
Intervening to learn and compose disentangled representations - Score: 17 (R=9, N=8) - Date: 2025-07-08 - Comment: The paper proposes a novel approach to learning disentangled representations in generative models, relevant to representation learning.
Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis - Score: 17 (R=9, N=8) - Date: 2025-07-08 - Comment: The paper provides a general theory of algorithm-dependent generalization for diffusion models, focusing on implicit regularization, which is relevant to emerging trends in theoretical work.
Simplifying Graph Neural Kernels: from Stacking Layers to Collapsed Structure - Score: 17 (R=9, N=8) - Date: 2025-07-08 - Comment: The paper proposes a simplified graph neural tangent kernel, which is relevant to model architecture and representation learning.
Dyn-O: Building Structured World Models with Object-Centric Representations - Score: 17 (R=9, N=8) - Date: 2025-07-08 - Comment: The paper introduces Dyn-O, an object-centric world model, which is relevant to model architecture and representation learning.
Latent Thermodynamic Flows: Unified Representation Learning and Generative Modeling of Temperature-Dependent Behaviors from Limited Data - Score: 17 (R=9, N=8) - Date: 2025-07-08 - Comment: The paper introduces a framework integrating representation learning and generative modeling, which aligns with foundational research in representation learning.
Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-07-04 - Comment: The paper discusses universal dynamics in compute-optimally trained neural networks, providing insights into training dynamics, which is relevant to representation learning.
Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training - Score: 17 (R=9, N=8) - Date: 2025-07-03 - Comment: The paper presents BBoxER, a black-box optimization method for LLM post-training, focusing on privacy and generalization, which aligns with foundational research in LLMs.
GPT, But Backwards: Exactly Inverting Language Model Outputs - Score: 17 (R=9, N=8) - Date: 2025-07-03 - Comment: The paper introduces a novel algorithm for reconstructing inputs from LLM outputs, which could provide insights into LLM behavior and interpretability.
Description of the Training Process of Neural Networks via Ergodic Theorem : Ghost nodes - Score: 17 (R=9, N=8) - Date: 2025-07-02 - Comment: The paper introduces a novel framework for understanding and accelerating the training of neural networks via an ergodic perspective, which aligns with the representation learning and model architecture criteria.
The language of time: a language model perspective on time-series foundation models - Score: 17 (R=9, N=8) - Date: 2025-07-02 - Comment: The paper investigates the representation learning mechanisms of time-series foundation models, which aligns with the core topic of representation learning.
On the Predictive Power of Representation Dispersion in Language Models - Score: 17 (R=9, N=8) - Date: 2025-07-01 - Comment: The paper explores the link between representation dispersion and language model performance, which aligns with the representation learning criterion.
On Universality of Non-Separable Approximate Message Passing Algorithms - Score: 17 (R=9, N=8) - Date: 2025-07-01 - Comment: The paper provides theoretical insights into Approximate Message Passing algorithms, focusing on universality for non-separable algorithms, which aligns with representation learning and training dynamics.
The Hidden Link Between RLHF and Contrastive Learning - Score: 17 (R=9, N=8) - Date: 2025-07-01 - Comment: The paper explores the connection between RLHF and contrastive learning, providing insights into representation learning through mutual information maximization.
Representation biases: will we achieve complete understanding by analyzing representations? - Score: 16 (R=9, N=7) - Date: 2025-07-31 - Comment: The paper discusses representation biases in neural networks, which aligns with the topic of representation learning by providing insights into how deep networks encode information.
Memorization in Fine-Tuned Large Language Models - Score: 16 (R=9, N=7) - Date: 2025-07-29 - Comment: The paper investigates memorization in fine-tuned LLMs, providing insights into LLM behavior and interpretability.
Explainable Mapper: Charting LLM Embedding Spaces Using Perturbation-Based Explanation and Verification Agents - Score: 16 (R=9, N=7) - Date: 2025-07-25 - Comment: The paper explores the topological structures of LLM embedding spaces, which aligns with representation learning and provides insights into LLM behavior.
Neural Tangent Kernels and Fisher Information Matrices for Simple ReLU Networks with Random Hidden Weights - Score: 16 (R=9, N=7) - Date: 2025-07-25 - Comment: The paper discusses neural tangent kernels and Fisher information matrices for ReLU networks, which aligns with representation learning by providing insights into how networks encode information.
Self-similarity Analysis in Deep Neural Networks - Score: 16 (R=9, N=7) - Date: 2025-07-25 - Comment: The paper investigates self-similarity in deep neural networks, providing insights into feature representation and training dynamics.
Reactivation: Empirical NTK Dynamics Under Task Shifts - Score: 16 (R=9, N=7) - Date: 2025-07-23 - Comment: The paper provides an empirical analysis of Neural Tangent Kernel (NTK) dynamics in continual learning, which is relevant to understanding training dynamics in neural networks.
SDSC:A Structure-Aware Metric for Semantic Signal Representation Learning - Score: 16 (R=9, N=7) - Date: 2025-07-22 - Comment: The paper proposes a new structure-aware metric for self-supervised representation learning in time series, aligning with the representation learning criterion.
Exploiting Primacy Effect To Improve Large Language Models - Score: 16 (R=9, N=7) - Date: 2025-07-21 - Comment: The paper explores the primacy effect in LLMs, providing insights into LLM behavior and interpretability, which is relevant to foundational research in LLMs.
Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights - Score: 16 (R=9, N=7) - Date: 2025-07-18 - Comment: The paper studies Gaussian approximation for deep neural networks with random weights, contributing to the understanding of training dynamics and representation learning.
Reasoning-Finetuning Repurposes Latent Representations in Base Models - Score: 16 (R=9, N=7) - Date: 2025-07-18 - Comment: The paper explores how reasoning-finetuning repurposes latent representations in base models, which aligns with representation learning by providing insights into how deep networks encode information.
How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction - Score: 16 (R=9, N=7) - Date: 2025-07-16 - Comment: The paper investigates the impact of labeling error on contrastive learning, which is relevant to representation learning and provides theoretical insights.
Large Language Models Encode Semantics in Low-Dimensional Linear Subspaces - Score: 16 (R=9, N=7) - Date: 2025-07-15 - Comment: The paper investigates the latent space geometry of LLMs, which is relevant to understanding LLM behavior and representation learning.
Why is Your Language Model a Poor Implicit Reward Model? - Score: 16 (R=9, N=7) - Date: 2025-07-11 - Comment: The paper investigates the generalization gap between implicit and explicit reward models in LLMs, providing theoretical insights into LLM behavior.
Stable Preference Optimization for LLMs: A Bilevel Approach Beyond Direct Preference Optimization - Score: 16 (R=9, N=7) - Date: 2025-07-11 - Comment: The paper provides theoretical insights into Direct Preference Optimization for LLMs, which aligns with the interest in foundational research on LLM behavior and interpretability.
On the Effect of Uncertainty on Layer-wise Inference Dynamics - Score: 16 (R=9, N=7) - Date: 2025-07-10 - Comment: The paper investigates the effect of uncertainty on inference dynamics in LLMs, providing theoretical insights into LLM behavior and interpretability.
Entropy-Memorization Law: Evaluating Memorization Difficulty of Data in LLMs - Score: 16 (R=9, N=7) - Date: 2025-07-09 - Comment: The paper investigates memorization in LLMs, providing theoretical insights into LLM behavior and interpretability.
L-VAE: Variational Auto-Encoder with Learnable Beta for Disentangled Representation - Score: 16 (R=9, N=7) - Date: 2025-07-04 - Comment: The paper proposes L-VAE, a novel model for disentangled representation learning, which is relevant to representation learning and autoencoders.
Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer - Score: 16 (R=9, N=7) - Date: 2025-07-04 - Comment: The paper investigates latent chain-of-thought reasoning in transformers, aligning with the Representation Learning criterion by exploring how reasoning structures emerge in models.
Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts - Score: 16 (R=9, N=7) - Date: 2025-07-01 - Comment: The paper provides insights into the use of sparse autoencoders for discovering unknown concepts, which aligns with the representation learning criterion.
Semantic-guided Diverse Decoding for Large Language Model - Score: 16 (R=9, N=7) - Date: 2025-07-01 - Comment: The paper introduces a novel method for diverse decoding in LLMs, focusing on semantic diversity, which aligns with foundational research in LLM behavior.
Not All Explanations for Deep Learning Phenomena Are Equally Valuable - Score: 16 (R=9, N=7) - Date: 2025-07-01 - Comment: The paper discusses the value of understanding deep learning phenomena like double descent and the lottery ticket hypothesis, which aligns with representation learning insights.
Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training - Score: 16 (R=9, N=7) - Date: 2025-07-01 - Comment: The paper investigates layer importance in LLMs for mathematical reasoning, providing insights into LLM behavior and interpretability.
Quantum Geometry of Data - Score: 16 (R=8, N=8) - Date: 2025-07-30 - Comment: The paper explores quantum geometry in data representation, which is relevant to emerging trends in representation learning.
Dynamics-Informed Reservoir Computing with Visibility Graphs - Score: 16 (R=8, N=8) - Date: 2025-07-28 - Comment: The paper introduces a novel framework for reservoir computing, which is a foundational research area in representation learning.
Central limit theorems for the eigenvalues of graph Laplacians on data clouds - Score: 16 (R=8, N=8) - Date: 2025-07-28 - Comment: The paper provides theoretical insights into the eigenvalues of graph Laplacians, which is relevant to foundational research in representation learning.
Euclidean Distance Deflation Under High-Dimensional Heteroskedastic Noise - Score: 16 (R=8, N=8) - Date: 2025-07-25 - Comment: The paper addresses the correction of Euclidean distances under heteroskedastic noise, providing theoretical insights into noise estimation and correction.
ChronoSelect: Robust Learning with Noisy Labels via Dynamics Temporal Memory - Score: 16 (R=8, N=8) - Date: 2025-07-25 - Comment: The paper introduces a novel framework for learning with noisy labels, focusing on training dynamics and memory architecture, which is relevant to representation learning.
On the Interaction of Compressibility and Adversarial Robustness - Score: 16 (R=8, N=8) - Date: 2025-07-24 - Comment: The paper explores the interaction between compressibility and adversarial robustness, providing insights into representation learning and model compression.
Should Bias Always be Eliminated? A Principled Framework to Use Data Bias for OOD Generation - Score: 16 (R=8, N=8) - Date: 2025-07-24 - Comment: The paper presents a framework that leverages data bias for out-of-distribution generation, providing theoretical insights, which aligns with emerging trends in challenging established assumptions.
Analogy making as amortised model construction - Score: 16 (R=8, N=8) - Date: 2025-07-23 - Comment: The paper discusses analogy making as a method for constructing internal models, which is relevant to representation learning and introduces a novel framework for model construction using analogies.
MAP Estimation with Denoisers: Convergence Rates and Guarantees - Score: 16 (R=8, N=8) - Date: 2025-07-22 - Comment: The paper provides theoretical justification for using denoisers in MAP estimation, which is relevant to foundational research in representation learning.
Generative Distribution Distillation - Score: 16 (R=8, N=8) - Date: 2025-07-22 - Comment: The paper proposes a generative approach to knowledge distillation, which is a novel method in representation learning.
FactorHD: A Hyperdimensional Computing Model for Multi-Object Multi-Class Representation and Factorization - Score: 16 (R=8, N=8) - Date: 2025-07-17 - Comment: The paper introduces a novel HDC model for efficient representation and factorization, which is relevant to emerging trends in neuro-symbolic AI.
Newfluence: Boosting Model interpretability and Understanding in High Dimensions - Score: 16 (R=8, N=8) - Date: 2025-07-17 - Comment: The paper introduces Newfluence, an alternative to influence functions for model interpretability in high dimensions, which is relevant to representation learning.
Einstein Fields: A Neural Perspective To Computational General Relativity - Score: 16 (R=8, N=8) - Date: 2025-07-17 - Comment: The paper introduces a neural representation for computational general relativity, which is relevant to AI for Science with a focus on foundational research.
BioScore: A Foundational Scoring Function For Diverse Biomolecular Complexes - Score: 16 (R=8, N=8) - Date: 2025-07-16 - Comment: The paper introduces BioScore, a foundational scoring function for biomolecular complexes, which aligns with AI for Science and representation learning.
Single-pass Adaptive Image Tokenization for Minimum Program Search - Score: 16 (R=8, N=8) - Date: 2025-07-11 - Comment: The paper introduces a single-pass adaptive tokenizer inspired by Kolmogorov Complexity, which is relevant to representation learning.
Mutual Information Free Topological Generalization Bounds via Stability - Score: 16 (R=8, N=8) - Date: 2025-07-10 - Comment: The paper introduces topological generalization bounds free of mutual information terms, focusing on algorithmic stability, which is relevant to emerging trends in learning theory.
OrbitAll: A Unified Quantum Mechanical Representation Deep Learning Framework for All Molecular Systems - Score: 16 (R=8, N=8) - Date: 2025-07-08 - Comment: The paper introduces a deep learning framework for quantum mechanical representation, which is relevant to AI for Science and foundational research in molecular modeling.
BoltzNCE: Learning Likelihoods for Boltzmann Generation with Stochastic Interpolants and Noise Contrastive Estimation - Score: 16 (R=8, N=8) - Date: 2025-07-02 - Comment: The paper proposes a novel method for learning likelihoods in Boltzmann generation using stochastic interpolants and noise contrastive estimation, which aligns with the AI for Science criterion.
Disentangled Feature Importance - Score: 16 (R=8, N=8) - Date: 2025-07-02 - Comment: The paper introduces Disentangled Feature Importance, a novel method for feature importance quantification, which aligns with representation learning.
Neural Langevin Machine: a local asymmetric learning rule can be creative - Score: 16 (R=8, N=8) - Date: 2025-07-01 - Comment: The paper presents a new generative model, the neural Langevin machine, which is relevant to representation learning and emerging trends.
GViT: Representing Images as Gaussians for Visual Recognition - Score: 16 (R=8, N=8) - Date: 2025-07-01 - Comment: The paper introduces a novel image representation method using Gaussians with a ViT classifier, which relates to representation learning and model architecture.
Riemannian-Geometric Fingerprints of Generative Models - Score: 16 (R=8, N=8) - Date: 2025-07-01 - Comment: The paper proposes a new geometric approach to understanding generative models' fingerprints, which is relevant to representation learning and emerging trends.
Consistency of Feature Attribution in Deep Learning Architectures for Multi-Omics - Score: 15 (R=8, N=7) - Date: 2025-07-31 - Comment: The paper investigates feature attribution methods in deep learning architectures, which relates to representation learning and interpretability, a foundational aspect of understanding model behavior.
Subgrid BoostCNN: Efficient Boosting of Convolutional Networks via Gradient-Guided Feature Selection - Score: 15 (R=8, N=7) - Date: 2025-07-31 - Comment: The paper introduces a novel framework for boosting CNN performance by integrating dynamic feature selection, which aligns with representation learning and model architecture innovation.
RCR-AF: Enhancing Model Generalization via Rademacher Complexity Reduction Activation Function - Score: 15 (R=8, N=7) - Date: 2025-07-31 - Comment: The paper introduces a novel activation function, RCR-AF, which enhances model robustness and generalization by controlling model sparsity and capacity. This aligns with the representation learning criterion.
Meaning-infused grammar: Gradient Acceptability Shapes the Geometric Representations of Constructions in LLMs - Score: 15 (R=8, N=7) - Date: 2025-07-31 - Comment: The paper investigates the internal representations in LLMs, which aligns with the interest in representation learning and theoretical insights into LLM behavior.
Stacked SVD or SVD stacked? A Random Matrix Theory perspective on data integration - Score: 15 (R=8, N=7) - Date: 2025-07-31 - Comment: The paper provides a theoretical analysis of data integration methods using random matrix theory, which is relevant to representation learning and foundational research.
When Truthful Representations Flip Under Deceptive Instructions? - Score: 15 (R=8, N=7) - Date: 2025-07-31 - Comment: The paper investigates how deceptive instructions affect internal representations in LLMs, providing insights into LLM behavior and interpretability.
Weight-Parameterization in Continuous Time Deep Neural Networks for Surrogate Modeling - Score: 15 (R=8, N=7) - Date: 2025-07-30 - Comment: The paper investigates weight parameterization strategies in continuous-time deep learning models, which relates to representation learning and model architecture. It provides insights into training dynamics and efficiency improvements.
Unlocking Interpretability for RF Sensing: A Complex-Valued White-Box Transformer - Score: 15 (R=8, N=7) - Date: 2025-07-30 - Comment: The paper introduces a complex-valued white-box transformer for RF sensing, focusing on interpretability and feature extraction, which aligns with the interest in model architecture and representation learning.
Hierarchical Stochastic Differential Equation Models for Latent Manifold Learning in Neural Time Series - Score: 15 (R=8, N=7) - Date: 2025-07-30 - Comment: The paper proposes a novel hierarchical stochastic differential equation model for latent manifold learning, which aligns with representation learning and offers insights into neural time series.
Torque-based Graph Surgery:Enhancing Graph Neural Networks with Hierarchical Rewiring - Score: 15 (R=8, N=7) - Date: 2025-07-30 - Comment: The paper proposes a novel torque-driven hierarchical rewiring strategy for GNNs, which enhances representation learning by dynamically modulating message passing. This aligns with representation learning and model architecture innovation.
Deep Polynomial Chaos Expansion - Score: 15 (R=8, N=7) - Date: 2025-07-30 - Comment: The paper introduces Deep Polynomial Chaos Expansion, combining PCE with probabilistic circuits for high-dimensional input spaces, which relates to representation learning and efficiency improvements.
Contrast-CAT: Contrasting Activations for Enhanced Interpretability in Transformer-based Text Classifiers - Score: 15 (R=8, N=7) - Date: 2025-07-30 - Comment: The paper proposes Contrast-CAT, a novel method for enhancing interpretability in transformer-based text classifiers, which relates to representation learning and model architecture.
Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning - Score: 15 (R=8, N=7) - Date: 2025-07-29 - Comment: The paper focuses on representation-level task saliency in multi-task learning, which is relevant to representation learning by exploring task interactions and shared representation learning.
Bag of Coins: A Statistical Probe into Neural Confidence Structures - Score: 15 (R=8, N=7) - Date: 2025-07-29 - Comment: The paper introduces a novel statistical probe for understanding neural confidence structures, relevant to representation learning and model architecture analysis.
DeepJIVE: Learning Joint and Individual Variation Explained from Multimodal Data Using Deep Learning - Score: 15 (R=8, N=7) - Date: 2025-07-29 - Comment: The paper presents DeepJIVE, a deep-learning approach for multimodal data integration, focusing on representation learning by uncovering joint and individual variations.
Quantizing Text-attributed Graphs for Semantic-Structural Integration - Score: 15 (R=8, N=7) - Date: 2025-07-29 - Comment: STAG introduces a novel framework for quantizing graph structural information, aligning with representation learning and model architecture.
CircuitProbe: Dissecting Spatiotemporal Visual Semantics with Circuit Tracing - Score: 15 (R=8, N=7) - Date: 2025-07-28 - Comment: The paper provides insights into the internal reasoning mechanisms of large vision-language models, which is relevant to understanding model behavior and representation learning.
Sticking to the Mean: Detecting Sticky Tokens in Text Embedding Models - Score: 15 (R=8, N=7) - Date: 2025-07-25 - Comment: The paper investigates 'sticky tokens' in text embedding models, which relates to representation learning by analyzing how these models encode information.
Logical Characterizations of GNNs with Mean Aggregation - Score: 15 (R=8, N=7) - Date: 2025-07-25 - Comment: The paper provides insights into the expressive power of GNNs with mean aggregation, which aligns with representation learning by analyzing how these networks encode information.
GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs - Score: 15 (R=8, N=7) - Date: 2025-07-25 - Comment: The paper introduces a novel inference-time steering method for LLMs and VLMs, which involves modifying internal activations and aligns with representation learning and model architecture criteria.
Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility - Score: 15 (R=8, N=7) - Date: 2025-07-24 - Comment: The paper discusses how large learning rates can achieve robustness and compressibility, providing insights into representation learning and model efficiency.
C3RL: Rethinking the Combination of Channel-independence and Channel-mixing from Representation Learning - Score: 15 (R=8, N=7) - Date: 2025-07-24 - Comment: The paper introduces a novel representation learning framework combining channel-mixing and channel-independence strategies, relevant to representation learning.
From Black Box to Biomarker: Sparse Autoencoders for Interpreting Speech Models of Parkinson's Disease - Score: 15 (R=8, N=7) - Date: 2025-07-24 - Comment: The paper applies sparse autoencoders to interpret speech models for Parkinson's disease, aligning with representation learning and sparse methods.
Semantic-Aware Gaussian Process Calibration with Structured Layerwise Kernels for Deep Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-07-23 - Comment: The paper introduces a new framework for Gaussian Process calibration in neural networks, which is relevant to representation learning and model architecture analysis.
Improving the Generation of VAEs with High Dimensional Latent Spaces by the use of Hyperspherical Coordinates - Score: 15 (R=8, N=7) - Date: 2025-07-23 - Comment: The paper proposes a novel parameterization of VAE latent spaces using hyperspherical coordinates, relevant to representation learning and improving generative models.
The Tsetlin Machine Goes Deep: Logical Learning and Reasoning With Graphs - Score: 15 (R=8, N=7) - Date: 2025-07-22 - Comment: The paper introduces the Graph Tsetlin Machine, which is an architectural innovation in graph representation learning, aligning with the model architecture criterion.
Better Training Data Attribution via Better Inverse Hessian-Vector Products - Score: 15 (R=8, N=7) - Date: 2025-07-22 - Comment: The paper introduces a new algorithm for training data attribution, which involves representation learning insights through inverse Hessian-vector products.
Inverse Scaling in Test-Time Compute - Score: 15 (R=8, N=7) - Date: 2025-07-22 - Comment: The paper investigates inverse scaling in test-time compute for LRMs, providing insights into LLM behavior and interpretability.
Unsupervised Ground Metric Learning - Score: 15 (R=8, N=7) - Date: 2025-07-18 - Comment: The paper explores unsupervised metric learning, focusing on algorithmic and modeling aspects, which relates to representation learning. It introduces a novel approach to learning ground metrics using optimal transport and Mahalanobis-like distances, contributing to foundational research in metric learning.
Insights into a radiology-specialised multimodal large language model with sparse autoencoders - Score: 15 (R=8, N=7) - Date: 2025-07-18 - Comment: The paper discusses the use of sparse autoencoders for interpretability in a multimodal LLM, which aligns with representation learning and model architecture insights.
Logit Arithmetic Elicits Long Reasoning Capabilities Without Training - Score: 15 (R=8, N=7) - Date: 2025-07-18 - Comment: The paper proposes a method to elicit long reasoning capabilities in large models without training, which aligns with representation learning by exploring how models can inherently possess reasoning abilities.
CytoSAE: Interpretable Cell Embeddings for Hematology - Score: 15 (R=8, N=7) - Date: 2025-07-17 - Comment: The paper introduces a sparse autoencoder for interpretable cell embeddings, which is relevant to representation learning and model architecture.
Probing for Arithmetic Errors in Language Models - Score: 15 (R=8, N=7) - Date: 2025-07-17 - Comment: The paper investigates internal activations in language models to detect arithmetic errors, which aligns with representation learning by exploring how models encode information.
Sparse Autoencoders for Sequential Recommendation Models: Interpretation and Flexible Control - Score: 15 (R=8, N=7) - Date: 2025-07-17 - Comment: The paper applies sparse autoencoders to sequential recommendation models, focusing on interpretability and control, which is relevant to representation learning.
Incorporating Fairness Constraints into Archetypal Analysis - Score: 15 (R=8, N=7) - Date: 2025-07-17 - Comment: The paper focuses on representation learning by proposing Fair Archetypal Analysis, which modifies Archetypal Analysis to incorporate fairness constraints, aligning with the representation learning criterion.
SynCoGen: Synthesizable 3D Molecule Generation via Joint Reaction and Coordinate Modeling - Score: 15 (R=8, N=7) - Date: 2025-07-17 - Comment: The paper presents a framework for synthesizable 3D molecule generation, which is relevant to foundational research in AI for Science, particularly in molecular modeling.
CLID-MU: Cross-Layer Information Divergence Based Meta Update Strategy for Learning with Noisy Labels - Score: 15 (R=8, N=7) - Date: 2025-07-17 - Comment: The paper introduces a novel meta-learning strategy for learning with noisy labels, which is a foundational aspect of representation learning.
Enforcing Latent Euclidean Geometry in Single-Cell VAEs for Manifold Interpolation - Score: 15 (R=8, N=7) - Date: 2025-07-17 - Comment: The paper introduces a novel training framework for VAEs to enforce Euclidean geometry in latent spaces, which is relevant to representation learning and model architecture.
A Group Theoretic Analysis of the Symmetries Underlying Base Addition and Their Learnability by Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-07-16 - Comment: The paper provides a group theoretic analysis of symmetries in base addition and explores neural networks' ability to learn these symmetries, which is relevant to representation learning and theoretical insights into neural network behavior.
Emergence of Hierarchical Emotion Organization in Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-07-16 - Comment: The paper analyzes hierarchical emotion organization in LLMs, which provides insights into LLM behavior and interpretability, aligning with the core topic of large language models.
Disentangling Neural Disjunctive Normal Form Models - Score: 15 (R=8, N=7) - Date: 2025-07-15 - Comment: The paper focuses on a new disentanglement method for neural DNF models, which aligns with representation learning by addressing how networks encode information.
Enhancing Chain-of-Thought Reasoning with Critical Representation Fine-tuning - Score: 15 (R=8, N=7) - Date: 2025-07-15 - Comment: The paper introduces Critical Representation Fine-Tuning (CRFT) for enhancing reasoning tasks, which is relevant to representation learning through representation-level optimization.
Text-Driven Causal Representation Learning for Source-Free Domain Generalization - Score: 15 (R=8, N=7) - Date: 2025-07-15 - Comment: The paper introduces a novel method for causal representation learning, which is relevant to representation learning.
Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition - Score: 15 (R=8, N=7) - Date: 2025-07-15 - Comment: The paper studies task generalization in LLMs through interpretability techniques, which aligns with the large language models criterion.
A Pre-training Framework for Relational Data with Information-theoretic Principles - Score: 15 (R=8, N=7) - Date: 2025-07-15 - Comment: The paper introduces a pre-training framework for relational data using information-theoretic principles, which is relevant to representation learning.
A Mixture of Linear Corrections Generates Secure Code - Score: 15 (R=8, N=7) - Date: 2025-07-15 - Comment: The paper discusses a mixture of linear corrections for secure code generation, which involves representation learning and aligns with foundational research.
Filter Equivariant Functions: A symmetric account of length-general extrapolation on lists - Score: 15 (R=8, N=7) - Date: 2025-07-14 - Comment: The paper introduces filter equivariant functions, which is a novel concept in representation learning, focusing on extrapolation in list functions.
Ranked Set Sampling-Based Multilayer Perceptron: Improving Generalization via Variance-Based Bounds - Score: 15 (R=8, N=7) - Date: 2025-07-14 - Comment: The paper introduces a new method for improving generalization in MLPs through variance reduction, which relates to representation learning and training dynamics.
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation - Score: 15 (R=8, N=7) - Date: 2025-07-14 - Comment: The paper explores using vision foundation models as visual tokenizers for image generation, which involves representation learning and architectural innovation.
PanMatch: Unleashing the Potential of Large Vision Models for Unified Matching Models - Score: 15 (R=8, N=7) - Date: 2025-07-14 - Comment: The paper presents a foundation model for correspondence matching using a unified framework, which involves architectural innovation and representation learning.
scE$^2$TM: Toward Interpretable Single-Cell Embedding via Topic Modeling - Score: 15 (R=8, N=7) - Date: 2025-07-14 - Comment: The paper focuses on interpretable single-cell embedding via topic modeling, which is a form of representation learning. It introduces a new interpretability evaluation benchmark, contributing to foundational research in representation learning.
Str-GCL: Structural Commonsense Driven Graph Contrastive Learning - Score: 15 (R=8, N=7) - Date: 2025-07-11 - Comment: The paper proposes a novel framework for graph contrastive learning by integrating structural commonsense, contributing to representation learning.
Does Data Scaling Lead to Visual Compositional Generalization? - Score: 15 (R=8, N=7) - Date: 2025-07-10 - Comment: The paper discusses compositional generalization in vision models, focusing on representational structure, which is relevant to representation learning.
Can Interpretation Predict Behavior on Unseen Data? - Score: 15 (R=8, N=7) - Date: 2025-07-10 - Comment: The paper explores interpretability as a tool for predicting out-of-distribution model behavior, which aligns with theoretical insights into model behavior.
Mitigating Shortcut Learning with InterpoLated Learning - Score: 15 (R=8, N=7) - Date: 2025-07-09 - Comment: The paper introduces InterpoLated Learning to mitigate shortcut learning, which is relevant to representation learning and training dynamics in neural networks.
Explainable Hierarchical Deep Learning Neural Networks (Ex-HiDeNN) - Score: 15 (R=8, N=7) - Date: 2025-07-09 - Comment: The paper proposes a novel approach for discovering interpretable expressions using a hierarchical deep learning architecture, relevant to representation learning and model architecture.
Pseudo-likelihood produces associative memories able to generalize, even for asymmetric couplings - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper discusses pseudo-likelihood in energy-based models, contributing to representation learning and generalization.
Meta-Learning Transformers to Improve In-Context Generalization - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper explores meta-learning transformers for improved in-context generalization, which is relevant to representation learning and model architecture.
Reason to Rote: Rethinking Memorization in Reasoning - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper investigates memorization in LLMs, providing insights into LLM behavior and interpretability, relevant to foundational research in LLMs.
Recovering Plasticity of Neural Networks via Soft Weight Rescaling - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper introduces a method to recover plasticity in neural networks, which is relevant to representation learning and training dynamics.
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper introduces a contrastive decoding method for LLMs, which is relevant to foundational research in LLM behavior and interpretability.
Tractable Representation Learning with Probabilistic Circuits - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper presents a novel framework for representation learning with probabilistic circuits, which aligns with the core topic of representation learning.
Return of the Latent Space COWBOYS: Re-thinking the use of VAEs for Bayesian Optimisation of Structured Spaces - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper explores a decoupled approach for Bayesian optimization using VAEs, which is relevant to model architecture and representation learning.
LLMs model how humans induce logically structured rules - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper explores how LLMs model human logical rule induction, which is relevant to foundational research in large language models and representation learning.
How Overconfidence in Initial Choices and Underconfidence Under Criticism Modulate Change of Mind in Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-07-08 - Comment: The paper explores LLMs' confidence mechanisms, providing insights into LLM behavior and interpretability, relevant to foundational research in LLMs.
Clarifying Before Reasoning: A Coq Prover with Structural Context - Score: 15 (R=8, N=7) - Date: 2025-07-04 - Comment: The paper explores improving task clarity to enhance reasoning in LLMs, which aligns with foundational research in LLM behavior and interpretability.
How Do Vision-Language Models Process Conflicting Information Across Modalities? - Score: 15 (R=8, N=7) - Date: 2025-07-03 - Comment: The paper explores how vision-language models process conflicting information, providing insights into model behavior and interpretability, relevant to representation learning and LLMs.
Token Communication in the Era of Large Models: An Information Bottleneck-Based Approach - Score: 15 (R=8, N=7) - Date: 2025-07-03 - Comment: The paper introduces a generative information bottleneck principle for token communication, which aligns with representation learning and efficiency improvements.
Analysis of Muon's Convergence and Critical Batch Size - Score: 15 (R=8, N=7) - Date: 2025-07-03 - Comment: The paper provides a theoretical analysis of a new optimizer, Muon, which is relevant to representation learning and training dynamics in neural networks.
ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks - Score: 15 (R=8, N=7) - Date: 2025-07-03 - Comment: The paper explores vulnerabilities in in-context learning of LLMs and proposes a defense mechanism, which is relevant to LLM behavior and interpretability.
Enhancing Interpretability in Generative Modeling: Statistically Disentangled Latent Spaces Guided by Generative Factors in Scientific Datasets - Score: 15 (R=8, N=7) - Date: 2025-07-02 - Comment: The paper introduces a novel architecture for disentangling latent spaces in generative models, aligning with the Representation Learning criterion.
DFReg: A Physics-Inspired Framework for Global Weight Distribution Regularization in Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-07-02 - Comment: The paper introduces a novel regularization method inspired by physics for neural networks, which relates to representation learning and model architecture.
Towards Undistillable Models by Minimizing Conditional Mutual Information - Score: 15 (R=8, N=7) - Date: 2025-07-02 - Comment: The paper proposes a method to create undistillable models by minimizing conditional mutual information, which aligns with the Representation Learning criterion.
The Trilemma of Truth in Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-07-01 - Comment: The paper introduces a method for probing the veracity of LLMs, which aligns with theoretical insights into LLM behavior and interpretability.
Emergent musical properties of a transformer under contrastive self-supervised learning - Score: 15 (R=8, N=7) - Date: 2025-07-01 - Comment: The paper explores emergent properties of transformers under contrastive self-supervised learning, which aligns with representation learning and model architecture.
Towards the Training of Deeper Predictive Coding Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-07-01 - Comment: The paper addresses training challenges in predictive coding networks, which aligns with the representation learning criterion.
AICO: Feature Significance Tests for Supervised Learning - Score: 15 (R=8, N=7) - Date: 2025-07-01 - Comment: The paper develops model-agnostic significance tests for feature importance, which aligns with foundational research in representation learning.
Token Activation Map to Visually Explain Multimodal LLMs - Score: 15 (R=8, N=7) - Date: 2025-07-01 - Comment: The paper proposes a method for explaining multimodal LLMs, which aligns with foundational research in LLM behavior and interpretability.
Intervening in Black Box: Concept Bottleneck Model for Enhancing Human Neural Network Mutual Understanding - Score: 15 (R=8, N=7) - Date: 2025-07-01 - Comment: The paper introduces a Concept Bottleneck Model to enhance interpretability in neural networks, which aligns with representation learning and model architecture analysis.
Towards Text-free Graph Foundation Models: Rethinking Multi-Domain Graph Contrastive Learning - Score: 15 (R=8, N=7) - Date: 2025-07-01 - Comment: The paper discusses a novel multi-domain graph contrastive learning framework, which is relevant to representation learning.

Other Foundational Research (27)

LEDOM: An Open and Fundamental Reverse Language Model - Score: 18 (R=9, N=9) - Date: 2025-07-03 - Comment: The paper introduces LEDOM, a reverse language model, which is a novel approach in the context of foundational models and LLMs.
LLM-Crowdsourced: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-07-31 - Comment: The paper proposes a novel benchmark-free evaluation paradigm for LLMs, which aligns with foundational research in evaluating LLM capabilities and introduces a new evaluation method.
Navigation through Non-Compact Symmetric Spaces: a mathematical perspective on Cartan Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-07-24 - Comment: The paper expands on the mathematical structures of Cartan Neural Networks, providing insights into their geometric properties, which aligns with emerging trends in theoretical work.
Language Generation in the Limit: Noise, Loss, and Feedback - Score: 17 (R=9, N=8) - Date: 2025-07-22 - Comment: The paper explores theoretical aspects of language generation in the limit, which aligns with emerging trends in foundational research.
KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning? - Score: 17 (R=9, N=8) - Date: 2025-07-16 - Comment: The paper investigates the role of chain-of-thought in LLM reasoning, providing insights into the implicit structures in mathematical reasoning. This aligns with foundational research in LLMs.
Functional Neural Wavefunction Optimization - Score: 17 (R=9, N=8) - Date: 2025-07-16 - Comment: The paper introduces a novel framework for optimization in variational quantum Monte Carlo, which involves neural network wavefunctions. This aligns with foundational research in AI for Science, focusing on theoretical insights rather than applications.
Generalized and Unified Equivalences between Hardness and Pseudoentropy - Score: 17 (R=9, N=8) - Date: 2025-07-09 - Comment: The paper provides a unified pseudoentropy characterization, which is relevant to emerging trends in theoretical work challenging established assumptions.
Predicting mutational effects on protein binding from folding energy - Score: 17 (R=9, N=8) - Date: 2025-07-09 - Comment: The paper proposes a novel transfer-learning approach for predicting protein binding effects, which aligns with foundational research in AI for Science, particularly in molecular modeling.
Theoretical Modeling of LLM Self-Improvement Training Dynamics Through Solver-Verifier Gap - Score: 17 (R=9, N=8) - Date: 2025-07-02 - Comment: The paper models the training dynamics of LLM self-improvement, providing theoretical insights into LLM behavior, which is relevant to foundational research in LLMs.
What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models - Score: 16 (R=9, N=7) - Date: 2025-07-10 - Comment: The paper evaluates foundation models using inductive bias probes, which aligns with the core topic of large language models and their theoretical insights.
The Riemannian Geometry associated to Gradient Flows of Linear Convolutional Networks - Score: 16 (R=9, N=7) - Date: 2025-07-10 - Comment: The paper provides theoretical insights into the gradient flow of linear convolutional networks, contributing to the understanding of training dynamics in neural networks.
Scale-Consistent Learning for Partial Differential Equations - Score: 16 (R=8, N=8) - Date: 2025-07-28 - Comment: The paper proposes a scale-consistent learning approach for PDEs, which is relevant to emerging trends in AI for science and foundational research.
Minimalist Concept Erasure in Generative Models - Score: 16 (R=8, N=8) - Date: 2025-07-21 - Comment: The paper introduces a novel minimalist concept erasure method in generative models, which involves theoretical insights into model behavior and optimization. This aligns with the emerging trends criterion.
Optimization Guarantees for Square-Root Natural-Gradient Variational Inference - Score: 16 (R=8, N=8) - Date: 2025-07-11 - Comment: The paper provides novel convergence guarantees for natural-gradient variational-Gaussian inference, which is relevant to emerging trends in theoretical work.
On the Hardness of Unsupervised Domain Adaptation: Optimal Learners and Information-Theoretic Perspective - Score: 16 (R=8, N=8) - Date: 2025-07-10 - Comment: The paper provides a theoretical perspective on the hardness of unsupervised domain adaptation, introducing an information-theoretic quantity to evaluate learning difficulty, which aligns with emerging trends in foundational research.
Foundation Model Self-Play: Open-Ended Strategy Innovation via Foundation Models - Score: 16 (R=8, N=8) - Date: 2025-07-10 - Comment: The paper introduces Foundation-Model Self-Play, leveraging foundation models for strategy innovation, which is a novel approach in the context of large language models.
Simple Convergence Proof of Adam From a Sign-like Descent Perspective - Score: 16 (R=8, N=8) - Date: 2025-07-09 - Comment: The paper provides a novel convergence proof for the Adam optimizer, offering theoretical insights into its behavior, which is relevant to training dynamics in neural networks.
What is an "Abstract Reasoner"? Revisiting Experiments and Arguments about Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-07-31 - Comment: The paper revisits the concept of 'abstract reasoning' in LLMs, which aligns with the interest in theoretical insights into LLM behavior.
Multi-state Protein Design with DynamicMPNN - Score: 15 (R=8, N=7) - Date: 2025-07-30 - Comment: The paper introduces DynamicMPNN, a model for multi-state protein design, which is relevant to foundational research in AI for Science.
Bayesian symbolic regression: Automated equation discovery from a physicists' perspective - Score: 15 (R=8, N=7) - Date: 2025-07-29 - Comment: The paper discusses a probabilistic approach to symbolic regression, offering a theoretical perspective that aligns with emerging trends in foundational research.
Improving LLMs' Generalized Reasoning Abilities by Graph Problems - Score: 15 (R=8, N=7) - Date: 2025-07-24 - Comment: The paper introduces a new approach to enhance LLMs' reasoning abilities using graph problems, which aligns with the large language models criterion.
C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning - Score: 15 (R=8, N=7) - Date: 2025-07-23 - Comment: The paper proposes a self-improving framework for multimodal data and model co-evolution, which is relevant to emerging trends in model training dynamics.
All-atom inverse protein folding through discrete flow matching - Score: 15 (R=8, N=7) - Date: 2025-07-22 - Comment: The paper presents a generative model for inverse protein folding, which is relevant to AI for Science with a focus on foundational research in protein modeling.
Bayesian Double Descent - Score: 15 (R=8, N=7) - Date: 2025-07-11 - Comment: The paper provides a Bayesian perspective on the double descent phenomenon, offering theoretical insights into training dynamics in neural networks.
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful - Score: 15 (R=8, N=7) - Date: 2025-07-10 - Comment: The paper revisits small batch size training for language models, providing insights into training dynamics and optimizer settings, which is relevant to training dynamics in neural networks.
On Design Principles for Private Adaptive Optimizers - Score: 15 (R=8, N=7) - Date: 2025-07-03 - Comment: The paper provides theoretical insights into private adaptive optimizers, which could impact model training dynamics.
Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections - Score: 15 (R=8, N=7) - Date: 2025-07-02 - Comment: The paper provides a theoretical framework bridging SFT and preference learning in LLM post-training, which is relevant to foundational research in LLMs.