Personalized Monthly Topic Summary 2025/06
| Metric | Value |
|---|---|
| Total Papers | 634 |
| Model Architecture | 187 |
| Model Compression and Efficiency | 161 |
| High Performance Computing | 43 |
| Representation Learning | 216 |
| Other Foundational Research | 27 |
Model Architecture (187)
-
Whole-Body Conditioned Egocentric Video Prediction - Score: 20.0 (R=0, N=0) - Date: 2025-06-27 - Comment: Author match
-
Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models - Score: 19 (R=10, N=9) - Date: 2025-06-25 - Comment: The paper introduces a new Mixture-of-Experts architecture, Chain-of-Experts, which is highly relevant to model architecture innovations.
-
SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification - Score: 18 (R=10, N=8) - Date: 2025-06-24 - Comment: The paper analyzes vulnerabilities in MoE-based LLMs, which is directly relevant to understanding MoE architectures.
-
Load Balancing Mixture of Experts with Similarity Preserving Routers - Score: 18 (R=10, N=8) - Date: 2025-06-18 - Comment: The paper addresses load balancing in Sparse Mixture of Experts (MoE) models, which is directly relevant to model architecture and efficiency.
-
On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks - Score: 18 (R=10, N=8) - Date: 2025-06-02 - Comment: The paper provides a theoretical study on the expressive power of Mixture-of-Experts (MoE), directly relevant to model architecture.
-
Forward Target Propagation: A Forward-Only Approach to Global Error Credit Assignment via Local Losses - Score: 18 (R=9, N=9) - Date: 2025-06-16 - Comment: The paper introduces Forward Target Propagation, a novel approach to error credit assignment, relevant to model architecture and training dynamics.
-
Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers - Score: 18 (R=9, N=9) - Date: 2025-06-13 - Comment: The paper provides a theoretical foundation for understanding out-of-context reasoning in transformers, which aligns with theoretical insights into LLM behavior.
-
Transformers Meet In-Context Learning: A Universal Approximation Theory - Score: 18 (R=9, N=9) - Date: 2025-06-06 - Comment: The paper develops a universal approximation theory for transformers in in-context learning, providing theoretical insights into LLM behavior.
-
HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts - Score: 18 (R=9, N=9) - Date: 2025-06-02 - Comment: The paper introduces HELM, a hyperbolic LLM with Mixture-of-Curvature Experts, relevant to LLM architecture innovations.
-
SPADE: Spatial Transcriptomics and Pathology Alignment Using a Mixture of Data Experts for an Expressive Latent Space - Score: 17 (R=9, N=8) - Date: 2025-06-30 - Comment: The paper introduces SPADE, a foundation model using a mixture-of-data experts technique for image representation learning, aligning with the Representation Learning and Model Architecture criteria.
-
TableMoE: Neuro-Symbolic Routing for Structured Expert Reasoning in Multimodal Table Understanding - Score: 17 (R=9, N=8) - Date: 2025-06-27 - Comment: The paper presents TableMoE, a neuro-symbolic Mixture-of-Experts architecture for multimodal table understanding, which is relevant to MoE and architectural innovations.
-
NaLaFormer: Norm-Aware Linear Attention for Transformer Models - Score: 17 (R=9, N=8) - Date: 2025-06-27 - Comment: The paper introduces a novel Norm-Aware Linear Attention mechanism for transformers, which is relevant to model architecture innovations.
-
Learning to Skip the Middle Layers of Transformers - Score: 17 (R=9, N=8) - Date: 2025-06-27 - Comment: The paper proposes a novel architecture for Transformers that dynamically skips middle layers, which is relevant to model architecture innovations.
-
A foundation model with multi-variate parallel attention to generate neuronal activity - Score: 17 (R=9, N=8) - Date: 2025-06-26 - Comment: The paper introduces a novel self-attention mechanism and a generative foundation model, relevant to model architecture and foundational model research.
-
Uncovering Conceptual Blindspots in Generative Image Models Using Sparse Autoencoders - Score: 17 (R=9, N=8) - Date: 2025-06-25 - Comment: The paper uses sparse autoencoders to identify conceptual blindspots in generative models, relevant to representation learning and model architecture analysis.
-
From memories to maps: Mechanisms of in context reinforcement learning in transformers - Score: 17 (R=9, N=8) - Date: 2025-06-25 - Comment: The paper explores representation learning in transformers, focusing on in-context reinforcement learning and memory mechanisms, which aligns with the representation learning criterion. It provides insights into how deep networks encode information and the role of memory in learning, which is foundational research.
-
RCStat: A Statistical Framework for using Relative Contextualization in Transformers - Score: 17 (R=9, N=8) - Date: 2025-06-25 - Comment: The paper introduces RCStat, a framework for using relative contextualization in transformers, which relates to model architecture and compression through key-value compression.
-
In-Context Occam's Razor: How Transformers Prefer Simpler Hypotheses on the Fly - Score: 17 (R=9, N=8) - Date: 2025-06-25 - Comment: The paper provides theoretical insights into how transformers prefer simpler hypotheses, aligning with the core topic of theoretical insights into LLM behavior.
-
On the Existence of Universal Simulators of Attention - Score: 17 (R=9, N=8) - Date: 2025-06-24 - Comment: The paper explores the theoretical ability of transformer architectures to simulate attention mechanisms, aligning with the model architecture criterion.
-
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation - Score: 17 (R=9, N=8) - Date: 2025-06-24 - Comment: The paper presents a compression framework for MoE models, aligning with the model compression criterion.
-
The 4th Dimension for Scaling Model Size - Score: 17 (R=9, N=8) - Date: 2025-06-24 - Comment: The paper explores a new dimension for scaling model size, which is relevant to foundational research in model architecture and scaling.
-
Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection - Score: 17 (R=9, N=8) - Date: 2025-06-24 - Comment: The paper discusses scaling state space models with Mixture-of-Experts, directly aligning with the model architecture criterion.
-
Beyond instruction-conditioning, MoTE: Mixture of Task Experts for Multi-task Embedding Models - Score: 17 (R=9, N=8) - Date: 2025-06-24 - Comment: The paper introduces a Mixture of Task Experts (MoTE) transformer block, which is directly relevant to model architecture, specifically MoE.
-
GTA: Grouped-head latenT Attention - Score: 17 (R=9, N=8) - Date: 2025-06-24 - Comment: The paper introduces Grouped-Head Latent Attention (GTA), a novel attention mechanism that reduces memory usage and computational complexity in LLMs, aligning with the model compression criterion.
-
From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers - Score: 17 (R=9, N=8) - Date: 2025-06-23 - Comment: The paper introduces a concept-agnostic attention module discovery method in transformers, relevant to model architecture and interpretability.
-
Feedback-driven recurrent quantum neural network universality - Score: 17 (R=9, N=8) - Date: 2025-06-23 - Comment: The paper presents a recurrent quantum neural network architecture with theoretical guarantees, which is relevant to emerging trends in quantum computing and neural networks.
-
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs - Score: 17 (R=9, N=8) - Date: 2025-06-18 - Comment: The paper presents Ring-lite, a Mixture-of-Experts (MoE)-based LLM optimized via reinforcement learning, which aligns with model architecture by focusing on MoE and efficiency.
-
MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation - Score: 17 (R=9, N=8) - Date: 2025-06-18 - Comment: The paper proposes MoORE, a novel model MoE-ization strategy for multi-task adaptation, relevant to model architecture and efficiency.
-
MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models - Score: 17 (R=9, N=8) - Date: 2025-06-18 - Comment: The paper introduces MoTE, a memory-efficient approach for Mixture-of-Experts models, relevant to model compression and architecture.
-
Less is More: Undertraining Experts Improves Model Upcycling - Score: 17 (R=9, N=8) - Date: 2025-06-18 - Comment: The paper challenges assumptions in model upcycling and discusses MoE layers, aligning with the model architecture criterion.
-
Toward a Graph Foundation Model: Pre-Training Transformers With Random Walks - Score: 17 (R=9, N=8) - Date: 2025-06-18 - Comment: The paper proposes a graph foundation model using Transformers, which is relevant to model architecture and representation learning.
-
Transformers Learn Faster with Semantic Focus - Score: 17 (R=9, N=8) - Date: 2025-06-18 - Comment: The paper studies sparse attention in transformers, focusing on learnability and generalization, which is relevant to model architecture and representation learning.
-
What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers - Score: 17 (R=9, N=8) - Date: 2025-06-17 - Comment: The paper investigates the training dynamics of Transformers, focusing on the abrupt learning phenomenon and representation collapse, which aligns with representation learning and insights into how deep networks encode information.
-
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention - Score: 17 (R=9, N=8) - Date: 2025-06-17 - Comment: The paper introduces a hybrid Mixture-of-Experts architecture with a novel attention mechanism, which is relevant to model architecture and efficiency.
-
Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization - Score: 17 (R=9, N=8) - Date: 2025-06-17 - Comment: The paper introduces a novel mixture-of-expert (MoE) approach for dynamic token-wise KV optimization in transformers, which aligns with the model architecture and model compression criteria.
-
Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization - Score: 17 (R=9, N=8) - Date: 2025-06-17 - Comment: The paper introduces the Mixture of Cognitive Reasoners architecture, which is relevant to model architecture and offers insights into modular reasoning with brain-like specialization.
-
Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts - Score: 17 (R=9, N=8) - Date: 2025-06-17 - Comment: The paper presents a Sparse Interpolated Mixture-of-Experts (SIMoE) method, which is relevant to model architecture innovations, specifically MoE.
-
Constant Bit-size Transformers Are Turing Complete - Score: 17 (R=9, N=8) - Date: 2025-06-17 - Comment: The paper proves that constant bit-size transformers are Turing complete, providing theoretical insights into transformer models.
-
Long-Short Alignment for Effective Long-Context Modeling in LLMs - Score: 17 (R=9, N=8) - Date: 2025-06-16 - Comment: The paper proposes a new perspective on length generalization in LLMs, focusing on long-short alignment, relevant to LLM behavior and architecture.
-
HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data - Score: 17 (R=9, N=8) - Date: 2025-06-16 - Comment: The paper introduces HEIST, a hierarchical graph transformer-based foundation model for spatial transcriptomics and proteomics data, which aligns with the AI for Science criterion focusing on foundational research in molecular/protein modeling.
-
A Framework for Non-Linear Attention via Modern Hopfield Networks - Score: 17 (R=9, N=8) - Date: 2025-06-16 - Comment: The paper proposes a framework for non-linear attention via Modern Hopfield Networks, which aligns with the model architecture criterion.
-
Sequential-Parallel Duality in Prefix Scannable Models - Score: 17 (R=9, N=8) - Date: 2025-06-13 - Comment: The paper discusses a broad class of neural sequence models and introduces Prefix-Scannable Models, which is relevant to model architecture innovations.
-
e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs - Score: 17 (R=9, N=8) - Date: 2025-06-11 - Comment: The paper discusses test-time scaling and in-context exploration for LLMs, which relates to foundational research in LLM architecture and theoretical insights.
-
PropMEND: Hypernetworks for Knowledge Propagation in LLMs - Score: 17 (R=9, N=8) - Date: 2025-06-11 - Comment: The paper introduces a hypernetwork-based approach for knowledge propagation in LLMs, which is relevant to foundational research in LLM behavior and architecture. It presents a novel method for improving knowledge propagation.
-
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning - Score: 17 (R=9, N=8) - Date: 2025-06-11 - Comment: The paper introduces SeerAttention-R, a sparse attention framework, which is relevant to model architecture and efficiency through sparse methods.
-
SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space - Score: 17 (R=9, N=8) - Date: 2025-06-11 - Comment: The paper proposes a novel approach to simultaneously optimize neural network architecture and weights, relevant to model architecture innovations.
-
Graph-KV: Breaking Sequence via Injecting Structural Biases into Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-06-10 - Comment: Graph-KV introduces structural biases into LLMs, which is relevant to model architecture and efficiency improvements.
-
Spark Transformer: Reactivating Sparsity in FFN and Attention - Score: 17 (R=9, N=8) - Date: 2025-06-10 - Comment: The paper introduces a novel transformer architecture with activation sparsity, aligning with model architecture and model compression criteria.
-
SMAR: Soft Modality-Aware Routing Strategy for MoE-based Multimodal Large Language Models Preserving Language Capabilities - Score: 17 (R=9, N=8) - Date: 2025-06-10 - Comment: The paper introduces SMAR, a routing strategy for MoE-based multimodal models, which is relevant to model architecture, specifically MoE.
-
Transformative or Conservative? Conservation laws for ResNets and Transformers - Score: 17 (R=9, N=8) - Date: 2025-06-09 - Comment: The paper explores conservation laws in ResNets and Transformers, providing theoretical insights into these architectures.
-
A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization - Score: 17 (R=9, N=8) - Date: 2025-06-09 - Comment: The paper provides a theoretical study of self-attention, a core component of modern neural architectures, and introduces new modules like HyperFeatureAttention and HyperAttention, which are relevant to model architecture innovations.
-
MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-06-09 - Comment: The paper introduces a heterogeneous mixture of adapters for LLM fine-tuning, relevant to MoE and model architecture.
-
Contextually Guided Transformers via Low-Rank Adaptation - Score: 17 (R=9, N=8) - Date: 2025-06-09 - Comment: The paper proposes a modification to Transformer architecture for context encoding, which is relevant to model architecture innovations.
-
Projectable Models: One-Shot Generation of Small Specialized Transformers from Large Ones - Score: 17 (R=9, N=8) - Date: 2025-06-09 - Comment: The paper explores generating smaller specialized models from large transformers, relevant to model compression and efficiency.
-
CoFrNets: Interpretable Neural Architecture Inspired by Continued Fractions - Score: 17 (R=9, N=8) - Date: 2025-06-09 - Comment: The paper proposes a novel neural architecture inspired by continued fractions, which is relevant to model architecture innovations.
-
When can in-context learning generalize out of task distribution? - Score: 17 (R=9, N=8) - Date: 2025-06-09 - Comment: The paper investigates conditions for in-context learning to generalize out of task distribution, which is relevant to large language models and theoretical insights.
-
Kinetics: Rethinking Test-Time Scaling Laws - Score: 17 (R=9, N=8) - Date: 2025-06-06 - Comment: The paper proposes a new scaling paradigm centered on sparse attention, which is relevant to model architecture and efficiency.
-
On the Convergence of Gradient Descent on Learning Transformers with Residual Connections - Score: 17 (R=9, N=8) - Date: 2025-06-06 - Comment: The paper analyzes the convergence of gradient descent on Transformers with residual connections, providing insights into model architecture and training dynamics.
-
Log-Linear Attention - Score: 17 (R=9, N=8) - Date: 2025-06-06 - Comment: The paper introduces log-linear attention, a novel attention mechanism balancing efficiency and expressiveness, relevant to model architecture.
-
NOBLE -- Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models - Score: 17 (R=9, N=8) - Date: 2025-06-06 - Comment: The paper introduces NOBLE, a neural operator framework for modeling biological neurons, which aligns with foundational research in AI for Science, focusing on new generative paradigms and architecture-level innovations.
-
Attention-Only Transformers via Unrolled Subspace Denoising - Score: 17 (R=9, N=8) - Date: 2025-06-05 - Comment: The paper proposes a fully interpretable transformer architecture using only self-attention operators, which is relevant to model architecture innovations.
-
Memory-Efficient and Privacy-Preserving Collaborative Training for Mixture-of-Experts LLMs - Score: 17 (R=9, N=8) - Date: 2025-06-04 - Comment: The paper introduces a privacy-preserving collaborative training framework for MoE LLMs, which is relevant to model architecture and LLMs.
-
Asymptotics of SGD in Sequence-Single Index Models and Single-Layer Attention Networks - Score: 17 (R=9, N=8) - Date: 2025-06-04 - Comment: The paper provides theoretical insights into the dynamics of SGD in sequence models and attention networks, relevant to representation learning and model architecture.
-
HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference - Score: 17 (R=9, N=8) - Date: 2025-06-04 - Comment: The paper introduces a novel attention mechanism for LLMs, which is relevant to foundational research in model architecture and efficiency.
-
Esoteric Language Models - Score: 17 (R=9, N=8) - Date: 2025-06-03 - Comment: The paper introduces a new family of models, Eso-LMs, which combines AR and MDM paradigms and introduces KV caching for MDMs, aligning with foundational research in LLM architecture.
-
SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model - Score: 17 (R=9, N=8) - Date: 2025-06-03 - Comment: The paper introduces a Mixture of Experts (MoE) model for DNA representation learning, which is relevant to both representation learning and model architecture.
-
Latent Structured Hopfield Network for Semantic Association and Retrieval - Score: 17 (R=9, N=8) - Date: 2025-06-03 - Comment: The paper introduces a biologically inspired framework integrating Hopfield networks into an autoencoder architecture, which aligns with representation learning and model architecture.
-
Incorporating Hierarchical Semantics in Sparse Autoencoder Architectures - Score: 17 (R=9, N=8) - Date: 2025-06-03 - Comment: The paper introduces a modified sparse autoencoder architecture incorporating hierarchical semantics, relevant to representation learning and model architecture.
-
Attention Retrieves, MLP Memorizes: Disentangling Trainable Components in the Transformer - Score: 17 (R=9, N=8) - Date: 2025-06-03 - Comment: The paper analyzes the roles of attention and MLP in Transformers, providing insights into model architecture.
-
Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers - Score: 17 (R=9, N=8) - Date: 2025-06-03 - Comment: The paper discusses hybrid memory architectures in transformers, which is relevant to model architecture innovations.
-
Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis - Score: 17 (R=9, N=8) - Date: 2025-06-02 - Comment: The paper explores the interpretability of Mixture-of-Experts (MoE) models, providing insights into their architecture and efficiency, which aligns with the model architecture criterion.
-
Differential Gated Self-Attention - Score: 17 (R=9, N=8) - Date: 2025-06-02 - Comment: The paper proposes a novel input-dependent gating mechanism for self-attention in Transformers, which is relevant to model architecture innovations.
-
Diversity of Transformer Layers: One Aspect of Parameter Scaling Laws - Score: 17 (R=9, N=8) - Date: 2025-06-02 - Comment: The paper investigates the diversity of Transformer layers and their impact on parameter scaling laws, providing insights into model architecture and theoretical understanding of Transformers.
-
Utility-Driven Speculative Decoding for Mixture-of-Experts - Score: 16 (R=9, N=7) - Date: 2025-06-27 - Comment: The paper discusses speculative decoding in Mixture-of-Experts (MoE) models, focusing on optimizing token throughput and reducing data movement, which aligns with the Model Architecture criterion.
-
Enhancing Large Language Models through Structured Reasoning - Score: 16 (R=9, N=7) - Date: 2025-06-26 - Comment: The paper proposes structured reasoning to enhance LLMs, which is relevant to foundational research in LLM architecture and theoretical insights.
-
Exploring Speaker Diarization with Mixture of Experts - Score: 16 (R=9, N=7) - Date: 2025-06-18 - Comment: The paper explores speaker diarization with a Mixture of Experts (MoE) approach, which aligns with model architecture by introducing MoE in speaker diarization.
-
Single-Example Learning in a Mixture of GPDMs with Latent Geometries - Score: 16 (R=9, N=7) - Date: 2025-06-18 - Comment: The paper presents a mixture-of-experts framework using Gaussian process dynamical models, which aligns with the model architecture criterion.
-
Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources? - Score: 16 (R=9, N=7) - Date: 2025-06-17 - Comment: The paper investigates whether Mixture-of-Experts models can surpass dense LLMs under equal resources, relevant to model architecture and offering insights into MoE performance.
-
MoTE: Mixture of Task-specific Experts for Pre-Trained ModelBased Class-incremental Learning - Score: 16 (R=9, N=7) - Date: 2025-06-16 - Comment: The paper proposes MoTE, a mixture of task-specific experts framework, aligning with the Model Architecture criterion focusing on Mixture-of-Experts.
-
Adaptive Preconditioners Trigger Loss Spikes in Adam - Score: 16 (R=9, N=7) - Date: 2025-06-06 - Comment: The paper investigates the mechanism behind loss spikes in the Adam optimizer, providing insights into training dynamics in neural networks.
-
Tug-of-war between idiom's figurative and literal meanings in LLMs - Score: 16 (R=9, N=7) - Date: 2025-06-03 - Comment: The paper provides mechanistic insights into how LLMs process idioms, which aligns with the interest in theoretical insights into LLM behavior.
-
Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning - Score: 16 (R=9, N=7) - Date: 2025-06-03 - Comment: The paper provides a theoretical study on the sample and runtime complexity of MoE, aligning with foundational research in model architecture.
-
QuKAN: A Quantum Circuit Born Machine approach to Quantum Kolmogorov Arnold Networks - Score: 16 (R=8, N=8) - Date: 2025-06-30 - Comment: The paper explores Quantum Kolmogorov Arnold Networks, introducing a novel architecture in quantum machine learning, which is relevant to model architecture innovations.
-
Unveiling Causal Reasoning in Large Language Models: Reality or Mirage? - Score: 16 (R=8, N=8) - Date: 2025-06-27 - Comment: The paper examines causal reasoning in LLMs, introducing a new benchmark and a method to enhance reasoning capabilities, which is relevant to theoretical insights into LLM behavior.
-
Finding Clustering Algorithms in the Transformer Architecture - Score: 16 (R=8, N=8) - Date: 2025-06-25 - Comment: The paper explores the implementation of clustering algorithms within the transformer architecture, providing insights into model architecture and representation learning.
-
BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning - Score: 16 (R=8, N=8) - Date: 2025-06-23 - Comment: The paper addresses limitations in the SFT + RL paradigm for small language models, proposing a novel method that could impact LLM training strategies.
-
Generating Directed Graphs with Dual Attention and Asymmetric Encoding - Score: 16 (R=8, N=8) - Date: 2025-06-23 - Comment: The paper presents a novel generative model for directed graphs using dual attention and asymmetric encoding, relevant to model architecture innovations.
-
SeqPE: Transformer with Sequential Position Encoding - Score: 16 (R=8, N=8) - Date: 2025-06-17 - Comment: The paper introduces SeqPE, a novel position encoding framework for transformers, which is relevant to model architecture and efficiency.
-
QiMeng-Attention: SOTA Attention Operator is generated by SOTA Attention Algorithm - Score: 16 (R=8, N=8) - Date: 2025-06-17 - Comment: The paper introduces QiMeng-Attention, a self-optimizing paradigm for generating high-performance attention operators, which is relevant to model architecture and efficiency.
-
Brewing Knowledge in Context: Distillation Perspectives on In-Context Learning - Score: 16 (R=8, N=8) - Date: 2025-06-16 - Comment: The paper offers a theoretical perspective on in-context learning as knowledge distillation, relevant to understanding LLM behavior.
-
Tversky Neural Networks: Psychologically Plausible Deep Learning with Differentiable Tversky Similarity - Score: 16 (R=8, N=8) - Date: 2025-06-16 - Comment: The paper introduces a novel neural network building block based on Tversky's similarity, which is relevant to model architecture innovations.
-
Foundation Models for Causal Inference via Prior-Data Fitted Networks - Score: 16 (R=8, N=8) - Date: 2025-06-13 - Comment: The paper introduces a framework for training foundation models for causal inference, which aligns with foundational research in AI for science.
-
Geometric Regularity in Deterministic Sampling of Diffusion-based Generative Models - Score: 16 (R=8, N=8) - Date: 2025-06-13 - Comment: The paper reveals geometric regularity in diffusion-based generative models, which aligns with emerging trends in theoretical insights.
-
Edit Flows: Flow Matching with Edit Operations - Score: 16 (R=8, N=8) - Date: 2025-06-11 - Comment: The paper proposes Edit Flows, a non-autoregressive model using edit operations, which is relevant to model architecture innovations.
-
Protriever: End-to-End Differentiable Protein Homology Search for Fitness Prediction - Score: 16 (R=8, N=8) - Date: 2025-06-11 - Comment: The paper introduces a new framework for protein homology search, which is foundational research in molecular modeling.
-
Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion - Score: 16 (R=8, N=8) - Date: 2025-06-11 - Comment: The paper explains the superior performance of masking diffusion in discrete diffusion models, which is relevant to emerging trends in model architecture.
-
MEMOIR: Lifelong Model Editing with Minimal Overwrite and Informed Retention for LLMs - Score: 16 (R=8, N=8) - Date: 2025-06-10 - Comment: The paper introduces a novel framework for lifelong model editing in LLMs, which aligns with foundational research in LLM behavior and architecture.
-
United Minds or Isolated Agents? Exploring Coordination of LLMs under Cognitive Load Theory - Score: 16 (R=8, N=8) - Date: 2025-06-10 - Comment: The paper introduces a novel multi-agent framework for LLMs based on Cognitive Load Theory, which aligns with foundational research in LLM behavior and architecture.
-
RhoDARTS: Differentiable Quantum Architecture Search with Density Matrix Simulations - Score: 16 (R=8, N=8) - Date: 2025-06-05 - Comment: The paper introduces a novel differentiable Quantum Architecture Search algorithm, which aligns with the Model Architecture criterion by proposing a new method for identifying effective quantum neural network architectures.
-
Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner - Score: 16 (R=8, N=8) - Date: 2025-06-05 - Comment: The paper investigates heuristics in the Shampoo optimization algorithm, focusing on Kronecker-factorization-based training algorithms, which is relevant to model architecture and efficiency improvements.
-
A Foundation Model for Spatial Proteomics - Score: 16 (R=8, N=8) - Date: 2025-06-05 - Comment: The paper presents KRONOS, a foundation model for spatial proteomics, which aligns with the AI for Science criterion by introducing a new generative paradigm for spatial proteomics analysis.
-
PMNO: A novel physics guided multi-step neural operator predictor for partial differential equations - Score: 16 (R=8, N=8) - Date: 2025-06-03 - Comment: The paper presents a novel neural operator architecture for PDEs, which is relevant to foundational research in AI for science.
-
Instruction Learning Paradigms: A Dual Perspective on White-box and Black-box LLMs - Score: 15 (R=8, N=7) - Date: 2025-06-30 - Comment: The paper presents a novel framework merging white-box and black-box LLMs, which could offer insights into LLM behavior and architecture.
-
Multiple Streams of Relation Extraction: Enriching and Recalling in Transformers - Score: 15 (R=8, N=7) - Date: 2025-06-27 - Comment: The paper investigates how LLMs learn and recall relation information during fine-tuning, providing insights into LLM behavior and interpretability.
-
Mastering Multiple-Expert Routing: Realizable $H$-Consistency and Strong Guarantees for Learning to Defer - Score: 15 (R=8, N=7) - Date: 2025-06-26 - Comment: The paper introduces novel surrogate loss functions and algorithms for learning to defer with multiple experts, which is relevant to model architecture and efficiency.
-
Argumentative Ensembling for Robust Recourse under Model Multiplicity - Score: 15 (R=8, N=7) - Date: 2025-06-26 - Comment: The paper addresses model multiplicity and proposes an argumentative ensembling method, which is relevant to model architecture and theoretical insights into model behavior.
-
Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture - Score: 15 (R=8, N=7) - Date: 2025-06-26 - Comment: The paper explores masked diffusion models within a decoder-only framework, offering insights into architectural influences and paradigm differences, which aligns with the model architecture criterion.
-
DualEquiNet: A Dual-Space Hierarchical Equivariant Network for Large Biomolecules - Score: 15 (R=8, N=7) - Date: 2025-06-26 - Comment: The paper introduces a novel network architecture for biomolecular modeling, which aligns with the AI for Science criterion, focusing on foundational research in molecular modeling.
-
The Effect of Depth on the Expressivity of Deep Linear State-Space Models - Score: 15 (R=8, N=7) - Date: 2025-06-25 - Comment: The paper investigates the expressivity of deep linear state-space models, which aligns with foundational research in model architecture by exploring the effects of depth and width.
-
On the algorithmic construction of deep ReLU networks - Score: 15 (R=8, N=7) - Date: 2025-06-25 - Comment: The paper explores the algorithmic construction of deep ReLU networks, providing insights into model architecture and expressivity, aligning with the core topic of model architecture.
-
Steering Conceptual Bias via Transformer Latent-Subspace Activation - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper explores steering conceptual bias in LLMs via latent-subspace activation, which is relevant to model architecture and LLM behavior.
-
Focus Your Attention: Towards Data-Intuitive Lightweight Vision Transformers - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper proposes a novel lightweight Vision Transformer architecture, which is relevant to model architecture innovations.
-
Shift Happens: Mixture of Experts based Continual Adaptation in Federated Learning - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper introduces a mixture of experts framework for federated learning, which is relevant to model architecture, specifically MoE.
-
DDOT: A Derivative-directed Dual-decoder Ordinary Differential Equation Transformer for Dynamic System Modeling - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper introduces DDOT, a novel transformer-based model for reconstructing ODEs, which aligns with the model architecture criterion by proposing a new architecture for dynamic system modeling.
-
Mechanistic Interpretability in the Presence of Architectural Obfuscation - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper examines the impact of architectural obfuscation on mechanistic interpretability, relevant to model architecture and LLMs.
-
HalluRNN: Mitigating Hallucinations via Recurrent Cross-Layer Reasoning in Large Vision-Language Models - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper introduces an architecture-level solution to mitigate hallucinations in LVLMs, which aligns with the interest in model architecture innovations.
-
MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper introduces a framework for multimodal reasoning with dynamic multi-expert aggregation, which is relevant to model architecture innovations.
-
Simulating Correlated Electrons with Symmetry-Enforced Normalizing Flows - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper presents a method using normalizing flows for simulating correlated electrons, which is relevant to AI for Science through foundational research in molecular modeling.
-
Exploring and Improving Initialization for Deep Graph Neural Networks: A Signal Propagation Perspective - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper introduces a new initialization method for deep GNNs to enhance signal propagation, which is relevant to model architecture innovations.
-
Mesh-Informed Neural Operator : A Transformer Generative Approach - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper introduces a new generative model architecture, the Mesh-Informed Neural Operator, which is relevant to model architecture innovations.
-
Relational Deep Learning: Challenges, Foundations and Next-Generation Architectures - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper provides a comprehensive review of relational deep learning, discussing foundational challenges and architectural advances, which aligns with model architecture and emerging trends.
-
FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: FLAME introduces a federated learning framework using Sparse Mixture-of-Experts, relevant to model architecture and compression.
-
T-SHRED: Symbolic Regression for Regularization and Model Discovery with Transformer Shallow Recurrent Decoders - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: T-SHRED introduces transformers for temporal encoding and symbolic regression for model regularization, relevant to model architecture and representation learning.
-
MoR: Better Handling Diverse Queries with a Mixture of Sparse, Dense, and Human Retrievers - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper discusses a mixture of retrievers for retrieval-augmented generation, which is relevant to model architecture through the use of mixture models.
-
From Bytes to Ideas: Language Modeling with Autoregressive U-Nets - Score: 15 (R=8, N=7) - Date: 2025-06-18 - Comment: The paper introduces an autoregressive U-Net for language modeling, which is relevant to model architecture innovations.
-
GenerationPrograms: Fine-grained Attribution with Executable Programs - Score: 15 (R=8, N=7) - Date: 2025-06-18 - Comment: The paper introduces a new framework for improving attribution in LLMs, which aligns with the large language models criterion.
-
Object-Centric Neuro-Argumentative Learning - Score: 15 (R=8, N=7) - Date: 2025-06-18 - Comment: The paper introduces a novel architecture combining neural and symbolic components, which aligns with model architecture innovations.
-
ResNets Are Deeper Than You Think - Score: 15 (R=8, N=7) - Date: 2025-06-18 - Comment: The paper provides insights into the function space of residual networks, which is relevant to understanding model architecture.
-
Equivariance Everywhere All At Once: A Recipe for Graph Foundation Models - Score: 15 (R=8, N=7) - Date: 2025-06-18 - Comment: The paper presents a recipe for graph foundation models, aligning with the foundation model criterion.
-
Distinct Computations Emerge From Compositional Curricula in In-Context Learning - Score: 15 (R=8, N=7) - Date: 2025-06-17 - Comment: The paper explores how compositional curricula affect in-context learning in transformers, which is relevant to representation learning and model architecture.
-
Breaking Thought Patterns: A Multi-Dimensional Reasoning Framework for LLMs - Score: 15 (R=8, N=7) - Date: 2025-06-17 - Comment: The paper proposes a novel framework combining MoE and CoT reasoning for LLMs, which is relevant to model architecture and LLMs.
-
MEraser: An Effective Fingerprint Erasure Approach for Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-06-17 - Comment: The paper presents MEraser, a method for removing backdoor-based fingerprints from LLMs, which is relevant to foundational research in LLMs.
-
Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model - Score: 15 (R=8, N=7) - Date: 2025-06-17 - Comment: The paper proposes a dynamic mixture-of-experts model for multilingual LLMs, addressing the curse of multilinguality, which is relevant to model architecture.
-
Model Merging for Knowledge Editing - Score: 15 (R=8, N=7) - Date: 2025-06-17 - Comment: The paper proposes a model merging framework for knowledge editing in LLMs, which is relevant to foundational research in LLMs.
-
An Attention-based Spatio-Temporal Neural Operator for Evolving Physics - Score: 15 (R=8, N=7) - Date: 2025-06-16 - Comment: The paper introduces a novel architecture combining attention mechanisms for spatio-temporal interactions, which aligns with the model architecture criterion.
-
STRCMP: Integrating Graph Structural Priors with Language Models for Combinatorial Optimization - Score: 15 (R=8, N=7) - Date: 2025-06-16 - Comment: The paper introduces a novel framework integrating graph structural priors with LLMs for combinatorial optimization, aligning with foundational research in model architecture and LLMs.
-
Principled Approaches for Extending Neural Architectures to Function Spaces for Operator Learning - Score: 15 (R=8, N=7) - Date: 2025-06-13 - Comment: The paper discusses extending neural architectures to function spaces for operator learning, which is relevant to foundational research in model architecture.
-
Self-Adapting Language Models - Score: 15 (R=8, N=7) - Date: 2025-06-13 - Comment: The paper introduces a framework for self-adapting LLMs, which is relevant to foundational research in LLM architecture and behavior.
-
Lattice Climber Attack: Adversarial attacks for randomized mixtures of classifiers - Score: 15 (R=8, N=7) - Date: 2025-06-13 - Comment: The paper discusses adversarial attacks on mixtures of classifiers, which is relevant to mixture-of-experts and model robustness.
-
Box-Constrained Softmax Function and Its Application for Post-Hoc Calibration - Score: 15 (R=8, N=7) - Date: 2025-06-13 - Comment: The paper introduces a novel box-constrained softmax function, which is a foundational contribution to model architecture and calibration methods.
-
Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation - Score: 15 (R=8, N=7) - Date: 2025-06-13 - Comment: The paper introduces a novel decoupled visual encoding architecture for a multimodal foundation model, which relates to model architecture innovations.
-
AlphaFold Database Debiasing for Robust Inverse Folding - Score: 15 (R=8, N=7) - Date: 2025-06-11 - Comment: The paper introduces a Debiasing Structure AutoEncoder for protein structure modeling, which is relevant to foundational research in AI for Science.
-
SEMA: a Scalable and Efficient Mamba like Attention via Token Localization and Averaging - Score: 15 (R=8, N=7) - Date: 2025-06-11 - Comment: The paper introduces a new attention mechanism, SEMA, which is relevant to model architecture innovations.
-
Vision Transformers Don't Need Trained Registers - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper proposes a training-free approach to improve Vision Transformers, relevant to model architecture.
-
Return of ChebNet: Understanding and Improving an Overlooked GNN on Long Range Tasks - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper revisits ChebNet, a foundational GNN architecture, and proposes improvements for long-range tasks, aligning with model architecture analysis.
-
CoCoA-Mix: Confusion-and-Confidence-Aware Mixture Model for Context Optimization - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper proposes a mixture model for context optimization, which is related to model architecture innovations.
-
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper presents STARFlow, a scalable generative model using normalizing flows and Transformers, which aligns with the model architecture criterion.
-
ENMA: Tokenwise Autoregression for Generative Neural PDE Operators - Score: 15 (R=8, N=7) - Date: 2025-06-09 - Comment: The paper introduces a generative neural operator for PDEs, which is relevant to AI for Science as it proposes a new generative paradigm for modeling spatio-temporal dynamics.
-
Flow-Attentional Graph Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-06-09 - Comment: The paper introduces flow attention in GNNs, which is an architectural innovation relevant to model architecture analysis.
-
Tensor-to-Tensor Models with Fast Iterated Sum Features - Score: 15 (R=8, N=7) - Date: 2025-06-09 - Comment: The paper introduces a novel tensor-to-tensor layer for neural networks, which is relevant to model architecture as it proposes a new layer type with efficiency improvements.
-
Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learning - Score: 15 (R=8, N=7) - Date: 2025-06-09 - Comment: The paper proposes a dynamic mixture of experts for lifelong robot learning, focusing on parameter-efficient learning and knowledge sharing, which is relevant to model architecture and efficiency.
-
On Measuring Long-Range Interactions in Graph Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-06-09 - Comment: The paper formalizes long-range interactions in graph neural networks, which is relevant to model architecture analysis and provides theoretical insights.
-
Topology-aware Neural Flux Prediction Guided by Physics - Score: 15 (R=8, N=7) - Date: 2025-06-09 - Comment: The paper proposes a novel framework for GNNs to capture topological differences, which is relevant to model architecture.
-
UniPTMs: The First Unified Multi-type PTM Site Prediction Model via Master-Slave Architecture-Based Multi-Stage Fusion Strategy and Hierarchical Contrastive Loss - Score: 15 (R=8, N=7) - Date: 2025-06-09 - Comment: The paper introduces a novel architecture for PTM site prediction, which involves architectural innovations like a 'Master-Slave' dual-path collaborative architecture.
-
Mixture-of-Experts Meets In-Context Reinforcement Learning - Score: 15 (R=8, N=7) - Date: 2025-06-09 - Comment: The paper introduces a mixture-of-experts framework for in-context reinforcement learning, which is relevant to model architecture innovations like MoE.
-
Exploring Diffusion Transformer Designs via Grafting - Score: 15 (R=8, N=7) - Date: 2025-06-06 - Comment: The paper explores diffusion transformer designs via grafting, which aligns with model architecture innovations, particularly in the context of transformers.
-
Power Law Guided Dynamic Sifting for Efficient Attention - Score: 15 (R=8, N=7) - Date: 2025-06-06 - Comment: The paper proposes SiftAttention, an efficient attention method for LLMs, which aligns with model architecture innovations by improving memory bandwidth usage.
-
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training - Score: 15 (R=8, N=7) - Date: 2025-06-06 - Comment: The paper introduces MesaNet, a sequence modeling approach with a novel layer derived from an in-context regression objective, contributing to model architecture innovations.
-
Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts - Score: 15 (R=8, N=7) - Date: 2025-06-06 - Comment: The paper introduces Diagonal Batching to improve parallelism in Recurrent Memory Transformers, aligning with model architecture and efficiency improvements.
-
The Oversmoothing Fallacy: A Misguided Narrative in GNN Research - Score: 15 (R=8, N=7) - Date: 2025-06-06 - Comment: The paper challenges the oversmoothing narrative in GNN research, providing insights into deep GNN architectures, which aligns with the model architecture criterion.
-
HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation - Score: 15 (R=8, N=7) - Date: 2025-06-06 - Comment: The paper introduces HMAR, a new image generation algorithm, which aligns with model architecture innovations, particularly in auto-regressive models.
-
Half-Layered Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-06-06 - Comment: The paper proposes a new architectural concept with 'half' layers, which is relevant to model architecture innovations.
-
Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural Perspective - Score: 15 (R=8, N=7) - Date: 2025-06-05 - Comment: The paper addresses the stability-plasticity trade-off in continual learning from an architectural perspective, which aligns with model architecture by exploring architectural innovations.
-
RadialRouter: Structured Representation for Efficient and Robust Large Language Models Routing - Score: 15 (R=8, N=7) - Date: 2025-06-05 - Comment: The paper introduces RadialRouter, a novel framework for LLM routing using a Transformer-based backbone, which aligns with the interest in model architecture innovations.
-
ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices - Score: 15 (R=8, N=7) - Date: 2025-06-05 - Comment: The paper proposes a new method for positional encoding in Transformers, which is relevant to model architecture innovations.
-
Out-of-Distribution Graph Models Merging - Score: 15 (R=8, N=7) - Date: 2025-06-05 - Comment: The paper proposes a novel approach to merge out-of-distribution graph models using a MoE module, which is relevant to model architecture and representation learning.
-
Multi-Exit Kolmogorov-Arnold Networks: enhancing accuracy and parsimony - Score: 15 (R=8, N=7) - Date: 2025-06-05 - Comment: The paper introduces multi-exit KANs, enhancing model architecture by enabling predictions at multiple depths, which is relevant to architectural innovations.
-
Bridging Neural ODE and ResNet: A Formal Error Bound for Safety Verification - Score: 15 (R=8, N=7) - Date: 2025-06-05 - Comment: The paper establishes a formal relationship between neural ODEs and ResNets, which is relevant to model architecture analysis.
-
Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper proposes Sparse-vDiT, a framework for accelerating video diffusion transformers using sparse attention, which relates to model compression and efficiency.
-
Sheaves Reloaded: A Directional Awakening - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper introduces Directed Sheaf Neural Networks, which relates to model architecture innovations.
-
On Universality Classes of Equivariant Networks - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper investigates the universality classes of equivariant networks, which relates to model architecture by exploring the expressivity and approximation power of these networks.
-
A Tale of Two Symmetries: Exploring the Loss Landscape of Equivariant Models - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper provides a theoretical analysis of the loss landscape of equivariant models, which aligns with foundational research in model architecture.
-
Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper reformulates LRP for Transformer explainability by incorporating positional encoding, which is relevant to model architecture and explainability.
-
Quotient Network -- A Network Similar to ResNet but Learning Quotients - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper introduces a new network architecture called Quotient Network, which is relevant to model architecture innovations.
-
Unlocking Personalized Knowledge in Federated Large Language Model: The Power of Mixture of Experts - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper discusses the use of Mixture of Experts in federated learning, which is relevant to model architecture and sparsity.
-
From Local Cues to Global Percepts: Emergent Gestalt Organization in Self-Supervised Vision Models - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper explores how Vision Transformers trained with Masked Autoencoding exhibit Gestalt-like perception, which relates to representation learning and model architecture analysis.
-
MOFGPT: Generative Design of Metal-Organic Frameworks using Language Models - Score: 15 (R=8, N=7) - Date: 2025-06-03 - Comment: The paper presents a generative design framework for MOFs using LLMs, which is relevant to AI for Science with a focus on foundational research in molecular modeling.
-
Control-R: Towards controllable test-time scaling - Score: 15 (R=8, N=7) - Date: 2025-06-03 - Comment: The paper introduces a novel test-time approach for reasoning in LRMs, which aligns with model architecture innovations through conditional networks.
-
PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations - Score: 15 (R=8, N=7) - Date: 2025-06-02 - Comment: The paper introduces a new transformer architecture for physics simulations, which aligns with model architecture innovations.
-
Mixture-of-Experts for Personalized and Semantic-Aware Next Location Prediction - Score: 15 (R=8, N=7) - Date: 2025-06-02 - Comment: The paper introduces a novel framework NextLocMoE using Mixture-of-Experts (MoE) for next location prediction, focusing on personalized and semantic-aware predictions. It provides insights into MoE architecture, which aligns with the model architecture criterion.
-
Cross-Attention Speculative Decoding - Score: 15 (R=8, N=7) - Date: 2025-06-02 - Comment: The paper presents a cross-attention-based speculative decoding model, which is relevant to model architecture innovations in LLMs.
-
Mixpert: Mitigating Multimodal Learning Conflicts with Efficient Mixture-of-Vision-Experts - Score: 15 (R=8, N=7) - Date: 2025-06-02 - Comment: The paper introduces Mixpert, a mixture-of-vision-experts architecture, which is relevant to model architecture innovations, particularly in multimodal learning.
-
Weisfeiler and Leman Follow the Arrow of Time: Expressive Power of Message Passing in Temporal Event Graphs - Score: 15 (R=8, N=7) - Date: 2025-06-02 - Comment: The paper introduces a novel message passing scheme for temporal graph neural networks, focusing on the expressive power of TGNNs, which is relevant to model architecture analysis.
-
Graph Flow Matching: Enhancing Image Generation with Neighbor-Aware Flow Fields - Score: 15 (R=8, N=7) - Date: 2025-06-02 - Comment: The paper proposes Graph Flow Matching, enhancing image generation with a novel architecture that combines flow matching with graph neural networks, relevant to model architecture innovations.
-
Cartan Networks: Group theoretical Hyperbolic Deep Learning - Score: 15 (R=8, N=7) - Date: 2025-06-02 - Comment: The paper introduces a novel class of hyperbolic deep learning architectures, which is relevant to model architecture innovations.
-
Mamba Knockout for Unraveling Factual Information Flow - Score: 15 (R=8, N=7) - Date: 2025-06-02 - Comment: The paper explores information flow in Mamba SSM-based language models, relevant to understanding LLM behavior.
-
Revisiting Uncertainty Estimation and Calibration of Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-06-02 - Comment: The paper evaluates uncertainty estimation in LLMs, focusing on Mixture-of-Experts architectures, which is relevant to large language models and model architecture.
Model Compression and Efficiency (161)
-
RL for Reasoning by Adaptively Revealing Rationales - Score: 20.0 (R=0, N=0) - Date: 2025-06-24 - Comment: Author match
-
AlphaEvolve: A coding agent for scientific and algorithmic discovery - Score: 18 (R=9, N=9) - Date: 2025-06-17 - Comment: The paper introduces AlphaEvolve, an evolutionary coding agent for scientific and algorithmic discovery, which is relevant to AI for Science and emerging trends.
-
QuickSilver -- Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization - Score: 17 (R=9, N=8) - Date: 2025-06-30 - Comment: QuickSilver introduces a modular framework for LLM inference optimization, including KV Cache Skipping and Adaptive Matryoshka Quantization, which are relevant to model compression and efficiency.
-
Probabilistic Optimality for Inference-time Scaling - Score: 17 (R=9, N=8) - Date: 2025-06-30 - Comment: The paper proposes a probabilistic framework for inference-time scaling in LLMs, providing a theoretical foundation for efficient scaling, which is relevant to model efficiency and LLM behavior.
-
Score-Based Model for Low-Rank Tensor Recovery - Score: 17 (R=9, N=8) - Date: 2025-06-30 - Comment: The paper proposes a score-based model for low-rank tensor recovery, which is relevant to model compression and low-rank approaches.
-
Data Efficacy for Language Model Training - Score: 17 (R=9, N=8) - Date: 2025-06-30 - Comment: The paper introduces a new paradigm, DELT, for data efficacy in language model training, which is foundational for improving training dynamics and efficiency in LLMs.
-
Linearity-based neural network compression - Score: 17 (R=9, N=8) - Date: 2025-06-27 - Comment: The paper introduces a novel linearity-based compression method for neural networks, which aligns with Model Compression through its focus on reducing model size.
-
Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning - Score: 17 (R=9, N=8) - Date: 2025-06-27 - Comment: The paper proposes a novel approach to continual learning using a sparse mixture-of-rank adaptive learning, which is relevant to model architecture and efficiency.
-
DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs - Score: 17 (R=9, N=8) - Date: 2025-06-26 - Comment: The paper introduces a novel framework for dual sparsity in LLMs, focusing on pruning and activation sparsity, which aligns with the model compression criterion.
-
First-Order Sparse Convex Optimization: Better Rates with Sparse Updates - Score: 17 (R=9, N=8) - Date: 2025-06-25 - Comment: The paper discusses sparse updates in convex optimization, which aligns with the model compression topic through sparsity and efficiency improvements.
-
Safe Pruning LoRA: Robust Distance-Guided Pruning for Safety Alignment in Adaptation of LLMs - Score: 17 (R=9, N=8) - Date: 2025-06-25 - Comment: The paper introduces a novel pruning-based approach for improving safety alignment in LLMs, which aligns with model compression and efficiency breakthroughs.
-
CommVQ: Commutative Vector Quantization for KV Cache Compression - Score: 17 (R=9, N=8) - Date: 2025-06-24 - Comment: The paper presents a novel method for KV cache compression using commutative vector quantization, which is relevant to model compression and efficiency breakthroughs.
-
UltraSketchLLM: Saliency-Driven Sketching for Ultra-Low Bit LLM Compression - Score: 17 (R=9, N=8) - Date: 2025-06-24 - Comment: The paper presents UltraSketchLLM, a novel framework for ultra-low bit compression of LLMs, aligning with the model compression criterion.
-
Revisiting LoRA through the Lens of Parameter Redundancy: Spectral Encoding Helps - Score: 17 (R=9, N=8) - Date: 2025-06-23 - Comment: The paper introduces SeLoRA, a novel approach to reduce parameter redundancy in LoRA, contributing to model compression and efficiency.
-
Joint Tensor-Train Parameterization for Efficient and Expressive Low-Rank Adaptation - Score: 17 (R=9, N=8) - Date: 2025-06-23 - Comment: The paper proposes a novel tensor-train-guided adaptation framework for low-rank adaptation, relevant to model compression through low-rank approaches.
-
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights - Score: 17 (R=9, N=8) - Date: 2025-06-23 - Comment: The paper introduces a novel approach to parameter-efficient fine-tuning of LLMs using prompt-conditioned parameter generation, which is relevant to model compression and efficiency.
-
Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs - Score: 17 (R=9, N=8) - Date: 2025-06-17 - Comment: The paper presents attribution-guided pruning for LLMs, focusing on model compression and interpretability, which aligns with model compression and efficiency breakthroughs.
-
Training Neural Networks by Optimizing Neuron Positions - Score: 17 (R=9, N=8) - Date: 2025-06-17 - Comment: The paper proposes a parameter-efficient neural architecture by optimizing neuron positions, which is relevant to model compression and efficiency.
-
AdaLRS: Loss-Guided Adaptive Learning Rate Search for Efficient Foundation Model Pretraining - Score: 17 (R=9, N=8) - Date: 2025-06-17 - Comment: The paper proposes AdaLRS, an adaptive learning rate search algorithm for foundation model pretraining, which is relevant to foundational research in model training dynamics.
-
MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-06-17 - Comment: The paper introduces a novel linear-space probabilistic framework for achieving (N:M)-sparsity in LLMs, which is relevant to model compression through sparsity and pruning.
-
Meta Pruning via Graph Metanetworks : A Meta Learning Framework for Network Pruning - Score: 17 (R=9, N=8) - Date: 2025-06-17 - Comment: The paper proposes a novel meta-learning framework for network pruning, which is relevant to model compression and introduces a new approach using metanetworks.
-
FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation - Score: 17 (R=9, N=8) - Date: 2025-06-16 - Comment: The paper introduces FIMA-Q, a post-training quantization method for Vision Transformers using Fisher Information Matrix approximation, contributing to model compression.
-
Dynamic Sparse Training of Diagonally Sparse Networks - Score: 17 (R=9, N=8) - Date: 2025-06-16 - Comment: The paper proposes a novel structured sparse-to-sparse training method, which aligns with the model compression criterion.
-
Boost Post-Training Quantization via Null Space Optimization for Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-06-16 - Comment: The paper introduces a novel approach to post-training quantization for large language models using null space optimization, which is a significant contribution to model compression.
-
AWP: Activation-Aware Weight Pruning and Quantization with Projected Gradient Descent - Score: 17 (R=9, N=8) - Date: 2025-06-13 - Comment: The paper focuses on model compression through activation-aware weight pruning and quantization, which aligns with the model compression criterion.
-
sparseGeoHOPCA: A Geometric Solution to Sparse Higher-Order PCA Without Covariance Estimation - Score: 17 (R=9, N=8) - Date: 2025-06-11 - Comment: The paper presents sparseGeoHOPCA, a novel framework for sparse higher-order PCA, which is relevant to model compression and representation learning.
-
Diversity-Guided MLP Reduction for Efficient Large Vision Transformers - Score: 17 (R=9, N=8) - Date: 2025-06-11 - Comment: The paper proposes a method for MLP reduction in vision transformers, which is relevant to model compression and efficiency.
-
Optimization over Sparse Support-Preserving Sets: Two-Step Projection with Global Optimality Guarantees - Score: 17 (R=9, N=8) - Date: 2025-06-11 - Comment: The paper presents a new sparse optimization algorithm with global optimality guarantees, relevant to model compression through sparsity.
-
Draft-based Approximate Inference for LLMs - Score: 17 (R=9, N=8) - Date: 2025-06-11 - Comment: The paper proposes a novel framework for approximate LLM inference using draft models, which relates to model efficiency and compression. It introduces new methods for inference acceleration, aligning with foundational research in LLMs.
-
Highly Compressed Tokenizer Can Generate Without Training - Score: 17 (R=9, N=8) - Date: 2025-06-11 - Comment: The paper discusses a highly compressed tokenizer with vector quantization, which relates to model compression and efficiency. It introduces a novel approach to image generation without training a generative model, aligning with foundational research in representation learning.
-
Hyperpruning: Efficient Search through Pruned Variants of Recurrent Neural Networks Leveraging Lyapunov Spectrum - Score: 17 (R=9, N=8) - Date: 2025-06-10 - Comment: The paper introduces a novel Lyapunov Spectrum-based metric for efficient pruning in recurrent neural networks, aligning with the model compression criterion.
-
Squeeze3D: Your 3D Generation Model is Secretly an Extreme Neural Compressor - Score: 17 (R=9, N=8) - Date: 2025-06-10 - Comment: The paper presents a novel framework for 3D data compression using pre-trained generative models, aligning with model compression and efficiency.
-
MiniCPM4: Ultra-Efficient LLMs on End Devices - Score: 17 (R=9, N=8) - Date: 2025-06-10 - Comment: The paper introduces an efficient LLM for end devices, focusing on model architecture and efficiency improvements.
-
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance - Score: 17 (R=9, N=8) - Date: 2025-06-10 - Comment: The paper introduces a new inference scaling paradigm for LLM safety assurance, which is relevant to foundational research in LLMs.
-
Cartridges: Lightweight and general-purpose long context representations via self-study - Score: 17 (R=9, N=8) - Date: 2025-06-09 - Comment: The paper introduces a method for efficient long context representations, relevant to model compression and efficiency.
-
BAQ: Efficient Bit Allocation Quantization for Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-06-09 - Comment: The paper proposes an efficient bit allocation quantization method for LLMs, focusing on model compression and efficiency, which is relevant to model compression.
-
LFA applied to CNNs: Efficient Singular Value Decomposition of Convolutional Mappings by Local Fourier Analysis - Score: 17 (R=9, N=8) - Date: 2025-06-09 - Comment: The paper proposes a novel approach for efficient singular value decomposition of convolutional mappings, which is relevant to model compression and efficiency.
-
The Generative Leap: Sharp Sample Complexity for Efficiently Learning Gaussian Multi-Index Models - Score: 17 (R=9, N=8) - Date: 2025-06-09 - Comment: The paper discusses efficient learning of Gaussian Multi-index models, focusing on representation learning and sample complexity, which aligns with foundational research in representation learning.
-
Inference-Time Hyper-Scaling with KV Cache Compression - Score: 17 (R=9, N=8) - Date: 2025-06-06 - Comment: The paper introduces Dynamic Memory Sparsification for KV cache compression, aligning with model compression and efficiency improvements.
-
FPTQuant: Function-Preserving Transforms for LLM Quantization - Score: 17 (R=9, N=8) - Date: 2025-06-06 - Comment: The paper introduces FPTQuant, a novel approach for LLM quantization, contributing to model compression with function-preserving transforms.
-
Efficient Knowledge Editing via Minimal Precomputation - Score: 17 (R=9, N=8) - Date: 2025-06-05 - Comment: The paper provides theoretical insights into reducing precomputation in knowledge editing for LLMs, aligning with the LLMs criterion.
-
PoLAR: Polar-Decomposed Low-Rank Adapter Representation - Score: 17 (R=9, N=8) - Date: 2025-06-04 - Comment: The paper proposes a novel low-rank adaptation method for large-scale models, which aligns with model compression and efficiency breakthroughs.
-
QKV Projections Require a Fraction of Their Memory - Score: 17 (R=9, N=8) - Date: 2025-06-04 - Comment: The paper proposes a novel tensor compression technique for QKV projections in attention layers, relevant to model compression.
-
Computational Thresholds in Multi-Modal Learning via the Spiked Matrix-Tensor Model - Score: 17 (R=9, N=8) - Date: 2025-06-04 - Comment: The paper explores a novel approach to multi-modal learning using a spiked matrix-tensor model, which provides insights into training dynamics and inference in high-dimensional settings.
-
Data Pruning by Information Maximization - Score: 17 (R=9, N=8) - Date: 2025-06-04 - Comment: The paper presents a novel data pruning method, InfoMax, which is relevant to model compression through coreset selection and sparsification techniques.
-
VUSA: Virtually Upscaled Systolic Array Architecture to Exploit Unstructured Sparsity in AI Acceleration - Score: 17 (R=9, N=8) - Date: 2025-06-04 - Comment: The paper introduces a novel systolic-array architecture that exploits unstructured sparsity, which is relevant to model compression and efficiency.
-
MLorc: Momentum Low-rank Compression for Large Language Model Adaptation - Score: 17 (R=9, N=8) - Date: 2025-06-03 - Comment: The paper proposes a novel memory-efficient training paradigm for LLMs using momentum low-rank compression, aligning with foundational research in model compression.
-
Unified Scaling Laws for Compressed Representations - Score: 17 (R=9, N=8) - Date: 2025-06-03 - Comment: The paper explores the interplay between scaling laws and compression formats, relevant to model compression and efficiency.
-
Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism - Score: 17 (R=9, N=8) - Date: 2025-06-03 - Comment: The paper proposes a novel compression algorithm for decentralized training, relevant to model compression and efficiency.
-
Uni-LoRA: One Vector is All You Need - Score: 17 (R=9, N=8) - Date: 2025-06-03 - Comment: The paper presents Uni-LoRA, a framework for parameter-efficient fine-tuning, relevant to model compression and efficiency.
-
Ultra-Quantisation: Efficient Embedding Search via 1.58-bit Encodings - Score: 17 (R=9, N=8) - Date: 2025-06-03 - Comment: The paper introduces a novel quantization method for efficient embedding search, which aligns with model compression and efficiency.
-
FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts - Score: 17 (R=9, N=8) - Date: 2025-06-03 - Comment: The paper proposes a novel PEFT framework for LLMs using MoE-based low-rank adaptation, relevant to model architecture and compression.
-
It Takes a Good Model to Train a Good Model: Generalized Gaussian Priors for Optimized LLMs - Score: 17 (R=9, N=8) - Date: 2025-06-03 - Comment: The paper proposes a framework for LLM optimization using generalized Gaussian priors, which aligns with model compression and efficiency.
-
ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration - Score: 17 (R=9, N=8) - Date: 2025-06-02 - Comment: The paper focuses on KV cache compression, a relevant topic under model compression, introducing a novel method for low-rank compression with minimal performance loss.
-
GradPower: Powering Gradients for Faster Language Model Pre-Training - Score: 17 (R=9, N=8) - Date: 2025-06-02 - Comment: The paper introduces GradPower, a gradient-transformation technique for accelerating language model pre-training, which is relevant to efficiency improvements in LLMs.
-
SALE : Low-bit Estimation for Efficient Sparse Attention in Long-context LLM Prefilling - Score: 17 (R=9, N=8) - Date: 2025-06-02 - Comment: The paper proposes SALE, a sparse attention method for LLMs, focusing on efficiency improvements through quantization and sparse attention, aligning with the model compression criterion.
-
Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution - Score: 17 (R=9, N=8) - Date: 2025-06-02 - Comment: The paper proposes a novel method, SEFT, for fine-tuning sparse LLMs, which is relevant to model compression and efficiency.
-
TSENOR: Highly-Efficient Algorithm for Finding Transposable N:M Sparse Masks - Score: 17 (R=9, N=8) - Date: 2025-06-02 - Comment: The paper presents a novel approach to network pruning with transposable N:M sparsity, which is relevant to model compression.
-
DenseLoRA: Dense Low-Rank Adaptation of Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-06-02 - Comment: DenseLoRA enhances parameter efficiency in low-rank adaptation of LLMs, which is relevant to model compression and large language models.
-
DLP: Dynamic Layerwise Pruning in Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-06-02 - Comment: The paper proposes Dynamic Layerwise Pruning for LLMs, relevant to model compression.
-
Projected Compression: Trainable Projection for Efficient Transformer Compression - Score: 16 (R=9, N=7) - Date: 2025-06-30 - Comment: The paper proposes Projected Compression, a novel model compression technique for Transformers, which aligns with the core topic of model compression.
-
DipSVD: Dual-importance Protected SVD for Efficient LLM Compression - Score: 16 (R=9, N=7) - Date: 2025-06-26 - Comment: The paper proposes a dual-level importance protection mechanism for SVD-based compression, which is relevant to model compression through low-rank approaches.
-
Orthogonal Soft Pruning for Efficient Class Unlearning - Score: 16 (R=9, N=7) - Date: 2025-06-26 - Comment: The paper presents a novel class-aware soft pruning framework, which aligns with the model compression criterion, focusing on pruning techniques.
-
UProp: Investigating the Uncertainty Propagation of LLMs in Multi-Step Agentic Decision-Making - Score: 16 (R=9, N=7) - Date: 2025-06-24 - Comment: The paper introduces UProp, a framework for uncertainty propagation in LLMs, which aligns with the large language models criterion by providing theoretical insights into LLM behavior.
-
BASE-Q: Bias and Asymmetric Scaling Enhanced Rotational Quantization for Large Language Models - Score: 16 (R=9, N=7) - Date: 2025-06-23 - Comment: The paper introduces a new quantization method for LLMs, which is relevant to model compression and efficiency improvements.
-
DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration - Score: 16 (R=9, N=7) - Date: 2025-06-16 - Comment: The paper introduces a dynamic sparse attention mechanism for LLMs, which aligns with the model compression and LLM criteria.
-
Tina: Tiny Reasoning Models via LoRA - Score: 16 (R=9, N=7) - Date: 2025-06-13 - Comment: The paper discusses low-rank adaptation (LoRA) for efficient reasoning in language models, which is relevant to model compression and efficiency.
-
The Geometries of Truth Are Orthogonal Across Tasks - Score: 16 (R=9, N=7) - Date: 2025-06-11 - Comment: The paper discusses the task-dependent nature of 'geometries of truth' in LLMs, providing theoretical insights into LLM behavior.
-
BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing - Score: 16 (R=9, N=7) - Date: 2025-06-05 - Comment: The paper focuses on model compression through quantization and weight indexing, which aligns with the model compression criterion.
-
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem - Score: 16 (R=9, N=7) - Date: 2025-06-05 - Comment: The paper discusses critique fine-tuning to enhance reasoning in LLMs, which is relevant to understanding and improving LLM behavior, aligning with foundational research in LLMs.
-
Earley-Driven Dynamic Pruning for Efficient Structured Decoding - Score: 16 (R=9, N=7) - Date: 2025-06-03 - Comment: The paper proposes a dynamic pruning strategy for efficient structured decoding, relevant to model compression.
-
Towards an Optimal Control Perspective of ResNet Training - Score: 16 (R=8, N=8) - Date: 2025-06-27 - Comment: The paper proposes a training formulation for ResNets reflecting an optimal control problem, which could lead to a theory-grounded layer pruning strategy.
-
Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference - Score: 16 (R=8, N=8) - Date: 2025-06-27 - Comment: The paper presents a novel Bayesian low-rank adaptation method for LLMs, which aligns with the Model Compression criterion through its focus on low-rank approaches.
-
Non-equilibrium Annealed Adjoint Sampler - Score: 16 (R=8, N=8) - Date: 2025-06-24 - Comment: The paper introduces a novel SOC-based diffusion sampler, which is relevant to emerging trends in foundational research.
-
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective - Score: 16 (R=8, N=8) - Date: 2025-06-24 - Comment: The paper proposes a novel prompt design paradigm for LLMs, which aligns with the large language models criterion by challenging conventional wisdom in LLM prompting.
-
Sharp Generalization Bounds for Foundation Models with Asymmetric Randomized Low-Rank Adapters - Score: 16 (R=8, N=8) - Date: 2025-06-18 - Comment: The paper provides theoretical insights into LoRA with asymmetric randomized low-rank adapters, which is relevant to model compression and efficiency.
-
Spectral Estimation with Free Decompression - Score: 16 (R=8, N=8) - Date: 2025-06-16 - Comment: The paper introduces a novel method for spectral estimation using free decompression, which is a theoretical advancement in handling large matrices.
-
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs - Score: 16 (R=8, N=8) - Date: 2025-06-13 - Comment: The paper proposes a novel token pruning method for MLLMs, aligning with model compression and efficiency breakthroughs.
-
Interior-Point Vanishing Problem in Semidefinite Relaxations for Neural Network Verification - Score: 16 (R=8, N=8) - Date: 2025-06-13 - Comment: The paper addresses a fundamental challenge in neural network verification using semidefinite relaxations, which is relevant to understanding model behavior and efficiency.
-
Accelerating Constrained Sampling: A Large Deviations Approach - Score: 16 (R=8, N=8) - Date: 2025-06-10 - Comment: The paper discusses a theoretical approach to improve sampling methods, which is relevant to foundational research in model efficiency.
-
Constrained Sampling for Language Models Should Be Easy: An MCMC Perspective - Score: 16 (R=8, N=8) - Date: 2025-06-09 - Comment: The paper proposes a constrained sampling framework for language models, which is relevant to large language models and introduces a novel MCMC-based approach.
-
PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling - Score: 16 (R=8, N=8) - Date: 2025-06-09 - Comment: The paper proposes a novel vector quantization framework for LLMs, which is relevant to model compression techniques.
-
Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order - Score: 16 (R=8, N=8) - Date: 2025-06-06 - Comment: The paper introduces zero-order optimization methods for fine-tuning LLMs, which is relevant to model compression and efficiency breakthroughs.
-
Retrieval-Augmented Generation as Noisy In-Context Learning: A Unified Theory and Risk Bounds - Score: 16 (R=8, N=8) - Date: 2025-06-04 - Comment: The paper provides a theoretical framework for retrieval-augmented generation, offering insights into LLM behavior, which aligns with the core topics.
-
TAH-QUANT: Effective Activation Quantization in Pipeline Parallelism over Slow Network - Score: 16 (R=8, N=8) - Date: 2025-06-03 - Comment: The paper introduces TAH-Quant, a novel activation quantization framework, relevant to model compression.
-
zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression - Score: 16 (R=8, N=8) - Date: 2025-06-03 - Comment: The paper introduces a framework for adaptive vocabularies in language models via token compression, which is relevant to model compression.
-
R-KV: Redundancy-aware KV Cache Compression for Training-Free Reasoning Models Acceleration - Score: 16 (R=8, N=8) - Date: 2025-06-02 - Comment: The paper proposes a novel KV cache compression method for reasoning models, relevant to model compression.
-
LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs - Score: 15 (R=8, N=7) - Date: 2025-06-30 - Comment: The paper introduces LLaVA-Scissor, a token compression strategy for video LLMs, aligning with the Model Compression criteria.
-
The Cost of Avoiding Backpropagation - Score: 15 (R=8, N=7) - Date: 2025-06-30 - Comment: The paper provides a comprehensive analysis of forward-mode automatic differentiation and zero-order optimization, which is relevant to model training dynamics and efficiency.
-
Model State Arithmetic for Machine Unlearning - Score: 15 (R=8, N=7) - Date: 2025-06-27 - Comment: The paper proposes a new algorithm for machine unlearning, which is relevant to foundational research in large language models and model efficiency.
-
Lower Bounds on the Size of Markov Equivalence Classes - Score: 15 (R=8, N=7) - Date: 2025-06-27 - Comment: The paper provides theoretical insights into the size of Markov equivalence classes, which is relevant to emerging trends in theoretical work.
-
Characterization and Mitigation of Training Instabilities in Microscaling Formats - Score: 15 (R=8, N=7) - Date: 2025-06-27 - Comment: The paper investigates training instabilities in low-precision formats, which is relevant to Model Compression through its focus on precision and efficiency in training.
-
Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-06-26 - Comment: The paper addresses safety risks and quantization-aware safety patching for quantized LLMs, which aligns with model compression and efficiency breakthroughs.
-
COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees - Score: 15 (R=8, N=7) - Date: 2025-06-26 - Comment: The paper proposes an uncertainty-guarding selection framework for foundation models, relevant to foundational model research.
-
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-06-25 - Comment: The paper focuses on pre-training strategies to improve quantization in LLMs, relevant to model compression and efficiency.
-
Dual-Forward Path Teacher Knowledge Distillation: Bridging the Capacity Gap Between Teacher and Student - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper proposes a novel knowledge distillation method, which is relevant to model compression and efficiency.
-
Quantum-Classical Hybrid Quantized Neural Network - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper presents a quantum-classical hybrid quantized neural network, which is relevant to foundational research in model compression and quantization.
-
Log-Normal Multiplicative Dynamics for Stable Low-Precision Training of Large Networks - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper introduces a novel Log-Normal Multiplicative Dynamics algorithm for stable low-precision training, which relates to model compression and efficiency.
-
Faster Low-Rank Approximation and Kernel Ridge Regression via the Block-Nystr\"om Method - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper introduces a novel low-rank approximation method, aligning with the model compression criterion.
-
One Sample is Enough to Make Conformal Prediction Robust - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper proposes a novel approach to robust conformal prediction, which aligns with model compression and efficiency by reducing computational costs.
-
Mitigating Over-Squashing in Graph Neural Networks by Spectrum-Preserving Sparsification - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper proposes a method for mitigating over-squashing in GNNs using spectrum-preserving sparsification, which is relevant to model architecture and sparsity.
-
A Scalable Factorization Approach for High-Order Structured Tensor Recovery - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper presents a scalable factorization approach for tensor recovery, which is relevant to model compression and efficiency through tensor decompositions.
-
LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: LazyEviction addresses KV cache size in LLMs, relevant to model compression and efficiency, particularly in maintaining reasoning performance.
-
From Local Interactions to Global Operators: Scalable Gaussian Process Operator for Physical Systems - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper introduces a novel scalable Gaussian Process Operator leveraging sparsity and structural information, relevant to model compression and efficiency.
-
$\texttt{SPECS}$: Faster Test-Time Scaling through Speculative Drafts - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper proposes a latency-aware test-time scaling method for LLMs, which is relevant to model efficiency and compression.
-
MadaKV: Adaptive Modality-Perception KV Cache Eviction for Efficient Multimodal Long-Context Inference - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper introduces MadaKV, a KV cache eviction strategy for multimodal LLMs, relevant to model compression and efficiency.
-
Optimizing Length Compression in Large Reasoning Models - Score: 15 (R=8, N=7) - Date: 2025-06-18 - Comment: The paper introduces a method for optimizing length compression in large reasoning models, which relates to model compression through efficiency improvements.
-
S$^4$C: Speculative Sampling with Syntactic and Semantic Coherence for Efficient Inference of Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-06-18 - Comment: The paper proposes a speculative sampling framework for efficient inference in LLMs, which aligns with large language models by addressing efficiency improvements.
-
Knowledge Compression via Question Generation: Enhancing Multihop Document Retrieval without Fine-tuning - Score: 15 (R=8, N=7) - Date: 2025-06-18 - Comment: The paper focuses on a novel method for knowledge compression, which aligns with the model compression criterion.
-
MobiEdit: Resource-efficient Knowledge Editing for Personalized On-device LLMs - Score: 15 (R=8, N=7) - Date: 2025-06-18 - Comment: The paper focuses on resource-efficient knowledge editing for LLMs on mobile devices, which relates to model compression and efficiency.
-
TensorSLM: Energy-efficient Embedding Compression of Sub-billion Parameter Language Models on Low-end Devices - Score: 15 (R=8, N=7) - Date: 2025-06-17 - Comment: The paper focuses on embedding compression using Tensor-Train Decomposition, which is relevant to model compression techniques.
-
CALM: Consensus-Aware Localized Merging for Multi-Task Learning - Score: 15 (R=8, N=7) - Date: 2025-06-17 - Comment: The paper proposes CALM, a method for model merging in multi-task learning, which is relevant to model architecture and efficiency.
-
Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence - Score: 15 (R=8, N=7) - Date: 2025-06-17 - Comment: The paper introduces a novel low-rank adaptation method for LLMs, which is relevant to model compression and efficiency.
-
Multipole Attention for Efficient Long Context Reasoning - Score: 15 (R=8, N=7) - Date: 2025-06-17 - Comment: The paper introduces Multipole Attention for efficient long-context reasoning, which is relevant to model architecture and efficiency improvements.
-
Flexible Realignment of Language Models - Score: 15 (R=8, N=7) - Date: 2025-06-17 - Comment: The paper proposes a flexible realignment framework for language models, which is relevant to foundational research in LLMs.
-
Beyond Sin-Squared Error: Linear-Time Entrywise Uncertainty Quantification for Streaming PCA - Score: 15 (R=8, N=7) - Date: 2025-06-17 - Comment: The paper proposes a novel statistical inference framework for streaming PCA, focusing on uncertainty quantification and efficient estimation, which is relevant to representation learning and efficiency.
-
LARGO: Low-Rank Regulated Gradient Projection for Robust Parameter Efficient Fine-Tuning - Score: 15 (R=8, N=7) - Date: 2025-06-17 - Comment: The paper proposes a low-rank regulated gradient projection algorithm for robust parameter-efficient fine-tuning, relevant to model compression and efficiency.
-
Scaling Probabilistic Circuits via Monarch Matrices - Score: 15 (R=8, N=7) - Date: 2025-06-17 - Comment: The paper introduces a novel sparse and structured parameterization for probabilistic circuits using Monarch matrices, which is relevant to model compression and efficiency.
-
Training-free LLM Merging for Multi-task Learning - Score: 15 (R=8, N=7) - Date: 2025-06-17 - Comment: The paper presents a method for merging LLMs for multi-task learning, which involves model-wise and layer-wise pruning, relevant to model compression.
-
Efficient Network Automatic Relevance Determination - Score: 15 (R=8, N=7) - Date: 2025-06-17 - Comment: The paper proposes Network Automatic Relevance Determination (NARD) for modeling sparse relationships, which is relevant to representation learning and model compression.
-
Why Do Some Inputs Break Low-Bit LLM Quantization? - Score: 15 (R=8, N=7) - Date: 2025-06-17 - Comment: The paper analyzes low-bit quantization errors in LLMs, which aligns with the model compression criterion.
-
LCD: Advancing Extreme Low-Bit Clustering for Large Language Models via Knowledge Distillation - Score: 15 (R=8, N=7) - Date: 2025-06-17 - Comment: The paper presents a novel approach to low-bit clustering for LLMs using knowledge distillation, which is relevant to model compression and efficiency.
-
FlexQuant: A Flexible and Efficient Dynamic Precision Switching Framework for LLM Quantization - Score: 15 (R=8, N=7) - Date: 2025-06-17 - Comment: The paper proposes FlexQuant, a dynamic precision-switching framework for LLM quantization, which is relevant to model compression and efficiency.
-
TruncQuant: Truncation-Ready Quantization for DNNs with Flexible Weight Bit Precision - Score: 15 (R=8, N=7) - Date: 2025-06-16 - Comment: The paper introduces a novel quantization method, TruncQuant, which is relevant to model compression and efficiency.
-
PolyMicros: Bootstrapping a Foundation Model for Polycrystalline Material Structure - Score: 15 (R=8, N=7) - Date: 2025-06-16 - Comment: The paper introduces a foundation model for polycrystalline materials, which is relevant to foundational research in AI for Science.
-
GenFT: A Generative Parameter-Efficient Fine-Tuning Method for Pretrained Foundation Models - Score: 15 (R=8, N=7) - Date: 2025-06-16 - Comment: The paper proposes GenFT, a generative parameter-efficient fine-tuning method for pretrained foundation models, which is a novel approach in model adaptation.
-
Slimming Down LLMs Without Losing Their Minds - Score: 15 (R=8, N=7) - Date: 2025-06-13 - Comment: The paper investigates parameter-efficient methods for fine-tuning LLMs, which is relevant to model compression and efficiency.
-
TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree - Score: 15 (R=8, N=7) - Date: 2025-06-13 - Comment: The paper introduces TreeLoRA, a novel approach for efficient continual learning in large pre-trained models, which is relevant to model compression and efficiency.
-
MAC: An Efficient Gradient Preconditioning using Mean Activation Approximated Curvature - Score: 15 (R=8, N=7) - Date: 2025-06-11 - Comment: The paper introduces an efficient gradient preconditioning method for training neural networks, which relates to model architecture and optimization. It provides a novel approach to improving training efficiency.
-
On Reasoning Strength Planning in Large Reasoning Models - Score: 15 (R=8, N=7) - Date: 2025-06-11 - Comment: The paper explores reasoning strength in large reasoning models, providing insights into LLM behavior, which is relevant to foundational research in LLMs.
-
LoRMA: Low-Rank Multiplicative Adaptation for LLMs - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper proposes a low-rank adaptation method for LLMs, relevant to model compression and efficiency.
-
PrunePEFT: Iterative Hybrid Pruning for Parameter-Efficient Fine-tuning of LLMs - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper introduces a novel pruning approach for parameter-efficient fine-tuning of LLMs, relevant to model compression.
-
Improving Memory Efficiency for Training KANs via Meta Learning - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper proposes a novel method for improving memory efficiency in KANs, which aligns with foundational research in model compression and efficiency.
-
Mind the Gap: Removing the Discretization Gap in Differentiable Logic Gate Networks - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper addresses model compression by reducing the discretization gap in logic gate networks, aligning with the model compression criterion.
-
Less is More: some Computational Principles based on Parcimony, and Limitations of Natural Intelligence - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper discusses principles of natural intelligence that could inspire more efficient AI systems, aligning with emerging trends in AI efficiency.
-
Certified Unlearning for Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper introduces a novel method for certified machine unlearning, which is a foundational research area related to model efficiency and privacy.
-
Flexible Operator Fusion for Fast Sparse Transformer with Diverse Masking on GPU - Score: 15 (R=8, N=7) - Date: 2025-06-09 - Comment: The paper proposes a framework for optimizing sparse Transformers, which is relevant to model compression and efficiency improvements.
-
RETENTION: Resource-Efficient Tree-Based Ensemble Model Acceleration with Content-Addressable Memory - Score: 15 (R=8, N=7) - Date: 2025-06-09 - Comment: The paper introduces a framework for accelerating tree-based ensemble models, which is relevant to model compression and efficiency.
-
Pruning Spurious Subgraphs for Graph Out-of-Distribtuion Generalization - Score: 15 (R=8, N=7) - Date: 2025-06-09 - Comment: The paper proposes a pruning-based method for graph out-of-distribution generalization, which is relevant to model compression and efficiency.
-
Route-and-Reason: Scaling Large Language Model Reasoning with Reinforced Model Router - Score: 15 (R=8, N=7) - Date: 2025-06-09 - Comment: The paper proposes a framework for collaborative reasoning in LLMs, which is relevant to LLM architecture and efficiency.
-
Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation - Score: 15 (R=8, N=7) - Date: 2025-06-09 - Comment: The paper proposes a progressive training strategy for low-rank adaptation, focusing on improving model generalization and pruning, which is relevant to model compression and efficiency.
-
Tight analyses of first-order methods with error feedback - Score: 15 (R=8, N=7) - Date: 2025-06-06 - Comment: The paper provides a tight analysis of error feedback methods in distributed learning, focusing on compression and efficiency, which is relevant to model compression.
-
HALoS: Hierarchical Asynchronous Local SGD over Slow Networks for Geo-Distributed Large Language Model Training - Score: 15 (R=8, N=7) - Date: 2025-06-06 - Comment: The paper proposes HALoS, a hierarchical asynchronous optimization framework for LLM training, which aligns with model architecture innovations and efficiency improvements.
-
Guided Speculative Inference for Efficient Test-Time Alignment of LLMs - Score: 15 (R=8, N=7) - Date: 2025-06-05 - Comment: The paper proposes a novel algorithm for efficient test-time alignment of LLMs, which aligns with large language models by providing theoretical insights into LLM behavior.
-
Out-of-Vocabulary Sampling Boosts Speculative Decoding - Score: 15 (R=8, N=7) - Date: 2025-06-05 - Comment: The paper introduces a novel out-of-vocabulary sampler for speculative decoding, which is relevant to model compression and efficiency by enabling extreme pruning of vocabulary.
-
Not All Tokens Are Meant to Be Forgotten - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper introduces a framework for targeted information forgetting in LLMs, which is relevant to model compression and efficiency.
-
StreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper introduces StreamBP, a memory-efficient backpropagation method for training LLMs on long sequences, which relates to model compression and efficiency.
-
WeightLoRA: Keep Only Necessary Adapters - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper proposes WeightLoRA, a method for adaptive selection of LoRA heads, which is relevant to model compression and parameter-efficient fine-tuning techniques.
-
FlexiSAGA: A Flexible Systolic Array GEMM Accelerator for Sparse and Dense Processing - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper presents a flexible AI hardware accelerator for sparse and dense processing, which is relevant to model compression.
-
Compiler Optimization via LLM Reasoning for Efficient Model Serving - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper explores compiler optimization using LLM reasoning, which is relevant to large language models and efficiency improvements.
-
Taming LLMs by Scaling Learning Rates with Gradient Grouping - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper introduces an optimizer wrapper for LLMs, which aligns with foundational research in model optimization and efficiency.
-
LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper introduces LIFT, a sparse fine-tuning method for LLMs, which relates to model compression and efficiency.
-
Quantitative Error Feedback for Quantization Noise Reduction of Filtering over Graphs - Score: 15 (R=8, N=7) - Date: 2025-06-03 - Comment: The paper introduces an error feedback framework for quantization noise reduction, relevant to model compression.
-
Flexible Mixed Precision Quantization for Learned Image Compression - Score: 15 (R=8, N=7) - Date: 2025-06-03 - Comment: The paper proposes a flexible mixed precision quantization method, which is relevant to model compression through quantization.
-
Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers - Score: 15 (R=8, N=7) - Date: 2025-06-03 - Comment: The paper presents REFORM, a novel framework for efficient long-context processing in transformers, which relates to model compression and efficiency.
-
Mamba Drafters for Speculative Decoding - Score: 15 (R=8, N=7) - Date: 2025-06-03 - Comment: The paper introduces a novel drafter for speculative decoding in LLMs, focusing on efficiency improvements, aligning with model compression and efficiency breakthroughs.
-
Rapid yet accurate Tile-circuit and device modeling for Analog In-Memory Computing - Score: 15 (R=8, N=7) - Date: 2025-06-03 - Comment: The paper models analog in-memory computing for neural networks, which is relevant to model compression and efficiency.
-
Inference Acceleration of Autoregressive Normalizing Flows by Selective Jacobi Decoding - Score: 15 (R=8, N=7) - Date: 2025-06-02 - Comment: The paper proposes a method for accelerating inference in autoregressive normalizing flows, focusing on efficiency improvements, which aligns with model compression and efficiency.
-
Knockoff-Guided Compressive Sensing: A Statistical Machine Learning Framework for Support-Assured Signal Recovery - Score: 15 (R=8, N=7) - Date: 2025-06-02 - Comment: The paper introduces a novel framework for compressive sensing with theoretical guarantees, relevant to model compression.
-
Model Unlearning via Sparse Autoencoder Subspace Guided Projections - Score: 15 (R=8, N=7) - Date: 2025-06-02 - Comment: The paper introduces a novel framework for model unlearning using sparse autoencoder subspace guided projections, which is relevant to model compression and efficiency.
-
Mind the Gap: A Practical Attack on GGUF Quantization - Score: 15 (R=8, N=7) - Date: 2025-06-02 - Comment: The paper discusses a practical attack on quantization methods, which is relevant to model compression.
High Performance Computing (43)
-
The Singapore Consensus on Global AI Safety Research Priorities - Score: 20.0 (R=0, N=0) - Date: 2025-06-27 - Comment: Author match
-
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity - Score: 20.0 (R=0, N=0) - Date: 2025-06-10 - Comment: Author match
-
FORT: Forward-Only Regression Training of Normalizing Flows - Score: 20.0 (R=0, N=0) - Date: 2025-06-04 - Comment: Author match
-
An entropy-optimal path to humble AI - Score: 18 (R=9, N=9) - Date: 2025-06-24 - Comment: The paper introduces a novel framework for entropy-optimizing Boltzmann machines, which could be considered an emerging trend in foundational AI research.
-
Random Matrix Theory for Deep Learning: Beyond Eigenvalues of Linear Models - Score: 18 (R=9, N=9) - Date: 2025-06-17 - Comment: The paper extends Random Matrix Theory to address challenges in deep learning, aligning with the emerging trends criterion.
-
Evolutionary Developmental Biology Can Serve as the Conceptual Foundation for a New Design Paradigm in Artificial Intelligence - Score: 18 (R=9, N=9) - Date: 2025-06-17 - Comment: The paper discusses a new design paradigm for AI based on evolutionary developmental biology, which is relevant to emerging trends in AI.
-
Farseer: A Refined Scaling Law in Large Language Models - Score: 18 (R=9, N=9) - Date: 2025-06-13 - Comment: The paper introduces a refined scaling law for LLMs, which is relevant to large language models and offers theoretical insights into their behavior.
-
A Conjecture on a Fundamental Trade-Off between Certainty and Scope in Symbolic and Generative AI - Score: 18 (R=9, N=9) - Date: 2025-06-13 - Comment: The paper introduces a conjecture on a fundamental trade-off in AI systems, which aligns with emerging trends and theoretical insights.
-
Language Modeling by Language Models - Score: 17 (R=9, N=8) - Date: 2025-06-26 - Comment: The paper discusses a novel approach to discovering language model architectures using a multi-agent LLM system, which aligns with foundational research in LLMs.
-
An ab initio foundation model of wavefunctions that accurately describes chemical bond breaking - Score: 17 (R=9, N=8) - Date: 2025-06-26 - Comment: The paper presents a novel transferable wavefunction model, Orbformer, which is a foundational research in molecular modeling, aligning with AI for Science.
-
Learning Physical Systems: Symplectification via Gauge Fixing in Dirac Structures - Score: 17 (R=9, N=8) - Date: 2025-06-24 - Comment: The paper introduces a novel framework, Presymplectification Networks, which addresses foundational limitations in physics-informed deep learning by embedding constrained systems into a higher-dimensional manifold. This aligns with the emerging trends criterion as it challenges established assumptions in modeling physical systems.
-
Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-06-24 - Comment: The paper explores a retrieval-free knowledge attribution method for LLMs, which aligns with the interest in theoretical insights into LLM behavior.
-
Evolutionary chemical learning in dimerization networks - Score: 17 (R=9, N=8) - Date: 2025-06-18 - Comment: The paper introduces a novel framework for chemical learning using dimerization networks, which is relevant to AI for Science with a focus on foundational research.
-
Machine Mirages: Defining the Undefined - Score: 17 (R=9, N=8) - Date: 2025-06-18 - Comment: The paper discusses 'machine mirages', a new class of cognitive aberrations in multimodal machine intelligence systems, which could be considered an emerging trend challenging established assumptions.
-
Large Language Models and Emergence: A Complex Systems Perspective - Score: 17 (R=9, N=8) - Date: 2025-06-16 - Comment: The paper examines the emergent capabilities of LLMs from a complex systems perspective, aligning with the Large Language Models criterion focusing on theoretical insights.
-
NoLoCo: No-all-reduce Low Communication Training Method for Large Models - Score: 17 (R=9, N=8) - Date: 2025-06-13 - Comment: The paper proposes a novel optimization method, NoLoCo, for low communication training of large models, which aligns with model compression and efficiency breakthroughs.
-
The Universality Lens: Why Even Highly Over-Parametrized Models Learn Well - Score: 17 (R=9, N=8) - Date: 2025-06-10 - Comment: The paper provides a theoretical explanation for why over-parameterized models generalize well, aligning with emerging trends in understanding model behavior.
-
Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning - Score: 17 (R=9, N=8) - Date: 2025-06-09 - Comment: The paper provides insights into the training dynamics of language models, which is relevant to understanding LLM behavior.
-
Dense Associative Memory with Epanechnikov Energy - Score: 17 (R=8, N=9) - Date: 2025-06-13 - Comment: The paper proposes a novel energy function for Dense Associative Memory networks, which aligns with emerging trends in foundational research by introducing a new paradigm for memory storage.
-
EvoLM: In Search of Lost Language Model Training Dynamics - Score: 16 (R=9, N=7) - Date: 2025-06-23 - Comment: The paper presents EvoLM, a model suite for analyzing language model training dynamics, providing insights into LLM behavior and training processes.
-
Mixtures of Neural Cellular Automata: A Stochastic Framework for Growth Modelling and Self-Organization - Score: 16 (R=8, N=8) - Date: 2025-06-26 - Comment: The paper introduces Mixtures of Neural Cellular Automata, a novel framework for modeling stochastic dynamical systems, which is relevant to emerging trends in model architecture.
-
Global Convergence of Adjoint-Optimized Neural PDEs - Score: 16 (R=8, N=8) - Date: 2025-06-17 - Comment: The paper studies the global convergence of adjoint-optimized neural PDEs, contributing to the theoretical understanding of neural network PDE models, which is relevant to AI for Science.
-
Solving Inverse Problems in Stochastic Self-Organising Systems through Invariant Representations - Score: 16 (R=8, N=8) - Date: 2025-06-16 - Comment: The paper introduces a novel inverse modeling method for stochastic self-organizing systems, aligning with the Emerging Trends criterion focusing on cutting-edge theoretical work.
-
Delayformer: spatiotemporal transformation for predicting high-dimensional dynamics - Score: 16 (R=8, N=8) - Date: 2025-06-16 - Comment: The paper introduces Delayformer, a novel framework for time-series prediction using spatiotemporal transformation, which is a new paradigm in time-series modeling.
-
Mathesis: Towards Formal Theorem Proving from Natural Languages - Score: 16 (R=8, N=8) - Date: 2025-06-10 - Comment: The paper presents a novel approach to formal theorem proving from natural languages, which could be relevant to foundational research in LLMs.
-
Learning-at-Criticality in Large Language Models for Quantum Field Theory and Beyond - Score: 16 (R=8, N=8) - Date: 2025-06-05 - Comment: The paper introduces a novel reinforcement learning scheme for LLMs in quantum field theory, which is relevant to foundational research in AI for science.
-
GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling - Score: 15 (R=8, N=7) - Date: 2025-06-30 - Comment: The paper proposes GPAS, a technique to improve training dynamics in LLMs, aligning with the Large Language Models criteria.
-
Tensor-Parallelism with Partially Synchronized Activations - Score: 15 (R=8, N=7) - Date: 2025-06-25 - Comment: The paper introduces a novel approach to tensor-parallelism in LLMs, which reduces communication overhead, aligning with model architecture and efficiency improvements.
-
ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper discusses a novel AI4AI agent that integrates exploration and reasoning, which is an emerging trend in AI development, potentially leading to foundational changes in AI system design.
-
SLR: An Automated Synthesis Framework for Scalable Logical Reasoning - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper introduces a framework for scalable logical reasoning in LLMs, which is relevant to understanding and improving LLM behavior.
-
Don't Make It Up: Preserving Ignorance Awareness in LLM Fine-Tuning - Score: 15 (R=8, N=7) - Date: 2025-06-18 - Comment: The paper proposes SEAT, a fine-tuning approach for LLMs that preserves ignorance awareness, which aligns with large language models by addressing theoretical insights into LLM behavior.
-
Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences - Score: 15 (R=8, N=7) - Date: 2025-06-18 - Comment: The paper introduces a method for training long sequences in LLMs, which aligns with the large language models criterion.
-
A Hybrid Neural Network -- Polynomial Series Scheme for Learning Invariant Manifolds of Discrete Dynamical Systems - Score: 15 (R=8, N=7) - Date: 2025-06-18 - Comment: The paper proposes a hybrid neural network and polynomial series scheme for learning invariant manifolds, which is relevant to model architecture innovations.
-
Tracing LLM Reasoning Processes with Strategic Games: A Framework for Planning, Revision, and Resource-Constrained Decision Making - Score: 15 (R=8, N=7) - Date: 2025-06-16 - Comment: The paper discusses a framework for evaluating LLMs' reasoning processes, which aligns with the Large Language Models criterion focusing on theoretical insights into LLM behavior.
-
Textual Bayes: Quantifying Uncertainty in LLM-Based Systems - Score: 15 (R=8, N=7) - Date: 2025-06-13 - Comment: The paper introduces a Bayesian approach to uncertainty quantification in LLMs, which is relevant to foundational research in LLM behavior and interpretability.
-
Thermodynamically Consistent Latent Dynamics Identification for Parametric Systems - Score: 15 (R=8, N=7) - Date: 2025-06-11 - Comment: The paper proposes a framework for reduced-order modeling using autoencoders, which is relevant to representation learning and model efficiency.
-
KScope: A Framework for Characterizing the Knowledge Status of Language Models - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper proposes a framework for characterizing LLM knowledge, which could provide theoretical insights into LLM behavior.
-
A projection-based framework for gradient-free and parallel learning - Score: 15 (R=8, N=7) - Date: 2025-06-09 - Comment: The paper presents a gradient-free and parallel learning framework, which is relevant to emerging trends in training dynamics and efficiency.
-
SPRINT: Enabling Interleaved Planning and Parallelized Execution in Reasoning Models - Score: 15 (R=8, N=7) - Date: 2025-06-09 - Comment: The paper introduces a framework for parallelizing reasoning in large models, which is relevant to large language models and efficiency improvements.
-
Associative Memory and Generative Diffusion in the Zero-noise Limit - Score: 15 (R=8, N=7) - Date: 2025-06-06 - Comment: The paper explores connections between generative diffusion and associative memory models, which is relevant to emerging trends in foundational research.
-
The Future of Continual Learning in the Era of Foundation Models: Three Key Directions - Score: 15 (R=8, N=7) - Date: 2025-06-05 - Comment: The paper discusses the future of continual learning in the context of foundation models, touching on the need for continual pre-training and compositionality, which is relevant to foundational model research.
-
Trade-offs in Data Memorization via Strong Data Processing Inequalities - Score: 15 (R=8, N=7) - Date: 2025-06-03 - Comment: The paper discusses data memorization in LLMs, providing theoretical insights into memorization and learning.
-
Conservation-preserved Fourier Neural Operator through Adaptive Correction - Score: 15 (R=8, N=7) - Date: 2025-06-02 - Comment: The paper introduces an adaptive correction approach for Fourier Neural Operators, which is relevant to AI for Science as it addresses foundational issues in modeling physical systems.
Representation Learning (216)
-
Diffuse and Disperse: Image Generation with Representation Regularization - Score: 20.0 (R=0, N=0) - Date: 2025-06-11 - Comment: Author match
-
Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation for Neurosymbolic Reasoning - Score: 19 (R=10, N=9) - Date: 2025-06-30 - Comment: The paper develops a theoretical framework for understanding how neural networks can discover symbolic structures, which is highly relevant to representation learning and theoretical insights.
-
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks - Score: 19 (R=10, N=9) - Date: 2025-06-10 - Comment: The paper introduces Alternating Gradient Flows, a framework for understanding feature learning dynamics in neural networks, which is highly relevant to representation learning.
-
Unfolding Generative Flows with Koopman Operators: Fast and Interpretable Sampling - Score: 18 (R=9, N=9) - Date: 2025-06-30 - Comment: The paper proposes a novel approach integrating Koopman operator theory with generative flows, offering insights into representation learning and model architecture.
-
Structured and Informed Probabilistic Modeling with the Thermodynamic Kolmogorov-Arnold Model - Score: 18 (R=9, N=9) - Date: 2025-06-18 - Comment: The paper introduces a novel probabilistic model inspired by classical representation theorems, relevant to emerging trends in generative modeling.
-
Learning geometry and topology via multi-chart flows - Score: 18 (R=9, N=9) - Date: 2025-06-02 - Comment: The paper introduces a method for learning geometry and topology via multi-chart flows, which is relevant to representation learning and emerging trends in foundational research.
-
Hyperspherical Variational Autoencoders Using Efficient Spherical Cauchy Distribution - Score: 17 (R=9, N=8) - Date: 2025-06-27 - Comment: The paper proposes a novel VAE architecture using a spherical Cauchy distribution, which is relevant to representation learning and model architecture.
-
Can Gradient Descent Simulate Prompting? - Score: 17 (R=9, N=8) - Date: 2025-06-27 - Comment: The paper explores a method for meta-training LMs such that gradient updates emulate the effects of conditioning on new information, offering insights into the generalization capabilities of gradient-based learning.
-
Global Convergence of Iteratively Reweighted Least Squares for Robust Subspace Recovery - Score: 17 (R=9, N=8) - Date: 2025-06-26 - Comment: The paper provides theoretical insights into Iteratively Reweighted Least Squares (IRLS) for robust subspace recovery, which is relevant to representation learning.
-
Cross-Layer Discrete Concept Discovery for Interpreting Language Models - Score: 17 (R=9, N=8) - Date: 2025-06-26 - Comment: The paper introduces a framework for interpreting language models using vector quantization, relevant to representation learning and model architecture analysis.
-
Who Does What in Deep Learning? Multidimensional Game-Theoretic Attribution of Function of Neural Units - Score: 17 (R=9, N=8) - Date: 2025-06-25 - Comment: The paper presents a game-theoretic framework for understanding neural unit contributions, relevant to representation learning and model interpretability.
-
Riemannian generative decoder - Score: 17 (R=9, N=8) - Date: 2025-06-25 - Comment: The paper presents a Riemannian generative decoder, which is a novel approach in representation learning, focusing on manifold-valued latents.
-
Optimization-Induced Dynamics of Lipschitz Continuity in Neural Networks - Score: 17 (R=9, N=8) - Date: 2025-06-24 - Comment: The paper provides a theoretical framework for understanding the dynamics of Lipschitz continuity in neural networks, relevant to representation learning.
-
These are Not All the Features You are Looking For: A Fundamental Bottleneck In Supervised Pretraining - Score: 17 (R=9, N=8) - Date: 2025-06-24 - Comment: The paper discusses a fundamental bottleneck in representation learning, specifically the 'information saturation bottleneck', which aligns with the representation learning criterion.
-
In-Context Learning Strategies Emerge Rationally - Score: 17 (R=9, N=8) - Date: 2025-06-24 - Comment: The paper provides a theoretical framework for understanding in-context learning strategies in Transformers, which is highly relevant to representation learning.
-
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens - Score: 17 (R=9, N=8) - Date: 2025-06-23 - Comment: The paper introduces a novel framework for vision-language models that augments decoding with latent visual tokens, aligning with representation learning by exploring how models encode and manipulate information without explicit image generation.
-
Variational Learning of Disentangled Representations - Score: 17 (R=9, N=8) - Date: 2025-06-23 - Comment: The paper introduces a new variational framework for disentangled representations, which is relevant to representation learning.
-
Latent Concept Disentanglement in Transformer-based Language Models - Score: 17 (R=9, N=8) - Date: 2025-06-23 - Comment: The paper examines latent concept disentanglement in transformer-based language models, which is relevant to representation learning and LLM behavior.
-
SlepNet: Spectral Subgraph Representation Learning for Neural Dynamics - Score: 17 (R=9, N=8) - Date: 2025-06-23 - Comment: The paper introduces SlepNet, a novel GCN architecture using Slepian bases for spectral subgraph representation learning, contributing to representation learning and architectural innovation.
-
Can structural correspondences ground real world representational content in Large Language Models? - Score: 17 (R=9, N=8) - Date: 2025-06-23 - Comment: The paper explores the representational capacities of LLMs, which aligns with the interest in theoretical insights into LLM behavior.
-
On the Theoretical Understanding of Identifiable Sparse Autoencoders and Beyond - Score: 17 (R=9, N=8) - Date: 2025-06-23 - Comment: The paper provides theoretical insights into sparse autoencoders, focusing on conditions for identifiability and improving feature reconstruction, which is relevant to representation learning.
-
Scientifically-Interpretable Reasoning Network (ScIReN): Uncovering the Black-Box of Nature - Score: 17 (R=9, N=8) - Date: 2025-06-18 - Comment: The paper proposes a scientifically-interpretable reasoning network, which is relevant to AI for Science with a focus on foundational research.
-
Logical Expressiveness of Graph Neural Networks with Hierarchical Node Individualization - Score: 17 (R=9, N=8) - Date: 2025-06-18 - Comment: The paper proposes Hierarchical Ego Graph Neural Networks (HEGNNs) with a focus on logical expressiveness and graph isomorphism, which is relevant to model architecture and representation learning.
-
'Memory States' from Almost Nothing: Representing and Computing in a Non-associative Algebra - Score: 17 (R=9, N=8) - Date: 2025-06-18 - Comment: The paper presents a non-associative algebraic framework for representation and computation, which aligns with representation learning.
-
Contrastive Self-Supervised Learning As Neural Manifold Packing - Score: 17 (R=9, N=8) - Date: 2025-06-17 - Comment: The paper introduces a novel framework for contrastive self-supervised learning, aligning with the representation learning criterion.
-
Align-then-Unlearn: Embedding Alignment for LLM Unlearning - Score: 17 (R=9, N=8) - Date: 2025-06-17 - Comment: The paper proposes a novel framework for unlearning in LLMs using embedding alignment, which is relevant to foundational research in LLM behavior and interpretability.
-
PDEfuncta: Spectrally-Aware Neural Representation for PDE Solution Modeling - Score: 17 (R=9, N=8) - Date: 2025-06-17 - Comment: The paper introduces Global Fourier Modulation for neural representations, which is relevant to representation learning and architectural innovations.
-
Interpretable representation learning of quantum data enabled by probabilistic variational autoencoders - Score: 17 (R=9, N=8) - Date: 2025-06-16 - Comment: The paper focuses on representation learning using variational autoencoders (VAEs) for quantum data, introducing modifications to improve interpretability and representation of quantum states.
-
How Visual Representations Map to Language Feature Space in Multimodal LLMs - Score: 17 (R=9, N=8) - Date: 2025-06-16 - Comment: The paper explores the alignment of visual and linguistic representations in multimodal models, focusing on representation learning and the use of sparse autoencoders.
-
Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods - Score: 17 (R=9, N=8) - Date: 2025-06-13 - Comment: The paper provides a theoretical study of in-context learning on structured manifolds, which is relevant to representation learning and provides foundational insights.
-
Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization - Score: 17 (R=9, N=8) - Date: 2025-06-13 - Comment: The paper focuses on representation learning by decomposing MLP activations into interpretable features using semi-nonnegative matrix factorization, which aligns with insights into how deep networks encode information.
-
Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning - Score: 17 (R=9, N=8) - Date: 2025-06-13 - Comment: The paper uses causal representation learning to uncover latent capabilities of language models, aligning with representation learning and providing theoretical insights into LLM behavior.
-
Probabilistic Variational Contrastive Learning - Score: 17 (R=9, N=8) - Date: 2025-06-13 - Comment: The paper introduces a probabilistic approach to contrastive learning, which is relevant to representation learning and offers a new theoretical perspective.
-
Unsupervised Elicitation of Language Models - Score: 17 (R=9, N=8) - Date: 2025-06-13 - Comment: The paper presents an unsupervised algorithm for fine-tuning language models, which is relevant to large language models and representation learning.
-
Resa: Transparent Reasoning Models via SAEs - Score: 17 (R=9, N=8) - Date: 2025-06-13 - Comment: The paper introduces Resa, a reasoning model using sparse autoencoder tuning, which aligns with representation learning and insights into how deep networks encode information.
-
Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations - Score: 17 (R=9, N=8) - Date: 2025-06-11 - Comment: The paper provides theoretical insights into task vectors in in-context learning, which relates to representation learning and understanding how deep networks encode information.
-
InfoDPCCA: Information-Theoretic Dynamic Probabilistic Canonical Correlation Analysis - Score: 17 (R=9, N=8) - Date: 2025-06-11 - Comment: The paper introduces InfoDPCCA, a novel framework for representation learning using an information-theoretic approach, which aligns with the representation learning criterion.
-
Leveraging chaos in the training of artificial neural networks - Score: 17 (R=9, N=8) - Date: 2025-06-11 - Comment: The paper explores the dynamics of neural network training with large learning rates, providing insights into training dynamics and the role of chaos, which aligns with representation learning.
-
Uncovering the Functional Roles of Nonlinearity in Memory - Score: 17 (R=9, N=8) - Date: 2025-06-10 - Comment: The paper investigates the role of nonlinearity in memory for recurrent networks, providing insights into representation learning.
-
Identifiable Object Representations under Spatial Ambiguities - Score: 17 (R=9, N=8) - Date: 2025-06-10 - Comment: The paper presents a novel approach to object representation learning, addressing spatial ambiguities with theoretical guarantees.
-
Training Superior Sparse Autoencoders for Instruct Models - Score: 17 (R=9, N=8) - Date: 2025-06-10 - Comment: The paper focuses on sparse autoencoders for mechanistic interpretability in LLMs, aligning with representation learning and model architecture criteria.
-
Variational Supervised Contrastive Learning - Score: 17 (R=9, N=8) - Date: 2025-06-10 - Comment: The paper introduces a novel approach to supervised contrastive learning, which is relevant to representation learning by addressing limitations in embedding distribution and generalization.
-
Towards an Explainable Comparison and Alignment of Feature Embeddings - Score: 17 (R=9, N=8) - Date: 2025-06-09 - Comment: The paper proposes a framework for comparing and aligning feature embeddings, relevant to representation learning.
-
Learning Along the Arrow of Time: Hyperbolic Geometry for Backward-Compatible Representation Learning - Score: 17 (R=9, N=8) - Date: 2025-06-09 - Comment: The paper introduces a hyperbolic geometry approach for backward-compatible representation learning, which is relevant to representation learning and introduces a novel perspective.
-
Neural Collapse in Cumulative Link Models for Ordinal Regression: An Analysis with Unconstrained Feature Model - Score: 17 (R=9, N=8) - Date: 2025-06-09 - Comment: The paper analyzes neural collapse in ordinal regression, providing theoretical insights into deep learning behavior, which is relevant to representation learning.
-
Grokking Beyond the Euclidean Norm of Model Parameters - Score: 17 (R=9, N=8) - Date: 2025-06-09 - Comment: The paper discusses grokking in neural networks, focusing on regularization and over-parameterization, which is relevant to representation learning and training dynamics.
-
Learning normalized image densities via dual score matching - Score: 17 (R=9, N=8) - Date: 2025-06-06 - Comment: The paper presents a new framework for learning normalized energy models inspired by diffusion generative models, contributing to representation learning with a novel dual score matching objective.
-
Sample Complexity and Representation Ability of Test-time Scaling Paradigms - Score: 17 (R=9, N=8) - Date: 2025-06-06 - Comment: The paper provides theoretical insights into test-time scaling paradigms for LLMs, which is relevant to understanding LLM behavior.
-
There Was Never a Bottleneck in Concept Bottleneck Models - Score: 17 (R=9, N=8) - Date: 2025-06-06 - Comment: The paper proposes Minimal Concept Bottleneck Models, which aligns with representation learning by introducing a new method for interpretability and information bottleneck.
-
Sparse Autoencoders, Again? - Score: 17 (R=9, N=8) - Date: 2025-06-06 - Comment: The paper revisits sparse autoencoders, proposing a hybrid model that addresses weaknesses in canonical SAEs and VAEs, contributing to representation learning.
-
KOALA++: Efficient Kalman-Based Optimization of Neural Networks with Gradient-Covariance Products - Score: 17 (R=9, N=8) - Date: 2025-06-06 - Comment: The paper introduces KOALA++, a Kalman-based optimization algorithm that models structured gradient uncertainty, which is relevant to representation learning and training dynamics in neural networks.
-
Self-Supervised Contrastive Learning is Approximately Supervised Contrastive Learning - Score: 17 (R=9, N=8) - Date: 2025-06-06 - Comment: The paper provides theoretical insights into self-supervised contrastive learning, which is relevant to representation learning.
-
CARL: Causality-guided Architecture Representation Learning for an Interpretable Performance Predictor - Score: 17 (R=9, N=8) - Date: 2025-06-05 - Comment: The paper proposes a causality-guided architecture representation learning method, which is relevant to representation learning and model architecture analysis.
-
Do Neural Networks Need Gradient Descent to Generalize? A Theoretical Study - Score: 17 (R=9, N=8) - Date: 2025-06-05 - Comment: The paper provides theoretical insights into the generalization abilities of neural networks, which is relevant to representation learning.
-
Models of Heavy-Tailed Mechanistic Universality - Score: 17 (R=9, N=8) - Date: 2025-06-05 - Comment: The paper introduces a model to explore heavy-tailed behavior in neural networks, which aligns with representation learning by providing insights into training dynamics and model behavior.
-
Non-Asymptotic Length Generalization - Score: 17 (R=9, N=8) - Date: 2025-06-04 - Comment: The paper provides a theoretical framework for length generalization, which is relevant to emerging trends in foundational research.
-
Probing Neural Topology of Large Language Models - Score: 17 (R=9, N=8) - Date: 2025-06-04 - Comment: The paper introduces a method for uncovering the functional connectivity topology of LLM neurons, which aligns with the interest in understanding LLM behavior and interpretability.
-
Unpacking Softmax: How Temperature Drives Representation Collapse, Compression, and Generalization - Score: 17 (R=9, N=8) - Date: 2025-06-03 - Comment: The paper provides insights into the role of the softmax function in representation learning, which is relevant to understanding training dynamics in neural networks.
-
Model Reprogramming Demystified: A Neural Tangent Kernel Perspective - Score: 17 (R=9, N=8) - Date: 2025-06-03 - Comment: The paper provides a theoretical analysis of Model Reprogramming using the Neural Tangent Kernel framework, which aligns with representation learning insights.
-
Slow Feature Analysis as Variational Inference Objective - Score: 17 (R=9, N=8) - Date: 2025-06-03 - Comment: The paper provides a novel probabilistic interpretation of Slow Feature Analysis through variational inference, which aligns with representation learning insights.
-
Disentangling Granularity: An Implicit Inductive Bias in Factorized VAEs - Score: 17 (R=9, N=8) - Date: 2025-06-02 - Comment: The paper explores implicit inductive bias in factorized VAEs, contributing to representation learning by uncovering disentangling granularity as a bias influencing disentanglement performance.
-
Hyperbolic Dataset Distillation - Score: 17 (R=9, N=8) - Date: 2025-06-02 - Comment: The paper introduces Hyperbolic Dataset Distillation (HDD), which is relevant to representation learning and model compression by addressing dataset distillation in hyperbolic space.
-
A Mathematical Perspective On Contrastive Learning - Score: 17 (R=9, N=8) - Date: 2025-06-02 - Comment: The paper provides a mathematical perspective on contrastive learning, focusing on representation learning and introducing novel probabilistic loss functions and metrics.
-
Characterising the Inductive Biases of Neural Networks on Boolean Data - Score: 17 (R=9, N=8) - Date: 2025-06-02 - Comment: The paper provides an analytical case study on the inductive biases of neural networks on Boolean data, which is relevant to representation learning and emerging trends.
-
Representational Difference Explanations - Score: 17 (R=9, N=8) - Date: 2025-06-02 - Comment: The paper introduces Representational Differences Explanations (RDX), a method for comparing learned representations, which aligns with representation learning by providing insights into how models encode information.
-
Interpretable Representation Learning for Additive Rule Ensembles - Score: 16 (R=9, N=7) - Date: 2025-06-27 - Comment: The paper introduces a novel method for interpretable representation learning by extending classical rule ensembles with learnable sparse linear transformations, which aligns with representation learning and sparse methods.
-
Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs - Score: 16 (R=9, N=7) - Date: 2025-06-26 - Comment: The paper uses cognitive models to interpret value trade-offs in LLMs, providing insights into LLM behavior and interpretability.
-
Disentangled representations of microscopy images - Score: 16 (R=9, N=7) - Date: 2025-06-26 - Comment: The paper proposes a disentangled representation learning methodology, which is relevant to representation learning.
-
Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training - Score: 16 (R=9, N=7) - Date: 2025-06-24 - Comment: The paper explores how LLMs acquire algorithmic abstractions during code training, which relates to understanding LLM behavior and representation learning.
-
Sparse Feature Coactivation Reveals Composable Semantic Modules in Large Language Models - Score: 16 (R=9, N=7) - Date: 2025-06-24 - Comment: The paper explores the modular organization of knowledge in LLMs using sparse autoencoders, which aligns with the representation learning criterion by providing insights into how LLMs encode information.
-
FaithfulSAE: Towards Capturing Faithful Features with Sparse Autoencoders without External Dataset Dependencies - Score: 16 (R=9, N=7) - Date: 2025-06-24 - Comment: The paper introduces FaithfulSAE, a method for training sparse autoencoders on synthetic datasets, which aligns with the representation learning criterion by improving the interpretability of model-internal features.
-
Measuring (a Sufficient) World Model in LLMs: A Variance Decomposition Framework - Score: 16 (R=9, N=7) - Date: 2025-06-23 - Comment: The paper introduces a framework to evaluate the robustness of LLMs' world models, which aligns with the interest in theoretical insights into LLM behavior.
-
Large Language Models are Near-Optimal Decision-Makers with a Non-Human Learning Behavior - Score: 16 (R=9, N=7) - Date: 2025-06-23 - Comment: The paper provides insights into the decision-making behavior of LLMs, which aligns with the interest in understanding LLM behavior and interpretability.
-
Formal Models of Active Learning from Contrastive Examples - Score: 16 (R=9, N=7) - Date: 2025-06-23 - Comment: The paper proposes a theoretical framework for learning from contrastive examples, relevant to representation learning.
-
Adaptive Task Vectors for Large Language Models - Score: 16 (R=9, N=7) - Date: 2025-06-05 - Comment: The paper proposes Adaptive Task Vectors for LLMs, which is relevant to the Large Language Models criterion by addressing theoretical insights into LLM behavior and improving generalization capabilities.
-
On Designing Diffusion Autoencoders for Efficient Generation and Representation Learning - Score: 16 (R=9, N=7) - Date: 2025-06-03 - Comment: The paper discusses diffusion autoencoders for efficient generation and representation learning, which aligns with foundational research in representation learning.
-
Exploring Image Generation via Mutually Exclusive Probability Spaces and Local Correlation Hypothesis - Score: 16 (R=8, N=8) - Date: 2025-06-30 - Comment: The paper proposes new theoretical frameworks for probabilistic generative models, which could be relevant to representation learning and emerging trends in generative paradigms.
-
Potemkin Understanding in Large Language Models - Score: 16 (R=8, N=8) - Date: 2025-06-27 - Comment: The paper introduces a framework to evaluate LLMs' understanding, which is relevant to theoretical insights into LLM behavior.
-
Stochastic Parameter Decomposition - Score: 16 (R=8, N=8) - Date: 2025-06-27 - Comment: The paper introduces Stochastic Parameter Decomposition, a method for decomposing neural networks, which aligns with representation learning and model interpretability.
-
scMamba: A Scalable Foundation Model for Single-Cell Multi-Omics Integration Beyond Highly Variable Feature Selection - Score: 16 (R=8, N=8) - Date: 2025-06-27 - Comment: The paper presents scMamba, a foundation model for single-cell multi-omics integration, which is relevant to foundational research in AI for science.
-
ProxelGen: Generating Proteins as 3D Densities - Score: 16 (R=8, N=8) - Date: 2025-06-25 - Comment: The paper introduces a new protein generative model using 3D densities, relevant to AI for Science with a focus on foundational research in molecular modeling.
-
Simulation-Free Differential Dynamics through Neural Conservation Laws - Score: 16 (R=8, N=8) - Date: 2025-06-24 - Comment: The paper presents a novel framework for training diffusion processes using Neural Conservation Laws, which aligns with the representation learning criterion by introducing a new method for modeling dynamics.
-
Generative Modeling of Full-Atom Protein Conformations using Latent Diffusion on Graph Embeddings - Score: 16 (R=8, N=8) - Date: 2025-06-23 - Comment: The paper presents a generative model for protein conformations using latent diffusion, which is relevant to AI for Science with a focus on foundational research in molecular modeling.
-
GeoRecon: Graph-Level Representation Learning for 3D Molecules via Reconstruction-Based Pretraining - Score: 16 (R=8, N=8) - Date: 2025-06-17 - Comment: The paper introduces a novel graph-level pretraining framework for molecular representation learning, focusing on global structural features, which is relevant to AI for Science and representation learning.
-
Distributional Training Data Attribution - Score: 16 (R=8, N=8) - Date: 2025-06-17 - Comment: The paper introduces distributional training data attribution, providing new insights into how randomness affects model outputs, which is relevant to representation learning.
-
GrokAlign: Geometric Characterisation and Acceleration of Grokking - Score: 16 (R=8, N=8) - Date: 2025-06-17 - Comment: The paper provides insights into the training dynamics of deep networks, specifically grokking, which is relevant to representation learning.
-
Provably Learning from Language Feedback - Score: 16 (R=8, N=8) - Date: 2025-06-13 - Comment: The paper formalizes the Learning from Language Feedback problem and introduces a new complexity measure, which aligns with emerging trends in theoretical insights.
-
VQC-MLPNet: An Unconventional Hybrid Quantum-Classical Architecture for Scalable and Robust Quantum Machine Learning - Score: 16 (R=8, N=8) - Date: 2025-06-13 - Comment: The paper introduces a novel hybrid quantum-classical architecture, VQC-MLPNet, which enhances representation capabilities and training stability, aligning with representation learning and model architecture criteria.
-
AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI) - Score: 16 (R=8, N=8) - Date: 2025-06-11 - Comment: The paper introduces a new framework for safeguarding LLMs against adversarial attacks, providing insights into LLM behavior and interpretability.
-
Explaining, Fast and Slow: Abstraction and Refinement of Provable Explanations - Score: 16 (R=8, N=8) - Date: 2025-06-11 - Comment: The paper proposes an abstraction-refinement technique for computing provable explanations, which is relevant to representation learning and model interpretability.
-
Improved Scaling Laws in Linear Regression via Data Reuse - Score: 16 (R=8, N=8) - Date: 2025-06-11 - Comment: The paper provides theoretical insights into improving scaling laws in linear regression, which is relevant to representation learning and model efficiency.
-
Schauder Bases for $C[0, 1]$ Using ReLU, Softplus and Two Sigmoidal Functions - Score: 16 (R=8, N=8) - Date: 2025-06-10 - Comment: The paper constructs Schauder bases using ReLU and other functions, which is relevant to representation learning and foundational research.
-
Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models - Score: 16 (R=8, N=8) - Date: 2025-06-10 - Comment: The paper introduces a multi-agent framework for reasoning in LLMs, which could provide insights into LLM behavior and interpretability.
-
Similarity Matching Networks: Hebbian Learning and Convergence Over Multiple Time Scales - Score: 16 (R=8, N=8) - Date: 2025-06-09 - Comment: The paper provides a convergence analysis of a biologically-plausible neural network for dimensionality reduction, relevant to representation learning and emerging trends.
-
Aligning Latent Spaces with Flow Priors - Score: 16 (R=8, N=8) - Date: 2025-06-06 - Comment: The paper introduces a novel framework for aligning latent spaces using flow-based generative models, which is relevant to representation learning.
-
NIMO: a Nonlinear Interpretable MOdel - Score: 16 (R=8, N=8) - Date: 2025-06-06 - Comment: The paper introduces NIMO, a model combining neural networks with linear models for interpretability, aligning with model architecture innovations.
-
DrSR: LLM based Scientific Equation Discovery with Dual Reasoning from Data and Experience - Score: 16 (R=8, N=8) - Date: 2025-06-06 - Comment: The paper introduces DrSR, a framework for symbolic regression using LLMs, which is relevant to representation learning and LLMs.
-
Temporal horizons in forecasting: a performance-learnability trade-off - Score: 16 (R=8, N=8) - Date: 2025-06-05 - Comment: The paper analyzes the trade-off in training horizons for autoregressive models, providing theoretical insights into model training dynamics, relevant to representation learning.
-
Reason from Future: Reverse Thought Chain Enhances LLM Reasoning - Score: 16 (R=8, N=8) - Date: 2025-06-05 - Comment: The paper proposes a novel reasoning paradigm called Reason from Future, which enhances LLM reasoning and aligns with foundational research in LLM behavior and interpretability.
-
Manipulating 3D Molecules in a Fixed-Dimensional SE(3)-Equivariant Latent Space - Score: 16 (R=8, N=8) - Date: 2025-06-04 - Comment: The paper introduces a VAE for 3D molecules with SE(3)-equivariant latent space, relevant to AI for science and representation learning.
-
Learning DNF through Generalized Fourier Representations - Score: 16 (R=8, N=8) - Date: 2025-06-03 - Comment: The paper introduces a generalized Fourier representation for learning DNF, which aligns with foundational research in representation learning.
-
Unlocking the Power of Rehearsal in Continual Learning: A Theoretical Perspective - Score: 16 (R=8, N=8) - Date: 2025-06-03 - Comment: The paper provides a theoretical analysis of rehearsal-based continual learning, which is relevant to representation learning.
-
Overfitting has a limitation: a model-independent generalization error bound based on R\'enyi entropy - Score: 16 (R=8, N=8) - Date: 2025-06-03 - Comment: The paper introduces a model-independent generalization error bound based on Rényi entropy, which is relevant to emerging trends in theoretical work.
-
From Invariant Representations to Invariant Data: Provable Robustness to Spurious Correlations via Noisy Counterfactual Matching - Score: 16 (R=8, N=8) - Date: 2025-06-02 - Comment: The paper introduces a method for robustness to spurious correlations using invariant data pairs, which relates to representation learning by focusing on invariant representations.
-
Representation Consistency for Accurate and Coherent LLM Answer Aggregation - Score: 15 (R=8, N=7) - Date: 2025-06-30 - Comment: The paper introduces a method for test-time scaling in LLMs using representation consistency, relevant to LLM behavior and interpretability.
-
Towards Understanding the Cognitive Habits of Large Reasoning Models - Score: 15 (R=8, N=7) - Date: 2025-06-30 - Comment: The paper evaluates cognitive habits in large reasoning models, providing insights into LLM behavior and interpretability.
-
TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence - Score: 15 (R=8, N=7) - Date: 2025-06-27 - Comment: TRIDENT introduces a novel framework for molecular representation learning by integrating multiple modalities, which is relevant to foundational research in AI for Science.
-
Distilling Normalizing Flows - Score: 15 (R=8, N=7) - Date: 2025-06-27 - Comment: The paper presents novel knowledge distillation techniques for normalizing flows, which aligns with representation learning by exploring how information is encoded and transferred within models.
-
Transferring disentangled representations: bridging the gap between synthetic and real images - Score: 15 (R=8, N=7) - Date: 2025-06-27 - Comment: The paper explores transferring disentangled representations from synthetic to real images, which aligns with Representation Learning through its focus on disentangled representation learning.
-
Lost in Retraining: Roaming the Parameter Space of Exponential Families Under Closed-Loop Learning - Score: 15 (R=8, N=7) - Date: 2025-06-26 - Comment: The paper explores closed-loop learning dynamics in exponential families, which aligns with representation learning by examining how models encode information and the training dynamics.
-
Self-Supervised Graph Learning via Spectral Bootstrapping and Laplacian-Based Augmentations - Score: 15 (R=8, N=7) - Date: 2025-06-26 - Comment: The paper presents a novel self-supervised graph learning framework using spectral bootstrapping, which is relevant to representation learning.
-
SEED: A Structural Encoder for Embedding-Driven Decoding in Time Series Prediction with LLMs - Score: 15 (R=8, N=7) - Date: 2025-06-26 - Comment: The paper introduces a structural encoder for embedding-driven decoding, which aligns with representation learning and model architecture insights.
-
Discrepancy-Aware Graph Mask Auto-Encoder - Score: 15 (R=8, N=7) - Date: 2025-06-25 - Comment: The paper introduces a new graph auto-encoder focusing on discrepancy-aware representation learning, which aligns with the core topic of representation learning.
-
Inference-Time Reward Hacking in Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-06-25 - Comment: The paper addresses reward hacking in LLMs, which is relevant to understanding LLM behavior and interpretability.
-
Thought Anchors: Which LLM Reasoning Steps Matter? - Score: 15 (R=8, N=7) - Date: 2025-06-25 - Comment: The paper analyzes reasoning steps in large language models, focusing on interpretability and thought anchors, which aligns with foundational research in understanding LLM behavior.
-
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper presents a multimodal framework unifying visual understanding and generation, which involves representation learning and model architecture insights.
-
A Set-to-Set Distance Measure in Hyperbolic Space - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper proposes a hyperbolic set-to-set distance measure, which aligns with the representation learning criterion by providing a novel method for computing dissimilarity in hyperbolic space.
-
Structured Kolmogorov-Arnold Neural ODEs for Interpretable Learning and Symbolic Discovery of Nonlinear Dynamics - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper introduces a novel framework, SKANODEs, which integrates structured state-space modeling with Neural ODEs for interpretable learning and symbolic discovery of nonlinear dynamics. This aligns with the representation learning criterion as it provides insights into how deep networks encode information.
-
Quantifying Uncertainty in the Presence of Distribution Shifts - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper proposes a Bayesian framework for uncertainty estimation under distribution shifts, which is relevant to foundational research in representation learning.
-
Understanding Reasoning in Thinking Language Models via Steering Vectors - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper explores steering vectors for controlling reasoning in LLMs, which is relevant to understanding LLM behavior and interpretability.
-
SE-Merging: A Self-Enhanced Approach for Dynamic Model Merging - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper delves into the mechanism behind model merging from a representation perspective, which aligns with the interest in representation learning and model architecture.
-
Enhancing VICReg: Random-Walk Pairing for Improved Generalization and Better Global Semantics Capturing - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper enhances VICReg, a self-supervised learning method, which is relevant to representation learning.
-
Pathwise Explanation of ReLU Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper proposes a novel pathwise explanation method for ReLU neural networks, which aligns with the representation learning criterion by providing insights into the decision-making process of neural networks.
-
How Alignment Shrinks the Generative Horizon - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper investigates how alignment affects the generative horizon of LLMs, providing insights into LLM behavior and interpretability.
-
Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper discusses a novel reinforcement learning framework for aligning frozen LLMs without changing their parameters, which is relevant to foundational research in LLM behavior and interpretability.
-
Flatness After All? - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper discusses the relationship between the curvature of the loss function at minima and generalization, providing insights into training dynamics in neural networks, aligning with the representation learning criterion.
-
HIDE and Seek: Detecting Hallucinations in Language Models via Decoupled Representations - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper proposes a method for detecting hallucinations in language models via decoupled representations, which aligns with the interest in theoretical insights into LLM behavior.
-
CLOUD: A Scalable and Physics-Informed Foundation Model for Crystal Representation Learning - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper introduces a transformer-based framework for crystal representation learning, which aligns with foundational research in representation learning and model architecture. It also integrates physical principles, which is relevant to AI for Science.
-
Step-by-Step Reasoning Attack: Revealing 'Erased' Knowledge in Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper introduces a reasoning attack to reveal 'erased' knowledge in LLMs, which provides insights into LLM behavior and interpretability.
-
Differentiable neural network representation of multi-well, locally-convex potentials - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper proposes a differentiable neural network representation for multi-well potentials, which is relevant to foundational research in representation learning.
-
Metapath-based Hyperbolic Contrastive Learning for Heterogeneous Graph Embedding - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper focuses on hyperbolic contrastive learning for heterogeneous graph embedding, which aligns with representation learning through contrastive methods.
-
Subspace-Boosted Model Merging - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper introduces Subspace Boosting for model merging, which is relevant to representation learning and model architecture as it provides insights into task vector space and merging efficacy.
-
Random feature approximation for general spectral methods - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper discusses random feature approximation and its generalization properties, which is relevant to representation learning and theoretical insights into neural networks.
-
Probing the Robustness of Large Language Models Safety to Latent Perturbations - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper discusses safety alignment in LLMs, focusing on robustness to latent perturbations, which is relevant to theoretical insights into LLM behavior.
-
One Period to Rule Them All: Identifying Critical Learning Periods in Deep Networks - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper identifies critical learning periods in deep networks, which is relevant to training dynamics and representation learning.
-
CORAL: Disentangling Latent Representations in Long-Tailed Diffusion - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper investigates diffusion models in long-tailed distributions and proposes a method to improve latent representation separation, contributing to representation learning.
-
NeuronSeek: On Stability and Expressivity of Task-driven Neurons - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper introduces NeuronSeek, a framework for task-driven neurons using tensor decomposition, relevant to representation learning and model architecture.
-
What Do Latent Action Models Actually Learn? - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper provides theoretical insights into latent action models, connecting them to principal component analysis and discussing data augmentation strategies, which aligns with representation learning.
-
Expressive Score-Based Priors for Distribution Matching with Geometry-Preserving Regularization - Score: 15 (R=8, N=7) - Date: 2025-06-18 - Comment: The paper introduces a novel approach to distribution matching using score-based priors, relevant to representation learning.
-
AlphaDecay:Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs - Score: 15 (R=8, N=7) - Date: 2025-06-18 - Comment: The paper introduces a novel weight decay method for LLMs, aligning with the large language models criterion.
-
Can Large Language Models Improve Spectral Graph Neural Networks? - Score: 15 (R=8, N=7) - Date: 2025-06-18 - Comment: The paper explores the use of LLMs to improve Spectral Graph Neural Networks, relevant to large language models and representation learning.
-
A Variational Information Theoretic Approach to Out-of-Distribution Detection - Score: 15 (R=8, N=7) - Date: 2025-06-18 - Comment: The paper presents a variational information-theoretic approach to out-of-distribution detection, which is relevant to representation learning.
-
Sampling from Your Language Model One Byte at a Time - Score: 15 (R=8, N=7) - Date: 2025-06-18 - Comment: The paper addresses tokenization issues in language models, which is relevant to large language models and their interpretability.
-
Alignment Quality Index (AQI) : Beyond Refusals: AQI as an Intrinsic Alignment Diagnostic via Latent Geometry, Cluster Divergence, and Layer wise Pooled Representations - Score: 15 (R=8, N=7) - Date: 2025-06-18 - Comment: The paper introduces Hierarchical Ego Graph Neural Networks, which is relevant to model architecture and representation learning.
-
Quantifying Structure in CLIP Embeddings: A Statistical Framework for Concept Interpretation - Score: 15 (R=8, N=7) - Date: 2025-06-18 - Comment: The paper introduces a statistical framework for interpreting CLIP embeddings, relevant to representation learning.
-
Understanding Learning Invariance in Deep Linear Networks - Score: 15 (R=8, N=7) - Date: 2025-06-17 - Comment: The paper provides theoretical insights into learning invariance in deep linear networks, which aligns with the representation learning criterion.
-
Because we have LLMs, we Can and Should Pursue Agentic Interpretability - Score: 15 (R=8, N=7) - Date: 2025-06-17 - Comment: The paper discusses agentic interpretability for LLMs, which aligns with the interest in theoretical insights into LLM behavior.
-
Human-like Forgetting Curves in Deep Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-06-17 - Comment: The study explores human-like forgetting curves in neural networks, contributing to representation learning by examining information retention and catastrophic forgetting.
-
Improving Large Language Model Safety with Contrastive Representation Learning - Score: 15 (R=8, N=7) - Date: 2025-06-16 - Comment: The paper proposes a defense framework using contrastive representation learning to improve LLM safety, aligning with representation learning.
-
Understanding Input Selectivity in Mamba: Impact on Approximation Power, Memorization, and Associative Recall Capacity - Score: 15 (R=8, N=7) - Date: 2025-06-16 - Comment: The paper provides a theoretical analysis of input selectivity in Mamba, a state-space model, contributing to understanding model architecture and representation learning.
-
GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-06-13 - Comment: The paper introduces a novel framework for unlearning in LLMs, which aligns with foundational research in LLM behavior and interpretability.
-
On the role of non-linear latent features in bipartite generative neural networks - Score: 15 (R=8, N=7) - Date: 2025-06-13 - Comment: The paper explores the role of non-linear latent features in bipartite generative neural networks, focusing on architectural choices and activation functions, which aligns with the interest in model architecture and representation learning.
-
DynaSubVAE: Adaptive Subgrouping for Scalable and Robust OOD Detection - Score: 15 (R=8, N=7) - Date: 2025-06-13 - Comment: The paper introduces a novel framework, DynaSubVAE, which performs representation learning and adaptive OOD detection, aligning with representation learning and model architecture.
-
Optimizing Latent Dimension Allocation in Hierarchical VAEs: Balancing Attenuation and Information Retention for OOD Detection - Score: 15 (R=8, N=7) - Date: 2025-06-13 - Comment: The paper introduces a framework for optimizing latent dimension allocation in hierarchical VAEs, which is relevant to model architecture and representation learning.
-
Propositional Logic for Probing Generalization in Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-06-11 - Comment: The paper investigates the generalization behavior of neural architectures in propositional logic tasks, which is relevant to representation learning and model architecture.
-
BioLangFusion: Multimodal Fusion of DNA, mRNA, and Protein Language Models - Score: 15 (R=8, N=7) - Date: 2025-06-11 - Comment: The paper presents a multimodal fusion approach for molecular language models, which is relevant to AI for science. It introduces new fusion techniques for integrating pre-trained models, contributing to foundational research in molecular modeling.
-
Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings - Score: 15 (R=8, N=7) - Date: 2025-06-11 - Comment: The paper addresses the granularity dilemma in embeddings, which is relevant to representation learning. It provides insights into the limitations of text encoders and proposes data generation strategies for improvement.
-
Text Embeddings Should Capture Implicit Semantics, Not Just Surface Meaning - Score: 15 (R=8, N=7) - Date: 2025-06-11 - Comment: The paper argues for a shift in text embedding research to focus on implicit semantics, which is relevant to representation learning.
-
An Adaptive Method Stabilizing Activations for Enhanced Generalization - Score: 15 (R=8, N=7) - Date: 2025-06-11 - Comment: The paper introduces AdaAct, an optimization algorithm for stabilizing activations, which is relevant to representation learning and model architecture.
-
What makes an Ensemble (Un) Interpretable? - Score: 15 (R=8, N=7) - Date: 2025-06-11 - Comment: The paper analyzes the interpretability of ensemble models using computational complexity theory, which is relevant to model architecture analysis.
-
Aligning Proteins and Language: A Foundation Model for Protein Retrieval - Score: 15 (R=8, N=7) - Date: 2025-06-11 - Comment: The paper proposes a foundation model for protein retrieval using contrastive learning, aligning with foundational research in AI for science.
-
Generative Modeling of Weights: Generalization or Memorization? - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper examines generative modeling of neural network weights, which is relevant to representation learning and model architecture by exploring the synthesis of model weights.
-
Language Embedding Meets Dynamic Graph: A New Exploration for Neural Architecture Representation Learning - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper explores neural architecture representation learning with a novel framework integrating language embedding and dynamic graph representation.
-
Rao-Blackwellised Reparameterisation Gradients - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper proposes a new gradient estimator for models with latent Gaussian variables, which is related to representation learning and training dynamics.
-
InverseScope: Scalable Activation Inversion for Interpreting Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper introduces InverseScope for interpreting LLM activations, aligning with the large language models criterion.
-
Multiple Object Stitching for Unsupervised Representation Learning - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper proposes a method for unsupervised representation learning in multi-object images, which is relevant to representation learning.
-
Half-AVAE: Adversarial-Enhanced Factorized and Structured Encoder-Free VAE for Underdetermined Independent Component Analysis - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper advances the VAE framework for ICA, focusing on enhancing latent variable independence, aligning with representation learning.
-
Learning Robust Heterogeneous Graph Representations via Contrastive-Reconstruction under Sparse Semantics - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper introduces a novel framework for heterogeneous graph representation learning, aligning with representation learning criteria.
-
Rescaled Influence Functions: Accurate Data Attribution in High Dimension - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper introduces rescaled influence functions for data attribution, which is relevant to representation learning and training dynamics.
-
Transferring Features Across Language Models With Model Stitching - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper explores feature transfer across language models using model stitching, which is relevant to representation learning and model architecture.
-
What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge Conflict on Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper provides insights into the behavior of large language models under knowledge conflict, which is relevant to understanding LLM behavior and interpretability.
-
NeurNCD: Novel Class Discovery via Implicit Neural Representation - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper introduces a framework for novel class discovery using implicit neural representation, which is relevant to representation learning.
-
Model-Driven Graph Contrastive Learning - Score: 15 (R=8, N=7) - Date: 2025-06-09 - Comment: The paper proposes a model-driven graph contrastive learning framework, which is relevant to representation learning and introduces a novel approach.
-
Evaluating Neuron Explanations: A Unified Framework with Sanity Checks - Score: 15 (R=8, N=7) - Date: 2025-06-09 - Comment: The paper provides a unified framework for evaluating neuron explanations, which aligns with representation learning by offering insights into how neural networks encode information.
-
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties - Score: 15 (R=8, N=7) - Date: 2025-06-09 - Comment: The paper introduces reasoning graphs to understand large reasoning models, which is relevant to representation learning as it provides insights into model behavior and interpretability.
-
Learning to Weight Parameters for Data Attribution - Score: 15 (R=8, N=7) - Date: 2025-06-09 - Comment: The paper presents a method for learning parameter importance weights for data attribution, which relates to representation learning by improving understanding of how models use training data.
-
Robust Moment Identification for Nonlinear PDEs via a Neural ODE Approach - Score: 15 (R=8, N=7) - Date: 2025-06-06 - Comment: The paper proposes a framework using Neural ODEs for learning reduced-order moment dynamics, which is relevant to representation learning and emerging trends.
-
Semi-Implicit Variational Inference via Kernelized Path Gradient Descent - Score: 15 (R=8, N=7) - Date: 2025-06-06 - Comment: The paper proposes a kernelized KL divergence estimator for semi-implicit variational inference, which is relevant to representation learning and efficiency.
-
Identifying and Understanding Cross-Class Features in Adversarial Training - Score: 15 (R=8, N=7) - Date: 2025-06-06 - Comment: The paper studies adversarial training through class-wise feature attribution, which is relevant to representation learning and training dynamics.
-
Towards Reasonable Concept Bottleneck Models - Score: 15 (R=8, N=7) - Date: 2025-06-06 - Comment: The paper introduces Concept Reasoning Models (CREAM) that focus on concept bottleneck models, which is relevant to representation learning by encoding concept relationships.
-
Aligning Multimodal Representations through an Information Bottleneck - Score: 15 (R=8, N=7) - Date: 2025-06-06 - Comment: The paper discusses aligning multimodal representations using an information bottleneck, which is relevant to representation learning.
-
Exploring bidirectional bounds for minimax-training of Energy-based models - Score: 15 (R=8, N=7) - Date: 2025-06-06 - Comment: The paper explores training dynamics in energy-based models, which aligns with representation learning. It introduces bidirectional bounds to stabilize training, offering theoretical insights.
-
Hierarchical Implicit Neural Emulators - Score: 15 (R=8, N=7) - Date: 2025-06-06 - Comment: The paper introduces a multiscale implicit neural emulator for neural PDE solvers, which aligns with foundational research in representation learning by enhancing long-term prediction accuracy through hierarchical state representations.
-
You Only Train Once - Score: 15 (R=8, N=7) - Date: 2025-06-06 - Comment: The paper introduces a novel method for optimizing loss weight hyperparameters in one shot, which relates to representation learning by improving training dynamics.
-
Relational reasoning and inductive bias in transformers trained on a transitive inference task - Score: 15 (R=8, N=7) - Date: 2025-06-06 - Comment: The paper investigates relational reasoning in transformers, which is relevant to understanding transformer behavior and representation learning.
-
Learning equivariant models by discovering symmetries with learnable augmentations - Score: 15 (R=8, N=7) - Date: 2025-06-05 - Comment: The paper introduces a method for learning symmetries in data, which is relevant to representation learning and model architecture.
-
Revisiting Unbiased Implicit Variational Inference - Score: 15 (R=8, N=7) - Date: 2025-06-05 - Comment: The paper revisits unbiased implicit variational inference, focusing on improving training routines and proposing a refined approach. It aligns with representation learning by providing insights into training dynamics and optimization methods.
-
Adapting Rule Representation With Four-Parameter Beta Distribution for Learning Classifier Systems - Score: 15 (R=8, N=7) - Date: 2025-06-05 - Comment: The paper introduces a novel rule representation using a four-parameter beta distribution, which is relevant to representation learning.
-
EpiCoDe: Boosting Model Performance Beyond Training with Extrapolation and Contrastive Decoding - Score: 15 (R=8, N=7) - Date: 2025-06-05 - Comment: The paper introduces a method to boost LLM performance in data-scarcity scenarios, focusing on model extrapolation and contrastive decoding. It provides theoretical insights into contrastive decoding, relevant to representation learning.
-
Random at First, Fast at Last: NTK-Guided Fourier Pre-Processing for Tabular DL - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper proposes a novel pre-processing method for tabular deep learning, focusing on representation learning and efficiency, which aligns with the core topics.
-
Through a Steerable Lens: Magnifying Neural Network Interpretability via Phase-Based Extrapolation - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper introduces a novel framework for visualizing neural network decision mechanisms, focusing on interpretability and representation learning.
-
Constrained Sliced Wasserstein Embedding - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper introduces a constrained learning approach to optimize slicing directions for Sliced Wasserstein distances, which is relevant to representation learning and efficiency.
-
Quantifying task-relevant representational similarity using decision variable correlation - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper introduces a method to quantify task-relevant representational similarity, which is relevant to representation learning.
-
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper introduces a method for learning image-text alignment using cycle consistency, focusing on representation learning, which aligns with the core topics.
-
Towards Better Generalization and Interpretability in Unsupervised Concept-Based Models - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper introduces a novel unsupervised concept-based model for image classification, focusing on representation learning and interpretability, which aligns with the core topics.
-
Towards Unsupervised Training of Matching-based Graph Edit Distance Solver via Preference-aware GAN - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper introduces a novel unsupervised GAN-based framework for graph edit distance computation, which is foundational in representation learning.
-
Johnny: Structuring Representation Space to Enhance Machine Abstract Reasoning Ability - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper proposes a novel representation space-based framework for abstract reasoning, which aligns with representation learning.
-
From Words to Waves: Analyzing Concept Formation in Speech and Text-Based Foundation Models - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper analyzes concept formation in speech and text-based foundation models, which is relevant to representation learning.
-
Less is More: Local Intrinsic Dimensions of Contextual Language Models - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper explores the geometric properties of contextual latent embeddings in LLMs, which is relevant to representation learning.
-
Generalization in VAE and Diffusion Models: A Unified Information-Theoretic Analysis - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper provides a unified information-theoretic analysis of VAE and diffusion models, which is relevant to representation learning.
-
Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness - Score: 15 (R=8, N=7) - Date: 2025-06-03 - Comment: The paper introduces a hybrid non-Euclidean optimization method with insights into training dynamics, which is relevant to representation learning.
-
Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods - Score: 15 (R=8, N=7) - Date: 2025-06-03 - Comment: The paper provides a theoretical analysis of overadaptation in supervised fine-tuning, which aligns with foundational research in representation learning.
-
Connecting Neural Models Latent Geometries with Relative Geodesic Representations - Score: 15 (R=8, N=7) - Date: 2025-06-03 - Comment: The paper explores the latent geometries of neural models, which is relevant to representation learning.
-
Self-supervised Latent Space Optimization with Nebula Variational Coding - Score: 15 (R=8, N=7) - Date: 2025-06-03 - Comment: The paper proposes a variational inference model for optimizing latent manifolds, which is relevant to representation learning.
-
Visual Sparse Steering: Improving Zero-shot Image Classification with Sparsity Guided Steering Vectors - Score: 15 (R=8, N=7) - Date: 2025-06-03 - Comment: The paper introduces a method for zero-shot image classification using sparse features, aligning with representation learning through sparse methods.
-
Weight-Space Linear Recurrent Neural Networks - Score: 15 (R=8, N=7) - Date: 2025-06-03 - Comment: The paper introduces a new framework for sequence modeling with weight-space learning, which is relevant to representation learning.
-
Understanding Model Reprogramming for CLIP via Decoupling Visual Prompts - Score: 15 (R=8, N=7) - Date: 2025-06-03 - Comment: The paper introduces a decoupling-and-reweighting framework for model reprogramming, which provides insights into representation learning.
-
Boosting Bot Detection via Heterophily-Aware Representation Learning and Prototype-Guided Cluster Discovery - Score: 15 (R=8, N=7) - Date: 2025-06-03 - Comment: The paper introduces a heterophily-aware representation learning framework for bot detection, which is relevant to representation learning.
-
Concept-Centric Token Interpretation for Vector-Quantized Generative Models - Score: 15 (R=8, N=7) - Date: 2025-06-03 - Comment: The paper introduces a novel approach for interpreting VQGMs, which aligns with representation learning and model architecture analysis.
-
Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval - Score: 15 (R=8, N=7) - Date: 2025-06-03 - Comment: The paper proposes a novel interpretability framework using Sparse Autoencoders, which aligns with representation learning and model architecture analysis.
-
The Road to Generalizable Neuro-Symbolic Learning Should be Paved with Foundation Models - Score: 15 (R=8, N=7) - Date: 2025-06-02 - Comment: The paper discusses the role of foundation models in neuro-symbolic learning, offering a new perspective on integrating symbolic programs with foundation models, aligning with emerging trends.
-
Rethinking Continual Learning with Progressive Neural Collapse - Score: 15 (R=8, N=7) - Date: 2025-06-02 - Comment: The paper proposes Progressive Neural Collapse (ProNC) for continual learning, which is relevant to representation learning and model architecture.
-
NeuronTune: Towards Self-Guided Spurious Bias Mitigation - Score: 15 (R=8, N=7) - Date: 2025-06-02 - Comment: NeuronTune addresses spurious bias in neural networks by intervening in the model's internal decision process, which is relevant to representation learning.
-
The Rich and the Simple: On the Implicit Bias of Adam and SGD - Score: 15 (R=8, N=7) - Date: 2025-06-02 - Comment: The paper investigates the implicit bias of Adam and SGD, which is relevant to representation learning.
-
BIRD: Behavior Induction via Representation-structure Distillation - Score: 15 (R=8, N=7) - Date: 2025-06-02 - Comment: The paper presents BIRD, a framework for transferring aligned behavior in models, focusing on representation learning and robustness, which aligns with representation learning.
-
Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization - Score: 15 (R=8, N=7) - Date: 2025-06-02 - Comment: The paper discusses Sharpness-Aware Minimization (SAM) and its variant CSAM, which are relevant to representation learning as they provide insights into training dynamics and model calibration.
Other Foundational Research (27)
-
Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking - Score: 20.0 (R=0, N=0) - Date: 2025-06-10 - Comment: Author match
-
Engineering Sentience - Score: 18 (R=9, N=9) - Date: 2025-06-26 - Comment: The paper discusses the concept of engineering sentience in AI, which is an emerging trend challenging established assumptions about AI capabilities.
-
Floating-Point Neural Networks Are Provably Robust Universal Approximators - Score: 18 (R=9, N=9) - Date: 2025-06-23 - Comment: The paper introduces a new IUA theorem for floating-point neural networks, providing theoretical insights into their approximation capabilities, which is relevant to foundational research in neural networks.
-
Phase transition of \emph{descending} phase retrieval algorithms - Score: 17 (R=9, N=8) - Date: 2025-06-24 - Comment: The paper explores theoretical limits of descending phase retrieval algorithms, which aligns with emerging trends in foundational research.
-
PhysiX: A Foundation Model for Physics Simulations - Score: 17 (R=9, N=8) - Date: 2025-06-24 - Comment: The paper introduces PhysiX, a foundation model for physics simulations, which is relevant to foundational research in AI for Science and introduces a new generative paradigm.
-
Mirror Descent Using the Tempesta Generalized Multi-parametric Logarithms - Score: 17 (R=9, N=8) - Date: 2025-06-18 - Comment: The paper develops a new class of Mirror Descent algorithms, which aligns with the emerging trends criterion.
-
LLM Cannot Discover Causality, and Should Be Restricted to Non-Decisional Support in Causal Discovery - Score: 17 (R=9, N=8) - Date: 2025-06-03 - Comment: The paper provides theoretical insights into the limitations of LLMs in causal discovery, aligning with the criteria for foundational research in LLM behavior.
-
Variational Learning Finds Flatter Solutions at the Edge of Stability - Score: 16 (R=9, N=7) - Date: 2025-06-17 - Comment: The paper analyzes the implicit regularization of variational learning through the Edge of Stability framework, contributing to understanding training dynamics in neural networks.
-
Language Models over Canonical Byte-Pair Encodings - Score: 16 (R=9, N=7) - Date: 2025-06-10 - Comment: The paper addresses canonicality in token-level language models, which is relevant to foundational research in LLMs.
-
Why Gradients Rapidly Increase Near the End of Training - Score: 16 (R=9, N=7) - Date: 2025-06-04 - Comment: The paper provides insights into the training dynamics of LLMs, which aligns with foundational research in training dynamics.
-
THE-Tree: Can Tracing Historical Evolution Enhance Scientific Verification and Reasoning? - Score: 16 (R=8, N=8) - Date: 2025-06-30 - Comment: The paper presents THE-Tree, a framework for scientific verification and reasoning, which is relevant to AI for Science with a focus on foundational research.
-
Prover Agent: An Agent-based Framework for Formal Mathematical Proofs - Score: 16 (R=8, N=8) - Date: 2025-06-26 - Comment: The paper presents a novel AI agent for automated theorem proving, integrating LLMs with a formal proof assistant, relevant to LLM theoretical insights.
-
Lifting Data-Tracing Machine Unlearning to Knowledge-Tracing for Foundation Models - Score: 16 (R=8, N=8) - Date: 2025-06-16 - Comment: The paper discusses knowledge-tracing for foundation models, which is a novel perspective on model unlearning and relevant to foundational model research.
-
Sharper Convergence Rates for Nonconvex Optimisation via Reduction Mappings - Score: 16 (R=8, N=8) - Date: 2025-06-11 - Comment: The paper discusses a framework for understanding optimization landscapes, which is relevant to emerging trends in theoretical work.
-
The Consistency Hypothesis in Uncertainty Quantification for Large Language Models - Score: 15 (R=8, N=7) - Date: 2025-06-30 - Comment: The paper examines the consistency hypothesis in uncertainty quantification for LLMs, which is relevant to theoretical insights into LLM behavior.
-
A Complete Loss Landscape Analysis of Regularized Deep Matrix Factorization - Score: 15 (R=8, N=7) - Date: 2025-06-26 - Comment: The paper provides a comprehensive study of the loss landscape of regularized deep matrix factorization, contributing to the understanding of training dynamics in neural networks.
-
Controlled Generation with Equivariant Variational Flow Matching - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper derives a controlled generation objective within Variational Flow Matching, which is relevant to foundational research in generative modeling.
-
KAG-Thinker: Teaching Large Language Models to Think with Human-like Reasoning Process - Score: 15 (R=8, N=7) - Date: 2025-06-24 - Comment: The paper introduces KAG-Thinker, a reasoning framework for LLMs, which aligns with the interest in theoretical insights into LLM behavior.
-
Mathematical Proof as a Litmus Test: Revealing Failure Modes of Advanced Large Reasoning Models - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper uses mathematical proofs to reveal failure modes in large reasoning models, providing insights into LLM behavior and limitations.
-
From General to Targeted Rewards: Surpassing GPT-4 in Open-Ended Long-Context Generation - Score: 15 (R=8, N=7) - Date: 2025-06-23 - Comment: The paper introduces a new framework for long-context generation in LLMs, which is relevant to theoretical insights into LLM behavior.
-
Foundation Model Insights and a Multi-Model Approach for Superior Fine-Grained One-shot Subset Selection - Score: 15 (R=8, N=7) - Date: 2025-06-18 - Comment: The paper investigates foundation models for subset selection, which aligns with the foundation model criterion.
-
Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study - Score: 15 (R=8, N=7) - Date: 2025-06-17 - Comment: The paper introduces a cognitive framework to evaluate LLMs' learning abilities, which aligns with the interest in theoretical insights into LLM behavior.
-
BLUR: A Bi-Level Optimization Approach for LLM Unlearning - Score: 15 (R=8, N=7) - Date: 2025-06-11 - Comment: The paper proposes a novel bi-level optimization approach for LLM unlearning, which is relevant to foundational research in LLMs.
-
Explicit Preference Optimization: No Need for an Implicit Reward Model - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper introduces a new framework for preference optimization in LLMs, aligning with the large language models criterion.
-
Cross-Entropy Games for Language Models: From Implicit Knowledge to General Capability Measures - Score: 15 (R=8, N=7) - Date: 2025-06-10 - Comment: The paper discusses a novel framework for evaluating LLM capabilities, which aligns with the large language models criterion.
-
Protein Inverse Folding From Structure Feedback - Score: 15 (R=8, N=7) - Date: 2025-06-04 - Comment: The paper presents a novel approach to protein inverse folding using structure feedback, which is foundational in AI for science.
-
Existing Large Language Model Unlearning Evaluations Are Inconclusive - Score: 15 (R=8, N=7) - Date: 2025-06-03 - Comment: The paper critiques existing LLM unlearning evaluations, which is relevant to large language models.