← Previous Summary | Monthly Overview | Next Summary →
2025-03 | 2026-03 | 2026-04

Personalized Monthly Topic Summary 2026/03

Metric	Value
Total Papers	23
Architecture and Training Dynamics	9
Efficiency, Compression, and Large-Scale Training	5
Representation Learning Theory and Structure	9
Memory Structures and Agent Memory Systems	0
World Models, Exploration, and Open-Ended Reinforcement Learning	0

Architecture and Training Dynamics (9)

The Geometric Cost of Normalization: Affine Bounds on the Bayesian Complexity of Neural Networks - Score: 19 (R=10, N=9) - Date: 2026-03-31 - Comment: Strong foundational theory linking normalization geometry to exact LLC/Bayesian complexity changes, with a crisp LayerNorm vs RMSNorm distinction and supporting experiments; unusually clean and relevant for a monthly digest.
Rethinking Language Model Scaling under Transferable Hypersphere Optimization - Score: 18 (R=10, N=8) - Date: 2026-03-31 - Comment: Strong keep: proposes a coherent hypersphere-parameterization scaling framework with theory, transferable hyperparameter laws, stability results, and the new SqrtGate MoE mechanism; broad foundational significance.
Universal Approximation Constraints of Narrow ResNets: The Tunnel Effect - Score: 16 (R=9, N=7) - Date: 2026-03-31 - Comment: Good foundational theory on expressivity limits of narrow ResNets, with explicit approximation constraints and regime-dependent bounds tied to skip/residual balance. Strong enough to keep as architecture theory.
Next-Token Prediction and Regret Minimization - Score: 16 (R=8, N=8) - Date: 2026-03-31 - Comment: A strong theoretical analysis connecting next-token prediction to low-regret online decision-making, with especially relevant insight on bounded-context limitations tied to transformer-style models.
A Tight Expressivity Hierarchy for GNN-Based Entity Resolution in Master Data Management - Score: 16 (R=8, N=8) - Date: 2026-03-31 - Comment: This is a strong foundational architecture/theory paper: it gives tight expressivity separations for specific MPNN adaptations and identifies minimal message-passing mechanisms needed for different predicates, which is unusually crisp and general.
Preconditioned Attention: Enhancing Efficiency in Transformers - Score: 16 (R=9, N=7) - Date: 2026-03-31 - Comment: Introduces a principled drop-in attention variant motivated by conditioning analysis, with a clear architectural contribution and broad applicability across transformer settings; strong enough for monthly inclusion.
Temporal Credit Is Free - Score: 17 (R=9, N=8) - Date: 2026-03-31 - Comment: Keep due to a bold and potentially important claim about eliminating temporal Jacobian propagation in online recurrent learning, with a unifying normalization rule and large efficiency gains.
Can We Change the Stroke Size for Easier Diffusion? - Score: 15 (R=8, N=7) - Date: 2026-03-31 - Comment: Studies diffusion training dynamics in the low-SNR regime through a principled stroke-size intervention, with both theoretical and empirical analysis. This is a distinctive foundational contribution on generative model optimization and is strong enough to keep.
DSO: Dual-Scale Neural Operators for Stable Long-term Fluid Dynamics Forecasting - Score: 15 (R=8, N=7) - Date: 2026-03-31 - Comment: Proposes a clear architectural mechanism in neural operators by explicitly separating local and global dynamics, with strong evidence for improved long-horizon stability. This is a substantive foundational architecture contribution worth keeping.

Efficiency, Compression, and Large-Scale Training (5)

High dimensional theory of two-phase optimizers - Score: 18 (R=10, N=8) - Date: 2026-03-31 - Comment: Strong theoretical paper giving high-dimensional analysis of two-phase optimizers and distributed/local-update noise tradeoffs; foundational and likely to matter beyond the specific LA-DiLoCo setup.
KVSculpt: KV Cache Compression as Distillation - Score: 18 (R=10, N=8) - Date: 2026-03-31 - Comment: Keep for a novel and conceptually clean view of KV compression as distillation/continuous optimization rather than eviction or merging, plus adaptive per-layer/head budgeting; strong methodological contribution.
TurboAngle: Near-Lossless KV Cache Compression via Uniform Angle Quantization - Score: 17 (R=9, N=8) - Date: 2026-03-31 - Comment: A strong efficiency paper: near-lossless KV-cache compression with a distinctive angle-domain quantization scheme and useful per-layer precision allocation, delivering impressive practical compression without calibration data.
Heddle: A Distributed Orchestration System for Agentic RL Rollout - Score: 15 (R=8, N=7) - Date: 2026-03-31 - Comment: Strong systems contribution for agentic RL rollout with trajectory-centric scheduling, placement, and adaptive parallelism; the long-tail execution framing and sizable throughput gains make it monthly-digest worthy.
daVinci-LLM:Towards the Science of Pretraining - Score: 15 (R=8, N=7) - Date: 2026-03-31 - Comment: A substantial open pretraining science paper with 200+ controlled ablations on data processing and curriculum; broad foundational relevance and unusual transparency make it a strong monthly keep.

Representation Learning Theory and Structure (9)

Expectation Error Bounds for Transfer Learning in Linear Regression and Linear Neural Networks - Score: 16 (R=8, N=8) - Date: 2026-03-31 - Comment: Provides exact and non-vacuous transfer-learning generalization results in canonical linear settings, making it a strong theoretical paper with clear foundational value.
On the Asymptotics of Self-Supervised Pre-training: Two-Stage M-Estimation and Representation Symmetry - Score: 17 (R=9, N=8) - Date: 2026-03-31 - Comment: Strong foundational theory for self-supervised pretraining: a sharp two-stage M-estimation asymptotic framework that explicitly handles symmetry/identifiability and characterizes downstream risk. This is the kind of core representation-learning theory worth surfacing monthly.
On the Loss Landscape Geometry of Regularized Deep Matrix Factorization: Uniqueness and Sharpness - Score: 17 (R=9, N=8) - Date: 2026-03-31 - Comment: Strong theoretical analysis of regularized deep matrix factorization, with notable results on near-generic uniqueness and Hessian sharpness structure. A solid foundational training-dynamics/representation-learning paper for the monthly digest.
Stop Probing, Start Coding: Why Linear Probes and Sparse Autoencoders Fail at Compositional Generalisation - Score: 17 (R=9, N=8) - Date: 2026-03-31 - Comment: Strong representation-learning paper clarifying why linear probes and SAEs fail under compositional shifts, with a compelling diagnosis that dictionary learning is the core bottleneck under superposition.
Spectral Signatures of Data Quality: Eigenvalue Tail Index as a Diagnostic for Label Noise in Neural Networks - Score: 15 (R=8, N=7) - Date: 2026-03-31 - Comment: Identifies a sharp and interpretable spectral signature of data quality at the bottleneck layer, with strong empirical evidence and a useful theoretical link to spiked random matrix behavior; strong fit for monthly selection.
Semantic Interaction Information mediates compositional generalization in latent space - Score: 15 (R=8, N=7) - Date: 2026-03-31 - Comment: Strong representation-learning paper on compositional generalization, latent-variable interactions, and JEPA-style disentangling of inference versus embeddings; the framework and analysis are foundational enough for monthly selection.
Geometry-aware similarity metrics for neural representations on Riemannian and statistical manifolds - Score: 16 (R=8, N=8) - Date: 2026-03-31 - Comment: Presents a mathematically grounded framework for comparing intrinsic geometry of neural representations, offering a novel and broadly useful representation-analysis tool worthy of monthly inclusion.
The Geometry of Harmful Intent: Training-Free Anomaly Detection via Angular Deviation in LLM Residual Streams - Score: 16 (R=8, N=8) - Date: 2026-03-31 - Comment: Training-free residual-stream geometry for harmful-intent detection is a strong representation-level contribution with clear mechanistic findings, especially the persistence under refusal ablation and compact angular structure across model variants.
The Price of Meaning: Why Every Semantic Memory System Forgets - Score: 16 (R=8, N=8) - Date: 2026-03-31 - Comment: A strong theoretical paper on semantic memory and representation geometry, with broad implications: it formalizes an interference-forgetting tradeoff and derives nontrivial impossibility-style results rather than just benchmarking systems.