Personalized Daily ArXiv Papers 2026-04-30

Model	Metric	Usage			Papers
Model	Metric	Prompt	Completion	Total	Total arXiv	Scanned	Relevant
`gpt-5.4`	Tokens	70978	14622	85600	453	242	8
`gpt-5.4`	Cost	$0.18	$0.22	$0.40	453	242	8

Topic Coverage:

Topic	Papers
Architecture and Training Dynamics	2
Efficiency, Compression, and Large-Scale Training	2
Memory Structures and Agent Memory Systems	1
World Models, Exploration, and Open-Ended Reinforcement Learning	3

Table of contents by topic:

Architecture and Training Dynamics (2)

Momentum-Conserving Graph Neural Networks for Deformable Objects Authors: Jiahong Wang, Logan Numerow, Stelian Coros, Christian Theobalt, Vahid Babaei, Bernhard Thomaszewski
Hyper Input Convex Neural Networks for Shape Constrained Learning and Optimal Transport Authors: Shayan Hundrieser, Insung Kong, Johannes Schmidt-Hieber

Efficiency, Compression, and Large-Scale Training (2)

AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving Authors: Zhongkai Yu, Haotian Ye, Chenyang Zhou, Ohm Rishabh Venkatachalam, Zaifeng Pan, Zhengding Hu, Junsung Kim, Won Woo Ro, Po-An Tsai, Shuyi Pei, Yangwook Kang, Yufei Ding
CoQuant: Joint Weight-Activation Subspace Projection for Mixed-Precision LLMs Authors: Zhe Ding, Su Pan, Duowei Pan

Memory Structures and Agent Memory Systems (1)

Associative-State Universal Transformers: Sparse Retrieval Meets Structured Recurrence Authors: Liu Xiao

World Models, Exploration, and Open-Ended Reinforcement Learning (3)

Lyapunov-Guided Self-Alignment: Test-Time Adaptation for Offline Safe Reinforcement Learning Authors: Seungyub Han, Hyungjin Kim, Jungwoo Lee
Uncertainty-Aware Predictive Safety Filters for Probabilistic Neural Network Dynamics Authors: Bernd Frauenknecht, Lukas Kesper, Daniel Mayfrank, Henrik Hose, Sebastian Trimpe
Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising Authors: Jun Guo, Qiwei Li, Peiyan Li, Zilong Chen, Nan Sun, Yifei Su, Heyun Wang, Yuan Zhang, Xinghang Li, Huaping Liu

Architecture and Training Dynamics (2)

1. Momentum-Conserving Graph Neural Networks for Deformable Objects

ArXiv ID: 2604.26097

Primary Topic: Architecture and Training Dynamics

Authors: Jiahong Wang, Logan Numerow, Stelian Coros, Christian Theobalt, Vahid Babaei, Bernhard Thomaszewski

Abstract: Graph neural networks (GNNs) have emerged as a versatile and efficient option for modeling the dynamic behavior of deformable materials. While GNNs generalize readily to arbitrary shapes, mesh topologies, and material parameters, existing architectures struggle to correctly predict the temporal evolution of key physical quantities such as linear and angular momentum. In this work, we propose MomentumGNN -- a novel architecture designed to accurately track momentum by construction. Unlike existing GNNs that output unconstrained nodal accelerations, our model predicts per-edge stretching and bending impulses which guarantee the preservation of linear and angular momentum. We train our network in an unsupervised fashion using a physics-based loss, and we show that our method outperforms baselines in a number of common scenarios where momentum plays a pivotal role.

Comment: Architecture enforces linear and angular momentum conservation by predicting edge impulses instead of unconstrained node accelerations.

Topic Match: The main contribution is a new architectural mechanism with built-in physical invariants, not just an application of standard GNNs.

Relevance: 8 Novelty: 8

2. Hyper Input Convex Neural Networks for Shape Constrained Learning and Optimal Transport

ArXiv ID: 2604.26942

Primary Topic: Architecture and Training Dynamics

Authors: Shayan Hundrieser, Insung Kong, Johannes Schmidt-Hieber

Abstract: We introduce Hyper Input Convex Neural Networks (HyCNNs), a novel neural network architecture designed for learning convex functions. HyCNNs combine the principles of Maxout networks with input convex neural networks (ICNNs) to create a neural network that is always convex in the input, theoretically capable of leveraging depth, and performs reliable when trained at scale compared to ICNNs. Concretely, we prove that HyCNNs require exponentially fewer parameters than ICNNs to approximate quadratic functions up to a given precision. Throughout a series of synthetic experiments, we demonstrate that HyCNNs outperform existing ICNNs and MLPs in terms of predictive performance for convex regression and interpolation tasks. We further apply HyCNNs to learn high-dimensional optimal transport maps for synthetic examples and for single-cell RNA sequencing data, where they oftentimes outperform ICNN-based neural optimal transport methods and other baselines across a wide range of settings.

Comment: Introduces a convex-by-construction architecture with stronger approximation efficiency than standard ICNNs.

Topic Match: This is fundamentally a new neural architecture class with theoretical expressivity and training-reliability motivation.

Relevance: 8 Novelty: 8

Efficiency, Compression, and Large-Scale Training (2)

1. AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving

ArXiv ID: 2604.26103

Primary Topic: Efficiency, Compression, and Large-Scale Training

Also Matches: Memory Structures and Agent Memory Systems

Authors: Zhongkai Yu, Haotian Ye, Chenyang Zhou, Ohm Rishabh Venkatachalam, Zaifeng Pan, Zhengding Hu, Junsung Kim, Won Woo Ro, Po-An Tsai, Shuyi Pei, Yangwook Kang, Yufei Ding

Abstract: All current LLM serving systems place the GPU at the center, from production-level attention-FFN disaggregation to NVIDIA's Rubin GPU-LPU heterogeneous platform. Even academic PIM/PNM proposals still treat the GPU as the central hub for cross-device communication. Yet the GPU's compute-rich architecture is fundamentally mismatched with the memory-bound nature of decode-phase attention, inflating serving latency while wasting power and die area on idle compute units. The problem is compounded as reasoning and agentic workloads push context lengths toward one million tokens, making attention latency the primary user-facing bottleneck. To address these inefficiencies, we present AMMA, a multi-chiplet, memory-centric architecture for low-latency long-context attention. AMMA replaces GPU compute dies with HBM-PNM cubes, roughly doubling the available memory bandwidth to better serve memory-bound attention workloads. To translate this bandwidth into proportional performance gains, we introduce (i) a logic-die microarchitecture that fully exploits per-cube internal bandwidth for decode attention under a minimal power and area budget, (ii) a two-level hybrid parallelism scheme, and (iii) a reordered collective flow that reduces intra-chip die-to-die communication overhead. We further conduct a design-space exploration over per-cube compute power and intra-chip D2D link bandwidth, providing actionable guidance for hardware designers. Evaluations show that AMMA achieves 15.5X lower attention latency and 6.9X lower energy consumption compared with the NVIDIA H100.

Comment: Memory-centric multi-chiplet attention-serving architecture targeting the bandwidth bottleneck of million-token decode attention.

Topic Match: This is a clear large-scale inference systems contribution centered on memory bandwidth, architecture, and long-context serving cost.

Relevance: 9 Novelty: 8

2. CoQuant: Joint Weight-Activation Subspace Projection for Mixed-Precision LLMs

ArXiv ID: 2604.26378

Primary Topic: Efficiency, Compression, and Large-Scale Training

Authors: Zhe Ding, Su Pan, Duowei Pan

Abstract: Post-training quantization (PTQ) has become an important technique for reducing the inference cost of Large Language Models (LLMs). While recent mixed-precision methods improve ultra-low bit quantization by preserving critical subspaces in high precision, they typically construct these subspaces relying solely on activation statistics. This ignores the fundamental nature of linear operations, where the output perturbation is jointly driven by both activation and weight quantization noise. In this paper, we propose CoQuant, a joint weight-activation subspace projection method. By theoretically modeling the expected output error, CoQuant formulates a closed-form weighted PCA solution that balances activation and weight covariances to select the optimal high-precision subspace. Extensive experiments on Llama-3.2 and Qwen2.5 models show that CoQuant consistently outperforms strong PTQ baselines in both WikiText perplexity and zero-shot common-sense reasoning accuracy. These results demonstrate that joint weight-activation subspace modeling provides a principled and effective direction for low-bit LLM quantization. The source code is available at https://github.com/Zachary5895/CoQuant.

Comment: Derives a closed-form weighted PCA subspace using joint weight and activation covariances for mixed-precision LLM quantization.

Topic Match: This is a direct fit: a principled new post-training quantization method that materially improves low-bit LLM efficiency by modeling output error from both weights and activations.

Relevance: 9 Novelty: 8

Memory Structures and Agent Memory Systems (1)

1. Associative-State Universal Transformers: Sparse Retrieval Meets Structured Recurrence

ArXiv ID: 2604.25930

Primary Topic: Memory Structures and Agent Memory Systems

Also Matches: Architecture and Training Dynamics

Authors: Liu Xiao

Abstract: We study whether a structured recurrent state can serve as a compact associative backbone for language modeling while still supporting exact retrieval. We introduce UniMatrix, a Universal Transformer style family that reuses a shared recurrent block across depth and augments it with hybrid state updates, a ROSA-style residual path, and token-conditioned embedding modulation. We evaluate these models on byte-level WikiText-2, synthetic associative recall, throughput profiling on Apple MPS, and a corrected benchmark for triple-token interactions. At small scale, UniMatrix-Core and UniMatrix-ROSA slightly outperform a parameter-matched Transformer on WikiText-2 while using many fewer parameters, reaching 5.084 and 5.083 bits-per-byte versus 5.124. The main negative result is equally important: on associative recall, the original UniMatrix family remains near chance while the Transformer reaches 25.4 percent, showing that compressed recurrent state alone is not enough for exact lookup. A retrieval-oriented follow-up, UniMatrix-Assoc, helps only marginally. By contrast, UniMatrix-SparsePointer, which adds sparse slot routing and direct pointer-logit fusion, reaches 75.6 percent on the original pilot recipe and 99.2 percent on a no-dropout follow-up while using 53.8 percent fewer parameters than the Transformer baseline. Ablations show that the gain comes from sufficient slot capacity and exact pointer-level output routing. Overall, structured recurrent state is promising and parameter-efficient, but strong long-range behavior still requires explicit sparse retrieval and better kernels.

Comment: Studies structured recurrent state plus sparse slot retrieval, showing explicit retrieval is needed for exact associative lookup in sequence models.

Topic Match: The paper's core contribution is a memory mechanism result: compressed recurrent state alone fails at exact recall, while sparse retrieval and pointer routing recover associative memory behavior.

Relevance: 9 Novelty: 8

World Models, Exploration, and Open-Ended Reinforcement Learning (3)

1. Lyapunov-Guided Self-Alignment: Test-Time Adaptation for Offline Safe Reinforcement Learning

ArXiv ID: 2604.26516

Primary Topic: World Models, Exploration, and Open-Ended Reinforcement Learning

Also Matches: Memory Structures and Agent Memory Systems

Authors: Seungyub Han, Hyungjin Kim, Jungwoo Lee

Abstract: Offline reinforcement learning (RL) agents often fail when deployed, as the gap between training datasets and real environments leads to unsafe behavior. To address this, we present SAS (Self-Alignment for Safety), a transformer-based framework that enables test-time adaptation in offline safe RL without retraining. In SAS, the main mechanism is self-alignment: at test time, the pretrained agent generates several imagined trajectories and selects those satisfying the Lyapunov condition. These feasible segments are then recycled as in-context prompts, allowing the agent to realign its behavior toward safety while avoiding parameter updates. In effect, SAS turns Lyapunov-guided imagination into control-invariant prompts, and its transformer architecture admits a hierarchical RL interpretation where prompting functions as Bayesian inference over latent skills. Across Safety Gymnasium and MuJoCo benchmarks, SAS consistently reduces cost and failure while maintaining or improving return.

Comment: Lyapunov-guided imagined trajectories reused as in-context prompts for offline safe RL test-time adaptation.

Topic Match: The main contribution is a new test-time adaptation principle for RL agents built around imagined trajectories and safety-guided selection, which squarely fits foundational RL.

Relevance: 8 Novelty: 8

2. Uncertainty-Aware Predictive Safety Filters for Probabilistic Neural Network Dynamics

ArXiv ID: 2604.26836

Primary Topic: World Models, Exploration, and Open-Ended Reinforcement Learning

Authors: Bernd Frauenknecht, Lukas Kesper, Daniel Mayfrank, Henrik Hose, Sebastian Trimpe

Abstract: Predictive safety filters (PSFs) leverage model predictive control to enforce constraint satisfaction during deep reinforcement learning (RL) exploration, yet their reliance on first-principles models or Gaussian processes limits scalability and broader applicability. Meanwhile, model-based RL (MBRL) methods routinely employ probabilistic ensemble (PE) neural networks to capture complex, high-dimensional dynamics from data with minimal prior knowledge. However, existing attempts to integrate PEs into PSFs lack rigorous uncertainty quantification. We introduce the Uncertainty-Aware Predictive Safety Filter (UPSi), a PSF that provides rigorous safety predictions using PE dynamics models by formulating future outcomes as reachable sets. UPSi introduces an explicit certainty constraint that prevents model exploitation and integrates seamlessly into common MBRL frameworks. We evaluate UPSi within Dyna-style MBRL on standard safe RL benchmarks and report substantial improvements in exploration safety over prior neural network PSFs while maintaining performance on par with standard MBRL. UPSi bridges the gap between the scalability and generality of modern MBRL and the safety guarantees of predictive safety filters.

Comment: Predictive safety filter with reachable-set uncertainty handling for probabilistic ensemble neural dynamics.

Topic Match: It directly advances model-based RL by making learned probabilistic dynamics usable inside safety-critical planning with explicit uncertainty control.

Relevance: 8 Novelty: 8

3. Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising

ArXiv ID: 2604.26694

Primary Topic: World Models, Exploration, and Open-Ended Reinforcement Learning

Authors: Jun Guo, Qiwei Li, Peiyan Li, Zilong Chen, Nan Sun, Yifei Su, Heyun Wang, Yuan Zhang, Xinghang Li, Huaping Liu

Abstract: We propose X-WAM, a Unified 4D World Model that unifies real-time robotic action execution and high-fidelity 4D world synthesis (video + 3D reconstruction) in a single framework, addressing the critical limitations of prior unified world models (e.g., UWM) that only model 2D pixel-space and fail to balance action efficiency and world modeling quality. To leverage the strong visual priors of pretrained video diffusion models, X-WAM imagines the future world by predicting multi-view RGB-D videos, and obtains spatial information efficiently through a lightweight structural adaptation: replicating the final few blocks of the pretrained Diffusion Transformer into a dedicated depth prediction branch for the reconstruction of future spatial information. Moreover, we propose Asynchronous Noise Sampling (ANS) to jointly optimize generation quality and action decoding efficiency. ANS applies a specialized asynchronous denoising schedule during inference, which rapidly decodes actions with fewer steps to enable efficient real-time execution, while dedicating the full sequence of steps to generate high-fidelity video. Rather than entirely decoupling the timesteps during training, ANS samples from their joint distribution to align with the inference distribution. Pretrained on over 5,800 hours of robotic data, X-WAM achieves 79.2% and 90.7% average success rate on RoboCasa and RoboTwin 2.0 benchmarks, while producing high-fidelity 4D reconstruction and generation surpassing existing methods in both visual and geometric metrics.

Comment: Unified action-conditioned 4D world model with asynchronous denoising to jointly support fast action decoding and richer future world synthesis.

Topic Match: Despite the robotics benchmark focus, the central contribution is a foundation-style world model design for action and future-state generation.

Relevance: 8 Novelty: 8

Paper Selection Prompt

System Prompt

You are a helpful paper reading assistant whose job is to read daily posts from ArXiv and identify a few papers that your friend will enjoy reading. Your job is to carefully read the paper titles and abstracts below and find the ones that match the criteria below.

User Prompt

Relevant Topics

Focus on specialized foundational research that remains worth reading even when it is not a daily hotspot.

Do not keep papers only because they are broadly frontier-relevant, widely discussed, or part of a major launch cycle. Broad daily frontier movement belongs in the hotspot digest unless the core contribution strongly matches the specialized topics below.

Architecture and Training Dynamics - Keep: work that introduces or analyzes core architectural or computational mechanisms such as MoE routing, attention variants, normalization or residual design, recurrent or state-space sequence modeling, dynamic or modular computation, or training-stability mechanisms. - Filter: papers that mainly apply an existing architecture to a new task or benchmark without new mechanistic insight.

Efficiency, Compression, and Large-Scale Training - Keep: quantization, sparsity, pruning, low-rank adaptation, KV-cache or cache design, memory-efficient inference or training, distributed training algorithms, communication or optimizer improvements, and training-system designs that materially change large-model training cost or behavior. - Filter: routine infrastructure optimization, deployment work, or straightforward tuning of standard efficiency methods without a clear new algorithmic or systems idea.

Representation Learning Theory and Structure - Keep: work on feature formation, sparse or dictionary learning, contrastive or self-supervised representation structure, training dynamics, identifiability, or other mechanistic understanding of learned representations. - Filter: papers that use representation-learning methods as standard components in downstream applications without new theoretical or methodological content.

Memory Structures and Agent Memory Systems - Keep: internal or external memory mechanisms, differentiable memory, recurrent or latent memory, long-context memory organization, memory compression or eviction, retrieval as a learned memory mechanism, episodic or semantic memory for agents, memory consolidation, forgetting, and agent memory systems whose core contribution is a new principle for storing, updating, recalling, or reasoning over memory. - Filter: standard RAG pipelines, vector-database plumbing, context stuffing, chat-history management, or agent products that add memory without a new memory mechanism, learning principle, or analysis.

World Models, Exploration, and Open-Ended Reinforcement Learning - Keep: model-based RL, action-conditioned world models, imagination or planning-based agents, open-ended exploration, automatic curriculum or environment generation, continual RL, reward-free skill discovery, and RL methods aimed at learning new behaviors or transferable knowledge through interaction. Also keep foundational work on pre-training agents or world models, foundation world models, generative interactive environments, or theoretical arguments about why world models or exploration are necessary for general-purpose agents. - Filter: RLHF, DPO, GRPO, RFT, instruction-following or alignment fine-tuning for LLMs; papers where RL is mainly a post-training optimizer for language models, reasoning traces, or tool-use agents without a new world-model, exploration, or generalization contribution; routine benchmark gains on a fixed environment without a new learning principle.

Usually leave these to the hotspot digest unless the core contribution is clearly foundational: - major model or product releases - broadly trendy agent or tooling launches - benchmark, leaderboard, or evaluation-only papers - downstream applications in medical imaging, segmentation, 3D vision, video understanding, information retrieval, summarization, recommendation, machine translation, speech recognition, time series, knowledge graphs, and similar domains

Scoring Criteria

Relevance and Novelty are independent axes. Score both from 1 to 10.

Relevance Scoring

9-10: directly centered on the target foundational topics; highest when the core contribution is clearly within them.

7-8: substantially related, but partly peripheral or focused on a narrower aspect.

5-6: touches the target topics, but the main contribution is elsewhere.

3-4: largely outside the target topics, often application-focused or domain-specific.

1-2: unrelated.

Important: Broad frontier relevance, major launch status, or daily buzz is not enough for a high Relevance score here. Those cases belong in the hotspot digest unless the paper strongly matches the specialized paper topics.

Novelty Scoring

9-10: new paradigm, theory, or major methodological breakthrough.

7-8: substantial methodological advance or strong new insight.

5-6: meaningful but incremental extension or refinement.

3-4: minor, narrow, or mostly engineering or domain-specific improvement.

1-2: little originality; mainly standard application of existing methods.

Topic Registry

Use exactly one PRIMARY_TOPIC_ID chosen from the stable topic IDs below. - architecture_training: Architecture and Training Dynamics - Core architectural or computational mechanisms, dynamic computation, and training-stability dynamics. - efficiency_scaling: Efficiency, Compression, and Large-Scale Training - Compression, sparsity, memory or cache efficiency, and large-scale training systems that materially change cost or behavior. - representation_structure: Representation Learning Theory and Structure - How learned representations form, organize, and support generalization or mechanistic understanding. - memory_systems: Memory Structures and Agent Memory Systems - Internal or external memory mechanisms, learned retrieval memory, consolidation, forgetting, and agent memory systems. - world_models_open_ended_rl: World Models, Exploration, and Open-Ended Reinforcement Learning - World models, model-based RL, exploration, continual learning, and RL for transferable knowledge acquisition rather than LLM post-training.

Papers

[PAPER LIST HERE]

Instructions

Respond in JSONL. Output exactly one JSON object per paper, one per line:

{"ARXIVID":"...","COMMENT":"...","RELEVANCE":0,"NOVELTY":0,"PRIMARY_TOPIC_ID":"...","MATCHED_TOPIC_IDS":[],"TOPIC_MATCH_COMMENT":"...","HOTSPOT_PAPER_TAGS":[],"HOTSPOT_PAPER_COMMENT":"..."}

Rules: - ARXIVID: the arXiv ID. - COMMENT: identify the single strongest matching criterion. Be brief and specific. Do not rely on generic phrases like "language modeling" or "advancement". Do not mention non-matching criteria. - RELEVANCE: integer from 1 to 10. - NOVELTY: integer from 1 to 10. - PRIMARY_TOPIC_ID: exactly one stable topic ID from the allowed topic registry. - MATCHED_TOPIC_IDS: zero or more stable topic IDs from the same allowed set. Include PRIMARY_TOPIC_ID when there are multiple matches. - TOPIC_MATCH_COMMENT: briefly explain why the primary topic is the best fit. - HOTSPOT_PAPER_TAGS: zero or more tags from this exact set only: daily_hot, new_frontier. - HOTSPOT_PAPER_COMMENT: briefly explain why the paper belongs in the daily hotspot paper feed when HOTSPOT_PAPER_TAGS is non-empty; otherwise use an empty string. - Use HOTSPOT_PAPER_TAGS sparingly. Most papers should return []. - daily_hot means the paper feels broadly important to the day and belongs in the daily hotspot paper section even if it is not part of the personalized foundational reading list. - new_frontier means the paper appears to open a genuinely new direction, paradigm, or field, even if the work is still early. - Do not output markdown, code fences, or any extra text.