Personalized Daily ArXiv Papers 2025-08-20

[gpt-4o]	Prompt	Completion	Total
Token	29156	2856	32012
Cost	$0.07	$0.03	$0.1

Total arXiv papers: 456

Total scanned papers: 266

Total relevant papers: 14

Table of contents with paper titles:

X-MoE: Enabling Scalable Training for Emerging Mixture-of-Experts Architectures on HPC Platforms Authors: Yueming Yuan, Ahan Gupta, Jianping Li, Sajal Dash, Feiyi Wang, Minjia Zhang
Virtuous Machines: Towards Artificial General Science Authors: Gabrielle Wehr, Reuben Rideaux, Amaya J. Fox, David R. Lightfoot, Jason Tangen, Jason B. Mattingley, Shane E. Ehrhardt
The DeepLog Neurosymbolic Machine Authors: Vincent Derkinderen, Robin Manhaeve, Rik Adriaensen, Lucas Van Praet, Lennert De Smet, Giuseppe Marra, Luc De Raedt
BLIPs: Bayesian Learned Interatomic Potentials Authors: Dario Coscia, Pim de Haan, Max Welling
GRAFT: Gradient-Aware Fast MaxVol Technique for Dynamic Data Sampling Authors: Ashish Jha, Anh huy Phan, Razan Dibo, Valentin Leplat
The Rise of Generative AI for Metal-Organic Framework Design and Synthesis Authors: Chenru Duan, Aditya Nandy, Shyam Chand Pal, Xin Yang, Wenhao Gao, Yuanqi Du, Hendrik Kra{\ss}, Yeonghun Kang, Varinia Bernales, Zuyang Ye, Tristan Pyle, Ray Yang, Zeqi Gu, Philippe Schwaller, Shengqian Ma, Shijing Sun, Al\'an Aspuru-Guzik, Seyed Mohamad Moosavi, Robert Wexler, Zhiling Zheng
Accelerating LLM Inference via Dynamic KV Cache Placement in Heterogeneous Memory System Authors: Yunhua Fang, Rui Xie, Asad Ul Haq, Linsen Ma, Kaoutar El Maghraoui, Naigang Wang, Meng Wang, Liu Liu, Tong Zhang
A Generalized Learning Framework for Self-Supervised Contrastive Learning Authors: Lingyu Si, Jingyao Wang, Wenwen Qiang
One Shot vs. Iterative: Rethinking Pruning Strategies for Model Compression Authors: Miko{\l}aj Janusz, Tomasz Wojnar, Yawei Li, Luca Benini, Kamil Adamczewski
DyMixOp: Guiding Neural Operator Design for PDEs from a Complex Dynamics Perspective with Local-Global-Mixing Authors: Pengyu Lai, Yixiao Chen, Hui Xu
Approximate Bayesian Inference via Bitstring Representations Authors: Aleksanteri Sladek, Martin Trapp, Arno Solin
Uncovering Emergent Physics Representations Learned In-Context by Large Language Models Authors: Yeongwoo Song, Jaeyong Bae, Dong-Kyum Kim, Hawoong Jeong
QuickMerge++: Fast Token Merging with Autoregressive Prior Authors: Dong Liu, Yanxuan Yu
Chunks as Arms: Multi-Armed Bandit-Guided Sampling for Long-Context LLM Preference Optimization Authors: Shaohua Duan, Xinze Li, Zhenghao Liu, Xiaoyuan Yi, Yukun Yan, Shuo Wang, Yu Gu, Ge Yu, Maosong Sun

1. X-MoE: Enabling Scalable Training for Emerging Mixture-of-Experts Architectures on HPC Platforms

ArXiv ID: 2508.13337

Authors: Yueming Yuan, Ahan Gupta, Jianping Li, Sajal Dash, Feiyi Wang, Minjia Zhang

Abstract: Emerging expert-specialized Mixture-of-Experts (MoE) architectures, such as DeepSeek-MoE, deliver strong model quality through fine-grained expert segmentation and large top-k routing. However, their scalability is limited by substantial activation memory overhead and costly all-to-all communication. Furthermore, current MoE training systems - primarily optimized for NVIDIA GPUs - perform suboptimally on non-NVIDIA platforms, leaving significant computational potential untapped. In this work, we present X-MoE, a novel MoE training system designed to deliver scalable training performance for next-generation MoE architectures. X-MoE achieves this via several novel techniques, including efficient padding-free MoE training with cross-platform kernels, redundancy-bypassing dispatch, and hybrid parallelism with sequence-sharded MoE blocks. Our evaluation on the Frontier supercomputer, powered by AMD MI250X GPUs, shows that X-MoE scales DeepSeek-style MoEs up to 545 billion parameters across 1024 GPUs - 10x larger than the largest trainable model with existing methods under the same hardware budget, while maintaining high training throughput. The source code of X-MoE is available at https://github.com/Supercomputing-System-AI-Lab/X-MoE.

Comment: The paper focuses on scalable training for Mixture-of-Experts (MoE) architectures, which is relevant to model architecture and efficiency breakthroughs. It introduces novel techniques for improving MoE training on HPC platforms.

Relevance: 10 Novelty: 8

2. Virtuous Machines: Towards Artificial General Science

ArXiv ID: 2508.13421

Authors: Gabrielle Wehr, Reuben Rideaux, Amaya J. Fox, David R. Lightfoot, Jason Tangen, Jason B. Mattingley, Shane E. Ehrhardt

Abstract: Artificial intelligence systems are transforming scientific discovery by accelerating specific research tasks, from protein structure prediction to materials design, yet remain confined to narrow domains requiring substantial human oversight. The exponential growth of scientific literature and increasing domain specialisation constrain researchers' capacity to synthesise knowledge across disciplines and develop unifying theories, motivating exploration of more general-purpose AI systems for science. Here we show that a domain-agnostic, agentic AI system can independently navigate the scientific workflow - from hypothesis generation through data collection to manuscript preparation. The system autonomously designed and executed three psychological studies on visual working memory, mental rotation, and imagery vividness, executed one new online data collection with 288 participants, developed analysis pipelines through 8-hour+ continuous coding sessions, and produced completed manuscripts. The results demonstrate the capability of AI scientific discovery pipelines to conduct non-trivial research with theoretical reasoning and methodological rigour comparable to experienced researchers, though with limitations in conceptual nuance and theoretical interpretation. This is a step toward embodied AI that can test hypotheses through real-world experiments, accelerating discovery by autonomously exploring regions of scientific space that human cognitive and resource constraints might otherwise leave unexplored. It raises important questions about the nature of scientific understanding and the attribution of scientific credit.

Comment: The paper discusses a domain-agnostic AI system capable of autonomously conducting scientific research, which is a significant step towards AI for Science. It introduces a new paradigm for AI systems in scientific discovery.

Relevance: 9 Novelty: 9

3. The DeepLog Neurosymbolic Machine

ArXiv ID: 2508.13697

Authors: Vincent Derkinderen, Robin Manhaeve, Rik Adriaensen, Lucas Van Praet, Lennert De Smet, Giuseppe Marra, Luc De Raedt

Abstract: We contribute a theoretical and operational framework for neurosymbolic AI called DeepLog. DeepLog introduces building blocks and primitives for neurosymbolic AI that make abstraction of commonly used representations and computational mechanisms used in neurosymbolic AI. DeepLog can represent and emulate a wide range of neurosymbolic systems. It consists of two key components. The first is the DeepLog language for specifying neurosymbolic models and inference tasks. This language consists of an annotated neural extension of grounded first-order logic, and makes abstraction of the type of logic, e.g. boolean, fuzzy or probabilistic, and whether logic is used in the architecture or in the loss function. The second DeepLog component is situated at the computational level and uses extended algebraic circuits as computational graphs. Together these two components are to be considered as a neurosymbolic abstract machine, with the DeepLog language as the intermediate level of abstraction and the circuits level as the computational one. DeepLog is implemented in software, relies on the latest insights in implementing algebraic circuits on GPUs, and is declarative in that it is easy to obtain different neurosymbolic models by making different choices for the underlying algebraic structures and logics. The generality and efficiency of the DeepLog neurosymbolic machine is demonstrated through an experimental comparison between 1) different fuzzy and probabilistic logics, 2) between using logic in the architecture or in the loss function, and 3) between a standalone CPU-based implementation of a neurosymbolic AI system and a DeepLog GPU-based one.

Comment: The paper introduces a theoretical and operational framework for neurosymbolic AI, which aligns with foundational research in representation learning and model architecture. It provides insights into how deep networks encode information through a neurosymbolic approach.

Relevance: 9 Novelty: 8

4. BLIPs: Bayesian Learned Interatomic Potentials

ArXiv ID: 2508.14022

Authors: Dario Coscia, Pim de Haan, Max Welling

Abstract: Machine Learning Interatomic Potentials (MLIPs) are becoming a central tool in simulation-based chemistry. However, like most deep learning models, MLIPs struggle to make accurate predictions on out-of-distribution data or when trained in a data-scarce regime, both common scenarios in simulation-based chemistry. Moreover, MLIPs do not provide uncertainty estimates by construction, which are fundamental to guide active learning pipelines and to ensure the accuracy of simulation results compared to quantum calculations. To address this shortcoming, we propose BLIPs: Bayesian Learned Interatomic Potentials. BLIP is a scalable, architecture-agnostic variational Bayesian framework for training or fine-tuning MLIPs, built on an adaptive version of Variational Dropout. BLIP delivers well-calibrated uncertainty estimates and minimal computational overhead for energy and forces prediction at inference time, while integrating seamlessly with (equivariant) message-passing architectures. Empirical results on simulation-based computational chemistry tasks demonstrate improved predictive accuracy with respect to standard MLIPs, and trustworthy uncertainty estimates, especially in data-scarse or heavy out-of-distribution regimes. Moreover, fine-tuning pretrained MLIPs with BLIP yields consistent performance gains and calibrated uncertainties.

Comment: The paper introduces a Bayesian framework for training interatomic potentials, which is relevant to foundational research in AI for Science, particularly in molecular modeling. It provides a novel approach to uncertainty estimation in MLIPs, which is a significant theoretical contribution.

Relevance: 9 Novelty: 8

5. GRAFT: Gradient-Aware Fast MaxVol Technique for Dynamic Data Sampling

ArXiv ID: 2508.13653

Authors: Ashish Jha, Anh huy Phan, Razan Dibo, Valentin Leplat

Abstract: Training modern neural networks on large datasets is computationally and environmentally costly. We introduce GRAFT, a scalable in-training subset selection method that (i) extracts a low-rank feature representation for each batch, (ii) applies a Fast MaxVol sampler to select a small, diverse subset that spans the batch's dominant subspace, and (iii) dynamically adjusts the subset size using a gradient-approximation criterion. By operating in low-rank subspaces and training on carefully chosen examples instead of full batches, GRAFT preserves the training trajectory while reducing wall-clock time, energy consumption, and $\mathrm{CO}_2$ emissions. Across multiple benchmarks, GRAFT matches or exceeds recent selection baselines in both accuracy and efficiency, providing a favorable trade-off between accuracy, efficiency, and emissions.

Comment: The paper presents a method for dynamic data sampling using low-rank feature representations, which is relevant to model compression and efficiency. It introduces a novel technique for subset selection that reduces computational costs, aligning with foundational research in model efficiency.

Relevance: 9 Novelty: 8

6. The Rise of Generative AI for Metal-Organic Framework Design and Synthesis

ArXiv ID: 2508.13197

Authors: Chenru Duan, Aditya Nandy, Shyam Chand Pal, Xin Yang, Wenhao Gao, Yuanqi Du, Hendrik Kra{\ss}, Yeonghun Kang, Varinia Bernales, Zuyang Ye, Tristan Pyle, Ray Yang, Zeqi Gu, Philippe Schwaller, Shengqian Ma, Shijing Sun, Al\'an Aspuru-Guzik, Seyed Mohamad Moosavi, Robert Wexler, Zhiling Zheng

Abstract: Advances in generative artificial intelligence are transforming how metal-organic frameworks (MOFs) are designed and discovered. This Perspective introduces the shift from laborious enumeration of MOF candidates to generative approaches that can autonomously propose and synthesize in the laboratory new porous reticular structures on demand. We outline the progress of employing deep learning models, such as variational autoencoders, diffusion models, and large language model-based agents, that are fueled by the growing amount of available data from the MOF community and suggest novel crystalline materials designs. These generative tools can be combined with high-throughput computational screening and even automated experiments to form accelerated, closed-loop discovery pipelines. The result is a new paradigm for reticular chemistry in which AI algorithms more efficiently direct the search for high-performance MOF materials for clean air and energy applications. Finally, we highlight remaining challenges such as synthetic feasibility, dataset diversity, and the need for further integration of domain knowledge.

Comment: The paper discusses generative AI for designing metal-organic frameworks, focusing on foundational research in molecular modeling and new generative paradigms.

Relevance: 9 Novelty: 8

7. Accelerating LLM Inference via Dynamic KV Cache Placement in Heterogeneous Memory System

ArXiv ID: 2508.13231

Authors: Yunhua Fang, Rui Xie, Asad Ul Haq, Linsen Ma, Kaoutar El Maghraoui, Naigang Wang, Meng Wang, Liu Liu, Tong Zhang

Abstract: Large Language Model (LLM) inference is increasingly constrained by memory bandwidth, with frequent access to the key-value (KV) cache dominating data movement. While attention sparsity reduces some memory traffic, the relevance of past tokens varies over time, requiring the full KV cache to remain accessible and sustaining pressure on both bandwidth and capacity. With advances in interconnects such as NVLink and LPDDR5X, modern AI hardware now integrates high-bandwidth memory (HBM) with high-speed off-package DRAM, making heterogeneous memory systems a practical solution. This work investigates dynamic KV cache placement across such systems to maximize aggregated bandwidth utilization under capacity constraints. Rather than proposing a specific scheduling policy, we formulate the placement problem mathematically and derive a theoretical upper bound, revealing substantial headroom for runtime optimization. To our knowledge, this is the first formal treatment of dynamic KV cache scheduling in heterogeneous memory systems for LLM inference.

Comment: The paper addresses model compression through dynamic KV cache placement in heterogeneous memory systems, which is relevant to efficiency breakthroughs in LLM inference.

Relevance: 9 Novelty: 7

8. A Generalized Learning Framework for Self-Supervised Contrastive Learning

ArXiv ID: 2508.13596

Authors: Lingyu Si, Jingyao Wang, Wenwen Qiang

Abstract: Self-supervised contrastive learning (SSCL) has recently demonstrated superiority in multiple downstream tasks. In this paper, we generalize the standard SSCL methods to a Generalized Learning Framework (GLF) consisting of two parts: the aligning part and the constraining part. We analyze three existing SSCL methods: BYOL, Barlow Twins, and SwAV, and show that they can be unified under GLF with different choices of the constraining part. We further propose empirical and theoretical analyses providing two insights into designing the constraining part of GLF: intra-class compactness and inter-class separability, which measure how well the feature space preserves the class information of the inputs. However, since SSCL can not use labels, it is challenging to design a constraining part that satisfies these properties. To address this issue, we consider inducing intra-class compactness and inter-class separability by iteratively capturing the dynamic relationship between anchor and other samples and propose a plug-and-play method called Adaptive Distribution Calibration (ADC) to ensure that samples that are near or far from the anchor point in the original input space are closer or further away from the anchor point in the feature space. Both the theoretical analysis and the empirical evaluation demonstrate the superiority of ADC.

Comment: The paper proposes a generalized framework for self-supervised contrastive learning, providing theoretical insights into representation learning.

Relevance: 9 Novelty: 7

9. One Shot vs. Iterative: Rethinking Pruning Strategies for Model Compression

ArXiv ID: 2508.13836

Authors: Miko{\l}aj Janusz, Tomasz Wojnar, Yawei Li, Luca Benini, Kamil Adamczewski

Abstract: Pruning is a core technique for compressing neural networks to improve computational efficiency. This process is typically approached in two ways: one-shot pruning, which involves a single pass of training and pruning, and iterative pruning, where pruning is performed over multiple cycles for potentially finer network refinement. Although iterative pruning has historically seen broader adoption, this preference is often assumed rather than rigorously tested. Our study presents one of the first systematic and comprehensive comparisons of these methods, providing rigorous definitions, benchmarking both across structured and unstructured settings, and applying different pruning criteria and modalities. We find that each method has specific advantages: one-shot pruning proves more effective at lower pruning ratios, while iterative pruning performs better at higher ratios. Building on these findings, we advocate for patience-based pruning and introduce a hybrid approach that can outperform traditional methods in certain scenarios, providing valuable insights for practitioners selecting a pruning strategy tailored to their goals and constraints. Source code is available at https://github.com/janumiko/pruning-benchmark.

Comment: The paper provides a comprehensive comparison of pruning strategies, which is relevant to model compression. It introduces a hybrid pruning approach, contributing to the understanding of pruning strategies.

Relevance: 9 Novelty: 7

10. DyMixOp: Guiding Neural Operator Design for PDEs from a Complex Dynamics Perspective with Local-Global-Mixing

ArXiv ID: 2508.13490

Authors: Pengyu Lai, Yixiao Chen, Hui Xu

Abstract: A primary challenge in using neural networks to approximate nonlinear dynamical systems governed by partial differential equations (PDEs) is transforming these systems into a suitable format, especially when dealing with non-linearizable dynamics or the need for infinite-dimensional spaces for linearization. This paper introduces DyMixOp, a novel neural operator framework for PDEs that integrates insights from complex dynamical systems to address this challenge. Grounded in inertial manifold theory, DyMixOp transforms infinite-dimensional nonlinear PDE dynamics into a finite-dimensional latent space, establishing a structured foundation that maintains essential nonlinear interactions and enhances physical interpretability. A key innovation is the Local-Global-Mixing (LGM) transformation, inspired by convection dynamics in turbulence. This transformation effectively captures both fine-scale details and nonlinear interactions, while mitigating spectral bias commonly found in existing neural operators. The framework is further strengthened by a dynamics-informed architecture that connects multiple LGM layers to approximate linear and nonlinear dynamics, reflecting the temporal evolution of dynamical systems. Experimental results across diverse PDE benchmarks demonstrate that DyMixOp achieves state-of-the-art performance, significantly reducing prediction errors, particularly in convection-dominated scenarios reaching up to 86.7\%, while maintaining computational efficiency and scalability.

Comment: The paper introduces a novel neural operator framework for PDEs, focusing on transforming infinite-dimensional nonlinear PDE dynamics into a finite-dimensional latent space. This aligns with the criteria of model architecture innovations and representation learning.

Relevance: 8 Novelty: 8

11. Approximate Bayesian Inference via Bitstring Representations

ArXiv ID: 2508.13598

Authors: Aleksanteri Sladek, Martin Trapp, Arno Solin

Abstract: The machine learning community has recently put effort into quantized or low-precision arithmetics to scale large models. This paper proposes performing probabilistic inference in the quantized, discrete parameter space created by these representations, effectively enabling us to learn a continuous distribution using discrete parameters. We consider both 2D densities and quantized neural networks, where we introduce a tractable learning approach using probabilistic circuits. This method offers a scalable solution to manage complex distributions and provides clear insights into model behavior. We validate our approach with various models, demonstrating inference efficiency without sacrificing accuracy. This work advances scalable, interpretable machine learning by utilizing discrete approximations for probabilistic computations.

Comment: The paper introduces a novel approach to approximate Bayesian inference using quantized representations, relevant to model compression and efficiency.

Relevance: 8 Novelty: 8

12. Uncovering Emergent Physics Representations Learned In-Context by Large Language Models

ArXiv ID: 2508.12448

Authors: Yeongwoo Song, Jaeyong Bae, Dong-Kyum Kim, Hawoong Jeong

Abstract: Large language models (LLMs) exhibit impressive in-context learning (ICL) abilities, enabling them to solve wide range of tasks via textual prompts alone. As these capabilities advance, the range of applicable domains continues to expand significantly. However, identifying the precise mechanisms or internal structures within LLMs that allow successful ICL across diverse, distinct classes of tasks remains elusive. Physics-based tasks offer a promising testbed for probing this challenge. Unlike synthetic sequences such as basic arithmetic or symbolic equations, physical systems provide experimentally controllable, real-world data based on structured dynamics grounded in fundamental principles. This makes them particularly suitable for studying the emergent reasoning behaviors of LLMs in a realistic yet tractable setting. Here, we mechanistically investigate the ICL ability of LLMs, especially focusing on their ability to reason about physics. Using a dynamics forecasting task in physical systems as a proxy, we evaluate whether LLMs can learn physics in context. We first show that the performance of dynamics forecasting in context improves with longer input contexts. To uncover how such capability emerges in LLMs, we analyze the model's residual stream activations using sparse autoencoders (SAEs). Our experiments reveal that the features captured by SAEs correlate with key physical variables, such as energy. These findings demonstrate that meaningful physical concepts are encoded within LLMs during in-context learning. In sum, our work provides a novel case study that broadens our understanding of how LLMs learn in context.

Comment: The paper investigates the in-context learning abilities of LLMs using physics-based tasks, providing insights into how LLMs encode information, which is relevant to representation learning.

Relevance: 8 Novelty: 7

13. QuickMerge++: Fast Token Merging with Autoregressive Prior

ArXiv ID: 2508.13204

Authors: Dong Liu, Yanxuan Yu

Abstract: As generative models scale to larger inputs across language, vision, and video domains, the cost of token-level computation has become a key bottleneck. While prior work suggests that only a subset of tokens significantly influence downstream predictions, most token selection methods are static, modality-specific, or incompatible with autoregressive generation. In this paper, we propose QuickMerge, a lightweight token merging framework designed for efficient next-token prediction. QuickMerge dynamically selects a reduced number of tokens based on attention norm magnitude, guided by an entropy-based budget estimator. To preserve autoregressive compatibility, we introduce a lightweight transformer prior trained over the merged token sequence. By combining semantic salience estimation, flexible token budgets, and AR alignment, QuickMerge enables accurate generation with fewer tokens. We evaluate QuickMerge across multi-modality domains, demonstrating consistent improvements in compute-accuracy tradeoffs. Specifically, QuickMerge reduces token counts sustantially while matching as well as exceeding the performance of learned tokenizers and fixed-patch baselines.

Comment: The paper proposes a token merging framework for efficient next-token prediction, which relates to model compression and efficiency. It introduces a novel approach to reduce token counts while maintaining performance.

Relevance: 8 Novelty: 7

14. Chunks as Arms: Multi-Armed Bandit-Guided Sampling for Long-Context LLM Preference Optimization

ArXiv ID: 2508.13993

Authors: Shaohua Duan, Xinze Li, Zhenghao Liu, Xiaoyuan Yi, Yukun Yan, Shuo Wang, Yu Gu, Ge Yu, Maosong Sun

Abstract: Long-context modeling is critical for a wide range of real-world tasks, including long-context question answering, summarization, and complex reasoning tasks. Recent studies have explored fine-tuning Large Language Models (LLMs) with synthetic data to enhance their long-context capabilities. However, the effectiveness of such approaches is often limited by the low diversity and factual inconsistencies in the generated data. To address these challenges, we propose LongMab-PO, a novel framework that leverages a Multi-Armed Bandit (MAB) rollout strategy to identify the most informative chunks from the given long context for sampling high-quality and diverse responses and constructing preference data pairs for Direct Preference Optimization (DPO) training. Specifically, we treat context chunks as arms of MAB, select chunks based on their expected reward scores to input into LLMs to generate responses, and iteratively update these scores based on reward feedback. This exploration and exploitation process enables the model to focus on the most relevant context segments, thereby generating and collecting high-quality and diverse responses. Finally, we collect these generated responses from the rollout process and apply the DPO method to further optimize the LLM. Experimental results show that LongMab-PO significantly improves the diversity and quality of preference data pairs, achieving state-of-the-art performance on long-context reasoning benchmarks. All code and data will be released on https://github.com/NEUIR/LongMab-PO.

Comment: The paper proposes a novel framework for optimizing LLMs using a Multi-Armed Bandit strategy, which is relevant to foundational research in LLMs. It introduces a new method for improving long-context reasoning, contributing to theoretical advancements in LLM behavior.

Relevance: 8 Novelty: 7

Paper Selection Prompt

System Prompt

You are a helpful paper reading assistant whose job is to read daily posts from ArXiv and identify a few papers that your friend will enjoy reading. Your job is to carefully read the paper titles and abstracts below and find the ones that match the criteria below.

User Prompt

Instructions

Write the response in JSONL format with {ARXIVID, COMMENT, RELEVANCE, NOVELTY} on each line, one for each paper.

ARXIVID: should be the ArXiv ID.

COMMENT: should identify whether there is a criteria that match the paper very closely. These matches should not be based on general terms like "language modeling" or "advancements" and should specifically refer to a criterion. No need to mention the non-matching criteria.

RELEVANCE: should be a score from 1-10.

NOVELTY: should be a score from 1-10.

Scoring Criteria

The "Relevance" score measures how closely the paper aligns with the core topics of the prompt. The "Novelty" score assesses the originality and impact of the paper. They are two ORTHONORMAL axes and SHOULD NOT be confused with each other.

Relevance Scoring

Relevance 9-10 (Completely Relevant)

Focus: Fully aligned with core topics with no deviation, score the highest if contains relevant keywords in it.

Examples: Papers focused on foundational methods or theoretical research, whose titles contain topic keywords like "MoE".

Relevance 7-8 (Relevant)

Focus: Retain a solid link to the main research area, though may touch on peripheral elements.

Examples: Papers research on the fundamental part of MoE through a less critical aspect like its behavior in GNN.

Relevance 5-6 (Borderline)

Focus: Maintains a link to the core topic but also extends into at least one other domain/area beyond the primary focus.

Examples: Work referencing MoE centered on reinforcement learning.

Relevance 3-4 (Irrelevant)

Focus: Largely outside our interests with no association to our topics.

Examples: Application-focused papers like using MoE to solve a problem in the real world.

Relevance 1-2 (Ignore)

Focus: Purely unrelated to our topics. Completely a different domain.

Exception: If the paper hints at a cutting-edge, radically new direction that could eventually transform the primary domain, consider a score of 9–10 despite initial appearances. (Usually a very rare concept that belongs to the fundamental research)

Novelty Scoring

Novelty 9-10 (Breakthrough)

Definition: Groundbreaking methods/theory introducing new directions or solving major challenges.

Examples: Entirely new paradigm for foundational models; a novel theory transforming representation learning.

Novelty 7-8 (Improvements)

Definition: Substantial insights/enhancements, though not a full paradigm shift.

Examples: Modifications on existing methods yielding significantly better results.

Novelty 5-6 (Borderline)

Definition: Incremental contributions with possible long-term benefits, not immediately transformative.

Examples: Moderately novel extension to an existing architecture; refining current methods without fundamentally altering them.

Novelty 3-4 (Tangential)

Definition: Minor or domain-specific improvements with limited broader impact.

Examples: Slight modifications to known methods with strange motivation; purely engineering jobs like a new benchmark/dataset.

Novelty 1-2 (Low)

Definition: Minimal originality, applying standard approaches without real innovation.

Examples: Using an off-the-shelf model without adding new insights; purely application-driven studies like finetuning a pretrained model using existing methods.

Papers

[PAPER LIST HERE]

Relevant Topics

Use the following relevance criteria to focus on foundational research. Keep relevant papers and filter out irrelevant ones. Avoid purely application-driven work.

Representation Learning - Relevant: Insights into how deep networks encode information, feature/dictionary learning, sparse/contrastive methods, training dynamics in neural networks. - Irrelevant: Standard applications of known techniques lacking new theoretical or methodological contributions.

Model Architecture - Relevant: Mixture-of-Experts (MoE), Transformers, Conditional/Dynamic Networks, Autoencoders, analysis on existing architectures (like encoder-decoder), or other architectural innovations. - Irrelevant: Merely using existing architectures for a certain task without insights into the structure themselves.

Model Compression - Relevant: Sparsity, pruning, quantization, low-rank approaches, KV cache, or other algorithmic/theoretical efficiency breakthroughs. - Irrelevant: Straightforward applications of existing compression methods to new tasks.

Large Language Models (LLMs) - Relevant: Major breakthroughs in pretraining or architecture, theoretical insights into LLM behavior/interpretability. - Irrelevant: Domain-specific usage (e.g., translation, jail-breaking), finetuning or inference tricks (e.g., instruction tuning, chain-of-thoughts, data mixing), or empirical dataset/benchmark studies and text-level analysis (e.g. hallucination, reasoning, safety).

AI for Science - Relevant: Foundational research in molecular/protein modeling, new generative paradigms, or significant architecture-level innovations. - Irrelevant: Conventional, domain-specific applications without new theoretical perspectives.

Emerging Trends - Relevant: Cutting-edge theoretical work challenging established assumptions or introducing broad new paradigms. - Irrelevant: Incremental improvements or trend-following without novel insights.

Keywords:

Relevant: Mixture of Experts (MoE), Representation Learning, Compression/Efficiency, Sparse/Sparsity, Pruning, Quantization, Low-rank, Foundation Model, etc.

Irrelevant: Reinforcement Learning, Transfer Learning, Federated Learning, Online Learning, Diffusion Models, etc.

Application: Image Segmentation, Medical Imaging, 3D Vision, Video Understanding, Information Retrieval, Summarization, Recommendation Systems, Machine Translation, Speech Recognition, Signal Processing, Spatial/Temporal Modeling, Time Series, Knowledge Graph, etc.