Personalized Daily Arxiv Papers 4/03/2025
| [gpt-4o] | Prompt | Completion | Total |
|---|---|---|---|
| Token | 23735 | 2804 | 26539 |
| Cost | $0.06 | $0.03 | $0.09 |
Total arXiv papers: 379
Total scanned papers: 225
Total relevant papers: 13
Table of contents with paper titles:
-
AI-Newton: A Concept-Driven Physical Law Discovery System without Prior Physical Knowledge Authors: You-Le Fang, Dong-Shan Jian, Xiang Li, Yan-Qing Ma
-
Estimating Unbounded Density Ratios: Applications in Error Control under Covariate Shift Authors: Shuntuo Xu, Zhou Yu, Jian Huang
-
Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design Authors: Mohan Zhang, Pingzhi Li, Jie Peng, Mufan Qiu, Tianlong Chen
-
InfiniteICL: Breaking the Limit of Context Window Size via Long Short-term Memory Transformation Authors: Bowen Cao, Deng Cai, Wai Lam
-
Sparse Gaussian Neural Processes Authors: Tommy Rochussen, Vincent Fortuin
-
Is the Reversal Curse a Binding Problem? Uncovering Limitations of Transformers from a Basic Generalization Failure Authors: Boshi Wang, Huan Sun
-
Critical Thinking: Which Kinds of Complexity Govern Optimal Reasoning Length? Authors: Celine Lee, Alexander M. Rush, Keyon Vafa
-
A Unified Approach to Analysis and Design of Denoising Markov Models Authors: Yinuo Ren, Grant M. Rotskoff, Lexing Ying
-
Denoising guarantees for optimized sampling schemes in compressed sensing Authors: Yaniv Plan, Matthew S. Scott, Xia Sheng, Ozgur Yilmaz
-
FLAMES: A Hybrid Spiking-State Space Model for Adaptive Memory Retention in Event-Based Learning Authors: Biswadeep Chakraborty, Saibal Mukhopadhyay
-
R2DN: Scalable Parameterization of Contracting and Lipschitz Recurrent Deep Networks Authors: Nicholas H. Barbara, Ruigang Wang, Ian R. Manchester
-
Analysis of an Idealized Stochastic Polyak Method and its Application to Black-Box Model Distillation Authors: Robert M. Gower, Guillaume Garrigos, Nicolas Loizou, Dimitris Oikonomou, Konstantin Mishchenko, Fabian Schaipp
-
Hessian-aware Training for Enhancing DNNs Resilience to Parameter Corruptions Authors: Tahmid Hasan Prato, Seijoon Kim, Lizhong Chen, Sanghyun Hong
1. AI-Newton: A Concept-Driven Physical Law Discovery System without Prior Physical Knowledge
ArXiv ID: 2504.01538
Authors: You-Le Fang, Dong-Shan Jian, Xiang Li, Yan-Qing Ma
Abstract: Current limitations in human scientific discovery necessitate a new research paradigm. While advances in artificial intelligence (AI) offer a highly promising solution, enabling AI to emulate human-like scientific discovery remains an open challenge. To address this, we propose AI-Newton, a concept-driven discovery system capable of autonomously deriving physical laws from raw data -- without supervision or prior physical knowledge. The system integrates a knowledge base and knowledge representation centered on physical concepts, along with an autonomous discovery workflow. As a proof of concept, we apply AI-Newton to a large set of Newtonian mechanics problems. Given experimental data with noise, the system successfully rediscovers fundamental laws, including Newton's second law, energy conservation and law of gravitation, using autonomously defined concepts. This achievement marks a significant step toward AI-driven autonomous scientific discovery.
Comment: AI-Newton represents a novel paradigm for autonomous scientific discovery, which aligns with the 'AI for Science' criterion and introduces a concept-driven approach to deriving physical laws.
Relevance: 9 Novelty: 9
2. Estimating Unbounded Density Ratios: Applications in Error Control under Covariate Shift
ArXiv ID: 2504.01031
Authors: Shuntuo Xu, Zhou Yu, Jian Huang
Abstract: The density ratio is an important metric for evaluating the relative likelihood of two probability distributions, with extensive applications in statistics and machine learning. However, existing estimation theories for density ratios often depend on stringent regularity conditions, mainly focusing on density ratio functions with bounded domains and ranges. In this paper, we study density ratio estimators using loss functions based on least squares and logistic regression. We establish upper bounds on estimation errors with standard minimax optimal rates, up to logarithmic factors. Our results accommodate density ratio functions with unbounded domains and ranges. We apply our results to nonparametric regression and conditional flow models under covariate shift and identify the tail properties of the density ratio as crucial for error control across domains affected by covariate shift. We provide sufficient conditions under which loss correction is unnecessary and demonstrate effective generalization capabilities of a source estimator to any suitable target domain. Our simulation experiments support these theoretical findings, indicating that the source estimator can outperform those derived from loss correction methods, even when the true density ratio is known.
Comment: The paper addresses density ratio estimation under relaxed conditions, which is foundational for representation learning and generalization under covariate shift. The theoretical contributions are significant and align with the criteria for foundational research.
Relevance: 9 Novelty: 8
3. Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design
ArXiv ID: 2504.01337
Authors: Mohan Zhang, Pingzhi Li, Jie Peng, Mufan Qiu, Tianlong Chen
Abstract: Mixture-of-Experts (MoE) has successfully scaled up models while maintaining nearly constant computing costs. By employing a gating network to route input tokens, it selectively activates a subset of expert networks to process the corresponding token embeddings. However, in practice, the efficiency of MoE is challenging to achieve due to two key reasons: imbalanced expert activation, which leads to substantial idle time during model or expert parallelism, and insufficient capacity utilization; massive communication overhead, induced by numerous expert routing combinations in expert parallelism at the system level. Previous works typically formulate it as the load imbalance issue characterized by the gating network favoring certain experts over others or attribute it to static execution which fails to adapt to the dynamic expert workload at runtime. In this paper, we exploit it from a brand new perspective, a higher-order view and analysis of MoE routing policies: expert collaboration and specialization where some experts tend to activate broadly with others (collaborative), while others are more likely to activate only with a specific subset of experts (specialized). Our experiments reveal that most experts tend to be overly collaborative, leading to increased communication overhead from repeatedly sending tokens to different accelerators. To this end, we propose a novel collaboration-constrained routing (C2R) strategy to encourage more specialized expert groups, as well as to improve expert utilization, and present an efficient implementation of MoE that further leverages expert specialization. We achieve an average performance improvement of 0.51% and 0.33% on LLaMA-MoE and Qwen-MoE respectively across ten downstream NLP benchmarks, and reduce the all2all communication costs between GPUs, bringing an extra 20%-30% total running time savings on top of the existing SoTA, i.e. MegaBlocks.
Comment: This paper addresses efficiency challenges in Mixture-of-Experts (MoE) models by introducing a novel collaboration-constrained routing (C2R) strategy. It provides insights into MoE routing policies and improves expert utilization, aligning well with foundational research in model architecture and efficiency.
Relevance: 9 Novelty: 8
4. InfiniteICL: Breaking the Limit of Context Window Size via Long Short-term Memory Transformation
ArXiv ID: 2504.01707
Authors: Bowen Cao, Deng Cai, Wai Lam
Abstract: In-context learning (ICL) is critical for large language models (LLMs), but its effectiveness is constrained by finite context windows, particularly in ultra-long contexts. To overcome this, we introduce InfiniteICL, a framework that parallels context and parameters in LLMs with short- and long-term memory in human cognitive systems, focusing on transforming temporary context knowledge into permanent parameter updates. This approach significantly reduces memory usage, maintains robust performance across varying input lengths, and theoretically enables infinite context integration through the principles of context knowledge elicitation, selection, and consolidation. Evaluations demonstrate that our method reduces context length by 90% while achieving 103% average performance of full-context prompting across fact recall, grounded reasoning, and skill acquisition tasks. When conducting sequential multi-turn transformations on complex, real-world contexts (with length up to 2M tokens), our approach surpasses full-context prompting while using only 0.4% of the original contexts. These findings highlight InfiniteICL's potential to enhance the scalability and efficiency of LLMs by breaking the limitations of conventional context window sizes.
Comment: The paper introduces InfiniteICL, a framework for extending context window size in LLMs by transforming context knowledge into parameter updates. This is highly relevant to foundational research in large language models and addresses a critical limitation in context handling.
Relevance: 9 Novelty: 8
5. Sparse Gaussian Neural Processes
ArXiv ID: 2504.01650
Authors: Tommy Rochussen, Vincent Fortuin
Abstract: Despite significant recent advances in probabilistic meta-learning, it is common for practitioners to avoid using deep learning models due to a comparative lack of interpretability. Instead, many practitioners simply use non-meta-models such as Gaussian processes with interpretable priors, and conduct the tedious procedure of training their model from scratch for each task they encounter. While this is justifiable for tasks with a limited number of data points, the cubic computational cost of exact Gaussian process inference renders this prohibitive when each task has many observations. To remedy this, we introduce a family of models that meta-learn sparse Gaussian process inference. Not only does this enable rapid prediction on new tasks with sparse Gaussian processes, but since our models have clear interpretations as members of the neural process family, it also allows manual elicitation of priors in a neural process for the first time. In meta-learning regimes for which the number of observed tasks is small or for which expert domain knowledge is available, this offers a crucial advantage.
Comment: The paper introduces a novel approach to meta-learning sparse Gaussian process inference, which aligns with representation learning and sparsity. The focus on interpretability and manual elicitation of priors adds theoretical depth.
Relevance: 9 Novelty: 8
6. Is the Reversal Curse a Binding Problem? Uncovering Limitations of Transformers from a Basic Generalization Failure
ArXiv ID: 2504.01928
Authors: Boshi Wang, Huan Sun
Abstract: Despite their impressive capabilities, LLMs exhibit a basic generalization failure known as the Reversal Curse, where they struggle to learn reversible factual associations. Understanding why this occurs could help identify weaknesses in current models and advance their generalization and robustness. In this paper, we conjecture that the Reversal Curse in LLMs is a manifestation of the long-standing binding problem in cognitive science, neuroscience and AI. Specifically, we identify two primary causes of the Reversal Curse stemming from transformers' limitations in conceptual binding: the inconsistency and entanglements of concept representations. We perform a series of experiments that support these conjectures. Our exploration leads to a model design based on JEPA (Joint-Embedding Predictive Architecture) that for the first time breaks the Reversal Curse without side-stepping it with specialized data augmentation or non-causal masking, and moreover, generalization could be further improved by incorporating special memory layers that support disentangled concept representations. We demonstrate that the skill of reversal unlocks a new kind of memory integration that enables models to solve large-scale arithmetic reasoning problems via parametric forward-chaining, outperforming frontier LLMs based on non-parametric memory and prolonged explicit reasoning.
Comment: The paper investigates the Reversal Curse in transformers and links it to the binding problem, providing insights into transformer limitations and proposing architectural improvements. This aligns with foundational research in model architecture and representation learning.
Relevance: 9 Novelty: 8
7. Critical Thinking: Which Kinds of Complexity Govern Optimal Reasoning Length?
ArXiv ID: 2504.01935
Authors: Celine Lee, Alexander M. Rush, Keyon Vafa
Abstract: Large language models (LLMs) often benefit from verbalized reasoning at inference time, but it remains unclear which aspects of task difficulty these extra reasoning tokens address. To investigate this question, we formalize a framework using deterministic finite automata (DFAs). DFAs offer a formalism through which we can characterize task complexity through measurable properties such as run length (number of reasoning steps required) and state-space size (decision complexity). We first show that across different tasks and models of different sizes and training paradigms, there exists an optimal amount of reasoning tokens such that the probability of producing a correct solution is maximized. We then investigate which properties of complexity govern this critical length: we find that task instances with longer corresponding underlying DFA runs (i.e. demand greater latent state-tracking requirements) correlate with longer reasoning lengths, but, surprisingly, that DFA size (i.e. state-space complexity) does not. We then demonstrate an implication of these findings: being able to predict the optimal number of reasoning tokens for new problems and filtering out non-optimal length answers results in consistent accuracy improvements.
Comment: This paper investigates reasoning length in LLMs using a formal framework based on deterministic finite automata (DFAs). It provides theoretical insights into LLM behavior and interpretability, which aligns with the foundational research focus on large language models.
Relevance: 9 Novelty: 8
8. A Unified Approach to Analysis and Design of Denoising Markov Models
ArXiv ID: 2504.01938
Authors: Yinuo Ren, Grant M. Rotskoff, Lexing Ying
Abstract: Probabilistic generative models based on measure transport, such as diffusion and flow-based models, are often formulated in the language of Markovian stochastic dynamics, where the choice of the underlying process impacts both algorithmic design choices and theoretical analysis. In this paper, we aim to establish a rigorous mathematical foundation for denoising Markov models, a broad class of generative models that postulate a forward process transitioning from the target distribution to a simple, easy-to-sample distribution, alongside a backward process particularly constructed to enable efficient sampling in the reverse direction. Leveraging deep connections with nonequilibrium statistical mechanics and generalized Doob's $h$-transform, we propose a minimal set of assumptions that ensure: (1) explicit construction of the backward generator, (2) a unified variational objective directly minimizing the measure transport discrepancy, and (3) adaptations of the classical score-matching approach across diverse dynamics. Our framework unifies existing formulations of continuous and discrete diffusion models, identifies the most general form of denoising Markov models under certain regularity assumptions on forward generators, and provides a systematic recipe for designing denoising Markov models driven by arbitrary L\'evy-type processes. We illustrate the versatility and practical effectiveness of our approach through novel denoising Markov models employing geometric Brownian motion and jump processes as forward dynamics, highlighting the framework's potential flexibility and capability in modeling complex distributions.
Comment: The paper provides a rigorous mathematical foundation for denoising Markov models, unifying existing generative model formulations and introducing novel design principles. This aligns with foundational research in representation learning and emerging trends.
Relevance: 8 Novelty: 9
9. Denoising guarantees for optimized sampling schemes in compressed sensing
ArXiv ID: 2504.01046
Authors: Yaniv Plan, Matthew S. Scott, Xia Sheng, Ozgur Yilmaz
Abstract: Compressed sensing with subsampled unitary matrices benefits from \emph{optimized} sampling schemes, which feature improved theoretical guarantees and empirical performance relative to uniform subsampling. We provide, in a first of its kind in compressed sensing, theoretical guarantees showing that the error caused by the measurement noise vanishes with an increasing number of measurements for optimized sampling schemes, assuming that the noise is Gaussian. We moreover provide similar guarantees for measurements sampled with-replacement with arbitrary probability weights. All our results hold on prior sets contained in a union of low-dimensional subspaces. Finally, we demonstrate that this denoising behavior appears in empirical experiments with a rate that closely matches our theoretical guarantees when the prior set is the range of a generative ReLU neural network and when it is the set of sparse vectors.
Comment: This paper provides theoretical guarantees for optimized sampling schemes in compressed sensing, which aligns with foundational research in model compression and efficiency. The focus on denoising guarantees and generative priors adds a novel theoretical perspective.
Relevance: 8 Novelty: 8
10. FLAMES: A Hybrid Spiking-State Space Model for Adaptive Memory Retention in Event-Based Learning
ArXiv ID: 2504.01257
Authors: Biswadeep Chakraborty, Saibal Mukhopadhyay
Abstract: We propose \textbf{FLAMES (Fast Long-range Adaptive Memory for Event-based Systems)}, a novel hybrid framework integrating structured state-space dynamics with event-driven computation. At its core, the \textit{Spike-Aware HiPPO (SA-HiPPO) mechanism} dynamically adjusts memory retention based on inter-spike intervals, preserving both short- and long-range dependencies. To maintain computational efficiency, we introduce a normal-plus-low-rank (NPLR) decomposition, reducing complexity from $\mathcal{O}(N^2)$ to $\mathcal{O}(Nr)$. FLAMES achieves state-of-the-art results on the Long Range Arena benchmark and event datasets like HAR-DVS and Celex-HAR. By bridging neuromorphic computing and structured sequence modeling, FLAMES enables scalable long-range reasoning in event-driven systems.
Comment: The paper introduces FLAMES, a novel hybrid framework combining structured state-space dynamics with event-driven computation. The use of a normal-plus-low-rank (NPLR) decomposition for efficiency aligns with the model compression criterion, and the Spike-Aware HiPPO mechanism offers insights into representation learning through memory retention dynamics.
Relevance: 8 Novelty: 8
11. R2DN: Scalable Parameterization of Contracting and Lipschitz Recurrent Deep Networks
ArXiv ID: 2504.01250
Authors: Nicholas H. Barbara, Ruigang Wang, Ian R. Manchester
Abstract: This paper presents the Robust Recurrent Deep Network (R2DN), a scalable parameterization of robust recurrent neural networks for machine learning and data-driven control. We construct R2DNs as a feedback interconnection of a linear time-invariant system and a 1-Lipschitz deep feedforward network, and directly parameterize the weights so that our models are stable (contracting) and robust to small input perturbations (Lipschitz) by design. Our parameterization uses a structure similar to the previously-proposed recurrent equilibrium networks (RENs), but without the requirement to iteratively solve an equilibrium layer at each time-step. This speeds up model evaluation and backpropagation on GPUs, and makes it computationally feasible to scale up the network size, batch size, and input sequence length in comparison to RENs. We compare R2DNs to RENs on three representative problems in nonlinear system identification, observer design, and learning-based feedback control and find that training and inference are both up to an order of magnitude faster with similar test set performance, and that training/inference times scale more favorably with respect to model expressivity.
Comment: The paper introduces a novel parameterization for recurrent deep networks (R2DN) that ensures stability and robustness by design, aligning with foundational research in model architecture. The focus on computational efficiency and scalability also adds to its relevance.
Relevance: 8 Novelty: 7
12. Analysis of an Idealized Stochastic Polyak Method and its Application to Black-Box Model Distillation
ArXiv ID: 2504.01898
Authors: Robert M. Gower, Guillaume Garrigos, Nicolas Loizou, Dimitris Oikonomou, Konstantin Mishchenko, Fabian Schaipp
Abstract: We provide a general convergence theorem of an idealized stochastic Polyak step size called SPS$^$. Besides convexity, we only assume a local expected gradient bound, that includes locally smooth and locally Lipschitz losses as special cases. We refer to SPS$^$ as idealized because it requires access to the loss for every training batch evaluated at a solution. It is also ideal, in that it achieves the optimal lower bound for globally Lipschitz function, and is the first Polyak step size to have an $O(1/\sqrt{t})$ anytime convergence in the smooth setting. We show how to combine SPS$^*$ with momentum to achieve the same favorable rates for the last iterate. We conclude with several experiments to validate our theory, and a more practical setting showing how we can distill a teacher GPT-2 model into a smaller student model without any hyperparameter tuning.
Comment: The paper provides theoretical insights into an idealized stochastic Polyak step size and its convergence properties, which aligns with foundational research in optimization methods relevant to training dynamics.
Relevance: 8 Novelty: 7
13. Hessian-aware Training for Enhancing DNNs Resilience to Parameter Corruptions
ArXiv ID: 2504.01933
Authors: Tahmid Hasan Prato, Seijoon Kim, Lizhong Chen, Sanghyun Hong
Abstract: Deep neural networks are not resilient to parameter corruptions: even a single-bitwise error in their parameters in memory can cause an accuracy drop of over 10%, and in the worst cases, up to 99%. This susceptibility poses great challenges in deploying models on computing platforms, where adversaries can induce bit-flips through software or bitwise corruptions may occur naturally. Most prior work addresses this issue with hardware or system-level approaches, such as integrating additional hardware components to verify a model's integrity at inference. However, these methods have not been widely deployed as they require infrastructure or platform-wide modifications. In this paper, we propose a new approach to addressing this issue: training models to be more resilient to bitwise corruptions to their parameters. Our approach, Hessian-aware training, promotes models with $flatter$ loss surfaces. We show that, while there have been training methods, designed to improve generalization through Hessian-based approaches, they do not enhance resilience to parameter corruptions. In contrast, models trained with our method demonstrate increased resilience to parameter corruptions, particularly with a 20$-$50% reduction in the number of bits whose individual flipping leads to a 90$-$100% accuracy drop. Moreover, we show the synergy between ours and existing hardware and system-level defenses.
Comment: The paper proposes a Hessian-aware training method to enhance resilience to parameter corruptions, which aligns with foundational research in model robustness and training dynamics. The focus on loss surface properties and resilience is relevant to representation learning and model efficiency.
Relevance: 8 Novelty: 7
Paper Selection Prompt
You are a helpful paper reading assistant whose job is to read daily posts from ArXiv and identify a few papers that your friend will enjoy reading. Your job is to carefully read the paper titles and abstracts below and find the ones that match the criteria below.
Instructions
Write the response in JSONL format with {ARXIVID, COMMENT, RELEVANCE, NOVELTY} on each line, one for each paper.
- ARXIVID: should be the ArXiv ID.
- COMMENT: should identify whether there is a criteria that match the paper very closely. These matches should not be based on general terms like "language modeling" or "advancements" and should specifically refer to a criterion. No need to mention the non-matching criteria.
- RELEVANCE: should be a score from 1-10.
- NOVELTY: should be a score from 1-10.
Scoring Criteria
The "Relevance" score measures how closely the paper aligns with the core topics of the prompt. The "Novelty" score assesses the originality and impact of the paper. They are two ORTHONORMAL axes and SHOULD NOT be confused with each other.
Relevance Scoring
- Relevance 9-10 (Completely Relevant)
- Focus: Fully aligned with core topics with no deviation, score the highest if contains relevant keywords in it.
-
Examples: Papers focused on foundational methods or theoretical research, whose titles contain topic keywords like "MoE".
-
Relevance 7-8 (Relevant)
- Focus: Retain a solid link to the main research area, though may touch on peripheral elements.
-
Examples: Papers research on the fundamental part of MoE through a less critical aspect like its behavior in GNN.
-
Relevance 5-6 (Borderline)
- Focus: Maintains a link to the core topic but also extends into at least one other domain/area beyond the primary focus.
-
Examples: Work referencing MoE centered on reinforcement learning.
-
Relevance 3-4 (Irrelevant)
- Focus: Largely outside our interests with no association to our topics.
-
Examples: Application-focused papers like using MoE to solve a problem in the real world.
-
Relevance 1-2 (Ignore)
- Focus: Purely unrelated to our topics. Completely a different domain.
- Exception: If the paper hints at a cutting-edge, radically new direction that could eventually transform the primary domain, consider a score of 9–10 despite initial appearances. (Usually a very rare concept that belongs to the fundamental research)
Novelty Scoring
- Novelty 9-10 (Breakthrough)
- Definition: Groundbreaking methods/theory introducing new directions or solving major challenges.
-
Examples: Entirely new paradigm for foundational models; a novel theory transforming representation learning.
-
Novelty 7-8 (Improvements)
- Definition: Substantial insights/enhancements, though not a full paradigm shift.
-
Examples: Modifications on existing methods yielding significantly better results.
-
Novelty 5-6 (Borderline)
- Definition: Incremental contributions with possible long-term benefits, not immediately transformative.
-
Examples: Moderately novel extension to an existing architecture; refining current methods without fundamentally altering them.
-
Novelty 3-4 (Tangential)
- Definition: Minor or domain-specific improvements with limited broader impact.
-
Examples: Slight modifications to known methods with strange motivation; purely engineering jobs like a new benchmark/dataset.
-
Novelty 1-2 (Low)
- Definition: Minimal originality, applying standard approaches without real innovation.
- Examples: Using an off-the-shelf model without adding new insights; purely application-driven studies like finetuning a pretrained model using existing methods.
Papers
[PAPER LIST HERE]
Relevant Topics
Use the following relevance criteria to focus on foundational research. Keep relevant papers and filter out irrelevant ones. Avoid purely application-driven work.
-
Representation Learning - Relevant: Insights into how deep networks encode information, feature/dictionary learning, sparse/contrastive methods, training dynamics in neural networks. - Irrelevant: Standard applications of known techniques lacking new theoretical or methodological contributions.
-
Model Architecture - Relevant: Mixture-of-Experts (MoE), Transformers, Conditional/Dynamic Networks, Autoencoders, analysis on existing architectures (like encoder-decoder), or other architectural innovations. - Irrelevant: Merely using existing architectures for a certain task without insights into the structure themselves.
-
Model Compression - Relevant: Sparsity, pruning, quantization, low-rank approaches, KV cache, or other algorithmic/theoretical efficiency breakthroughs. - Irrelevant: Straightforward applications of existing compression methods to new tasks.
-
Large Language Models (LLMs) - Relevant: Major breakthroughs in pretraining or architecture, theoretical insights into LLM behavior/interpretability. - Irrelevant: Domain-specific usage (e.g., translation, jail-breaking), finetuning or inference tricks (e.g., instruction tuning, chain-of-thoughts, data mixing), or empirical dataset/benchmark studies and text-level analysis (e.g. hallucination, reasoning, safety).
-
AI for Science - Relevant: Foundational research in molecular/protein modeling, new generative paradigms, or significant architecture-level innovations. - Irrelevant: Conventional, domain-specific applications without new theoretical perspectives.
-
Emerging Trends - Relevant: Cutting-edge theoretical work challenging established assumptions or introducing broad new paradigms. - Irrelevant: Incremental improvements or trend-following without novel insights.
Keywords:
- Relevant: Mixture of Experts (MoE), Representation Learning, Compression/Efficiency, Sparse/Sparsity, Pruning, Quantization, Low-rank, Foundation Model, etc.
- Irrelevant: Reinforcement Learning, Transfer Learning, Federated Learning, Online Learning, Diffusion Models, etc.
- Application: Image Segmentation, Medical Imaging, 3D Vision, Video Understanding, Information Retrieval, Summarization, Recommendation Systems, Machine Translation, Speech Recognition, Signal Processing, Spatial/Temporal Modeling, Time Series, Knowledge Graph, etc.