Personalized Daily ArXiv Papers 2025-07-29
| [gpt-4o] | Prompt | Completion | Total |
|---|---|---|---|
| Token | 41545 | 5151 | 46696 |
| Cost | $0.1 | $0.05 | $0.16 |
Total arXiv papers: 801
Total scanned papers: 478
Total relevant papers: 30
Table of contents with paper titles:
-
Quantum-Informed Machine Learning for Chaotic Systems Authors: Maida Wang, Xiao Xue, Peter V. Coveney
-
Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression Authors: Te Zhang, Yuheng Li, Junxiang Wang, Lujun Li
-
TransPrune: Token Transition Pruning for Efficient Large Vision-Language Model Authors: Ao Li, Yuxiang Duan, Jinghui Zhang, Congbo Ma, Yutong Xie, Gustavo Carneiro, Mohammad Yaqub, Hu Wang
-
Transformers as Unrolled Inference in Probabilistic Laplacian Eigenmaps: An Interpretation and Potential Improvements Authors: Aditya Ravuri, Neil D. Lawrence
-
Feature learning is decoupled from generalization in high capacity neural networks Authors: Niclas Alexander G\"oring, Charles London, Abdurrahman Hadi Erturk, Chris Mingard, Yoonsoo Nam, Ard A. Louis
-
State evolution beyond first-order methods I: Rigorous predictions and finite-sample guarantees Authors: Michael Celentano, Chen Cheng, Ashwin Pananjady, Kabir Aladin Verchand
-
HCAttention: Extreme KV Cache Compression via Heterogeneous Attention Computing for LLMs Authors: Dongquan Yang, Yifan Yang, Xiaotian Yu, Xianbiao Qi, Rong Xiao
-
MeLA: A Metacognitive LLM-Driven Architecture for Automatic Heuristic Design Authors: Zishang Qiu, Xinan Chen, Long Chen, Ruibin Bai
-
Enhancing Materials Discovery with Valence Constrained Design in Generative Modeling Authors: Mouyang Cheng, Weiliang Luo, Hao Tang, Bowen Yu, Yongqiang Cheng, Weiwei Xie, Ju Li, Heather J. Kulik, Mingda Li
-
Dimer-Enhanced Optimization: A First-Order Approach to Escaping Saddle Points in Neural Network Training Authors: Yue Hu, Zanxia Cao, Yingchao Liu
-
The wall confronting large language models Authors: Peter V. Coveney, Sauro Succi
-
WEEP: A Differentiable Nonconvex Sparse Regularizer via Weakly-Convex Envelope Authors: Takanobu Furuhashi, Hidekata Hontani, Tatsuya Yokota
-
EcoTransformer: Attention without Multiplication Authors: Xin Gao, Xingming Xu
-
Memorization in Fine-Tuned Large Language Models Authors: Danil Savine, Muni Sreenivas Pydi, Jamal Atif, Olivier Capp\'e
-
Mixture of Length and Pruning Experts for Knowledge Graphs Reasoning Authors: Enjun Du, Siyi Liu, Yongqi Zhang
-
DeepJIVE: Learning Joint and Individual Variation Explained from Multimodal Data Using Deep Learning Authors: Matthew Drexler, Benjamin Risk, James J Lah, Suprateek Kundu, Deqiang Qiu
-
Iterative Pretraining Framework for Interatomic Potentials Authors: Taoyong Cui, Zhongyao Wang, Dongzhan Zhou, Yuqiang Li, Lei Bai, Wanli Ouyang, Mao Su, Shufei Zhang
-
Bayesian symbolic regression: Automated equation discovery from a physicists' perspective Authors: Roger Guimera, Marta Sales-Pardo
-
CLoRA: Parameter-Efficient Continual Learning with Low-Rank Adaptation Authors: Shishir Muralidhara, Didier Stricker, Ren\'e Schuster
-
Sparse-mode Dynamic Mode Decomposition for Disambiguating Local and Global Structures Authors: Sara M. Ichinaga, Steven L. Brunton, Aleksandr Y. Aravkin, J. Nathan Kutz
-
Bag of Coins: A Statistical Probe into Neural Confidence Structures Authors: Agnideep Aich, Ashit Baran Aich, Md Monzur Murshed, Sameera Hewage, Bruce Wade
-
Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning Authors: Tolga Dimlioglu, Anna Choromanska
-
Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis Authors: Zhuokun Chen, Jugang Fan, Zhuowei Yu, Bohan Zhuang, Mingkui Tan
-
Quantizing Text-attributed Graphs for Semantic-Structural Integration Authors: Jianyuan Bo, Hao Wu, Yuan Fang
-
Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning Authors: Zedong Wang, Siyuan Li, Dan Xu
-
Kolmogorov Arnold Network Autoencoder in Medicine Authors: Ugo Lomoio, Pierangelo Veltri, Pietro Hiram Guzzi
-
Diagonally-Weighted Generalized Method of Moments Estimation for Gaussian Mixture Modeling Authors: Liu Zhang, Oscar Mickelin, Sheng Xu, Amit Singer
-
Regularizing Subspace Redundancy of Low-Rank Adaptation Authors: Yue Zhu, Haiwen Diao, Shang Gao, Jiazuo Yu, Jiawen Zhu, Yunzhi Zhuge, Shuai Hao, Xu Jia, Lu Zhang, Ying Zhang, Huchuan Lu
-
DeltaLLM: A Training-Free Framework Exploiting Temporal Sparsity for Efficient Edge LLM Inference Authors: Jiawen Qi, Chang Gao, Zhaochun Ren, Qinyu Chen
-
Nonconvex Optimization Framework for Group-Sparse Feedback Linear-Quadratic Optimal Control. II: Non-Penalty Approach Authors: Lechen Feng, Xun Li, Yuan-Hua Ni
1. Quantum-Informed Machine Learning for Chaotic Systems
ArXiv ID: 2507.19861
Authors: Maida Wang, Xiao Xue, Peter V. Coveney
Abstract: Learning the behaviour of chaotic systems remains challenging due to instability in long-term predictions and difficulties in accurately capturing invariant statistical properties. While quantum machine learning offers a promising route to efficiently capture physical properties from high-dimensional data, its practical deployment is hindered by current hardware noise and limited scalability. We introduce a quantum-informed machine learning framework for learning partial differential equations, with an application focus on chaotic systems. A quantum circuit Born machine is employed to learn the invariant properties of chaotic dynamical systems, achieving substantial memory efficiency by representing these complex physical statistics with a compact set of trainable circuit parameters. This approach reduces the data storage requirement by over two orders of magnitude compared to the raw simulation data. The resulting statistical quantum-informed prior is then incorporated into a Koopman-based auto-regressive model to address issues such as gradient vanishing or explosion, while maintaining long-term statistical fidelity. The framework is evaluated on three representative systems: the Kuramoto-Sivashinsky equation, two-dimensional Kolmogorov flow and turbulent channel flow. In all cases, the quantum-informed model achieves superior performance compared to its classical counterparts without quantum priors. This hybrid architecture offers a practical route for learning dynamical systems using near-term quantum hardware.
Comment: The paper introduces a quantum-informed machine learning framework, which is an emerging trend in AI for science, offering a new paradigm for learning chaotic systems.
Relevance: 9 Novelty: 8
2. Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression
ArXiv ID: 2507.20613
Authors: Te Zhang, Yuheng Li, Junxiang Wang, Lujun Li
Abstract: Large multimodal models (LMMs) have advanced significantly by integrating visual encoders with extensive language models, enabling robust reasoning capabilities. However, compressing LMMs for deployment on edge devices remains a critical challenge. In this work, we propose an adaptive search algorithm that optimizes sparsity and KV cache compression to enhance LMM efficiency. Utilizing the Tree-structured Parzen Estimator, our method dynamically adjusts pruning ratios and KV cache quantization bandwidth across different LMM layers, using model performance as the optimization objective. This approach uniquely combines pruning with key-value cache quantization and incorporates a fast pruning technique that eliminates the need for additional fine-tuning or weight adjustments, achieving efficient compression without compromising accuracy. Comprehensive evaluations on benchmark datasets, including LLaVA-1.5 7B and 13B, demonstrate our method superiority over state-of-the-art techniques such as SparseGPT and Wanda across various compression levels. Notably, our framework automatic allocation of KV cache compression resources sets a new standard in LMM optimization, delivering memory efficiency without sacrificing much performance.
Comment: The paper proposes an adaptive search algorithm for sparsity and KV cache compression in large multimodal models, relevant to model compression.
Relevance: 9 Novelty: 8
3. TransPrune: Token Transition Pruning for Efficient Large Vision-Language Model
ArXiv ID: 2507.20630
Authors: Ao Li, Yuxiang Duan, Jinghui Zhang, Congbo Ma, Yutong Xie, Gustavo Carneiro, Mohammad Yaqub, Hu Wang
Abstract: Large Vision-Language Models (LVLMs) have advanced multimodal learning but face high computational costs due to the large number of visual tokens, motivating token pruning to improve inference efficiency. The key challenge lies in identifying which tokens are truly important. Most existing approaches rely on attention-based criteria to estimate token importance. However, they inherently suffer from certain limitations, such as positional bias. In this work, we explore a new perspective on token importance based on token transitions in LVLMs. We observe that the transition of token representations provides a meaningful signal of semantic information. Based on this insight, we propose TransPrune, a training-free and efficient token pruning method. Specifically, TransPrune progressively prunes tokens by assessing their importance through a combination of Token Transition Variation (TTV)-which measures changes in both the magnitude and direction of token representations-and Instruction-Guided Attention (IGA), which measures how strongly the instruction attends to image tokens via attention. Extensive experiments demonstrate that TransPrune achieves comparable multimodal performance to original LVLMs, such as LLaVA-v1.5 and LLaVA-Next, across eight benchmarks, while reducing inference TFLOPs by more than half. Moreover, TTV alone can serve as an effective criterion without relying on attention, achieving performance comparable to attention-based methods. The code will be made publicly available upon acceptance of the paper at https://github.com/liaolea/TransPrune.
Comment: The paper introduces a novel token pruning method for large vision-language models, focusing on efficiency improvements through token transition analysis, which aligns with model compression and efficiency breakthroughs.
Relevance: 9 Novelty: 8
4. Transformers as Unrolled Inference in Probabilistic Laplacian Eigenmaps: An Interpretation and Potential Improvements
ArXiv ID: 2507.21040
Authors: Aditya Ravuri, Neil D. Lawrence
Abstract: We propose a probabilistic interpretation of transformers as unrolled inference steps assuming a probabilistic Laplacian Eigenmaps model from the ProbDR framework. Our derivation shows that at initialisation, transformers perform "linear" dimensionality reduction. We also show that within the transformer block, a graph Laplacian term arises from our arguments, rather than an attention matrix (which we interpret as an adjacency matrix). We demonstrate that simply subtracting the identity from the attention matrix (and thereby taking a graph diffusion step) improves validation performance on a language model and a simple vision transformer.
Comment: This paper provides a probabilistic interpretation of transformers, offering insights into their structure and potential improvements, which aligns with model architecture analysis.
Relevance: 9 Novelty: 8
5. Feature learning is decoupled from generalization in high capacity neural networks
ArXiv ID: 2507.19680
Authors: Niclas Alexander G\"oring, Charles London, Abdurrahman Hadi Erturk, Chris Mingard, Yoonsoo Nam, Ard A. Louis
Abstract: Neural networks outperform kernel methods, sometimes by orders of magnitude, e.g. on staircase functions. This advantage stems from the ability of neural networks to learn features, adapting their hidden representations to better capture the data. We introduce a concept we call feature quality to measure this performance improvement. We examine existing theories of feature learning and demonstrate empirically that they primarily assess the strength of feature learning, rather than the quality of the learned features themselves. Consequently, current theories of feature learning do not provide a sufficient foundation for developing theories of neural network generalization.
Comment: The paper examines feature learning in neural networks, providing insights into representation learning and generalization, which aligns with foundational research in representation learning.
Relevance: 9 Novelty: 8
6. State evolution beyond first-order methods I: Rigorous predictions and finite-sample guarantees
ArXiv ID: 2507.19611
Authors: Michael Celentano, Chen Cheng, Ashwin Pananjady, Kabir Aladin Verchand
Abstract: We develop a toolbox for exact analysis of iterative algorithms on a class of high-dimensional nonconvex optimization problems with random data. While prior work has shown that low-dimensional statistics of (generalized) first-order methods can be predicted by a deterministic recursion known as state evolution, our focus is on developing such a prediction for a more general class of algorithms. We provide a state evolution for any method whose iterations are given by (possibly interleaved) first-order and saddle point updates, showing two main results. First, we establish a rigorous state evolution prediction that holds even when the updates are not coordinate-wise separable. Second, we establish finite-sample guarantees bounding the deviation of the empirical updates from the established state evolution. In the process, we develop a technical toolkit that may prove useful in related problems. One component of this toolkit is a general Hilbert space lifting technique to prove existence and uniqueness of a convenient parameterization of the state evolution. Another component of the toolkit combines a generic application of Bolthausen's conditioning method with a sequential variant of Gordon's Gaussian comparison inequality, and provides additional ingredients that enable a general finite-sample analysis.
Comment: The paper provides a theoretical framework for state evolution in high-dimensional nonconvex optimization, which is relevant to representation learning and emerging trends.
Relevance: 9 Novelty: 8
7. HCAttention: Extreme KV Cache Compression via Heterogeneous Attention Computing for LLMs
ArXiv ID: 2507.19823
Authors: Dongquan Yang, Yifan Yang, Xiaotian Yu, Xianbiao Qi, Rong Xiao
Abstract: Processing long-context inputs with large language models presents a significant challenge due to the enormous memory requirements of the Key-Value (KV) cache during inference. Existing KV cache compression methods exhibit noticeable performance degradation when memory is reduced by more than 85%. Additionally, strategies that leverage GPU-CPU collaboration for approximate attention remain underexplored in this setting. We propose HCAttention, a heterogeneous attention computation framework that integrates key quantization, value offloading, and dynamic KV eviction to enable efficient inference under extreme memory constraints. The method is compatible with existing transformer architectures and does not require model fine-tuning. Experimental results on the LongBench benchmark demonstrate that our approach preserves the accuracy of full-attention model while shrinking the KV cache memory footprint to 25% of its original size. Remarkably, it stays competitive with only 12.5% of the cache, setting a new state-of-the-art in LLM KV cache compression. To the best of our knowledge, HCAttention is the first to extend the Llama-3-8B model to process 4 million tokens on a single A100 GPU with 80GB memory.
Comment: The paper presents a new method for KV cache compression in LLMs, which is relevant to model compression and efficiency.
Relevance: 9 Novelty: 8
8. MeLA: A Metacognitive LLM-Driven Architecture for Automatic Heuristic Design
ArXiv ID: 2507.20541
Authors: Zishang Qiu, Xinan Chen, Long Chen, Ruibin Bai
Abstract: This paper introduces MeLA, a Metacognitive LLM-Driven Architecture that presents a new paradigm for Automatic Heuristic Design (AHD). Traditional evolutionary methods operate directly on heuristic code; in contrast, MeLA evolves the instructional prompts used to guide a Large Language Model (LLM) in generating these heuristics. This process of "prompt evolution" is driven by a novel metacognitive framework where the system analyzes performance feedback to systematically refine its generative strategy. MeLA's architecture integrates a problem analyzer to construct an initial strategic prompt, an error diagnosis system to repair faulty code, and a metacognitive search engine that iteratively optimizes the prompt based on heuristic effectiveness. In comprehensive experiments across both benchmark and real-world problems, MeLA consistently generates more effective and robust heuristics, significantly outperforming state-of-the-art methods. Ultimately, this research demonstrates the profound potential of using cognitive science as a blueprint for AI architecture, revealing that by enabling an LLM to metacognitively regulate its problem-solving process, we unlock a more robust and interpretable path to AHD.
Comment: The paper introduces a novel architecture, MeLA, which uses a metacognitive framework to evolve prompts for LLMs, aligning with the Large Language Models criterion.
Relevance: 9 Novelty: 8
9. Enhancing Materials Discovery with Valence Constrained Design in Generative Modeling
ArXiv ID: 2507.19799
Authors: Mouyang Cheng, Weiliang Luo, Hao Tang, Bowen Yu, Yongqiang Cheng, Weiwei Xie, Ju Li, Heather J. Kulik, Mingda Li
Abstract: Diffusion-based deep generative models have emerged as powerful tools for inverse materials design. Yet, many existing approaches overlook essential chemical constraints such as oxidation state balance, which can lead to chemically invalid structures. Here we introduce CrysVCD (Crystal generator with Valence-Constrained Design), a modular framework that integrates chemical rules directly into the generative process. CrysVCD first employs a transformer-based elemental language model to generate valence-balanced compositions, followed by a diffusion model to generate crystal structures. The valence constraint enables orders-of-magnitude more efficient chemical valence checking, compared to pure data-driven approaches with post-screening. When fine-tuned on stability metrics, CrysVCD achieves 85% thermodynamic stability and 68% phonon stability. Moreover, CrysVCD supports conditional generation of functional materials, enabling discovery of candidates such as high thermal conductivity semiconductors and high-$\kappa$ dielectric compounds. Designed as a general-purpose plugin, CrysVCD can be integrated into diverse generative pipeline to promote chemical validity, offering a reliable, scientifically grounded path for materials discovery.
Comment: CrysVCD integrates chemical rules into generative modeling for materials discovery, aligning with AI for Science through foundational research in molecular modeling.
Relevance: 9 Novelty: 8
10. Dimer-Enhanced Optimization: A First-Order Approach to Escaping Saddle Points in Neural Network Training
ArXiv ID: 2507.19968
Authors: Yue Hu, Zanxia Cao, Yingchao Liu
Abstract: First-order optimization methods, such as SGD and Adam, are widely used for training large-scale deep neural networks due to their computational efficiency and robust performance. However, relying solely on gradient information, these methods often struggle to navigate complex loss landscapes with flat regions, plateaus, and saddle points. Second-order methods, which use curvature information from the Hessian matrix, can address these challenges but are computationally infeasible for large models. The Dimer method, a first-order technique that constructs two closely spaced points to probe the local geometry of a potential energy surface, efficiently estimates curvature using only gradient information. Inspired by its use in molecular dynamics simulations for locating saddle points, we propose Dimer-Enhanced Optimization (DEO), a novel framework to escape saddle points in neural network training. DEO adapts the Dimer method to explore a broader region of the loss landscape, approximating the Hessian's smallest eigenvector without computing the full matrix. By periodically projecting the gradient onto the subspace orthogonal to the minimum curvature direction, DEO guides the optimizer away from saddle points and flat regions, enhancing training efficiency with non-stepwise updates. Preliminary experiments on a Transformer toy model show DEO achieves competitive performance compared to standard first-order methods, improving navigation of complex loss landscapes. Our work repurposes physics-inspired, first-order curvature estimation to enhance neural network training in high-dimensional spaces.
Comment: The paper introduces Dimer-Enhanced Optimization, a novel first-order method inspired by physics to escape saddle points in neural network training, aligning with representation learning and training dynamics.
Relevance: 9 Novelty: 8
11. The wall confronting large language models
ArXiv ID: 2507.19703
Authors: Peter V. Coveney, Sauro Succi
Abstract: We show that the scaling laws which determine the performance of large language models (LLMs) severely limit their ability to improve the uncertainty of their predictions. As a result, raising their reliability to meet the standards of scientific inquiry is intractable by any reasonable measure. We argue that the very mechanism which fuels much of the learning power of LLMs, namely the ability to generate non-Gaussian output distributions from Gaussian input ones, might well be at the roots of their propensity to produce error pileup, ensuing information catastrophes and degenerative AI behaviour. This tension between learning and accuracy is a likely candidate mechanism underlying the observed low values of the scaling components. It is substantially compounded by the deluge of spurious correlations pointed out by Calude and Longo which rapidly increase in any data set merely as a function of its size, regardless of its nature. The fact that a degenerative AI pathway is a very probable feature of the LLM landscape does not mean that it must inevitably arise in all future AI research. Its avoidance, which we also discuss in this paper, necessitates putting a much higher premium on insight and understanding of the structural characteristics of the problems being investigated.
Comment: The paper discusses theoretical limitations of LLMs, which aligns with the interest in theoretical insights into LLM behavior.
Relevance: 9 Novelty: 8
12. WEEP: A Differentiable Nonconvex Sparse Regularizer via Weakly-Convex Envelope
ArXiv ID: 2507.20447
Authors: Takanobu Furuhashi, Hidekata Hontani, Tatsuya Yokota
Abstract: Sparse regularization is fundamental in signal processing for efficient signal recovery and feature extraction. However, it faces a fundamental dilemma: the most powerful sparsity-inducing penalties are often non-differentiable, conflicting with gradient-based optimizers that dominate the field. We introduce WEEP (Weakly-convex Envelope of Piecewise Penalty), a novel, fully differentiable sparse regularizer derived from the weakly-convex envelope framework. WEEP provides strong, unbiased sparsity while maintaining full differentiability and L-smoothness, making it natively compatible with any gradient-based optimizer. This resolves the conflict between statistical performance and computational tractability. We demonstrate superior performance compared to the L1-norm and other established non-convex sparse regularizers on challenging signal and image denoising tasks.
Comment: The paper introduces WEEP, a novel differentiable sparse regularizer, which aligns with the representation learning and model compression criteria by addressing sparsity and differentiability.
Relevance: 9 Novelty: 8
13. EcoTransformer: Attention without Multiplication
ArXiv ID: 2507.20096
Authors: Xin Gao, Xingming Xu
Abstract: The Transformer, with its scaled dot-product attention mechanism, has become a foundational architecture in modern AI. However, this mechanism is computationally intensive and incurs substantial energy costs. We propose a new Transformer architecture EcoTransformer, in which the output context vector is constructed as the convolution of the values using a Laplacian kernel, where the distances are measured by the L1 metric between the queries and keys. Compared to dot-product based attention, the new attention score calculation is free of matrix multiplication. It performs on par with, or even surpasses, scaled dot-product attention in NLP, bioinformatics, and vision tasks, while consuming significantly less energy.
Comment: EcoTransformer proposes a new Transformer architecture that reduces computational costs, aligning with model architecture innovations.
Relevance: 9 Novelty: 8
14. Memorization in Fine-Tuned Large Language Models
ArXiv ID: 2507.21009
Authors: Danil Savine, Muni Sreenivas Pydi, Jamal Atif, Olivier Capp\'e
Abstract: This study investigates the mechanisms and factors influencing memorization in fine-tuned large language models (LLMs), with a focus on the medical domain due to its privacy-sensitive nature. We examine how different aspects of the fine-tuning process affect a model's propensity to memorize training data, using the PHEE dataset of pharmacovigilance events. Our research employs two main approaches: a membership inference attack to detect memorized data, and a generation task with prompted prefixes to assess verbatim reproduction. We analyze the impact of adapting different weight matrices in the transformer architecture, the relationship between perplexity and memorization, and the effect of increasing the rank in low-rank adaptation (LoRA) fine-tuning. Key findings include: (1) Value and Output matrices contribute more significantly to memorization compared to Query and Key matrices; (2) Lower perplexity in the fine-tuned model correlates with increased memorization; (3) Higher LoRA ranks lead to increased memorization, but with diminishing returns at higher ranks. These results provide insights into the trade-offs between model performance and privacy risks in fine-tuned LLMs. Our findings have implications for developing more effective and responsible strategies for adapting large language models while managing data privacy concerns.
Comment: The paper investigates memorization in fine-tuned LLMs, providing insights into LLM behavior and interpretability.
Relevance: 9 Novelty: 7
15. Mixture of Length and Pruning Experts for Knowledge Graphs Reasoning
ArXiv ID: 2507.20498
Authors: Enjun Du, Siyi Liu, Yongqi Zhang
Abstract: Knowledge Graph (KG) reasoning, which aims to infer new facts from structured knowledge repositories, plays a vital role in Natural Language Processing (NLP) systems. Its effectiveness critically depends on constructing informative and contextually relevant reasoning paths. However, existing graph neural networks (GNNs) often adopt rigid, query-agnostic path-exploration strategies, limiting their ability to adapt to diverse linguistic contexts and semantic nuances. To address these limitations, we propose \textbf{MoKGR}, a mixture-of-experts framework that personalizes path exploration through two complementary components: (1) a mixture of length experts that adaptively selects and weights candidate path lengths according to query complexity, providing query-specific reasoning depth; and (2) a mixture of pruning experts that evaluates candidate paths from a complementary perspective, retaining the most informative paths for each query. Through comprehensive experiments on diverse benchmark, MoKGR demonstrates superior performance in both transductive and inductive settings, validating the effectiveness of personalized path exploration in KGs reasoning.
Comment: The paper introduces a mixture-of-experts framework for knowledge graph reasoning, relevant to model architecture with a focus on MoE.
Relevance: 9 Novelty: 7
16. DeepJIVE: Learning Joint and Individual Variation Explained from Multimodal Data Using Deep Learning
ArXiv ID: 2507.19682
Authors: Matthew Drexler, Benjamin Risk, James J Lah, Suprateek Kundu, Deqiang Qiu
Abstract: Conventional multimodal data integration methods provide a comprehensive assessment of the shared or unique structure within each individual data type but suffer from several limitations such as the inability to handle high-dimensional data and identify nonlinear structures. In this paper, we introduce DeepJIVE, a deep-learning approach to performing Joint and Individual Variance Explained (JIVE). We perform mathematical derivation and experimental validations using both synthetic and real-world 1D, 2D, and 3D datasets. Different strategies of achieving the identity and orthogonality constraints for DeepJIVE were explored, resulting in three viable loss functions. We found that DeepJIVE can successfully uncover joint and individual variations of multimodal datasets. Our application of DeepJIVE to the Alzheimer's Disease Neuroimaging Initiative (ADNI) also identified biologically plausible covariation patterns between the amyloid positron emission tomography (PET) and magnetic resonance (MR) images. In conclusion, the proposed DeepJIVE can be a useful tool for multimodal data analysis.
Comment: The paper presents DeepJIVE, a deep-learning approach for multimodal data integration, focusing on representation learning by uncovering joint and individual variations.
Relevance: 8 Novelty: 7
17. Iterative Pretraining Framework for Interatomic Potentials
ArXiv ID: 2507.20118
Authors: Taoyong Cui, Zhongyao Wang, Dongzhan Zhou, Yuqiang Li, Lei Bai, Wanli Ouyang, Mao Su, Shufei Zhang
Abstract: Machine learning interatomic potentials (MLIPs) enable efficient molecular dynamics (MD) simulations with ab initio accuracy and have been applied across various domains in physical science. However, their performance often relies on large-scale labeled training data. While existing pretraining strategies can improve model performance, they often suffer from a mismatch between the objectives of pretraining and downstream tasks or rely on extensive labeled datasets and increasingly complex architectures to achieve broad generalization. To address these challenges, we propose Iterative Pretraining for Interatomic Potentials (IPIP), a framework designed to iteratively improve the predictive performance of MLIP models. IPIP incorporates a forgetting mechanism to prevent iterative training from converging to suboptimal local minima. Unlike general-purpose foundation models, which frequently underperform on specialized tasks due to a trade-off between generality and system-specific accuracy, IPIP achieves higher accuracy and efficiency using lightweight architectures. Compared to general-purpose force fields, this approach achieves over 80% reduction in prediction error and up to 4x speedup in the challenging Mo-S-O system, enabling fast and accurate simulations.
Comment: The paper proposes an iterative pretraining framework for interatomic potentials, relevant to AI for science with a focus on foundational research in molecular modeling.
Relevance: 8 Novelty: 7
18. Bayesian symbolic regression: Automated equation discovery from a physicists' perspective
ArXiv ID: 2507.19540
Authors: Roger Guimera, Marta Sales-Pardo
Abstract: Symbolic regression automates the process of learning closed-form mathematical models from data. Standard approaches to symbolic regression, as well as newer deep learning approaches, rely on heuristic model selection criteria, heuristic regularization, and heuristic exploration of model space. Here, we discuss the probabilistic approach to symbolic regression, an alternative to such heuristic approaches with direct connections to information theory and statistical physics. We show how the probabilistic approach establishes model plausibility from basic considerations and explicit approximations, and how it provides guarantees of performance that heuristic approaches lack. We also discuss how the probabilistic approach compels us to consider model ensembles, as opposed to single models.
Comment: The paper discusses a probabilistic approach to symbolic regression, offering a theoretical perspective that aligns with emerging trends in foundational research.
Relevance: 8 Novelty: 7
19. CLoRA: Parameter-Efficient Continual Learning with Low-Rank Adaptation
ArXiv ID: 2507.19887
Authors: Shishir Muralidhara, Didier Stricker, Ren\'e Schuster
Abstract: In the past, continual learning (CL) was mostly concerned with the problem of catastrophic forgetting in neural networks, that arises when incrementally learning a sequence of tasks. Current CL methods function within the confines of limited data access, without any restrictions imposed on computational resources. However, in real-world scenarios, the latter takes precedence as deployed systems are often computationally constrained. A major drawback of most CL methods is the need to retrain the entire model for each new task. The computational demands of retraining large models can be prohibitive, limiting the applicability of CL in environments with limited resources. Through CLoRA, we explore the applicability of Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method for class-incremental semantic segmentation. CLoRA leverages a small set of parameters of the model and uses the same set for learning across all tasks. Results demonstrate the efficacy of CLoRA, achieving performance on par with and exceeding the baseline methods. We further evaluate CLoRA using NetScore, underscoring the need to factor in resource efficiency and evaluate CL methods beyond task performance. CLoRA significantly reduces the hardware requirements for training, making it well-suited for CL in resource-constrained environments after deployment.
Comment: The paper explores low-rank adaptation for continual learning, focusing on parameter efficiency, which aligns with model compression and efficiency breakthroughs.
Relevance: 8 Novelty: 7
20. Sparse-mode Dynamic Mode Decomposition for Disambiguating Local and Global Structures
ArXiv ID: 2507.19787
Authors: Sara M. Ichinaga, Steven L. Brunton, Aleksandr Y. Aravkin, J. Nathan Kutz
Abstract: The dynamic mode decomposition (DMD) is a data-driven approach that extracts the dominant features from spatiotemporal data. In this work, we introduce sparse-mode DMD, a new variant of the optimized DMD framework that specifically leverages sparsity-promoting regularization in order to approximate DMD modes which have localized spatial structure. The algorithm maintains the noise-robust properties of optimized DMD while disambiguating between modes which are spatially local versus global in nature. In many applications, such modes are associated with discrete and continuous spectra respectively, thus allowing the algorithm to explicitly construct, in an unsupervised manner, the distinct portions of the spectrum. We demonstrate this by analyzing synthetic and real-world systems, including examples from optical waveguides, quantum mechanics, and sea surface temperature data.
Comment: The paper introduces sparse-mode dynamic mode decomposition, focusing on sparsity and feature extraction, which aligns with representation learning and model compression.
Relevance: 8 Novelty: 7
21. Bag of Coins: A Statistical Probe into Neural Confidence Structures
ArXiv ID: 2507.19774
Authors: Agnideep Aich, Ashit Baran Aich, Md Monzur Murshed, Sameera Hewage, Bruce Wade
Abstract: Modern neural networks, despite their high accuracy, often produce poorly calibrated confidence scores, limiting their reliability in high-stakes applications. Existing calibration methods typically post-process model outputs without interrogating the internal consistency of the predictions themselves. In this work, we introduce a novel, non-parametric statistical probe, the Bag-of-Coins (BoC) test, that examines the internal consistency of a classifier's logits. The BoC test reframes confidence estimation as a frequentist hypothesis test: does the model's top-ranked class win 1-v-1 contests against random competitors at a rate consistent with its own stated softmax probability? When applied to modern deep learning architectures, this simple probe reveals a fundamental dichotomy. On Vision Transformers (ViTs), the BoC output serves as a state-of-the-art confidence score, achieving near-perfect calibration with an ECE of 0.0212, an 88% improvement over a temperature-scaled baseline. Conversely, on Convolutional Neural Networks (CNNs) like ResNet, the probe reveals a deep inconsistency between the model's predictions and its internal logit structure, a property missed by traditional metrics. We posit that BoC is not merely a calibration method, but a new diagnostic tool for understanding and exposing the differing ways that popular architectures represent uncertainty.
Comment: The paper introduces a novel statistical probe for understanding neural confidence structures, relevant to representation learning and model architecture analysis.
Relevance: 8 Novelty: 7
22. Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning
ArXiv ID: 2507.20424
Authors: Tolga Dimlioglu, Anna Choromanska
Abstract: We study centralized distributed data parallel training of deep neural networks (DNNs), aiming to improve the trade-off between communication efficiency and model performance of the local gradient methods. To this end, we revisit the flat-minima hypothesis, which suggests that models with better generalization tend to lie in flatter regions of the loss landscape. We introduce a simple, yet effective, sharpness measure, Inverse Mean Valley, and demonstrate its strong correlation with the generalization gap of DNNs. We incorporate an efficient relaxation of this measure into the distributed training objective as a lightweight regularizer that encourages workers to collaboratively seek wide minima. The regularizer exerts a pushing force that counteracts the consensus step pulling the workers together, giving rise to the Distributed Pull-Push Force (DPPF) algorithm. Empirically, we show that DPPF outperforms other communication-efficient approaches and achieves better generalization performance than local gradient methods and synchronous gradient averaging, while significantly reducing communication overhead. In addition, our loss landscape visualizations confirm the ability of DPPF to locate flatter minima. On the theoretical side, we show that DPPF guides workers to span flat valleys, with the final valley width governed by the interplay between push and pull strengths, and that its pull-push dynamics is self-stabilizing. We further provide generalization guarantees linked to the valley width and prove convergence in the non-convex setting.
Comment: The paper discusses a new algorithm for distributed training focusing on flat optima recovery, which is relevant to emerging trends in model training dynamics.
Relevance: 8 Novelty: 7
23. Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis
ArXiv ID: 2507.20454
Authors: Zhuokun Chen, Jugang Fan, Zhuowei Yu, Bohan Zhuang, Mingkui Tan
Abstract: Visual autoregressive modeling, based on the next-scale prediction paradigm, exhibits notable advantages in image quality and model scalability over traditional autoregressive and diffusion models. It generates images by progressively refining resolution across multiple stages. However, the computational overhead in high-resolution stages remains a critical challenge due to the substantial number of tokens involved. In this paper, we introduce SparseVAR, a plug-and-play acceleration framework for next-scale prediction that dynamically excludes low-frequency tokens during inference without requiring additional training. Our approach is motivated by the observation that tokens in low-frequency regions have a negligible impact on image quality in high-resolution stages and exhibit strong similarity with neighboring tokens. Additionally, we observe that different blocks in the next-scale prediction model focus on distinct regions, with some concentrating on high-frequency areas. SparseVAR leverages these insights by employing lightweight MSE-based metrics to identify low-frequency tokens while preserving the fidelity of excluded regions through a small set of uniformly sampled anchor tokens. By significantly reducing the computational cost while maintaining high image generation quality, SparseVAR achieves notable acceleration in both HART and Infinity. Specifically, SparseVAR achieves up to a 2 times speedup with minimal quality degradation in Infinity-2B.
Comment: SparseVAR introduces a novel approach to reduce computational overhead in high-resolution image synthesis, aligning with model compression through sparsity.
Relevance: 8 Novelty: 7
24. Quantizing Text-attributed Graphs for Semantic-Structural Integration
ArXiv ID: 2507.19526
Authors: Jianyuan Bo, Hao Wu, Yuan Fang
Abstract: Text-attributed graphs (TAGs) have emerged as a powerful representation for modeling complex relationships across diverse domains. With the rise of large language models (LLMs), there is growing interest in leveraging their capabilities for graph learning. However, current approaches face significant challenges in embedding structural information into LLM-compatible formats, requiring either computationally expensive alignment mechanisms or manual graph verbalization techniques that often lose critical structural details. Moreover, these methods typically require labeled data from source domains for effective transfer learning, significantly constraining their adaptability. We propose STAG, a novel self-supervised framework that directly quantizes graph structural information into discrete tokens using a frozen codebook. Unlike traditional quantization approaches, our method employs soft assignment and KL divergence guided quantization to address the unique challenges of graph data, which lacks natural tokenization structures. Our framework enables both LLM-based and traditional learning approaches, supporting true zero-shot transfer learning without requiring labeled data even in the source domain. Extensive experiments demonstrate state-of-the-art performance across multiple node classification benchmarks while maintaining compatibility with different LLM architectures, offering an elegant solution to bridging graph learning with LLMs.
Comment: STAG introduces a novel framework for quantizing graph structural information, aligning with representation learning and model architecture.
Relevance: 8 Novelty: 7
25. Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning
ArXiv ID: 2507.21049
Authors: Zedong Wang, Siyuan Li, Dan Xu
Abstract: Despite the promise of Multi-Task Learning in leveraging complementary knowledge across tasks, existing multi-task optimization (MTO) techniques remain fixated on resolving conflicts via optimizer-centric loss scaling and gradient manipulation strategies, yet fail to deliver consistent gains. In this paper, we argue that the shared representation space, where task interactions naturally occur, offers rich information and potential for operations complementary to existing optimizers, especially for facilitating the inter-task complementarity, which is rarely explored in MTO. This intuition leads to Rep-MTL, which exploits the representation-level task saliency to quantify interactions between task-specific optimization and shared representation learning. By steering these saliencies through entropy-based penalization and sample-wise cross-task alignment, Rep-MTL aims to mitigate negative transfer by maintaining the effective training of individual tasks instead pure conflict-solving, while explicitly promoting complementary information sharing. Experiments are conducted on four challenging MTL benchmarks covering both task-shift and domain-shift scenarios. The results show that Rep-MTL, even paired with the basic equal weighting policy, achieves competitive performance gains with favorable efficiency. Beyond standard performance metrics, Power Law exponent analysis demonstrates Rep-MTL's efficacy in balancing task-specific learning and cross-task sharing. The project page is available at HERE.
Comment: The paper focuses on representation-level task saliency in multi-task learning, which is relevant to representation learning by exploring task interactions and shared representation learning.
Relevance: 8 Novelty: 7
26. Kolmogorov Arnold Network Autoencoder in Medicine
ArXiv ID: 2507.19524
Authors: Ugo Lomoio, Pierangelo Veltri, Pietro Hiram Guzzi
Abstract: Deep learning neural networks architectures such Multi Layer Perceptrons (MLP) and Convolutional blocks still play a crucial role in nowadays research advancements. From a topological point of view, these architecture may be represented as graphs in which we learn the functions related to the nodes while fixed edges convey the information from the input to the output. A recent work introduced a new architecture called Kolmogorov Arnold Networks (KAN) that reports how putting learnable activation functions on the edges of the neural network leads to better performances in multiple scenarios. Multiple studies are focusing on optimizing the KAN architecture by adding important features such as dropout regularization, Autoencoders (AE), model benchmarking and last, but not least, the KAN Convolutional Network (KCN) that introduced matrix convolution with KANs learning. This study aims to benchmark multiple versions of vanilla AEs (such as Linear, Convolutional and Variational) against their Kolmogorov-Arnold counterparts that have same or less number of parameters. Using cardiological signals as model input, a total of five different classic AE tasks were studied: reconstruction, generation, denoising, inpainting and anomaly detection. The proposed experiments uses a medical dataset \textit{AbnormalHeartbeat} that contains audio signals obtained from the stethoscope.
Comment: The paper discusses Kolmogorov Arnold Network Autoencoder, which involves architectural innovations by introducing learnable activation functions on the edges of neural networks.
Relevance: 8 Novelty: 7
27. Diagonally-Weighted Generalized Method of Moments Estimation for Gaussian Mixture Modeling
ArXiv ID: 2507.20459
Authors: Liu Zhang, Oscar Mickelin, Sheng Xu, Amit Singer
Abstract: Since Pearson [Philosophical Transactions of the Royal Society of London. A, 185 (1894), pp. 71-110] first applied the method of moments (MM) for modeling data as a mixture of one-dimensional Gaussians, moment-based estimation methods have proliferated. Among these methods, the generalized method of moments (GMM) improves the statistical efficiency of MM by weighting the moments appropriately. However, the computational complexity and storage complexity of MM and GMM grow exponentially with the dimension, making these methods impractical for high-dimensional data or when higher-order moments are required. Such computational bottlenecks are more severe in GMM since it additionally requires estimating a large weighting matrix. To overcome these bottlenecks, we propose the diagonally-weighted GMM (DGMM), which achieves a balance among statistical efficiency, computational complexity, and numerical stability. We apply DGMM to study the parameter estimation problem for weakly separated heteroscedastic low-rank Gaussian mixtures and design a computationally efficient and numerically stable algorithm that obtains the DGMM estimator without explicitly computing or storing the moment tensors. We implement the proposed algorithm and empirically validate the advantages of DGMM: in numerical studies, DGMM attains smaller estimation errors while requiring substantially shorter runtime than MM and GMM. The code and data will be available upon publication at https://github.com/liu-lzhang/dgmm.
Comment: The paper proposes a diagonally-weighted GMM for Gaussian mixture modeling, which involves low-rank approaches and efficiency improvements, aligning with model compression.
Relevance: 8 Novelty: 7
28. Regularizing Subspace Redundancy of Low-Rank Adaptation
ArXiv ID: 2507.20745
Authors: Yue Zhu, Haiwen Diao, Shang Gao, Jiazuo Yu, Jiawen Zhu, Yunzhi Zhuge, Shuai Hao, Xu Jia, Lu Zhang, Ying Zhang, Huchuan Lu
Abstract: Low-Rank Adaptation (LoRA) and its variants have delivered strong capability in Parameter-Efficient Transfer Learning (PETL) by minimizing trainable parameters and benefiting from reparameterization. However, their projection matrices remain unrestricted during training, causing high representation redundancy and diminishing the effectiveness of feature adaptation in the resulting subspaces. While existing methods mitigate this by manually adjusting the rank or implicitly applying channel-wise masks, they lack flexibility and generalize poorly across various datasets and architectures. Hence, we propose ReSoRA, a method that explicitly models redundancy between mapping subspaces and adaptively Regularizes Subspace redundancy of Low-Rank Adaptation. Specifically, it theoretically decomposes the low-rank submatrices into multiple equivalent subspaces and systematically applies de-redundancy constraints to the feature distributions across different projections. Extensive experiments validate that our proposed method consistently facilitates existing state-of-the-art PETL methods across various backbones and datasets in vision-language retrieval and standard visual classification benchmarks. Besides, as a training supervision, ReSoRA can be seamlessly integrated into existing approaches in a plug-and-play manner, with no additional inference costs. Code is publicly available at: https://github.com/Lucenova/ReSoRA.
Comment: The paper proposes a method to regularize subspace redundancy in low-rank adaptation, which is relevant to model compression and efficiency.
Relevance: 8 Novelty: 7
29. DeltaLLM: A Training-Free Framework Exploiting Temporal Sparsity for Efficient Edge LLM Inference
ArXiv ID: 2507.19608
Authors: Jiawen Qi, Chang Gao, Zhaochun Ren, Qinyu Chen
Abstract: Deploying Large Language Models (LLMs) on edge devices remains challenging due to their quadratically increasing computations with the sequence length. Existing studies for dynamic attention pruning are designed for hardware with massively parallel computation capabilities, such as GPUs or TPUs, and aim at long context lengths (e.g., 64K), making them unsuitable for edge scenarios. We present DeltaLLM, a training-free framework that exploits temporal sparsity in attention patterns to enable efficient LLM inference across both the prefilling and decoding stages, on resource-constrained edge devices. DeltaLLM introduces an accuracy- and memory-aware delta matrix construction strategy that introduces temporal sparsity, and a context-aware hybrid attention mechanism that combines full attention in a local context window with delta approximation outside it to increase accuracy. We evaluate our framework on the edge-device-friendly BitNet-b1.58-2B-4T model and Llama3.2-1B-Instruct model across diverse language tasks. The results show that on BitNet, our framework increases the attention sparsity from 0% to 60% during the prefilling stage with slight accuracy improvement on the WG task, and 0% to 57% across both the prefilling and decoding stages, with even higher F1 score from 29.63 to 30.97 on SQuAD-v2 task. On the Llama model, it can also achieve up to 60% sparsity during the prefilling stage and around 57% across both stages with negligible accuracy drop. These results demonstrate that DeltaLLM offers a promising solution for efficient edge deployment, requiring no fine-tuning and seamlessly integrating with existing inference pipelines.
Comment: The paper presents a framework for efficient LLM inference on edge devices by exploiting temporal sparsity, relevant to model compression and efficiency.
Relevance: 8 Novelty: 7
30. Nonconvex Optimization Framework for Group-Sparse Feedback Linear-Quadratic Optimal Control. II: Non-Penalty Approach
ArXiv ID: 2507.19895
Authors: Lechen Feng, Xun Li, Yuan-Hua Ni
Abstract: This work is a companion paper of [8], where the distributed linear-quadratic problem with fixed communication topology (DFT-LQ) and the sparse feedback LQ problem (SF-LQ) are formulated into a nonsmooth and nonconvex optimization problem with affine constraints. Moreover, a penalty approach is considered in \cite{feng-part1}, and the PALM (proximal alternating linearized minimization) algorithm is studied with convergence and complexity analysis. In this paper, we aim to address the inherent drawbacks of the penalty approach, such as the challenge of tuning the penalty parameter and the risk of introducing spurious stationary points. Specifically, we first reformulate the SF-LQ problem and the DFT-LQ problem from an epi-composition function perspective, aiming to solve the constrained problem directly. Then, from a theoretical viewpoint, we revisit the alternating direction method of multipliers (ADMM) and establish its convergence to the set of cluster points under certain assumptions. When these assumptions do not hold, we can effectively utilize alternative approaches combining subgradient descent with Difference-of-Convex relaxation methods. In summary, our results enable the direct design of group-sparse feedback gains with theoretical guarantees, without resorting to convex surrogates, restrictive structural assumptions, or penalty formulations that incorporate constraints into the cost function.
Comment: The paper focuses on nonconvex optimization for group-sparse feedback, which is relevant to model compression through sparsity and optimization techniques.
Relevance: 8 Novelty: 7
Paper Selection Prompt
System Prompt
You are a helpful paper reading assistant whose job is to read daily posts from ArXiv and identify a few papers that your friend will enjoy reading. Your job is to carefully read the paper titles and abstracts below and find the ones that match the criteria below.
User Prompt
Instructions
Write the response in JSONL format with {ARXIVID, COMMENT, RELEVANCE, NOVELTY} on each line, one for each paper.
- ARXIVID: should be the ArXiv ID.
- COMMENT: should identify whether there is a criteria that match the paper very closely. These matches should not be based on general terms like "language modeling" or "advancements" and should specifically refer to a criterion. No need to mention the non-matching criteria.
- RELEVANCE: should be a score from 1-10.
- NOVELTY: should be a score from 1-10.
Scoring Criteria
The "Relevance" score measures how closely the paper aligns with the core topics of the prompt. The "Novelty" score assesses the originality and impact of the paper. They are two ORTHONORMAL axes and SHOULD NOT be confused with each other.
Relevance Scoring
- Relevance 9-10 (Completely Relevant)
- Focus: Fully aligned with core topics with no deviation, score the highest if contains relevant keywords in it.
Examples: Papers focused on foundational methods or theoretical research, whose titles contain topic keywords like "MoE".
Relevance 7-8 (Relevant)
- Focus: Retain a solid link to the main research area, though may touch on peripheral elements.
Examples: Papers research on the fundamental part of MoE through a less critical aspect like its behavior in GNN.
Relevance 5-6 (Borderline)
- Focus: Maintains a link to the core topic but also extends into at least one other domain/area beyond the primary focus.
Examples: Work referencing MoE centered on reinforcement learning.
Relevance 3-4 (Irrelevant)
- Focus: Largely outside our interests with no association to our topics.
Examples: Application-focused papers like using MoE to solve a problem in the real world.
Relevance 1-2 (Ignore)
- Focus: Purely unrelated to our topics. Completely a different domain.
- Exception: If the paper hints at a cutting-edge, radically new direction that could eventually transform the primary domain, consider a score of 9–10 despite initial appearances. (Usually a very rare concept that belongs to the fundamental research)
Novelty Scoring
- Novelty 9-10 (Breakthrough)
- Definition: Groundbreaking methods/theory introducing new directions or solving major challenges.
Examples: Entirely new paradigm for foundational models; a novel theory transforming representation learning.
Novelty 7-8 (Improvements)
- Definition: Substantial insights/enhancements, though not a full paradigm shift.
Examples: Modifications on existing methods yielding significantly better results.
Novelty 5-6 (Borderline)
- Definition: Incremental contributions with possible long-term benefits, not immediately transformative.
Examples: Moderately novel extension to an existing architecture; refining current methods without fundamentally altering them.
Novelty 3-4 (Tangential)
- Definition: Minor or domain-specific improvements with limited broader impact.
Examples: Slight modifications to known methods with strange motivation; purely engineering jobs like a new benchmark/dataset.
Novelty 1-2 (Low)
- Definition: Minimal originality, applying standard approaches without real innovation.
- Examples: Using an off-the-shelf model without adding new insights; purely application-driven studies like finetuning a pretrained model using existing methods.
Papers
[PAPER LIST HERE]
Relevant Topics
Use the following relevance criteria to focus on foundational research. Keep relevant papers and filter out irrelevant ones. Avoid purely application-driven work.
Representation Learning - Relevant: Insights into how deep networks encode information, feature/dictionary learning, sparse/contrastive methods, training dynamics in neural networks. - Irrelevant: Standard applications of known techniques lacking new theoretical or methodological contributions.
Model Architecture - Relevant: Mixture-of-Experts (MoE), Transformers, Conditional/Dynamic Networks, Autoencoders, analysis on existing architectures (like encoder-decoder), or other architectural innovations. - Irrelevant: Merely using existing architectures for a certain task without insights into the structure themselves.
Model Compression - Relevant: Sparsity, pruning, quantization, low-rank approaches, KV cache, or other algorithmic/theoretical efficiency breakthroughs. - Irrelevant: Straightforward applications of existing compression methods to new tasks.
Large Language Models (LLMs) - Relevant: Major breakthroughs in pretraining or architecture, theoretical insights into LLM behavior/interpretability. - Irrelevant: Domain-specific usage (e.g., translation, jail-breaking), finetuning or inference tricks (e.g., instruction tuning, chain-of-thoughts, data mixing), or empirical dataset/benchmark studies and text-level analysis (e.g. hallucination, reasoning, safety).
AI for Science - Relevant: Foundational research in molecular/protein modeling, new generative paradigms, or significant architecture-level innovations. - Irrelevant: Conventional, domain-specific applications without new theoretical perspectives.
Emerging Trends - Relevant: Cutting-edge theoretical work challenging established assumptions or introducing broad new paradigms. - Irrelevant: Incremental improvements or trend-following without novel insights.
Keywords:
- Relevant: Mixture of Experts (MoE), Representation Learning, Compression/Efficiency, Sparse/Sparsity, Pruning, Quantization, Low-rank, Foundation Model, etc.
- Irrelevant: Reinforcement Learning, Transfer Learning, Federated Learning, Online Learning, Diffusion Models, etc.
- Application: Image Segmentation, Medical Imaging, 3D Vision, Video Understanding, Information Retrieval, Summarization, Recommendation Systems, Machine Translation, Speech Recognition, Signal Processing, Spatial/Temporal Modeling, Time Series, Knowledge Graph, etc.