Personalized Daily ArXiv Papers 2025-04-11

[gpt-4o]	Prompt	Completion	Total
Token	24337	3113	27450
Cost	$0.06	$0.03	$0.09

Total arXiv papers: 365

Total scanned papers: 219

Total relevant papers: 11

Table of contents with paper titles:

Minimum width for universal approximation using squashable activation functions Authors: Jonghyun Shin, Namjun Kim, Geonho Hwang, Sejun Park
Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression Authors: Hanqi Xiao, Yi-Lin Sung, Elias Stengel-Eskin, Mohit Bansal
Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models Authors: Ling Team, Caizhi Tang, Chilin Fu, Chunwei Wu, Jia Guo, Jianwen Wang, Jingyu Hu, Liang Jiang, Meng Li, Peng Jiao, Pingping Liu, Shaomian Zheng, Shiwei Liang, Shuaicheng Li, Yalin Zhang, Yingting Wu, Yongkang Liu, Zhenyu Huang
How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective Authors: Qi Liu, Jiaxin Mao, Ji-Rong Wen
Smoothed Distance Kernels for MMDs and Applications in Wasserstein Gradient Flows Authors: Nicolaj Rux, Michael Quellmalz, Gabriele Steidl
Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory Authors: Mirac Suzgun, Mert Yuksekgonul, Federico Bianchi, Dan Jurafsky, James Zou
Adversarial Subspace Generation for Outlier Detection in High-Dimensional Data Authors: Jose Cribeiro-Ramallo, Federico Matteucci, Paul Enciu, Alexander Jenke, Vadim Arzamasov, Thorsten Strufe, Klemens B\"ohm
LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation Authors: Juzheng Zhang, Jiacheng You, Ashwinee Panda, Tom Goldstein
Prototype-Based Continual Learning with Label-free Replay Buffer and Cluster Preservation Loss Authors: Agil Aghasanli, Yi Li, Plamen Angelov
ConceptFormer: Towards Efficient Use of Knowledge-Graph Embeddings in Large Language Models Authors: Joel Barmettler, Abraham Bernstein, Luca Rossetto
Revisiting Likelihood-Based Out-of-Distribution Detection by Modeling Representations Authors: Yifan Ding, Arturas Aleksandrauskas, Amirhossein Ahmadian, Jonas Unger, Fredrik Lindsten, Gabriel Eilertsen

1. Minimum width for universal approximation using squashable activation functions

ArXiv ID: 2504.07371

Authors: Jonghyun Shin, Namjun Kim, Geonho Hwang, Sejun Park

Abstract: The exact minimum width that allows for universal approximation of unbounded-depth networks is known only for ReLU and its variants. In this work, we study the minimum width of networks using general activation functions. Specifically, we focus on squashable functions that can approximate the identity function and binary step function by alternatively composing with affine transformations. We show that for networks using a squashable activation function to universally approximate $L^p$ functions from $[0,1]^{d_x}$ to $\mathbb R^{d_y}$, the minimum width is $\max{d_x,d_y,2}$ unless $d_x=d_y=1$; the same bound holds for $d_x=d_y=1$ if the activation function is monotone. We then provide sufficient conditions for squashability and show that all non-affine analytic functions and a class of piecewise functions are squashable, i.e., our minimum width result holds for those general classes of activation functions.

Comment: This paper provides theoretical insights into the minimum width for universal approximation using squashable activation functions, which is foundational research in representation learning.

Relevance: 9 Novelty: 8

2. Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression

ArXiv ID: 2504.07389

Authors: Hanqi Xiao, Yi-Lin Sung, Elias Stengel-Eskin, Mohit Bansal

Abstract: Post-training quantization (PTQ) reduces a model's memory footprint by mapping full precision weights into low bit weights without costly retraining, but can degrade its downstream performance especially in low 2- to 3-bit settings. We develop a new mixed-precision PTQ approach, Task-Circuit Quantization (TaCQ), that draws parallels to automated circuit discovery, directly conditioning the quantization process on specific weight circuits -- which we define as sets of weights associated with downstream task performance. These weights are kept as 16-bit weights, while others are quantized, maintaining performance while only adding a marginal memory cost. Specifically, TaCQ contrasts unquantized model weights with a uniformly-quantized model to estimate the expected change in weights due to quantization and uses gradient information to predict the resulting impact on task performance, allowing us to preserve task-specific weights. We compare TaCQ-based quantization to existing mixed-precision quantization methods when conditioning both on general-purpose and task-specific data. Across QA, math reasoning, and text-to-SQL tasks for both Llama-3 and Qwen2.5, we find that TaCQ outperforms baselines using the same calibration data and a lower weight budget, achieving major improvements in the 2 and 3-bit regime. With only 3.1 bits we are able to recover 96% of Llama-3-8B-Instruct's unquantized 16-bit MMLU performance, obtaining a 5.25% absolute improvement over SPQR. We also observe consistently large gains over existing methods in the 2-bit regime, with an average gain of 14.74% over the strongest baseline, SliM-LLM. Moreover, we observe a 7.20% gain without conditioning on specific tasks, showing TaCQ's ability to identify important weights is not limited to task-conditioned settings.

Comment: The paper introduces a novel mixed-precision quantization method (TaCQ) that directly addresses foundational challenges in model compression, particularly in low-bit regimes. The approach is highly relevant and demonstrates significant improvements over baselines.

Relevance: 9 Novelty: 8

3. Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models

ArXiv ID: 2504.07158

Authors: Ling Team, Caizhi Tang, Chilin Fu, Chunwei Wu, Jia Guo, Jianwen Wang, Jingyu Hu, Liang Jiang, Meng Li, Peng Jiao, Pingping Liu, Shaomian Zheng, Shiwei Liang, Shuaicheng Li, Yalin Zhang, Yingting Wu, Yongkang Liu, Zhenyu Huang

Abstract: This technical report presents Ring-Lite-Distill, a lightweight reasoning model derived from our open-source Mixture-of-Experts (MoE) Large Language Models (LLMs) Ling-Lite. This study demonstrates that through meticulous high-quality data curation and ingenious training paradigms, the compact MoE model Ling-Lite can be further trained to achieve exceptional reasoning capabilities, while maintaining its parameter-efficient architecture with only 2.75 billion activated parameters, establishing an efficient lightweight reasoning architecture. In particular, in constructing this model, we have not merely focused on enhancing advanced reasoning capabilities, exemplified by high-difficulty mathematical problem solving, but rather aimed to develop a reasoning model with more comprehensive competency coverage. Our approach ensures coverage across reasoning tasks of varying difficulty levels while preserving generic capabilities, such as instruction following, tool use, and knowledge retention. We show that, Ring-Lite-Distill's reasoning ability reaches a level comparable to DeepSeek-R1-Distill-Qwen-7B, while its general capabilities significantly surpass those of DeepSeek-R1-Distill-Qwen-7B. The models are accessible at https://huggingface.co/inclusionAI

Comment: The paper discusses a lightweight reasoning model derived from a Mixture-of-Experts (MoE) architecture, aligning with the 'Model Architecture' criterion. It emphasizes parameter efficiency and reasoning capabilities, which are relevant to foundational research.

Relevance: 9 Novelty: 7

4. How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective

ArXiv ID: 2504.07898

Authors: Qi Liu, Jiaxin Mao, Ji-Rong Wen

Abstract: Recent studies have shown that large language models (LLMs) can assess relevance and support information retrieval (IR) tasks such as document ranking and relevance judgment generation. However, the internal mechanisms by which off-the-shelf LLMs understand and operationalize relevance remain largely unexplored. In this paper, we systematically investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability. Using activation patching techniques, we analyze the roles of various model components and identify a multi-stage, progressive process in generating either pointwise or pairwise relevance judgment. Specifically, LLMs first extract query and document information in the early layers, then process relevance information according to instructions in the middle layers, and finally utilize specific attention heads in the later layers to generate relevance judgments in the required format. Our findings provide insights into the mechanisms underlying relevance assessment in LLMs, offering valuable implications for future research on leveraging LLMs for IR tasks.

Comment: This paper provides mechanistic interpretability insights into how LLMs process relevance judgments, which aligns with the foundational study of LLM behavior and interpretability.

Relevance: 9 Novelty: 7

5. Smoothed Distance Kernels for MMDs and Applications in Wasserstein Gradient Flows

ArXiv ID: 2504.07820

Authors: Nicolaj Rux, Michael Quellmalz, Gabriele Steidl

Abstract: Negative distance kernels $K(x,y) := - |x-y|$ were used in the definition of maximum mean discrepancies (MMDs) in statistics and lead to favorable numerical results in various applications. In particular, so-called slicing techniques for handling high-dimensional kernel summations profit from the simple parameter-free structure of the distance kernel. However, due to its non-smoothness in $x=y$, most of the classical theoretical results, e.g. on Wasserstein gradient flows of the corresponding MMD functional do not longer hold true. In this paper, we propose a new kernel which keeps the favorable properties of the negative distance kernel as being conditionally positive definite of order one with a nearly linear increase towards infinity and a simple slicing structure, but is Lipschitz differentiable now. Our construction is based on a simple 1D smoothing procedure of the absolute value function followed by a Riemann-Liouville fractional integral transform. Numerical results demonstrate that the new kernel performs similarly well as the negative distance kernel in gradient descent methods, but now with theoretical guarantees.

Comment: The paper introduces a new smoothed distance kernel for MMDs with theoretical guarantees, which is relevant to representation learning and emerging trends in foundational research. The novel kernel design and its theoretical contributions are noteworthy.

Relevance: 8 Novelty: 8

6. Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory

ArXiv ID: 2504.07952

Authors: Mirac Suzgun, Mert Yuksekgonul, Federico Bianchi, Dan Jurafsky, James Zou

Abstract: Despite their impressive performance on complex tasks, current language models (LMs) typically operate in a vacuum: Each input query is processed separately, without retaining insights from previous attempts. Here, we present Dynamic Cheatsheet (DC), a lightweight framework that endows a black-box LM with a persistent, evolving memory. Rather than repeatedly re-discovering or re-committing the same solutions and mistakes, DC enables models to store and reuse accumulated strategies, code snippets, and general problem-solving insights at inference time. This test-time learning enhances performance substantially across a range of tasks without needing explicit ground-truth labels or human feedback. Leveraging DC, Claude 3.5 Sonnet's accuracy more than doubled on AIME math exams once it began retaining algebraic insights across questions. Similarly, GPT-4o's success rate on Game of 24 increased from 10% to 99% after the model discovered and reused a Python-based solution. In tasks prone to arithmetic mistakes, such as balancing equations, DC enabled GPT-4o and Claude to reach near-perfect accuracy by recalling previously validated code, whereas their baselines stagnated around 50%. Beyond arithmetic challenges, DC yields notable accuracy gains on knowledge-demanding tasks. Claude achieved a 9% improvement in GPQA-Diamond and an 8% boost on MMLU-Pro problems. Crucially, DC's memory is self-curated, focusing on concise, transferable snippets rather than entire transcript. Unlike finetuning or static retrieval methods, DC adapts LMs' problem-solving skills on the fly, without modifying their underlying parameters. Overall, our findings present DC as a promising approach for augmenting LMs with persistent memory, bridging the divide between isolated inference events and the cumulative, experience-driven learning characteristic of human cognition.

Comment: The paper introduces Dynamic Cheatsheet (DC), a framework for test-time learning with adaptive memory, which aligns with emerging trends in LLM behavior and interpretability. The approach is novel and demonstrates significant performance improvements.

Relevance: 8 Novelty: 8

7. Adversarial Subspace Generation for Outlier Detection in High-Dimensional Data

ArXiv ID: 2504.07522

Authors: Jose Cribeiro-Ramallo, Federico Matteucci, Paul Enciu, Alexander Jenke, Vadim Arzamasov, Thorsten Strufe, Klemens B\"ohm

Abstract: Outlier detection in high-dimensional tabular data is challenging since data is often distributed across multiple lower-dimensional subspaces -- a phenomenon known as the Multiple Views effect (MV). This effect led to a large body of research focused on mining such subspaces, known as subspace selection. However, as the precise nature of the MV effect was not well understood, traditional methods had to rely on heuristic-driven search schemes that struggle to accurately capture the true structure of the data. Properly identifying these subspaces is critical for unsupervised tasks such as outlier detection or clustering, where misrepresenting the underlying data structure can hinder the performance. We introduce Myopic Subspace Theory (MST), a new theoretical framework that mathematically formulates the Multiple Views effect and writes subspace selection as a stochastic optimization problem. Based on MST, we introduce V-GAN, a generative method trained to solve such an optimization problem. This approach avoids any exhaustive search over the feature space while ensuring that the intrinsic data structure is preserved. Experiments on 42 real-world datasets show that using V-GAN subspaces to build ensemble methods leads to a significant increase in one-class classification performance -- compared to existing subspace selection, feature selection, and embedding methods. Further experiments on synthetic data show that V-GAN identifies subspaces more accurately while scaling better than other relevant subspace selection methods. These results confirm the theoretical guarantees of our approach and also highlight its practical viability in high-dimensional settings.

Comment: The paper introduces a novel theoretical framework (Myopic Subspace Theory) for subspace selection and proposes a generative method (V-GAN) to address high-dimensional data challenges. While it focuses on outlier detection, the foundational insights into subspace selection and optimization align with representation learning.

Relevance: 8 Novelty: 8

8. LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation

ArXiv ID: 2504.07448

Authors: Juzheng Zhang, Jiacheng You, Ashwinee Panda, Tom Goldstein

Abstract: Low-Rank Adaptation (LoRA) has emerged as a popular parameter-efficient fine-tuning (PEFT) method for Large Language Models (LLMs), yet it still incurs notable overhead and suffers from parameter interference in multi-task scenarios. We propose LoRA with Reduced Interference (LoRI), a simple yet effective approach that freezes the projection matrices $A$ as random projections and sparsifies the matrices $B$ using task-specific masks. This design substantially reduces the number of trainable parameters while maintaining strong task performance. Moreover, LoRI minimizes cross-task interference in adapter merging by leveraging the orthogonality between adapter subspaces, and supports continual learning by using sparsity to mitigate catastrophic forgetting. Extensive experiments across natural language understanding, mathematical reasoning, code generation, and safety alignment tasks demonstrate that LoRI outperforms full fine-tuning and existing PEFT methods, while using up to 95% fewer trainable parameters than LoRA. In multi-task experiments, LoRI enables effective adapter merging and continual learning with reduced cross-task interference. Code is available at: https://github.com/juzhengz/LoRI

Comment: The paper proposes LoRI, a novel approach to reduce cross-task interference in multi-task low-rank adaptation for LLMs. It aligns with model compression and efficiency topics, particularly in parameter-efficient fine-tuning.

Relevance: 8 Novelty: 8

9. Prototype-Based Continual Learning with Label-free Replay Buffer and Cluster Preservation Loss

ArXiv ID: 2504.07240

Authors: Agil Aghasanli, Yi Li, Plamen Angelov

Abstract: Continual learning techniques employ simple replay sample selection processes and use them during subsequent tasks. Typically, they rely on labeled data. In this paper, we depart from this by automatically selecting prototypes stored without labels, preserving cluster structures in the latent space across tasks. By eliminating label dependence in the replay buffer and introducing cluster preservation loss, it is demonstrated that the proposed method can maintain essential information from previously encountered tasks while ensuring adaptation to new tasks. "Push-away" and "pull-toward" mechanisms over previously learned prototypes are also introduced for class-incremental and domain-incremental scenarios. These mechanisms ensure the retention of previously learned information as well as adaptation to new classes or domain shifts. The proposed method is evaluated on several benchmarks, including SplitCIFAR100, SplitImageNet32, SplitTinyImageNet, and SplitCaltech256 for class-incremental, as well as R-MNIST and CORe50 for domain-incremental setting using pre-extracted DINOv2 features. Experimental results indicate that the label-free replay-based technique outperforms state-of-the-art continual learning methods and, in some cases, even surpasses offline learning. An unsupervised variant of the proposed technique for the class-incremental setting, avoiding labels use even on incoming data, also demonstrated competitive performance, outperforming particular supervised baselines in some cases. These findings underscore the effectiveness of the proposed framework in retaining prior information and facilitating continual adaptation.

Comment: The paper introduces a novel prototype-based continual learning method with a label-free replay buffer and cluster preservation loss, which aligns with representation learning and sparsity-related methods.

Relevance: 8 Novelty: 7

10. ConceptFormer: Towards Efficient Use of Knowledge-Graph Embeddings in Large Language Models

ArXiv ID: 2504.07624

Authors: Joel Barmettler, Abraham Bernstein, Luca Rossetto

Abstract: Retrieval Augmented Generation (RAG) has enjoyed increased attention in the recent past and recent advancements in Large Language Models (LLMs) have highlighted the importance of integrating world knowledge into these systems. Current RAG methodologies often modify the internal architecture of pre-trained language models (PLMs) or rely on textifying knowledge graphs (KGs), which is inefficient in terms of token usage. This paper introduces ConceptFormer, a new approach to augment LLMs with structured knowledge from KGs, such as Wikidata, without altering their internal structure or relying on textual input of KGs. ConceptFormer operates in the LLM embedding vector space, creating and injecting \emph{concept vectors} that encapsulate the information of the KG nodes directly. Trained in conjunction with a frozen LLM, ConceptFormer generates a comprehensive lookup table that maps KG nodes to their respective concept vectors. The approach aims to enhance the factual recall capabilities of LLMs by enabling them to process these concept vectors natively, thus enriching them with structured world knowledge in an efficient and scalable manner. Our experiments demonstrate that the addition of concept vectors to GPT-2 0.1B substantially increases its factual recall ability (Hit@10) by up to 272\% when tested on sentences from Wikipedia and up to 348\% on synthetically generated sentences. Even injecting only a single concept vector into the prompt increases factual recall ability (Hit@10) by up to 213\% on Wikipedia sentences, significantly outperforming RAG with graph textification while consuming 130x fewer input tokens.

Comment: ConceptFormer introduces a method to integrate structured knowledge from knowledge graphs into LLMs without altering their internal architecture. This aligns with foundational research in model architecture and representation learning.

Relevance: 8 Novelty: 7

11. Revisiting Likelihood-Based Out-of-Distribution Detection by Modeling Representations

ArXiv ID: 2504.07793

Authors: Yifan Ding, Arturas Aleksandrauskas, Amirhossein Ahmadian, Jonas Unger, Fredrik Lindsten, Gabriel Eilertsen

Abstract: Out-of-distribution (OOD) detection is critical for ensuring the reliability of deep learning systems, particularly in safety-critical applications. Likelihood-based deep generative models have historically faced criticism for their unsatisfactory performance in OOD detection, often assigning higher likelihood to OOD data than in-distribution samples when applied to image data. In this work, we demonstrate that likelihood is not inherently flawed. Rather, several properties in the images space prohibit likelihood as a valid detection score. Given a sufficiently good likelihood estimator, specifically using the probability flow formulation of a diffusion model, we show that likelihood-based methods can still perform on par with state-of-the-art methods when applied in the representation space of pre-trained encoders. The code of our work can be found at $\href{https://github.com/limchaos/Likelihood-OOD.git}{\texttt{https://github.com/limchaos/Likelihood-OOD.git}}$.

Comment: This paper revisits likelihood-based OOD detection by modeling representations using diffusion models. It aligns with foundational research in representation learning and provides insights into improving OOD detection.

Relevance: 8 Novelty: 7

Paper Selection Prompt

You are a helpful paper reading assistant whose job is to read daily posts from ArXiv and identify a few papers that your friend will enjoy reading. Your job is to carefully read the paper titles and abstracts below and find the ones that match the criteria below.

Instructions

Write the response in JSONL format with {ARXIVID, COMMENT, RELEVANCE, NOVELTY} on each line, one for each paper.

ARXIVID: should be the ArXiv ID.
COMMENT: should identify whether there is a criteria that match the paper very closely. These matches should not be based on general terms like "language modeling" or "advancements" and should specifically refer to a criterion. No need to mention the non-matching criteria.
RELEVANCE: should be a score from 1-10.
NOVELTY: should be a score from 1-10.

Scoring Criteria

The "Relevance" score measures how closely the paper aligns with the core topics of the prompt. The "Novelty" score assesses the originality and impact of the paper. They are two ORTHONORMAL axes and SHOULD NOT be confused with each other.

Relevance Scoring

Relevance 9-10 (Completely Relevant)
Focus: Fully aligned with core topics with no deviation, score the highest if contains relevant keywords in it.
Examples: Papers focused on foundational methods or theoretical research, whose titles contain topic keywords like "MoE".
Relevance 7-8 (Relevant)
Focus: Retain a solid link to the main research area, though may touch on peripheral elements.
Examples: Papers research on the fundamental part of MoE through a less critical aspect like its behavior in GNN.
Relevance 5-6 (Borderline)
Focus: Maintains a link to the core topic but also extends into at least one other domain/area beyond the primary focus.
Examples: Work referencing MoE centered on reinforcement learning.
Relevance 3-4 (Irrelevant)
Focus: Largely outside our interests with no association to our topics.
Examples: Application-focused papers like using MoE to solve a problem in the real world.
Relevance 1-2 (Ignore)
Focus: Purely unrelated to our topics. Completely a different domain.
Exception: If the paper hints at a cutting-edge, radically new direction that could eventually transform the primary domain, consider a score of 9–10 despite initial appearances. (Usually a very rare concept that belongs to the fundamental research)

Novelty Scoring

Novelty 9-10 (Breakthrough)
Definition: Groundbreaking methods/theory introducing new directions or solving major challenges.
Examples: Entirely new paradigm for foundational models; a novel theory transforming representation learning.
Novelty 7-8 (Improvements)
Definition: Substantial insights/enhancements, though not a full paradigm shift.
Examples: Modifications on existing methods yielding significantly better results.
Novelty 5-6 (Borderline)
Definition: Incremental contributions with possible long-term benefits, not immediately transformative.
Examples: Moderately novel extension to an existing architecture; refining current methods without fundamentally altering them.
Novelty 3-4 (Tangential)
Definition: Minor or domain-specific improvements with limited broader impact.
Examples: Slight modifications to known methods with strange motivation; purely engineering jobs like a new benchmark/dataset.
Novelty 1-2 (Low)
Definition: Minimal originality, applying standard approaches without real innovation.
Examples: Using an off-the-shelf model without adding new insights; purely application-driven studies like finetuning a pretrained model using existing methods.

Papers

[PAPER LIST HERE]

Relevant Topics

Use the following relevance criteria to focus on foundational research. Keep relevant papers and filter out irrelevant ones. Avoid purely application-driven work.

Representation Learning - Relevant: Insights into how deep networks encode information, feature/dictionary learning, sparse/contrastive methods, training dynamics in neural networks. - Irrelevant: Standard applications of known techniques lacking new theoretical or methodological contributions.
Model Architecture - Relevant: Mixture-of-Experts (MoE), Transformers, Conditional/Dynamic Networks, Autoencoders, analysis on existing architectures (like encoder-decoder), or other architectural innovations. - Irrelevant: Merely using existing architectures for a certain task without insights into the structure themselves.
Model Compression - Relevant: Sparsity, pruning, quantization, low-rank approaches, KV cache, or other algorithmic/theoretical efficiency breakthroughs. - Irrelevant: Straightforward applications of existing compression methods to new tasks.
Large Language Models (LLMs) - Relevant: Major breakthroughs in pretraining or architecture, theoretical insights into LLM behavior/interpretability. - Irrelevant: Domain-specific usage (e.g., translation, jail-breaking), finetuning or inference tricks (e.g., instruction tuning, chain-of-thoughts, data mixing), or empirical dataset/benchmark studies and text-level analysis (e.g. hallucination, reasoning, safety).
AI for Science - Relevant: Foundational research in molecular/protein modeling, new generative paradigms, or significant architecture-level innovations. - Irrelevant: Conventional, domain-specific applications without new theoretical perspectives.
Emerging Trends - Relevant: Cutting-edge theoretical work challenging established assumptions or introducing broad new paradigms. - Irrelevant: Incremental improvements or trend-following without novel insights.

Keywords:

Relevant: Mixture of Experts (MoE), Representation Learning, Compression/Efficiency, Sparse/Sparsity, Pruning, Quantization, Low-rank, Foundation Model, etc.
Irrelevant: Reinforcement Learning, Transfer Learning, Federated Learning, Online Learning, Diffusion Models, etc.
Application: Image Segmentation, Medical Imaging, 3D Vision, Video Understanding, Information Retrieval, Summarization, Recommendation Systems, Machine Translation, Speech Recognition, Signal Processing, Spatial/Temporal Modeling, Time Series, Knowledge Graph, etc.