Personalized Daily ArXiv Papers 2025-05-08
| [gpt-4o] | Prompt | Completion | Total |
|---|---|---|---|
| Token | 28086 | 3939 | 32025 |
| Cost | $0.07 | $0.04 | $0.11 |
Total arXiv papers: 417
Total scanned papers: 292
Total relevant papers: 12
Table of contents with paper titles:
-
Large Language Model Compression with Global Rank and Sparsity Optimization Authors: Changhai Zhou, Qian Qiao, Weizhong Zhang, Cheng Jin
-
Quiet Feature Learning in Algorithmic Tasks Authors: Prudhviraj Naidu, Zixian Wang, Leon Bergen, Ramamohan Paturi
-
Position: Foundation Models Need Digital Twin Representations Authors: Yiqing Shen, Hao Ding, Lalithkumar Seenivasan, Tianmin Shu, Mathias Unberath
-
Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth Authors: Changhai Zhou, Yuhua Zhou, Qian Qiao, Weizhong Zhang, Cheng Jin
-
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection Authors: Xinyue Zeng, Haohui Wang, Junhong Lin, Jun Wu, Tyler Cody, Dawei Zhou
-
ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via $\alpha$-$\beta$-Divergence Authors: Guanghui Wang, Zhiyong Yang, Zitai Wang, Shi Wang, Qianqian Xu, Qingming Huang
-
APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design Authors: Yonghao Tan, Pingcheng Dong, Yongkun Wu, Yu Liu, Xuejiao Liu, Peng Luo, Shih-Yang Liu, Xijie Huang, Dong Zhang, Luhong Liang, Kwang-Ting Cheng
-
AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design Authors: Yanbiao Liang, Huihong Shi, Haikuo Shao, Zhongfeng Wang
-
Grouped Sequency-arranged Rotation: Optimizing Rotation Transformation for Quantization for Free Authors: Euntae Choi, Sumin Song, Woosang Lim, Sungjoo Yoo
-
Is the end of Insight in Sight ? Authors: Jean-Michel Tucny, Mihir Durve, Sauro Succi
-
Sparsity is All You Need: Rethinking Biological Pathway-Informed Approaches in Deep Learning Authors: Isabella Caranzano, Corrado Pancotti, Cesare Rollo, Flavio Sartori, Pietro Li`o, Piero Fariselli, Tiziana Sanavia
-
Information Filtering Networks: Theoretical Foundations, Generative Methodologies, and Real-World Applications Authors: Tomaso Aste
1. Large Language Model Compression with Global Rank and Sparsity Optimization
ArXiv ID: 2505.03801
Authors: Changhai Zhou, Qian Qiao, Weizhong Zhang, Cheng Jin
Abstract: Low-rank and sparse composite approximation is a natural idea to compress Large Language Models (LLMs). However, such an idea faces two primary challenges that adversely affect the performance of existing methods. The first challenge relates to the interaction and cooperation between low-rank and sparse matrices, while the second involves determining weight allocation across different layers, as redundancy varies considerably among them. To address these challenges, we propose a novel two-stage LLM compression method with the capability of global rank and sparsity optimization. It is noteworthy that the overall optimization space is vast, making comprehensive optimization computationally prohibitive. Therefore, to reduce the optimization space, our first stage utilizes robust principal component analysis to decompose the weight matrices of LLMs into low-rank and sparse components, which span the low dimensional and sparse spaces containing the resultant low-rank and sparse matrices, respectively. In the second stage, we propose a probabilistic global optimization technique to jointly identify the low-rank and sparse structures within the above two spaces. The appealing feature of our approach is its ability to automatically detect the redundancy across different layers and to manage the interaction between the sparse and low-rank components. Extensive experimental results indicate that our method significantly surpasses state-of-the-art techniques for sparsification and composite approximation.
Comment: Proposes a two-stage LLM compression method combining low-rank and sparse approximations with global optimization. This directly addresses foundational challenges in model compression and sparsity.
Relevance: 10 Novelty: 8
2. Quiet Feature Learning in Algorithmic Tasks
ArXiv ID: 2505.03997
Authors: Prudhviraj Naidu, Zixian Wang, Leon Bergen, Ramamohan Paturi
Abstract: We train Transformer-based language models on ten foundational algorithmic tasks and observe pronounced phase transitions in their loss curves that deviate from established power-law scaling trends. Over large ranges of compute, the validation loss barely improves, then abruptly decreases. Probing the models' internal representations reveals the learning of quiet features during the stagnant phase, followed by sudden acquisition of loud features that coincide with the sharp drop in loss. Our ablation experiments show that disrupting a single learned feature can dramatically degrade performance, providing evidence of their causal role in task performance. These findings challenge the prevailing assumption that next-token predictive loss reliably tracks incremental progress; instead, key internal features may be developing below the surface until they coalesce, triggering a rapid performance gain.
Comment: The paper provides insights into representation learning by analyzing how features are encoded and emerge in Transformer-based models during training. This aligns closely with the 'Representation Learning' criterion, particularly in understanding training dynamics and feature learning.
Relevance: 10 Novelty: 8
3. Position: Foundation Models Need Digital Twin Representations
ArXiv ID: 2505.03798
Authors: Yiqing Shen, Hao Ding, Lalithkumar Seenivasan, Tianmin Shu, Mathias Unberath
Abstract: Current foundation models (FMs) rely on token representations that directly fragment continuous real-world multimodal data into discrete tokens. They limit FMs to learning real-world knowledge and relationships purely through statistical correlation rather than leveraging explicit domain knowledge. Consequently, current FMs struggle with maintaining semantic coherence across modalities, capturing fine-grained spatial-temporal dynamics, and performing causal reasoning. These limitations cannot be overcome by simply scaling up model size or expanding datasets. This position paper argues that the machine learning community should consider digital twin (DT) representations, which are outcome-driven digital representations that serve as building blocks for creating virtual replicas of physical processes, as an alternative to the token representation for building FMs. Finally, we discuss how DT representations can address these challenges by providing physically grounded representations that explicitly encode domain knowledge and preserve the continuous nature of real-world processes.
Comment: The position paper argues for digital twin representations as an alternative to token-based representations in foundation models. This aligns with emerging trends and challenges established assumptions in representation learning.
Relevance: 9 Novelty: 8
4. Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth
ArXiv ID: 2505.03802
Authors: Changhai Zhou, Yuhua Zhou, Qian Qiao, Weizhong Zhang, Cheng Jin
Abstract: QLoRA effectively combines low-bit quantization and LoRA to achieve memory-friendly fine-tuning for large language models (LLM). Recently, methods based on SVD for continuous update iterations to initialize LoRA matrices to accommodate quantization errors have generally failed to consistently improve performance. Dynamic mixed precision is a natural idea for continuously improving the fine-tuning performance of quantized models, but previous methods often optimize low-rank subspaces or quantization components separately, without considering their synergy. To address this, we propose \textbf{QR-Adaptor}, a unified, gradient-free strategy that uses partial calibration data to jointly search the quantization components and the rank of low-rank spaces for each layer, thereby continuously improving model performance. QR-Adaptor does not minimize quantization error but treats precision and rank allocation as a discrete optimization problem guided by actual downstream performance and memory usage. Compared to state-of-the-art (SOTA) quantized LoRA fine-tuning methods, our approach achieves a 4.89\% accuracy improvement on GSM8K, and in some cases even outperforms the 16-bit fine-tuned model while maintaining the memory footprint of the 4-bit setting.
Comment: The paper introduces QR-Adaptor, a method for fine-tuning quantized models by jointly optimizing quantization and low-rank components. This aligns well with the model compression criterion, particularly in low-rank approaches and quantization.
Relevance: 9 Novelty: 8
5. LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
ArXiv ID: 2505.03793
Authors: Xinyue Zeng, Haohui Wang, Junhong Lin, Jun Wu, Tyler Cody, Dawei Zhou
Abstract: The proliferation of open-sourced Large Language Models (LLMs) and diverse downstream tasks necessitates efficient model selection, given the impracticality of fine-tuning all candidates due to computational constraints. Despite the recent advances in LLM selection, a fundamental research question largely remains nascent: how can we model the dynamic behaviors of LLMs during fine-tuning, thereby enhancing our understanding of their generalization performance across diverse downstream tasks? In this work, we propose a novel theoretical framework that provides a proper lens to assess the generalization capabilities of LLMs, thereby enabling accurate and efficient LLM selection for downstream applications. In particular, we first derive a Hessian-based PAC-Bayes generalization bound that unveils fine-tuning dynamics of LLMs and then introduce LENSLLM, a Neural Tangent Kernel(NTK)-based Rectified Scaling Model that enables accurate performance predictions across diverse tasks while maintaining computational efficiency. Extensive empirical results on 3 large-scale benchmarks demonstrate that our model achieves up to 91.1% accuracy and reduces up to 88.5% computational cost in LLM selection, outperforming 5 state-of-the-art methods. We open-source our proposed LENSLLM model and corresponding results at the Github link: https://github.com/Susan571/LENSLLM.git.
Comment: Proposes a theoretical framework for understanding fine-tuning dynamics in LLMs, leveraging PAC-Bayes bounds and NTK-based models. This aligns with foundational research in LLM behavior and generalization.
Relevance: 9 Novelty: 8
6. ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via $\alpha$-$\beta$-Divergence
ArXiv ID: 2505.04560
Authors: Guanghui Wang, Zhiyong Yang, Zitai Wang, Shi Wang, Qianqian Xu, Qingming Huang
Abstract: Knowledge Distillation (KD) transfers knowledge from a large teacher model to a smaller student model by minimizing the divergence between their output distributions, typically using forward Kullback-Leibler divergence (FKLD) or reverse KLD (RKLD). It has become an effective training paradigm due to the broader supervision information provided by the teacher distribution compared to one-hot labels. We identify that the core challenge in KD lies in balancing two mode-concentration effects: the \textbf{\textit{Hardness-Concentration}} effect, which refers to focusing on modes with large errors, and the \textbf{\textit{Confidence-Concentration}} effect, which refers to focusing on modes with high student confidence. Through an analysis of how probabilities are reassigned during gradient updates, we observe that these two effects are entangled in FKLD and RKLD, but in extreme forms. Specifically, both are too weak in FKLD, causing the student to fail to concentrate on the target class. In contrast, both are too strong in RKLD, causing the student to overly emphasize the target class while ignoring the broader distributional information from the teacher. To address this imbalance, we propose ABKD, a generic framework with $\alpha$-$\beta$-divergence. Our theoretical results show that ABKD offers a smooth interpolation between FKLD and RKLD, achieving an effective trade-off between these effects. Extensive experiments on 17 language/vision datasets with 12 teacher-student settings confirm its efficacy. The code is available at https://github.com/ghwang-s/abkd.
Comment: The paper proposes a novel framework for knowledge distillation using alpha-beta divergence, addressing challenges in balancing concentration effects. This aligns with the 'Model Compression' criterion, particularly in advancing theoretical understanding of distillation.
Relevance: 9 Novelty: 8
7. APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design
ArXiv ID: 2505.03748
Authors: Yonghao Tan, Pingcheng Dong, Yongkun Wu, Yu Liu, Xuejiao Liu, Peng Luo, Shih-Yang Liu, Xijie Huang, Dong Zhang, Luhong Liang, Kwang-Ting Cheng
Abstract: DNN accelerators, significantly advanced by model compression and specialized dataflow techniques, have marked considerable progress. However, the frequent access of high-precision partial sums (PSUMs) leads to excessive memory demands in architectures utilizing input/weight stationary dataflows. Traditional compression strategies have typically overlooked PSUM quantization, which may account for 69% of power consumption. This study introduces a novel Additive Partial Sum Quantization (APSQ) method, seamlessly integrating PSUM accumulation into the quantization framework. A grouping strategy that combines APSQ with PSUM quantization enhanced by a reconfigurable architecture is further proposed. The APSQ performs nearly lossless on NLP and CV tasks across BERT, Segformer, and EfficientViT models while compressing PSUMs to INT8. This leads to a notable reduction in energy costs by 28-87%. Extended experiments on LLaMA2-7B demonstrate the potential of APSQ for large language models. Code is available at https://github.com/Yonghao-Tan/APSQ.
Comment: The paper introduces a novel quantization method (APSQ) for partial sums in DNN accelerators, which aligns with the model compression criterion, particularly in energy efficiency and quantization innovations.
Relevance: 9 Novelty: 8
8. AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design
ArXiv ID: 2505.03745
Authors: Yanbiao Liang, Huihong Shi, Haikuo Shao, Zhongfeng Wang
Abstract: Recently, large language models (LLMs) have achieved huge success in the natural language processing (NLP) field, driving a growing demand to extend their deployment from the cloud to edge devices. However, deploying LLMs on resource-constrained edge devices poses significant challenges, including (1) intensive computations and huge model sizes, (2) great memory and bandwidth demands introduced by the autoregressive generation process, and (3) limited scalability for handling long sequences. To address these challenges, we propose AccLLM, a comprehensive acceleration framework that enables efficient and fast long-context LLM inference through algorithm and hardware co-design. At the algorithmic level, we integrate (1) pruning, (2) {\Lambda}-shaped attention, and (3) an innovative W2A8KV4 (2-bit weights, 8-bit activations, and 4-bit KV cache) quantization scheme, thus effectively reducing memory and bandwidth requirements while facilitating LLMs' long-sequence generation. At the hardware level, we design a dedicated FPGA-based accelerator with a reconfigurable computing engine to effectively and flexibly accommodate diverse operations arising from our compression algorithm, thereby fully translating the algorithmic innovations into tangible hardware efficiency. We validate AccLLM on the Xilinx Alveo U280 FPGA, demonstrating a 4.07x energy efficiency and a 2.98x throughput compared to the state-of-the-art work FlightLLM.
Comment: The paper proposes AccLLM, a framework for accelerating LLM inference with algorithm-hardware co-design, including pruning and quantization innovations, which aligns with model compression and efficiency breakthroughs.
Relevance: 9 Novelty: 8
9. Grouped Sequency-arranged Rotation: Optimizing Rotation Transformation for Quantization for Free
ArXiv ID: 2505.03810
Authors: Euntae Choi, Sumin Song, Woosang Lim, Sungjoo Yoo
Abstract: Large Language Models (LLMs) face deployment challenges due to high computational costs, and while Post-Training Quantization (PTQ) offers a solution, existing rotation-based methods struggle at very low bit-widths like 2-bit. We introduce a novel, training-free approach to construct an improved rotation matrix, addressing the limitations of current methods. The key contributions include leveraging the Walsh-Hadamard transform with sequency ordering, which clusters similar frequency components to reduce quantization error compared to standard Hadamard matrices, significantly improving performance. Furthermore, we propose a Grouped Sequency-arranged Rotation (GSR) using block-diagonal matrices with smaller Walsh blocks, effectively isolating outlier impacts and achieving performance comparable to optimization-based methods without requiring any training. Our method demonstrates robust performance on reasoning tasks and Perplexity (PPL) score on WikiText-2. Our method also enhances results even when applied over existing learned rotation techniques.
Comment: This paper proposes a novel quantization method leveraging the Walsh-Hadamard transform and introduces Grouped Sequency-arranged Rotation (GSR), which directly addresses model compression and efficiency. The method is innovative and relevant to foundational research in model compression.
Relevance: 9 Novelty: 8
10. Is the end of Insight in Sight ?
ArXiv ID: 2505.04627
Authors: Jean-Michel Tucny, Mihir Durve, Sauro Succi
Abstract: It is shown that the weight matrices of a Physics-informed neural network (PINN)-based deep learning application to a rarefied gas dynamics problem described by the Boltzmann equation bear no evident link to the mathematical structure of the physical problem. Instead, the weights appear close to Gaussian distributed random matrices. Although significantly more work is needed to support a robust assessment in this direction, these results suggest that deep-learning and the numerical solution of the Boltzmann equation represent two equivalent, but largely distinct paths to the same physical knowledge. If so, Explainable AI might be an unrealistic target and possibly even an ill-posed one.
Comment: The paper questions the explainability of AI by analyzing weight matrices in PINNs, suggesting a potential challenge to the goal of Explainable AI. This aligns with emerging trends challenging established assumptions.
Relevance: 8 Novelty: 7
11. Sparsity is All You Need: Rethinking Biological Pathway-Informed Approaches in Deep Learning
ArXiv ID: 2505.04300
Authors: Isabella Caranzano, Corrado Pancotti, Cesare Rollo, Flavio Sartori, Pietro Li`o, Piero Fariselli, Tiziana Sanavia
Abstract: Biologically-informed neural networks typically leverage pathway annotations to enhance performance in biomedical applications. We hypothesized that the benefits of pathway integration does not arise from its biological relevance, but rather from the sparsity it introduces. We conducted a comprehensive analysis of all relevant pathway-based neural network models for predictive tasks, critically evaluating each study's contributions. From this review, we curated a subset of methods for which the source code was publicly available. The comparison of the biologically informed state-of-the-art deep learning models and their randomized counterparts showed that models based on randomized information performed equally well as biologically informed ones across different metrics and datasets. Notably, in 3 out of the 15 analyzed models, the randomized versions even outperformed their biologically informed counterparts. Moreover, pathway-informed models did not show any clear advantage in interpretability, as randomized models were still able to identify relevant disease biomarkers despite lacking explicit pathway information. Our findings suggest that pathway annotations may be too noisy or inadequately explored by current methods. Therefore, we propose a methodology that can be applied to different domains and can serve as a robust benchmark for systematically comparing novel pathway-informed models against their randomized counterparts. This approach enables researchers to rigorously determine whether observed performance improvements can be attributed to biological insights.
Comment: The paper critiques biologically-informed neural networks and highlights the role of sparsity, which aligns with the 'Representation Learning' and 'Model Compression' criteria, particularly in understanding sparsity's role in neural networks.
Relevance: 8 Novelty: 7
12. Information Filtering Networks: Theoretical Foundations, Generative Methodologies, and Real-World Applications
ArXiv ID: 2505.03812
Authors: Tomaso Aste
Abstract: Information Filtering Networks (IFNs) provide a powerful framework for modeling complex systems through globally sparse yet locally dense and interpretable structures that capture multivariate dependencies. This review offers a comprehensive account of IFNs, covering their theoretical foundations, construction methodologies, and diverse applications. Tracing their origins from early network-based models to advanced formulations such as the Triangulated Maximally Filtered Graph (TMFG) and the Maximally Filtered Clique Forest (MFCF), the paper highlights how IFNs address key challenges in high-dimensional data-driven modeling. IFNs and their construction methodologies are intrinsically higher-order networks that generate simplicial complexes-structures that are only now becoming popular in the broader literature. Applications span fields including finance, biology, psychology, and artificial intelligence, where IFNs improve interpretability, computational efficiency, and predictive performance. Special attention is given to their role in graphical modeling, where IFNs enable the estimation of sparse inverse covariance matrices with greater accuracy and scalability than traditional approaches like Graphical LASSO. Finally, the review discusses recent developments that integrate IFNs with machine learning and deep learning, underscoring their potential not only to bridge classical network theory with contemporary data-driven paradigms, but also to shape the architectures of deep learning models themselves.
Comment: The paper provides a comprehensive review of Information Filtering Networks (IFNs), discussing their theoretical foundations and potential integration with deep learning architectures, which aligns with emerging trends and foundational research.
Relevance: 8 Novelty: 7
Paper Selection Prompt
You are a helpful paper reading assistant whose job is to read daily posts from ArXiv and identify a few papers that your friend will enjoy reading. Your job is to carefully read the paper titles and abstracts below and find the ones that match the criteria below.
Instructions
Write the response in JSONL format with {ARXIVID, COMMENT, RELEVANCE, NOVELTY} on each line, one for each paper.
- ARXIVID: should be the ArXiv ID.
- COMMENT: should identify whether there is a criteria that match the paper very closely. These matches should not be based on general terms like "language modeling" or "advancements" and should specifically refer to a criterion. No need to mention the non-matching criteria.
- RELEVANCE: should be a score from 1-10.
- NOVELTY: should be a score from 1-10.
Scoring Criteria
The "Relevance" score measures how closely the paper aligns with the core topics of the prompt. The "Novelty" score assesses the originality and impact of the paper. They are two ORTHONORMAL axes and SHOULD NOT be confused with each other.
Relevance Scoring
- Relevance 9-10 (Completely Relevant)
- Focus: Fully aligned with core topics with no deviation, score the highest if contains relevant keywords in it.
-
Examples: Papers focused on foundational methods or theoretical research, whose titles contain topic keywords like "MoE".
-
Relevance 7-8 (Relevant)
- Focus: Retain a solid link to the main research area, though may touch on peripheral elements.
-
Examples: Papers research on the fundamental part of MoE through a less critical aspect like its behavior in GNN.
-
Relevance 5-6 (Borderline)
- Focus: Maintains a link to the core topic but also extends into at least one other domain/area beyond the primary focus.
-
Examples: Work referencing MoE centered on reinforcement learning.
-
Relevance 3-4 (Irrelevant)
- Focus: Largely outside our interests with no association to our topics.
-
Examples: Application-focused papers like using MoE to solve a problem in the real world.
-
Relevance 1-2 (Ignore)
- Focus: Purely unrelated to our topics. Completely a different domain.
- Exception: If the paper hints at a cutting-edge, radically new direction that could eventually transform the primary domain, consider a score of 9–10 despite initial appearances. (Usually a very rare concept that belongs to the fundamental research)
Novelty Scoring
- Novelty 9-10 (Breakthrough)
- Definition: Groundbreaking methods/theory introducing new directions or solving major challenges.
-
Examples: Entirely new paradigm for foundational models; a novel theory transforming representation learning.
-
Novelty 7-8 (Improvements)
- Definition: Substantial insights/enhancements, though not a full paradigm shift.
-
Examples: Modifications on existing methods yielding significantly better results.
-
Novelty 5-6 (Borderline)
- Definition: Incremental contributions with possible long-term benefits, not immediately transformative.
-
Examples: Moderately novel extension to an existing architecture; refining current methods without fundamentally altering them.
-
Novelty 3-4 (Tangential)
- Definition: Minor or domain-specific improvements with limited broader impact.
-
Examples: Slight modifications to known methods with strange motivation; purely engineering jobs like a new benchmark/dataset.
-
Novelty 1-2 (Low)
- Definition: Minimal originality, applying standard approaches without real innovation.
- Examples: Using an off-the-shelf model without adding new insights; purely application-driven studies like finetuning a pretrained model using existing methods.
Papers
[PAPER LIST HERE]
Relevant Topics
Use the following relevance criteria to focus on foundational research. Keep relevant papers and filter out irrelevant ones. Avoid purely application-driven work.
-
Representation Learning - Relevant: Insights into how deep networks encode information, feature/dictionary learning, sparse/contrastive methods, training dynamics in neural networks. - Irrelevant: Standard applications of known techniques lacking new theoretical or methodological contributions.
-
Model Architecture - Relevant: Mixture-of-Experts (MoE), Transformers, Conditional/Dynamic Networks, Autoencoders, analysis on existing architectures (like encoder-decoder), or other architectural innovations. - Irrelevant: Merely using existing architectures for a certain task without insights into the structure themselves.
-
Model Compression - Relevant: Sparsity, pruning, quantization, low-rank approaches, KV cache, or other algorithmic/theoretical efficiency breakthroughs. - Irrelevant: Straightforward applications of existing compression methods to new tasks.
-
Large Language Models (LLMs) - Relevant: Major breakthroughs in pretraining or architecture, theoretical insights into LLM behavior/interpretability. - Irrelevant: Domain-specific usage (e.g., translation, jail-breaking), finetuning or inference tricks (e.g., instruction tuning, chain-of-thoughts, data mixing), or empirical dataset/benchmark studies and text-level analysis (e.g. hallucination, reasoning, safety).
-
AI for Science - Relevant: Foundational research in molecular/protein modeling, new generative paradigms, or significant architecture-level innovations. - Irrelevant: Conventional, domain-specific applications without new theoretical perspectives.
-
Emerging Trends - Relevant: Cutting-edge theoretical work challenging established assumptions or introducing broad new paradigms. - Irrelevant: Incremental improvements or trend-following without novel insights.
Keywords:
- Relevant: Mixture of Experts (MoE), Representation Learning, Compression/Efficiency, Sparse/Sparsity, Pruning, Quantization, Low-rank, Foundation Model, etc.
- Irrelevant: Reinforcement Learning, Transfer Learning, Federated Learning, Online Learning, Diffusion Models, etc.
- Application: Image Segmentation, Medical Imaging, 3D Vision, Video Understanding, Information Retrieval, Summarization, Recommendation Systems, Machine Translation, Speech Recognition, Signal Processing, Spatial/Temporal Modeling, Time Series, Knowledge Graph, etc.