Personalized Daily Arxiv Papers 3/27/2025
| [gpt-4o] | Prompt | Completion | Total |
|---|---|---|---|
| Token | 26335 | 3005 | 29340 |
| Cost | $0.07 | $0.03 | $0.1 |
Total arXiv papers: 337
Total scanned papers: 192
Total relevant papers: 13
Table of contents with paper titles:
-
Extendable Long-Horizon Planning via Hierarchical Multiscale Diffusion Authors: Chang Chen, Hany Hamed, Doojin Baek, Taegu Kang, Yoshua Bengio, Sungjin Ahn
-
Assessing SAM for Tree Crown Instance Segmentation from Drone Imagery Authors: M\'elisande Teng, Arthur Ouaknine, Etienne Lalibert\'e, Yoshua Bengio, David Rolnick, Hugo Larochelle
-
A scalable gene network model of regulatory dynamics in single cells Authors: Paul Bertin, Joseph D. Viviano, Alejandro Tejada-Lapuerta, Weixu Wang, Stefan Bauer, Fabian J. Theis, Yoshua Bengio
-
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation Authors: Han Chen, Zicong Jiang, Zining Zhang, Bingsheng He, Pingyi Luo, Mian Lu, Yuqiang Chen
-
Fundamental Limits of Perfect Concept Erasure Authors: Somnath Basu Roy Chowdhury, Avinava Dubey, Ahmad Beirami, Rahul Kidambi, Nicholas Monath, Amr Ahmed, Snigdha Chaturvedi
-
A Theoretical Framework for Prompt Engineering: Approximating Smooth Functions with Transformer Prompts Authors: Ryumei Nakada, Wenlong Ji, Tianxi Cai, James Zou, Linjun Zhang
-
TeleLoRA: Teleporting Model-Specific Alignment Across LLMs Authors: Xiao Lin, Manoj Acharya, Anirban Roy, Susmit Jha
-
ASGO: Adaptive Structured Gradient Optimization Authors: Kang An, Yuxing Liu, Rui Pan, Shiqian Ma, Donald Goldfarb, Tong Zhang
-
Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning Authors: Sashuai Zhou, Hai Huang, Yan Xia
-
Including local feature interactions in deep non-negative matrix factorization networks improves performance Authors: Mahbod Nouri, David Rotermund, Alberto Garcia-Ortiz, Klaus R. Pawelzik
-
Network Inversion for Generating Confidently Classified Counterfeits Authors: Pirzada Suhail, Amit Sethi
-
TraNCE: Transformative Non-linear Concept Explainer for CNNs Authors: Ugochukwu Ejike Akpudo, Yongsheng Gao, Jun Zhou, Andrew Lewis
-
Faster Parameter-Efficient Tuning with Token Redundancy Reduction Authors: Kwonyoung Kim, Jungin Park, Jin Kim, Hyeongjun Kwon, Kwanghoon Sohn
1. Extendable Long-Horizon Planning via Hierarchical Multiscale Diffusion
ArXiv ID: 2503.20102
Authors: Chang Chen, Hany Hamed, Doojin Baek, Taegu Kang, Yoshua Bengio, Sungjin Ahn
Abstract: This paper tackles a novel problem, extendable long-horizon planning-enabling agents to plan trajectories longer than those in training data without compounding errors. To tackle this, we propose the Hierarchical Multiscale Diffuser (HM-Diffuser) and Progressive Trajectory Extension (PTE), an augmentation method that iteratively generates longer trajectories by stitching shorter ones. HM-Diffuser trains on these extended trajectories using a hierarchical structure, efficiently handling tasks across multiple temporal scales. Additionally, we introduce Adaptive Plan Pondering and the Recursive HM-Diffuser, which consolidate hierarchical layers into a single model to process temporal scales recursively. Experimental results demonstrate the effectiveness of our approach, advancing diffusion-based planners for scalable long-horizon planning.
Comment: Author match
2. Assessing SAM for Tree Crown Instance Segmentation from Drone Imagery
ArXiv ID: 2503.20199
Authors: M\'elisande Teng, Arthur Ouaknine, Etienne Lalibert\'e, Yoshua Bengio, David Rolnick, Hugo Larochelle
Abstract: The potential of tree planting as a natural climate solution is often undermined by inadequate monitoring of tree planting projects. Current monitoring methods involve measuring trees by hand for each species, requiring extensive cost, time, and labour. Advances in drone remote sensing and computer vision offer great potential for mapping and characterizing trees from aerial imagery, and large pre-trained vision models, such as the Segment Anything Model (SAM), may be a particularly compelling choice given limited labeled data. In this work, we compare SAM methods for the task of automatic tree crown instance segmentation in high resolution drone imagery of young tree plantations. We explore the potential of SAM for this task, and find that methods using SAM out-of-the-box do not outperform a custom Mask R-CNN, even with well-designed prompts, but that there is potential for methods which tune SAM further. We also show that predictions can be improved by adding Digital Surface Model (DSM) information as an input.
Comment: Author match
3. A scalable gene network model of regulatory dynamics in single cells
ArXiv ID: 2503.20027
Authors: Paul Bertin, Joseph D. Viviano, Alejandro Tejada-Lapuerta, Weixu Wang, Stefan Bauer, Fabian J. Theis, Yoshua Bengio
Abstract: Single-cell data provide high-dimensional measurements of the transcriptional states of cells, but extracting insights into the regulatory functions of genes, particularly identifying transcriptional mechanisms affected by biological perturbations, remains a challenge. Many perturbations induce compensatory cellular responses, making it difficult to distinguish direct from indirect effects on gene regulation. Modeling how gene regulatory functions shape the temporal dynamics of these responses is key to improving our understanding of biological perturbations. Dynamical models based on differential equations offer a principled way to capture transcriptional dynamics, but their application to single-cell data has been hindered by computational constraints, stochasticity, sparsity, and noise. Existing methods either rely on low-dimensional representations or make strong simplifying assumptions, limiting their ability to model transcriptional dynamics at scale. We introduce a Functional and Learnable model of Cell dynamicS, FLeCS, that incorporates gene network structure into coupled differential equations to model gene regulatory functions. Given (pseudo)time-series single-cell data, FLeCS accurately infers cell dynamics at scale, provides improved functional insights into transcriptional mechanisms perturbed by gene knockouts, both in myeloid differentiation and K562 Perturb-seq experiments, and simulates single-cell trajectories of A549 cells following small-molecule perturbations.
Comment: Author match
4. LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation
ArXiv ID: 2503.19950
Authors: Han Chen, Zicong Jiang, Zining Zhang, Bingsheng He, Pingyi Luo, Mian Lu, Yuqiang Chen
Abstract: We introduce LogQuant, a groundbreaking 2-bit quantization technique for KV Cache in large language model (LLM) inference, delivering substantial memory savings while preserving superior performance. Previous methods either assume that later tokens are more important or attempt to predict important tokens based on earlier attention patterns. Both approaches, however, can result in performance bottlenecks or frequent mispredictions. LogQuant takes a different approach. By applying a log-based filtering mechanism, it selectively compresses the KV Cache across the entire context, achieving better performance with the same or even reduced memory footprint compared to existing methods. In benchmark tests, it enhances throughput by 25% and boosts batch size by 60% without increasing memory consumption. For challenging tasks such as Math and Code Completion, LogQuant improves accuracy by 40% to 200% at the same compression ratio, outperforming comparable techniques.LogQuant integrates effortlessly with popular inference frameworks like Python's transformers library. Implementation can be available in https://github.com/Concyclics/LogQuantKV.
Comment: LogQuant introduces a novel 2-bit quantization technique for KV Cache in LLM inference, addressing memory efficiency and accuracy preservation. This aligns closely with model compression and efficiency breakthroughs, particularly in LLMs.
Relevance: 10 Novelty: 8
5. Fundamental Limits of Perfect Concept Erasure
ArXiv ID: 2503.20098
Authors: Somnath Basu Roy Chowdhury, Avinava Dubey, Ahmad Beirami, Rahul Kidambi, Nicholas Monath, Amr Ahmed, Snigdha Chaturvedi
Abstract: Concept erasure is the task of erasing information about a concept (e.g., gender or race) from a representation set while retaining the maximum possible utility -- information from original representations. Concept erasure is useful in several applications, such as removing sensitive concepts to achieve fairness and interpreting the impact of specific concepts on a model's performance. Previous concept erasure techniques have prioritized robustly erasing concepts over retaining the utility of the resultant representations. However, there seems to be an inherent tradeoff between erasure and retaining utility, making it unclear how to achieve perfect concept erasure while maintaining high utility. In this paper, we offer a fresh perspective toward solving this problem by quantifying the fundamental limits of concept erasure through an information-theoretic lens. Using these results, we investigate constraints on the data distribution and the erasure functions required to achieve the limits of perfect concept erasure. Empirically, we show that the derived erasure functions achieve the optimal theoretical bounds. Additionally, we show that our approach outperforms existing methods on a range of synthetic and real-world datasets using GPT-4 representations.
Comment: This paper provides an information-theoretic perspective on concept erasure, which is highly relevant to representation learning. The focus on fundamental limits and theoretical bounds adds significant novelty.
Relevance: 9 Novelty: 8
6. A Theoretical Framework for Prompt Engineering: Approximating Smooth Functions with Transformer Prompts
ArXiv ID: 2503.20561
Authors: Ryumei Nakada, Wenlong Ji, Tianxi Cai, James Zou, Linjun Zhang
Abstract: Prompt engineering has emerged as a powerful technique for guiding large language models (LLMs) toward desired responses, significantly enhancing their performance across diverse tasks. Beyond their role as static predictors, LLMs increasingly function as intelligent agents, capable of reasoning, decision-making, and adapting dynamically to complex environments. However, the theoretical underpinnings of prompt engineering remain largely unexplored. In this paper, we introduce a formal framework demonstrating that transformer models, when provided with carefully designed prompts, can act as a configurable computational system by emulating a ``virtual'' neural network during inference. Specifically, input prompts effectively translate into the corresponding network configuration, enabling LLMs to adjust their internal computations dynamically. Building on this construction, we establish an approximation theory for $\beta$-times differentiable functions, proving that transformers can approximate such functions with arbitrary precision when guided by appropriately structured prompts. Moreover, our framework provides theoretical justification for several empirically successful prompt engineering techniques, including the use of longer, structured prompts, filtering irrelevant information, enhancing prompt token diversity, and leveraging multi-agent interactions. By framing LLMs as adaptable agents rather than static models, our findings underscore their potential for autonomous reasoning and problem-solving, paving the way for more robust and theoretically grounded advancements in prompt engineering and AI agent design.
Comment: This paper provides a theoretical framework for prompt engineering, demonstrating how transformer prompts can approximate smooth functions and act as configurable computational systems. It aligns closely with foundational research in LLMs and offers theoretical insights into their behavior and adaptability.
Relevance: 9 Novelty: 8
7. TeleLoRA: Teleporting Model-Specific Alignment Across LLMs
ArXiv ID: 2503.20228
Authors: Xiao Lin, Manoj Acharya, Anirban Roy, Susmit Jha
Abstract: Mitigating Trojans in Large Language Models (LLMs) is one of many tasks where alignment data is LLM specific, as different LLMs have different Trojan triggers and trigger behaviors to be removed. In this paper, we introduce TeleLoRA (Teleporting Low-Rank Adaptation), a novel framework that synergizes model-specific alignment data across multiple LLMs to enable zero-shot Trojan mitigation on unseen LLMs without alignment data. TeleLoRA learns a unified generator of LoRA adapter weights by leveraging local activation information across multiple LLMs. This generator is designed to be permutation symmetric to generalize across models with different architectures and sizes. We optimize the model design for memory efficiency, making it feasible to learn with large-scale LLMs with minimal computational resources. Experiments on LLM Trojan mitigation benchmarks demonstrate that TeleLoRA effectively reduces attack success rates while preserving the benign performance of the models.
Comment: TeleLoRA introduces a novel framework for low-rank adaptation across LLMs, which aligns with model compression and efficiency topics. The permutation-symmetric generator and memory-efficient design are innovative contributions.
Relevance: 9 Novelty: 8
8. ASGO: Adaptive Structured Gradient Optimization
ArXiv ID: 2503.20762
Authors: Kang An, Yuxing Liu, Rui Pan, Shiqian Ma, Donald Goldfarb, Tong Zhang
Abstract: Training deep neural networks (DNNs) is a structured optimization problem, because the parameters are naturally represented by matrices and tensors rather than simple vectors. Under this structural representation, it has been widely observed that gradients are low-rank and Hessians are approximately block-wise diagonal. These structured properties are crucial for designing efficient optimization algorithms but may not be utilized by current popular optimizers like Adam. In this paper, we present a novel optimization algorithm ASGO that capitalizes on these properties by employing a preconditioner that is adaptively updated using structured gradients. By fine-grained theoretical analysis, ASGO is proven to achieve superior convergence rates compared to existing structured gradient methods. Based on the convergence theory, we further demonstrate that ASGO can benefit from the low-rank and block-wise diagonal properties. We also discuss practical modifications of ASGO and empirically verify the effectiveness of the algorithm on language model tasks.
Comment: The paper introduces ASGO, a novel optimization algorithm leveraging structured gradients and low-rank properties, which aligns with model compression and efficiency breakthroughs. The theoretical analysis and practical modifications add to its novelty.
Relevance: 9 Novelty: 8
9. Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning
ArXiv ID: 2503.20633
Authors: Sashuai Zhou, Hai Huang, Yan Xia
Abstract: Multi-modal models excel in cross-modal tasks but are computationally expensive due to their billions of parameters. Parameter-efficient fine-tuning (PEFT) offers a solution by adding small trainable components while freezing pre-trained parameters. However, existing methods primarily focus on uni-modal processing, overlooking the critical modal fusion needed for multi-modal tasks. To fill this gap, we propose heterogeneous mixture of experts adapters that extend the traditional PEFT framework to support multi-modal expert combinations and improve information interaction. Additionally, our approach modifies the affine linear expert design to enable efficient modal fusion in a low-rank space, achieving competitive performance with only 5-8\% of the parameters fine-tuned. Experiments across eight downstream tasks, including visual-audio and text-visual, demonstrate the superior performance of the approach.
Comment: The paper proposes heterogeneous MoE adapters for multi-modal fine-tuning, which aligns with the topic of Mixture-of-Experts and architectural innovations. The focus on low-rank space for efficient modal fusion adds to its relevance.
Relevance: 9 Novelty: 7
10. Including local feature interactions in deep non-negative matrix factorization networks improves performance
ArXiv ID: 2503.20398
Authors: Mahbod Nouri, David Rotermund, Alberto Garcia-Ortiz, Klaus R. Pawelzik
Abstract: The brain uses positive signals as a means of signaling. Forward interactions in the early visual cortex are also positive, realized by excitatory synapses. Only local interactions also include inhibition. Non-negative matrix factorization (NMF) captures the biological constraint of positive long-range interactions and can be implemented with stochastic spikes. While NMF can serve as an abstract formalization of early neural processing in the visual system, the performance of deep convolutional networks with NMF modules does not match that of CNNs of similar size. However, when the local NMF modules are each followed by a module that mixes the NMF's positive activities, the performances on the benchmark data exceed that of vanilla deep convolutional networks of similar size. This setting can be considered a biologically more plausible emulation of the processing in cortical (hyper-)columns with the potential to improve the performance of deep networks.
Comment: The paper explores the integration of local feature interactions in deep non-negative matrix factorization (NMF) networks, which aligns with representation learning and architectural insights. The focus on improving performance through biologically plausible mechanisms adds a novel perspective.
Relevance: 8 Novelty: 7
11. Network Inversion for Generating Confidently Classified Counterfeits
ArXiv ID: 2503.20187
Authors: Pirzada Suhail, Amit Sethi
Abstract: In machine learning, especially with vision classifiers, generating inputs that are confidently classified by the model is essential for understanding its decision boundaries and behavior. However, creating such samples that are confidently classified yet distinct from the training data distribution is a challenge. Traditional methods often modify existing inputs, but they don't always ensure confident classification. In this work, we extend network inversion techniques to generate Confidently Classified Counterfeits-synthetic samples that are confidently classified by the model despite being significantly different from the training data. We achieve this by modifying the generator's conditioning mechanism from soft vector conditioning to one-hot vector conditioning and applying Kullback-Leibler divergence (KLD) between the one-hot vectors and the classifier's output distribution. This encourages the generator to produce samples that are both plausible and confidently classified. Generating Confidently Classified Counterfeits is crucial for ensuring the safety and reliability of machine learning systems, particularly in safety-critical applications where models must exhibit confidence only on data within the training distribution. By generating such counterfeits, we challenge the assumption that high-confidence predictions are always indicative of in-distribution data, providing deeper insights into the model's limitations and decision-making process.
Comment: The paper explores network inversion techniques to generate confidently classified counterfeits, which provides insights into model behavior and decision boundaries. This aligns with representation learning, particularly in understanding how models encode information and their limitations.
Relevance: 8 Novelty: 7
12. TraNCE: Transformative Non-linear Concept Explainer for CNNs
ArXiv ID: 2503.20230
Authors: Ugochukwu Ejike Akpudo, Yongsheng Gao, Jun Zhou, Andrew Lewis
Abstract: Convolutional neural networks (CNNs) have succeeded remarkably in various computer vision tasks. However, they are not intrinsically explainable. While the feature-level understanding of CNNs reveals where the models looked, concept-based explainability methods provide insights into what the models saw. However, their assumption of linear reconstructability of image activations fails to capture the intricate relationships within these activations. Their Fidelity-only approach to evaluating global explanations also presents a new concern. For the first time, we address these limitations with the novel Transformative Nonlinear Concept Explainer (TraNCE) for CNNs. Unlike linear reconstruction assumptions made by existing methods, TraNCE captures the intricate relationships within the activations. This study presents three original contributions to the CNN explainability literature: (i) An automatic concept discovery mechanism based on variational autoencoders (VAEs). This transformative concept discovery process enhances the identification of meaningful concepts from image activations. (ii) A visualization module that leverages the Bessel function to create a smooth transition between prototypical image pixels, revealing not only what the CNN saw but also what the CNN avoided, thereby mitigating the challenges of concept duplication as documented in previous works. (iii) A new metric, the Faith score, integrates both Coherence and Fidelity for a comprehensive evaluation of explainer faithfulness and consistency.
Comment: This paper introduces a novel concept explainer for CNNs using variational autoencoders and a new evaluation metric. It aligns with representation learning and explainability, particularly in understanding how CNNs encode information, making it relevant to foundational research.
Relevance: 8 Novelty: 7
13. Faster Parameter-Efficient Tuning with Token Redundancy Reduction
ArXiv ID: 2503.20282
Authors: Kwonyoung Kim, Jungin Park, Jin Kim, Hyeongjun Kwon, Kwanghoon Sohn
Abstract: Parameter-efficient tuning (PET) aims to transfer pre-trained foundation models to downstream tasks by learning a small number of parameters. Compared to traditional fine-tuning, which updates the entire model, PET significantly reduces storage and transfer costs for each task regardless of exponentially increasing pre-trained model capacity. However, most PET methods inherit the inference latency of their large backbone models and often introduce additional computational overhead due to additional modules (e.g. adapters), limiting their practicality for compute-intensive applications. In this paper, we propose Faster Parameter-Efficient Tuning (FPET), a novel approach that enhances inference speed and training efficiency while maintaining high storage efficiency. Specifically, we introduce a plug-and-play token redundancy reduction module delicately designed for PET. This module refines tokens from the self-attention layer using an adapter to learn the accurate similarity between tokens and cuts off the tokens through a fully-differentiable token merging strategy, which uses a straight-through estimator for optimal token reduction. Experimental results prove that our FPET achieves faster inference and higher memory efficiency than the pre-trained backbone while keeping competitive performance on par with state-of-the-art PET methods.
Comment: This paper introduces a token redundancy reduction module for parameter-efficient tuning, which aligns with model compression and efficiency improvements. The focus on reducing inference latency and memory usage is a notable contribution.
Relevance: 8 Novelty: 7
Paper Selection Prompt
You are a helpful paper reading assistant whose job is to read daily posts from ArXiv and identify a few papers that your friend will enjoy reading. Your job is to carefully read the paper titles and abstracts below and find the ones that match the criteria below.
Instructions
Write the response in JSONL format with {ARXIVID, COMMENT, RELEVANCE, NOVELTY} on each line, one for each paper.
- ARXIVID: should be the ArXiv ID.
- COMMENT: should identify whether there is a criteria that match the paper very closely. These matches should not be based on general terms like "language modeling" or "advancements" and should specifically refer to a criterion. No need to mention the non-matching criteria.
- RELEVANCE: should be a score from 1-10.
- NOVELTY: should be a score from 1-10.
Scoring Criteria
The "Relevance" score measures how closely the paper aligns with the core topics of the prompt. The "Novelty" score assesses the originality and impact of the paper. They are two ORTHONORMAL axes and SHOULD NOT be confused with each other.
Relevance Scoring
- Relevance 9-10 (Completely Relevant)
- Focus: Fully aligned with core topics with no deviation, score the highest if contains relevant keywords in it.
-
Examples: Papers focused on foundational methods or theoretical research, whose titles contain topic keywords like "MoE".
-
Relevance 7-8 (Relevant)
- Focus: Retain a solid link to the main research area, though may touch on peripheral elements.
-
Examples: Papers research on the fundamental part of MoE through a less critical aspect like its behavior in GNN.
-
Relevance 5-6 (Borderline)
- Focus: Maintains a link to the core topic but also extends into at least one other domain/area beyond the primary focus.
-
Examples: Work referencing MoE centered on reinforcement learning.
-
Relevance 3-4 (Irrelevant)
- Focus: Largely outside our interests with no association to our topics.
-
Examples: Application-focused papers like using MoE to solve a problem in the real world.
-
Relevance 1-2 (Ignore)
- Focus: Purely unrelated to our topics. Completely a different domain.
- Exception: If the paper hints at a cutting-edge, radically new direction that could eventually transform the primary domain, consider a score of 9–10 despite initial appearances. (Usually a very rare concept that belongs to the fundamental research)
Novelty Scoring
- Novelty 9-10 (Breakthrough)
- Definition: Groundbreaking methods/theory introducing new directions or solving major challenges.
-
Examples: Entirely new paradigm for foundational models; a novel theory transforming representation learning.
-
Novelty 7-8 (Improvements)
- Definition: Substantial insights/enhancements, though not a full paradigm shift.
-
Examples: Modifications on existing methods yielding significantly better results.
-
Novelty 5-6 (Borderline)
- Definition: Incremental contributions with possible long-term benefits, not immediately transformative.
-
Examples: Moderately novel extension to an existing architecture; refining current methods without fundamentally altering them.
-
Novelty 3-4 (Tangential)
- Definition: Minor or domain-specific improvements with limited broader impact.
-
Examples: Slight modifications to known methods with strange motivation; purely engineering jobs like a new benchmark/dataset.
-
Novelty 1-2 (Low)
- Definition: Minimal originality, applying standard approaches without real innovation.
- Examples: Using an off-the-shelf model without adding new insights; purely application-driven studies like finetuning a pretrained model using existing methods.
Papers
[PAPER LIST HERE]
Relevant Topics
Use the following relevance criteria to focus on foundational research. Keep relevant papers and filter out irrelevant ones. Avoid purely application-driven work.
-
Representation Learning - Relevant: Insights into how deep networks encode information, feature/dictionary learning, sparse/contrastive methods, training dynamics in neural networks. - Irrelevant: Standard applications of known techniques lacking new theoretical or methodological contributions.
-
Model Architecture - Relevant: Mixture-of-Experts (MoE), Transformers, Conditional/Dynamic Networks, Autoencoders, analysis on existing architectures (like encoder-decoder), or other architectural innovations. - Irrelevant: Merely using existing architectures for a certain task without insights into the structure themselves.
-
Model Compression - Relevant: Sparsity, pruning, quantization, low-rank approaches, KV cache, or other algorithmic/theoretical efficiency breakthroughs. - Irrelevant: Straightforward applications of existing compression methods to new tasks.
-
Large Language Models (LLMs) - Relevant: Major breakthroughs in pretraining or architecture, theoretical insights into LLM behavior/interpretability. - Irrelevant: Domain-specific usage (e.g., translation, jail-breaking), finetuning or inference tricks (e.g., instruction tuning, chain-of-thoughts, data mixing), or empirical dataset/benchmark studies and text-level analysis (e.g. hallucination, reasoning, safety).
-
AI for Science - Relevant: Foundational research in molecular/protein modeling, new generative paradigms, or significant architecture-level innovations. - Irrelevant: Conventional, domain-specific applications without new theoretical perspectives.
-
Emerging Trends - Relevant: Cutting-edge theoretical work challenging established assumptions or introducing broad new paradigms. - Irrelevant: Incremental improvements or trend-following without novel insights.
Keywords:
- Relevant: Mixture of Experts (MoE), Representation Learning, Compression/Efficiency, Sparse/Sparsity, Pruning, Quantization, Low-rank, Foundation Model, etc.
- Irrelevant: Reinforcement Learning, Transfer Learning, Federated Learning, Online Learning, Diffusion Models, etc.
- Application: Image Segmentation, Medical Imaging, 3D Vision, Video Understanding, Information Retrieval, Summarization, Recommendation Systems, Machine Translation, Speech Recognition, Signal Processing, Spatial/Temporal Modeling, Time Series, Knowledge Graph, etc.