Personalized Daily ArXiv Papers 2025-05-09
| [gpt-4o] | Prompt | Completion | Total |
|---|---|---|---|
| Token | 31693 | 4433 | 36126 |
| Cost | $0.08 | $0.04 | $0.12 |
Total arXiv papers: 409
Total scanned papers: 235
Total relevant papers: 17
Table of contents with paper titles:
-
Confabulation dynamics in a reservoir computer: Filling in the gaps with untrained attractors Authors: Jack O'Hagan, Andrew Keane, Andrew Flynn
-
Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry Authors: Mohammed Adnan, Rohan Jain, Ekansh Sharma, Rahul Krishnan, Yani Ioannou
-
Understanding In-context Learning of Addition via Activation Subspaces Authors: Xinyan Hu, Kayo Yin, Michael I. Jordan, Jacob Steinhardt, Lijie Chen
-
DPQ-HD: Post-Training Compression for Ultra-Low Power Hyperdimensional Computing Authors: Nilesh Prasad Pandey, Shriniwas Kulkarni, David Wang, Onat Gungor, Flavio Ponzina, Tajana Rosing
-
Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning Authors: Lianbo Ma, Jianlun Ma, Yuee Zhou, Guoyang Xie, Qiang He, Zhichao Lu
-
Stochastic Variational Propagation: Local, Scalable and Efficient Alternative to Backpropagation Authors: Bojian Yin, Federico Corradi
-
When Bad Data Leads to Good Models Authors: Kenneth Li, Yida Chen, Fernanda Vi\'egas, Martin Wattenberg
-
Rethinking Invariance in In-context Learning Authors: Lizhe Fang, Yifei Wang, Khashayar Gatmiry, Lei Fang, Yisen Wang
-
Chain-of-Thought Tokens are Computer Program Variables Authors: Fangwei Zhu, Peiyi Wang, Zhifang Sui
-
Advancing Neural Network Verification through Hierarchical Safety Abstract Interpretation Authors: Luca Marzari, Isabella Mastroeni, Alessandro Farinelli
-
Clustering with Communication: A Variational Framework for Single Cell Representation Learning Authors: Cong Qi, Yeqing Chen, Jie Zhang, Wei Zhi
-
Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning Authors: Le-Trung Nguyen, Ael Quelennec, Van-Tam Nguyen, Enzo Tartaglione
-
SetONet: A Deep Set-based Operator Network for Solving PDEs with permutation invariant variable input sampling Authors: Stepan Tretiakov, Xingjian Li, Krishna Kumar
-
OpenworldAUC: Towards Unified Evaluation and Optimization for Open-world Prompt Tuning Authors: Cong Hua, Qianqian Xu, Zhiyong Yang, Zitai Wang, Shilong Bao, Qingming Huang
-
ComPO: Preference Alignment via Comparison Oracles Authors: Peter Chen, Xi Chen, Wotao Yin, Tianyi Lin
-
Learning dynamically inspired invariant subspaces for Koopman and transfer operator approximation Authors: Gary Froyland, Kevin K\"uhl
-
Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models Authors: Aishwarya Venkataramanan, Paul Bodesheim, Joachim Denzler
1. Confabulation dynamics in a reservoir computer: Filling in the gaps with untrained attractors
ArXiv ID: 2505.04792
Authors: Jack O'Hagan, Andrew Keane, Andrew Flynn
Abstract: Artificial Intelligence has advanced significantly in recent years thanks to innovations in the design and training of artificial neural networks (ANNs). Despite these advancements, we still understand relatively little about how elementary forms of ANNs learn, fail to learn, and generate false information without the intent to deceive, a phenomenon known as confabulation'. To provide some foundational insight, in this paper we analyse how confabulation occurs in reservoir computers (RCs): a dynamical system in the form of an ANN. RCs are particularly useful to study as they are known to confabulate in a well-defined way: when RCs are trained to reconstruct the dynamics of a given attractor, they sometimes construct an attractor that they were not trained to construct, a so-calleduntrained attractor' (UA). This paper sheds light on the role played by UAs when reconstruction fails and their influence when modelling transitions between reconstructed attractors. Based on our results, we conclude that UAs are an intrinsic feature of learning systems whose state spaces are bounded, and that this means of confabulation may be present in systems beyond RCs.
Comment: The paper provides foundational insights into confabulation dynamics in reservoir computers, analyzing untrained attractors. This aligns with emerging trends and representation learning, offering theoretical contributions to understanding learning systems.
Relevance: 9 Novelty: 8
2. Sparse Training from Random Initialization: Aligning Lottery Ticket Masks using Weight Symmetry
ArXiv ID: 2505.05143
Authors: Mohammed Adnan, Rohan Jain, Ekansh Sharma, Rahul Krishnan, Yani Ioannou
Abstract: The Lottery Ticket Hypothesis (LTH) suggests there exists a sparse LTH mask and weights that achieve the same generalization performance as the dense model while using significantly fewer parameters. However, finding a LTH solution is computationally expensive, and a LTH sparsity mask does not generalize to other random weight initializations. Recent work has suggested that neural networks trained from random initialization find solutions within the same basin modulo permutation, and proposes a method to align trained models within the same loss basin. We hypothesize that misalignment of basins is the reason why LTH masks do not generalize to new random initializations and propose permuting the LTH mask to align with the new optimization basin when performing sparse training from a different random init. We empirically show a significant increase in generalization when sparse training from random initialization with the permuted mask as compared to using the non-permuted LTH mask, on multiple datasets (CIFAR-10, CIFAR-100 and ImageNet) and models (VGG11, ResNet20 and ResNet50).
Comment: This paper addresses sparse training and the Lottery Ticket Hypothesis, which is highly relevant to model compression and sparsity. The proposed method of aligning masks using weight symmetry is novel and provides significant insights.
Relevance: 9 Novelty: 8
3. Understanding In-context Learning of Addition via Activation Subspaces
ArXiv ID: 2505.05145
Authors: Xinyan Hu, Kayo Yin, Michael I. Jordan, Jacob Steinhardt, Lijie Chen
Abstract: To perform in-context learning, language models must extract signals from individual few-shot examples, aggregate these into a learned prediction rule, and then apply this rule to new examples. How is this implemented in the forward pass of modern transformer models? To study this, we consider a structured family of few-shot learning tasks for which the true prediction rule is to add an integer $k$ to the input. We find that Llama-3-8B attains high accuracy on this task for a range of $k$, and localize its few-shot ability to just three attention heads via a novel optimization approach. We further show the extracted signals lie in a six-dimensional subspace, where four of the dimensions track the unit digit and the other two dimensions track overall magnitude. We finally examine how these heads extract information from individual few-shot examples, identifying a self-correction mechanism in which mistakes from earlier examples are suppressed by later examples. Our results demonstrate how tracking low-dimensional subspaces across a forward pass can provide insight into fine-grained computational structures.
Comment: This paper provides insights into in-context learning in transformers, focusing on activation subspaces and computational structures. It aligns well with representation learning and theoretical analysis of LLMs, offering novel insights into model behavior.
Relevance: 9 Novelty: 8
4. DPQ-HD: Post-Training Compression for Ultra-Low Power Hyperdimensional Computing
ArXiv ID: 2505.05413
Authors: Nilesh Prasad Pandey, Shriniwas Kulkarni, David Wang, Onat Gungor, Flavio Ponzina, Tajana Rosing
Abstract: Hyperdimensional Computing (HDC) is emerging as a promising approach for edge AI, offering a balance between accuracy and efficiency. However, current HDC-based applications often rely on high-precision models and/or encoding matrices to achieve competitive performance, which imposes significant computational and memory demands, especially for ultra-low power devices. While recent efforts use techniques like precision reduction and pruning to increase the efficiency, most require retraining to maintain performance, making them expensive and impractical. To address this issue, we propose a novel Post Training Compression algorithm, Decomposition-Pruning-Quantization (DPQ-HD), which aims at compressing the end-to-end HDC system, achieving near floating point performance without the need of retraining. DPQ-HD reduces computational and memory overhead by uniquely combining the above three compression techniques and efficiently adapts to hardware constraints. Additionally, we introduce an energy-efficient inference approach that progressively evaluates similarity scores such as cosine similarity and performs early exit to reduce the computation, accelerating prediction inference while maintaining accuracy. We demonstrate that DPQ-HD achieves up to 20-100x reduction in memory for image and graph classification tasks with only a 1-2% drop in accuracy compared to uncompressed workloads. Lastly, we show that DPQ-HD outperforms the existing post-training compression methods and performs better or at par with retraining-based state-of-the-art techniques, requiring significantly less overall optimization time (up to 100x) and faster inference (up to 56x) on a microcontroller
Comment: The paper introduces a post-training compression algorithm for hyperdimensional computing, combining decomposition, pruning, and quantization. This aligns well with foundational research in model compression and efficiency.
Relevance: 9 Novelty: 8
5. Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning
ArXiv ID: 2505.04877
Authors: Lianbo Ma, Jianlun Ma, Yuee Zhou, Guoyang Xie, Qiang He, Zhichao Lu
Abstract: Mixed Precision Quantization (MPQ) has become an essential technique for optimizing neural network by determining the optimal bitwidth per layer. Existing MPQ methods, however, face a major hurdle: they require a computationally expensive search for quantization policies on large-scale datasets. To resolve this issue, we introduce a novel approach that first searches for quantization policies on small datasets and then generalizes them to large-scale datasets. This approach simplifies the process, eliminating the need for large-scale quantization fine-tuning and only necessitating model weight adjustment. Our method is characterized by three key techniques: sharpness-aware minimization for enhanced quantization generalization, implicit gradient direction alignment to handle gradient conflicts among different optimization objectives, and an adaptive perturbation radius to accelerate optimization. Both theoretical analysis and experimental results validate our approach. Using the CIFAR10 dataset (just 0.5\% the size of ImageNet training data) for MPQ policy search, we achieved equivalent accuracy on ImageNet with a significantly lower computational cost, while improving efficiency by up to 150% over the baselines.
Comment: The paper addresses mixed-precision quantization with novel techniques like sharpness-aware minimization and adaptive gradient alignment, which are directly relevant to model compression and efficiency.
Relevance: 9 Novelty: 8
6. Stochastic Variational Propagation: Local, Scalable and Efficient Alternative to Backpropagation
ArXiv ID: 2505.05181
Authors: Bojian Yin, Federico Corradi
Abstract: Backpropagation (BP) is the cornerstone of deep learning, but its reliance on global gradient synchronization limits scalability and imposes significant memory overhead. We propose Stochastic Variational Propagation (SVP), a scalable alternative that reframes training as hierarchical variational inference. SVP treats layer activations as latent variables and optimizes local Evidence Lower Bounds (ELBOs), enabling independent, local updates while preserving global coherence. However, directly applying KL divergence in layer-wise ELBOs risks inter-layer's representation collapse due to excessive compression. To prevent this, SVP projects activations into low-dimensional spaces via fixed random matrices, ensuring information preservation and representational diversity. Combined with a feature alignment loss for inter-layer consistency, SVP achieves competitive accuracy with BP across diverse architectures (MLPs, CNNs, Transformers) and datasets (MNIST to ImageNet), reduces memory usage by up to 4x, and significantly improves scalability. More broadly, SVP introduces a probabilistic perspective to deep representation learning, opening pathways toward more modular and interpretable neural network design.
Comment: The paper proposes Stochastic Variational Propagation (SVP) as an alternative to backpropagation, introducing a probabilistic perspective to representation learning and scalability. This aligns well with foundational research in training dynamics.
Relevance: 9 Novelty: 8
7. When Bad Data Leads to Good Models
ArXiv ID: 2505.04741
Authors: Kenneth Li, Yida Chen, Fernanda Vi\'egas, Martin Wattenberg
Abstract: In large language model (LLM) pretraining, data quality is believed to determine model quality. In this paper, we re-examine the notion of "quality" from the perspective of pre- and post-training co-design. Specifically, we explore the possibility that pre-training on more toxic data can lead to better control in post-training, ultimately decreasing a model's output toxicity. First, we use a toy experiment to study how data composition affects the geometry of features in the representation space. Next, through controlled experiments with Olmo-1B models trained on varying ratios of clean and toxic data, we find that the concept of toxicity enjoys a less entangled linear representation as the proportion of toxic data increases. Furthermore, we show that although toxic data increases the generational toxicity of the base model, it also makes the toxicity easier to remove. Evaluations on Toxigen and Real Toxicity Prompts demonstrate that models trained on toxic data achieve a better trade-off between reducing generational toxicity and preserving general capabilities when detoxifying techniques such as inference-time intervention (ITI) are applied. Our findings suggest that, with post-training taken into account, bad data may lead to good models.
Comment: This paper explores the impact of toxic data on LLM pretraining and its implications for representation geometry, which aligns with foundational insights into LLM behavior and representation learning.
Relevance: 9 Novelty: 8
8. Rethinking Invariance in In-context Learning
ArXiv ID: 2505.04994
Authors: Lizhe Fang, Yifei Wang, Khashayar Gatmiry, Lei Fang, Yisen Wang
Abstract: In-Context Learning (ICL) has emerged as a pivotal capability of auto-regressive large language models, yet it is hindered by a notable sensitivity to the ordering of context examples regardless of their mutual independence. To address this issue, recent studies have introduced several variant algorithms of ICL that achieve permutation invariance. However, many of these do not exhibit comparable performance with the standard auto-regressive ICL algorithm. In this work, we identify two crucial elements in the design of an invariant ICL algorithm: information non-leakage and context interdependence, which are not simultaneously achieved by any of the existing methods. These investigations lead us to the proposed Invariant ICL (InvICL), a methodology designed to achieve invariance in ICL while ensuring the two properties. Empirically, our findings reveal that InvICL surpasses previous models, both invariant and non-invariant, in most benchmark datasets, showcasing superior generalization capabilities across varying input lengths. Code is available at https://github.com/PKU-ML/InvICL.
Comment: The paper addresses invariance in in-context learning, a key capability of LLMs, and proposes a novel methodology (InvICL) with theoretical and empirical contributions, making it relevant to foundational LLM research.
Relevance: 9 Novelty: 8
9. Chain-of-Thought Tokens are Computer Program Variables
ArXiv ID: 2505.04955
Authors: Fangwei Zhu, Peiyi Wang, Zhifang Sui
Abstract: Chain-of-thoughts (CoT) requires large language models (LLMs) to generate intermediate steps before reaching the final answer, and has been proven effective to help LLMs solve complex reasoning tasks. However, the inner mechanism of CoT still remains largely unclear. In this paper, we empirically study the role of CoT tokens in LLMs on two compositional tasks: multi-digit multiplication and dynamic programming. While CoT is essential for solving these problems, we find that preserving only tokens that store intermediate results would achieve comparable performance. Furthermore, we observe that storing intermediate results in an alternative latent form will not affect model performance. We also randomly intervene some values in CoT, and notice that subsequent CoT tokens and the final answer would change correspondingly. These findings suggest that CoT tokens may function like variables in computer programs but with potential drawbacks like unintended shortcuts and computational complexity limits between tokens. The code and data are available at https://github.com/solitaryzero/CoTs_are_Variables.
Comment: The paper investigates the role of chain-of-thought tokens in LLMs, providing insights into their function as variables, which is highly relevant to understanding LLM behavior and representation learning.
Relevance: 9 Novelty: 8
10. Advancing Neural Network Verification through Hierarchical Safety Abstract Interpretation
ArXiv ID: 2505.05235
Authors: Luca Marzari, Isabella Mastroeni, Alessandro Farinelli
Abstract: Traditional methods for formal verification (FV) of deep neural networks (DNNs) are constrained by a binary encoding of safety properties, where a model is classified as either safe or unsafe (robust or not robust). This binary encoding fails to capture the nuanced safety levels within a model, often resulting in either overly restrictive or too permissive requirements. In this paper, we introduce a novel problem formulation called Abstract DNN-Verification, which verifies a hierarchical structure of unsafe outputs, providing a more granular analysis of the safety aspect for a given DNN. Crucially, by leveraging abstract interpretation and reasoning about output reachable sets, our approach enables assessing multiple safety levels during the FV process, requiring the same (in the worst case) or even potentially less computational effort than the traditional binary verification approach. Specifically, we demonstrate how this formulation allows rank adversarial inputs according to their abstract safety level violation, offering a more detailed evaluation of the model's safety and robustness. Our contributions include a theoretical exploration of the relationship between our novel abstract safety formulation and existing approaches that employ abstract interpretation for robustness verification, complexity analysis of the novel problem introduced, and an empirical evaluation considering both a complex deep reinforcement learning task (based on Habitat 3.0) and standard DNN-Verification benchmarks.
Comment: The paper introduces a novel hierarchical safety verification framework for DNNs, which aligns with foundational research in model architecture by providing theoretical insights into safety and robustness verification.
Relevance: 8 Novelty: 8
11. Clustering with Communication: A Variational Framework for Single Cell Representation Learning
ArXiv ID: 2505.04891
Authors: Cong Qi, Yeqing Chen, Jie Zhang, Wei Zhi
Abstract: Single-cell RNA sequencing (scRNA-seq) has revealed complex cellular heterogeneity, but recent studies emphasize that understanding biological function also requires modeling cell-cell communication (CCC), the signaling interactions mediated by ligand-receptor pairs that coordinate cellular behavior. Tools like CellChat have demonstrated that CCC plays a critical role in processes such as cell differentiation, tissue regeneration, and immune response, and that transcriptomic data inherently encodes rich information about intercellular signaling. We propose CCCVAE, a novel variational autoencoder framework that incorporates CCC signals into single-cell representation learning. By leveraging a communication-aware kernel derived from ligand-receptor interactions and a sparse Gaussian process, CCCVAE encodes biologically informed priors into the latent space. Unlike conventional VAEs that treat each cell independently, CCCVAE encourages latent embeddings to reflect both transcriptional similarity and intercellular signaling context. Empirical results across four scRNA-seq datasets show that CCCVAE improves clustering performance, achieving higher evaluation scores than standard VAE baselines. This work demonstrates the value of embedding biological priors into deep generative models for unsupervised single-cell analysis.
Comment: The paper introduces a variational autoencoder framework incorporating cell-cell communication signals, which aligns with representation learning. The use of biologically informed priors adds novelty to the generative modeling approach.
Relevance: 8 Novelty: 7
12. Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning
ArXiv ID: 2505.05086
Authors: Le-Trung Nguyen, Ael Quelennec, Van-Tam Nguyen, Enzo Tartaglione
Abstract: On-device learning has emerged as a promising direction for AI development, particularly because of its potential to reduce latency issues and mitigate privacy risks associated with device-server communication, while improving energy efficiency. Despite these advantages, significant memory and computational constraints still represent major challenges for its deployment. Drawing on previous studies on low-rank decomposition methods that address activation memory bottlenecks in backpropagation, we propose a novel shortcut approach as an alternative. Our analysis and experiments demonstrate that our method can reduce activation memory usage, even up to $120.09\times$ compared to vanilla training, while also reducing overall training FLOPs up to $1.86\times$ when evaluated on traditional benchmarks.
Comment: The paper proposes a shortcut approach for efficient on-device learning, which is relevant to model compression and efficiency. The method offers significant memory and computational savings, making it a notable contribution.
Relevance: 8 Novelty: 7
13. SetONet: A Deep Set-based Operator Network for Solving PDEs with permutation invariant variable input sampling
ArXiv ID: 2505.04738
Authors: Stepan Tretiakov, Xingjian Li, Krishna Kumar
Abstract: Neural operators, particularly the Deep Operator Network (DeepONet), have shown promise in learning mappings between function spaces for solving differential equations. However, standard DeepONet requires input functions to be sampled at fixed locations, limiting its applicability in scenarios with variable sensor configurations, missing data, or irregular grids. We introduce the Set Operator Network (SetONet), a novel architecture that integrates Deep Sets principles into the DeepONet framework to address this limitation. The core innovation lies in the SetONet branch network, which processes the input function as an unordered \emph{set} of location-value pairs. This design ensures permutation invariance with respect to the input points, making SetONet inherently robust to variations in the number and locations of sensors. SetONet learns richer, spatially-aware input representations by explicitly processing spatial coordinates and function values. We demonstrate SetONet's effectiveness on several benchmark problems, including derivative/anti-derivative operators, 1D Darcy flow, and 2D elasticity. Results show that SetONet successfully learns operators under variable input sampling conditions where standard DeepONet fails. Furthermore, SetONet is architecturally robust to sensor drop-off; unlike standard DeepONet, which requires methods like interpolation to function with missing data. Notably, SetONet can achieve comparable or improved accuracy over DeepONet on fixed grids, particularly for nonlinear problems, likely due to its enhanced input representation. SetONet provides a flexible and robust extension to the neural operator toolkit, significantly broadening the applicability of operator learning to problems with variable or incomplete input data.
Comment: The paper introduces SetONet, a novel architecture extending DeepONet with permutation invariance, which aligns with the 'Model Architecture' criterion for architectural innovations.
Relevance: 8 Novelty: 7
14. OpenworldAUC: Towards Unified Evaluation and Optimization for Open-world Prompt Tuning
ArXiv ID: 2505.05180
Authors: Cong Hua, Qianqian Xu, Zhiyong Yang, Zitai Wang, Shilong Bao, Qingming Huang
Abstract: Prompt tuning adapts Vision-Language Models like CLIP to open-world tasks with minimal training costs. In this direction, one typical paradigm evaluates model performance separately on known classes (i.e., base domain) and unseen classes (i.e., new domain). However, real-world scenarios require models to handle inputs without prior domain knowledge. This practical challenge has spurred the development of open-world prompt tuning, which demands a unified evaluation of two stages: 1) detecting whether an input belongs to the base or new domain (P1), and 2) classifying the sample into its correct class (P2). What's more, as domain distributions are generally unknown, a proper metric should be insensitive to varying base/new sample ratios (P3). However, we find that current metrics, including HM, overall accuracy, and AUROC, fail to satisfy these three properties simultaneously. To bridge this gap, we propose OpenworldAUC, a unified metric that jointly assesses detection and classification through pairwise instance comparisons. To optimize OpenworldAUC effectively, we introduce Gated Mixture-of-Prompts (GMoP), which employs domain-specific prompts and a gating mechanism to dynamically balance detection and classification. Theoretical guarantees ensure generalization of GMoP under practical conditions. Experiments on 15 benchmarks in open-world scenarios show GMoP achieves SOTA performance on OpenworldAUC and other metrics. We release the code at https://github.com/huacong/OpenworldAUC
Comment: The paper proposes Gated Mixture-of-Prompts (GMoP) for open-world prompt tuning, which aligns with 'Model Architecture' due to its innovative use of domain-specific prompts and gating mechanisms.
Relevance: 8 Novelty: 7
15. ComPO: Preference Alignment via Comparison Oracles
ArXiv ID: 2505.05465
Authors: Peter Chen, Xi Chen, Wotao Yin, Tianyi Lin
Abstract: Direct alignment methods are increasingly used for aligning large language models (LLMs) with human preferences. However, these methods suffer from the issues of verbosity and likelihood displacement, which can be driven by the noisy preference pairs that induce similar likelihood for preferred and dispreferred responses. The contributions of this paper are two-fold. First, we propose a new preference alignment method based on comparison oracles and provide the convergence guarantee for its basic scheme. Second, we improve our method using some heuristics and conduct the experiments to demonstrate the flexibility and compatibility of practical scheme in improving the performance of LLMs using noisy preference pairs. Evaluations are conducted across multiple base and instruction-tuned models (Mistral-7B, Llama-3-8B and Gemma-2-9B) with benchmarks (AlpacaEval 2, MT-Bench and Arena-Hard). Experimental results show the effectiveness of our method as an alternative to addressing the limitations of existing direct alignment methods. A highlight of our work is that we evidence the importance of designing specialized methods for preference pairs with distinct likelihood margin, which complements the recent findings in \citet{Razin-2025-Unintentional}.
Comment: The paper proposes a novel preference alignment method using comparison oracles, which aligns with foundational research in LLM behavior and interpretability. It provides theoretical insights into addressing limitations of existing alignment methods.
Relevance: 8 Novelty: 7
16. Learning dynamically inspired invariant subspaces for Koopman and transfer operator approximation
ArXiv ID: 2505.05085
Authors: Gary Froyland, Kevin K\"uhl
Abstract: Transfer and Koopman operator methods offer a framework for representing complex, nonlinear dynamical systems via linear transformations, enabling for a deeper understanding of the underlying dynamics. The spectrum of these operators provide important insights into system predictability and emergent behaviour, although efficiently estimating them from data can be challenging. We tackle this issue through the lens of general operator and representational learning, in which we approximate these linear operators using efficient finite-dimensional representations. Specifically, we machine-learn orthonormal, locally supported basis functions that are dynamically tailored to the system. This learned basis provides a particularly accurate approximation of the operator's action as well as a nearly invariant finite-dimensional subspace. We illustrate our approach with examples that showcase the retrieval of spectral properties from the estimated operator, and emphasise the dynamically adaptive quality of the machine-learned basis.
Comment: This paper explores operator learning and invariant subspaces, which aligns with representation learning and foundational research into how systems encode information. The use of machine-learned basis functions for spectral properties is novel.
Relevance: 8 Novelty: 7
17. Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models
ArXiv ID: 2505.05163
Authors: Aishwarya Venkataramanan, Paul Bodesheim, Joachim Denzler
Abstract: Vision-Language Models (VLMs) learn joint representations by mapping images and text into a shared latent space. However, recent research highlights that deterministic embeddings from standard VLMs often struggle to capture the uncertainties arising from the ambiguities in visual and textual descriptions and the multiple possible correspondences between images and texts. Existing approaches tackle this by learning probabilistic embeddings during VLM training, which demands large datasets and does not leverage the powerful representations already learned by large-scale VLMs like CLIP. In this paper, we propose GroVE, a post-hoc approach to obtaining probabilistic embeddings from frozen VLMs. GroVE builds on Gaussian Process Latent Variable Model (GPLVM) to learn a shared low-dimensional latent space where image and text inputs are mapped to a unified representation, optimized through single-modal embedding reconstruction and cross-modal alignment objectives. Once trained, the Gaussian Process model generates uncertainty-aware probabilistic embeddings. Evaluation shows that GroVE achieves state-of-the-art uncertainty calibration across multiple downstream tasks, including cross-modal retrieval, visual question answering, and active learning.
Comment: The paper introduces a probabilistic embedding approach for frozen vision-language models, which aligns with representation learning and uncertainty quantification, offering a novel post-hoc method.
Relevance: 8 Novelty: 7
Paper Selection Prompt
You are a helpful paper reading assistant whose job is to read daily posts from ArXiv and identify a few papers that your friend will enjoy reading. Your job is to carefully read the paper titles and abstracts below and find the ones that match the criteria below.
Instructions
Write the response in JSONL format with {ARXIVID, COMMENT, RELEVANCE, NOVELTY} on each line, one for each paper.
- ARXIVID: should be the ArXiv ID.
- COMMENT: should identify whether there is a criteria that match the paper very closely. These matches should not be based on general terms like "language modeling" or "advancements" and should specifically refer to a criterion. No need to mention the non-matching criteria.
- RELEVANCE: should be a score from 1-10.
- NOVELTY: should be a score from 1-10.
Scoring Criteria
The "Relevance" score measures how closely the paper aligns with the core topics of the prompt. The "Novelty" score assesses the originality and impact of the paper. They are two ORTHONORMAL axes and SHOULD NOT be confused with each other.
Relevance Scoring
- Relevance 9-10 (Completely Relevant)
- Focus: Fully aligned with core topics with no deviation, score the highest if contains relevant keywords in it.
-
Examples: Papers focused on foundational methods or theoretical research, whose titles contain topic keywords like "MoE".
-
Relevance 7-8 (Relevant)
- Focus: Retain a solid link to the main research area, though may touch on peripheral elements.
-
Examples: Papers research on the fundamental part of MoE through a less critical aspect like its behavior in GNN.
-
Relevance 5-6 (Borderline)
- Focus: Maintains a link to the core topic but also extends into at least one other domain/area beyond the primary focus.
-
Examples: Work referencing MoE centered on reinforcement learning.
-
Relevance 3-4 (Irrelevant)
- Focus: Largely outside our interests with no association to our topics.
-
Examples: Application-focused papers like using MoE to solve a problem in the real world.
-
Relevance 1-2 (Ignore)
- Focus: Purely unrelated to our topics. Completely a different domain.
- Exception: If the paper hints at a cutting-edge, radically new direction that could eventually transform the primary domain, consider a score of 9–10 despite initial appearances. (Usually a very rare concept that belongs to the fundamental research)
Novelty Scoring
- Novelty 9-10 (Breakthrough)
- Definition: Groundbreaking methods/theory introducing new directions or solving major challenges.
-
Examples: Entirely new paradigm for foundational models; a novel theory transforming representation learning.
-
Novelty 7-8 (Improvements)
- Definition: Substantial insights/enhancements, though not a full paradigm shift.
-
Examples: Modifications on existing methods yielding significantly better results.
-
Novelty 5-6 (Borderline)
- Definition: Incremental contributions with possible long-term benefits, not immediately transformative.
-
Examples: Moderately novel extension to an existing architecture; refining current methods without fundamentally altering them.
-
Novelty 3-4 (Tangential)
- Definition: Minor or domain-specific improvements with limited broader impact.
-
Examples: Slight modifications to known methods with strange motivation; purely engineering jobs like a new benchmark/dataset.
-
Novelty 1-2 (Low)
- Definition: Minimal originality, applying standard approaches without real innovation.
- Examples: Using an off-the-shelf model without adding new insights; purely application-driven studies like finetuning a pretrained model using existing methods.
Papers
[PAPER LIST HERE]
Relevant Topics
Use the following relevance criteria to focus on foundational research. Keep relevant papers and filter out irrelevant ones. Avoid purely application-driven work.
-
Representation Learning - Relevant: Insights into how deep networks encode information, feature/dictionary learning, sparse/contrastive methods, training dynamics in neural networks. - Irrelevant: Standard applications of known techniques lacking new theoretical or methodological contributions.
-
Model Architecture - Relevant: Mixture-of-Experts (MoE), Transformers, Conditional/Dynamic Networks, Autoencoders, analysis on existing architectures (like encoder-decoder), or other architectural innovations. - Irrelevant: Merely using existing architectures for a certain task without insights into the structure themselves.
-
Model Compression - Relevant: Sparsity, pruning, quantization, low-rank approaches, KV cache, or other algorithmic/theoretical efficiency breakthroughs. - Irrelevant: Straightforward applications of existing compression methods to new tasks.
-
Large Language Models (LLMs) - Relevant: Major breakthroughs in pretraining or architecture, theoretical insights into LLM behavior/interpretability. - Irrelevant: Domain-specific usage (e.g., translation, jail-breaking), finetuning or inference tricks (e.g., instruction tuning, chain-of-thoughts, data mixing), or empirical dataset/benchmark studies and text-level analysis (e.g. hallucination, reasoning, safety).
-
AI for Science - Relevant: Foundational research in molecular/protein modeling, new generative paradigms, or significant architecture-level innovations. - Irrelevant: Conventional, domain-specific applications without new theoretical perspectives.
-
Emerging Trends - Relevant: Cutting-edge theoretical work challenging established assumptions or introducing broad new paradigms. - Irrelevant: Incremental improvements or trend-following without novel insights.
Keywords:
- Relevant: Mixture of Experts (MoE), Representation Learning, Compression/Efficiency, Sparse/Sparsity, Pruning, Quantization, Low-rank, Foundation Model, etc.
- Irrelevant: Reinforcement Learning, Transfer Learning, Federated Learning, Online Learning, Diffusion Models, etc.
- Application: Image Segmentation, Medical Imaging, 3D Vision, Video Understanding, Information Retrieval, Summarization, Recommendation Systems, Machine Translation, Speech Recognition, Signal Processing, Spatial/Temporal Modeling, Time Series, Knowledge Graph, etc.