Personalized Daily ArXiv Papers 2025-07-17

[gpt-4o]	Prompt	Completion	Total
Token	29435	3474	32909
Cost	$0.07	$0.03	$0.11

Total arXiv papers: 437

Total scanned papers: 256

Total relevant papers: 24

Table of contents with paper titles:

Torsional-GFN: a conditional conformation generator for small molecules Authors: Alexandra Volokhova, L\'ena N\'ehale Ezzine, Piotr Gai\'nski, Luca Scimeca, Emmanuel Bengio, Prudencio Tossou, Yoshua Bengio, Alex Hernandez-Garcia
Cluster Contrast for Unsupervised Visual Representation Learning Authors: Nikolaos Giakoumoglou, Tania Stathaki
SToFM: a Multi-scale Foundation Model for Spatial Transcriptomics Authors: Suyuan Zhao, Yizhen Luo, Ganbo Yang, Yan Zhong, Hao Zhou, Zaiqing Nie
Mixture of Raytraced Experts Authors: Andrea Perin, Giacomo Lagomarsini, Claudio Gallicchio, Giuseppe Nuti
Tracing the Path to Grokking: Embeddings, Dropout, and Network Activation Authors: Ahmed Salah, David Yevick
PoTPTQ: A Two-step Power-of-Two Post-training for LLMs Authors: Xinyu Wang, Vahid Partovi Nia, Peng Lu, Jerry Huang, Xiao-Wen Chang, Boxing Chen, Yufei Cui
Composing Linear Layers from Irreducibles Authors: Travis Pence, Daisuke Yamada, Vikas Singh
Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential Authors: Mohammad Samragh, Arnav Kundu, David Harrison, Kumari Nishu, Devang Naik, Minsik Cho, Mehrdad Farajtabar
IAM: Efficient Inference through Attention Mapping between Different-scale LLMs Authors: Yi Zhao, Zuchao Li, Hai Zhao
Optimizers Qualitatively Alter Solutions And We Should Leverage This Authors: Razvan Pascanu, Clare Lyle, Ionut-Vlad Modoranu, Naima Elosegui Borras, Dan Alistarh, Petar Velickovic, Sarath Chandar, Soham De, James Martens
FactorHD: A Hyperdimensional Computing Model for Multi-Object Multi-Class Representation and Factorization Authors: Yifei Zhou, Xuchu Huang, Chenyu Ni, Min Zhou, Zheyu Yan, Xunzhao Yin, Cheng Zhuo
Einstein Fields: A Neural Perspective To Computational General Relativity Authors: Sandeep Suresh Cranganore, Andrei Bodnar, Arturs Berzins, Johannes Brandstetter
Newfluence: Boosting Model interpretability and Understanding in High Dimensions Authors: Haolin Zou, Arnab Auddy, Yongchan Kwon, Kamiar Rahnama Rad, Arian Maleki
Incorporating Fairness Constraints into Archetypal Analysis Authors: Aleix Alcacer, Irene Epifanio
An Memory-Efficient Framework for Deformable Transformer with Neural Architecture Search Authors: Wendong Mao, Mingfan Zhao, Jianfeng Guan, Qiwei Dong, Zhongfeng Wang
SynCoGen: Synthesizable 3D Molecule Generation via Joint Reaction and Coordinate Modeling Authors: Andrei Rekesh, Miruna Cretu, Dmytro Shevchuk, Vignesh Ram Somnath, Pietro Li`o, Robert A. Batey, Mike Tyers, Micha{\l} Koziarski, Cheng-Hao Liu
RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization Authors: Vladimir Bogachev, Vladimir Aletov, Alexander Molozhavenko, Denis Bobkov, Vera Soboleva, Aibek Alanov, Maxim Rakhuba
CLID-MU: Cross-Layer Information Divergence Based Meta Update Strategy for Learning with Noisy Labels Authors: Ruofan Hu, Dongyu Zhang, Huayi Zhang, Elke Rundensteiner
Effective Fine-Tuning of Vision Transformers with Low-Rank Adaptation for Privacy-Preserving Image Classification Authors: Haiwei Lin, Shoko Imaizumi, Hitoshi Kiya
Sparse Autoencoders for Sequential Recommendation Models: Interpretation and Flexible Control Authors: Anton Klenitskiy, Konstantin Polev, Daria Denisova, Alexey Vasilev, Dmitry Simakov, Gleb Gusev
Protenix-Mini: Efficient Structure Predictor via Compact Architecture, Few-Step Diffusion and Switchable pLM Authors: Chengyue Gong, Xinshi Chen, Yuxuan Zhang, Yuxuan Song, Hao Zhou, Wenzhi Xiao
Enforcing Latent Euclidean Geometry in Single-Cell VAEs for Manifold Interpolation Authors: Alessandro Palma, Sergei Rybakov, Leon Hetzel, Stephan G\"unnemann, Fabian J. Theis
CytoSAE: Interpretable Cell Embeddings for Hematology Authors: Muhammed Furkan Dasdelen, Hyesu Lim, Michele Buck, Katharina S. G\"otze, Carsten Marr, Steffen Schneider
Probing for Arithmetic Errors in Language Models Authors: Yucheng Sun, Alessandro Stolfo, Mrinmaya Sachan

1. Torsional-GFN: a conditional conformation generator for small molecules

ArXiv ID: 2507.11759

Authors: Alexandra Volokhova, L\'ena N\'ehale Ezzine, Piotr Gai\'nski, Luca Scimeca, Emmanuel Bengio, Prudencio Tossou, Yoshua Bengio, Alex Hernandez-Garcia

Abstract: Generating stable molecular conformations is crucial in several drug discovery applications, such as estimating the binding affinity of a molecule to a target. Recently, generative machine learning methods have emerged as a promising, more efficient method than molecular dynamics for sampling of conformations from the Boltzmann distribution. In this paper, we introduce Torsional-GFN, a conditional GFlowNet specifically designed to sample conformations of molecules proportionally to their Boltzmann distribution, using only a reward function as training signal. Conditioned on a molecular graph and its local structure (bond lengths and angles), Torsional-GFN samples rotations of its torsion angles. Our results demonstrate that Torsional-GFN is able to sample conformations approximately proportional to the Boltzmann distribution for multiple molecules with a single model, and allows for zero-shot generalization to unseen bond lengths and angles coming from the MD simulations for such molecules. Our work presents a promising avenue for scaling the proposed approach to larger molecular systems, achieving zero-shot generalization to unseen molecules, and including the generation of the local structure into the GFlowNet model.

Comment: Author match

2. Cluster Contrast for Unsupervised Visual Representation Learning

ArXiv ID: 2507.12359

Authors: Nikolaos Giakoumoglou, Tania Stathaki

Abstract: We introduce Cluster Contrast (CueCo), a novel approach to unsupervised visual representation learning that effectively combines the strengths of contrastive learning and clustering methods. Inspired by recent advancements, CueCo is designed to simultaneously scatter and align feature representations within the feature space. This method utilizes two neural networks, a query and a key, where the key network is updated through a slow-moving average of the query outputs. CueCo employs a contrastive loss to push dissimilar features apart, enhancing inter-class separation, and a clustering objective to pull together features of the same cluster, promoting intra-class compactness. Our method achieves 91.40% top-1 classification accuracy on CIFAR-10, 68.56% on CIFAR-100, and 78.65% on ImageNet-100 using linear evaluation with a ResNet-18 backbone. By integrating contrastive learning with clustering, CueCo sets a new direction for advancing unsupervised visual representation learning.

Comment: The paper introduces a novel approach to unsupervised visual representation learning by combining contrastive learning and clustering, aligning with the representation learning criterion.

Relevance: 9 Novelty: 8

3. SToFM: a Multi-scale Foundation Model for Spatial Transcriptomics

ArXiv ID: 2507.11588

Authors: Suyuan Zhao, Yizhen Luo, Ganbo Yang, Yan Zhong, Hao Zhou, Zaiqing Nie

Abstract: Spatial Transcriptomics (ST) technologies provide biologists with rich insights into single-cell biology by preserving spatial context of cells. Building foundational models for ST can significantly enhance the analysis of vast and complex data sources, unlocking new perspectives on the intricacies of biological tissues. However, modeling ST data is inherently challenging due to the need to extract multi-scale information from tissue slices containing vast numbers of cells. This process requires integrating macro-scale tissue morphology, micro-scale cellular microenvironment, and gene-scale gene expression profile. To address this challenge, we propose SToFM, a multi-scale Spatial Transcriptomics Foundation Model. SToFM first performs multi-scale information extraction on each ST slice, to construct a set of ST sub-slices that aggregate macro-, micro- and gene-scale information. Then an SE(2) Transformer is used to obtain high-quality cell representations from the sub-slices. Additionally, we construct \textbf{SToCorpus-88M}, the largest high-resolution spatial transcriptomics corpus for pretraining. SToFM achieves outstanding performance on a variety of downstream tasks, such as tissue region semantic segmentation and cell type annotation, demonstrating its comprehensive understanding of ST data

Comment: The paper introduces a multi-scale foundation model for spatial transcriptomics, which is relevant to AI for science and foundational model research.

Relevance: 9 Novelty: 8

4. Mixture of Raytraced Experts

ArXiv ID: 2507.12419

Authors: Andrea Perin, Giacomo Lagomarsini, Claudio Gallicchio, Giuseppe Nuti

Abstract: We introduce a Mixture of Raytraced Experts, a stacked Mixture of Experts (MoE) architecture which can dynamically select sequences of experts, producing computational graphs of variable width and depth. Existing MoE architectures generally require a fixed amount of computation for a given sample. Our approach, in contrast, yields predictions with increasing accuracy as the computation cycles through the experts' sequence. We train our model by iteratively sampling from a set of candidate experts, unfolding the sequence akin to how Recurrent Neural Networks are trained. Our method does not require load-balancing mechanisms, and preliminary experiments show a reduction in training epochs of 10\% to 40\% with a comparable/higher accuracy. These results point to new research directions in the field of MoEs, allowing the design of potentially faster and more expressive models. The code is available at https://github.com/nutig/RayTracing

Comment: The paper introduces a Mixture of Experts architecture with dynamic expert selection, directly relevant to model architecture innovations.

Relevance: 9 Novelty: 8

5. Tracing the Path to Grokking: Embeddings, Dropout, and Network Activation

ArXiv ID: 2507.11645

Authors: Ahmed Salah, David Yevick

Abstract: Grokking refers to delayed generalization in which the increase in test accuracy of a neural network occurs appreciably after the improvement in training accuracy This paper introduces several practical metrics including variance under dropout, robustness, embedding similarity, and sparsity measures, that can forecast grokking behavior. Specifically, the resilience of neural networks to noise during inference is estimated from a Dropout Robustness Curve (DRC) obtained from the variation of the accuracy with the dropout rate as the model transitions from memorization to generalization. The variance of the test accuracy under stochastic dropout across training checkpoints further exhibits a local maximum during the grokking. Additionally, the percentage of inactive neurons decreases during generalization, while the embeddings tend to a bimodal distribution independent of initialization that correlates with the observed cosine similarity patterns and dataset symmetries. These metrics additionally provide valuable insight into the origin and behaviour of grokking.

Comment: The paper provides insights into neural network training dynamics and introduces metrics related to sparsity and embedding similarity, which are relevant to representation learning.

Relevance: 9 Novelty: 8

6. PoTPTQ: A Two-step Power-of-Two Post-training for LLMs

ArXiv ID: 2507.11959

Authors: Xinyu Wang, Vahid Partovi Nia, Peng Lu, Jerry Huang, Xiao-Wen Chang, Boxing Chen, Yufei Cui

Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across various natural language processing (NLP) tasks. However, their deployment is challenging due to the substantial computational resources required. Power-of-two (PoT) quantization is a general tool to counteract this difficulty. Albeit previous works on PoT quantization can be efficiently dequantized on CPUs using fixed-point addition, it showed less effectiveness on GPUs. The reason is entanglement of the sign bit and sequential bit manipulations needed for dequantization. We propose a novel POT quantization framework for LLM weights that (i) outperforms state-of-the-art accuracy in extremely low-precision number formats, and (ii) enables faster inference through more efficient dequantization. To maintain the accuracy of the quantized model, we introduce a two-step post-training algorithm: (i) initialize the quantization scales with a robust starting point, and (ii) refine these scales using a minimal calibration set. The performance of our PoT post-training algorithm surpasses the current state-of-the-art in integer quantization, particularly at low precisions such as 2- and 3-bit formats. Our PoT quantization accelerates the dequantization step required for the floating point inference and leads to $3.67\times$ speed up on a NVIDIA V100, and $1.63\times$ on a NVIDIA RTX 4090, compared to uniform integer dequantization.

Comment: The paper proposes a novel quantization framework for LLMs, focusing on efficiency and compression, which aligns with model compression criteria.

Relevance: 9 Novelty: 8

7. Composing Linear Layers from Irreducibles

ArXiv ID: 2507.11688

Authors: Travis Pence, Daisuke Yamada, Vikas Singh

Abstract: Contemporary large models often exhibit behaviors suggesting the presence of low-level primitives that compose into modules with richer functionality, but these fundamental building blocks remain poorly understood. We investigate this compositional structure in linear layers by asking: can we identify/synthesize linear transformations from a minimal set of geometric primitives? Using Clifford algebra, we show that linear layers can be expressed as compositions of bivectors -- geometric objects encoding oriented planes -- and introduce a differentiable algorithm that decomposes them into products of rotors. This construction uses only O(log^2 d) parameters, versus O(d^2) required by dense matrices. Applied to the key, query, and value projections in LLM attention layers, our rotor-based layers match the performance of strong baselines such as block-Hadamard and low-rank approximations. Our findings provide an algebraic perspective on how these geometric primitives can compose into higher-level functions within deep models.

Comment: The paper explores the compositional structure of linear layers using geometric primitives, which aligns with foundational research in model architecture and representation learning.

Relevance: 9 Novelty: 8

8. Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential

ArXiv ID: 2507.11851

Authors: Mohammad Samragh, Arnav Kundu, David Harrison, Kumari Nishu, Devang Naik, Minsik Cho, Mehrdad Farajtabar

Abstract: Autoregressive language models are constrained by their inherently sequential nature, generating one token at a time. This paradigm limits inference speed and parallelism, especially during later stages of generation when the direction and semantics of text are relatively certain. In this work, we propose a novel framework that leverages the inherent knowledge of vanilla autoregressive language models about future tokens, combining techniques to realize this potential and enable simultaneous prediction of multiple subsequent tokens. Our approach introduces several key innovations: (1) a masked-input formulation where multiple future tokens are jointly predicted from a common prefix; (2) a gated LoRA formulation that preserves the original LLM's functionality, while equipping it for multi-token prediction; (3) a lightweight, learnable sampler module that generates coherent sequences from the predicted future tokens; (4) a set of auxiliary training losses, including a consistency loss, to enhance the coherence and accuracy of jointly generated tokens; and (5) a speculative generation strategy that expands tokens quadratically in the future while maintaining high fidelity. Our method achieves significant speedups through supervised fine-tuning on pretrained models. For example, it generates code and math nearly 5x faster, and improves general chat and knowledge tasks by almost 2.5x. These gains come without any loss in quality.

Comment: The paper introduces a novel framework for multi-token prediction in LLMs, which aligns with large language models by proposing a new method for improving inference speed.

Relevance: 9 Novelty: 8

9. IAM: Efficient Inference through Attention Mapping between Different-scale LLMs

ArXiv ID: 2507.11953

Authors: Yi Zhao, Zuchao Li, Hai Zhao

Abstract: LLMs encounter significant challenges in resource consumption nowadays, especially with long contexts. Despite extensive efforts dedicate to enhancing inference efficiency, these methods primarily exploit internal sparsity within the models, without leveraging external information for optimization. We identify the high similarity of attention matrices across different-scale LLMs, which offers a novel perspective for optimization. We first conduct a comprehensive analysis of how to measure similarity, how to select mapping Layers and whether mapping is consistency. Based on these insights, we introduce the IAM framework, which achieves dual benefits of accelerated attention computation and reduced KV cache usage by performing attention mapping between small and large LLMs. Our experimental results demonstrate that IAM can accelerate prefill by 15% and reduce KV cache usage by 22.1% without appreciably sacrificing performance. Experiments on different series of models show the generalizability of IAM. Importantly, it is also orthogonal to many existing KV cache optimization methods, making it a versatile addition to the current toolkit for enhancing LLM efficiency.

Comment: The paper introduces the IAM framework for efficient inference in LLMs, which aligns with model compression by proposing a method to reduce resource consumption and improve efficiency.

Relevance: 9 Novelty: 7

10. Optimizers Qualitatively Alter Solutions And We Should Leverage This

ArXiv ID: 2507.12224

Authors: Razvan Pascanu, Clare Lyle, Ionut-Vlad Modoranu, Naima Elosegui Borras, Dan Alistarh, Petar Velickovic, Sarath Chandar, Soham De, James Martens

Abstract: Due to the nonlinear nature of Deep Neural Networks (DNNs), one can not guarantee convergence to a unique global minimum of the loss when using optimizers relying only on local information, such as SGD. Indeed, this was a primary source of skepticism regarding the feasibility of DNNs in the early days of the field. The past decades of progress in deep learning have revealed this skepticism to be misplaced, and a large body of empirical evidence shows that sufficiently large DNNs following standard training protocols exhibit well-behaved optimization dynamics that converge to performant solutions. This success has biased the community to use convex optimization as a mental model for learning, leading to a focus on training efficiency, either in terms of required iteration, FLOPs or wall-clock time, when improving optimizers. We argue that, while this perspective has proven extremely fruitful, another perspective specific to DNNs has received considerably less attention: the optimizer not only influences the rate of convergence, but also the qualitative properties of the learned solutions. Restated, the optimizer can and will encode inductive biases and change the effective expressivity of a given class of models. Furthermore, we believe the optimizer can be an effective way of encoding desiderata in the learning process. We contend that the community should aim at understanding the biases of already existing methods, as well as aim to build new optimizers with the explicit intent of inducing certain properties of the solution, rather than solely judging them based on their convergence rates. We hope our arguments will inspire research to improve our understanding of how the learning process can impact the type of solution we converge to, and lead to a greater recognition of optimizers design as a critical lever that complements the roles of architecture and data in shaping model outcomes.

Comment: The paper discusses the role of optimizers in influencing the qualitative properties of learned solutions, which is relevant to understanding training dynamics in neural networks.

Relevance: 8 Novelty: 8

11. FactorHD: A Hyperdimensional Computing Model for Multi-Object Multi-Class Representation and Factorization

ArXiv ID: 2507.12366

Authors: Yifei Zhou, Xuchu Huang, Chenyu Ni, Min Zhou, Zheyu Yan, Xunzhao Yin, Cheng Zhuo

Abstract: Neuro-symbolic artificial intelligence (neuro-symbolic AI) excels in logical analysis and reasoning. Hyperdimensional Computing (HDC), a promising brain-inspired computational model, is integral to neuro-symbolic AI. Various HDC models have been proposed to represent class-instance and class-class relations, but when representing the more complex class-subclass relation, where multiple objects associate different levels of classes and subclasses, they face challenges for factorization, a crucial task for neuro-symbolic AI systems. In this article, we propose FactorHD, a novel HDC model capable of representing and factorizing the complex class-subclass relation efficiently. FactorHD features a symbolic encoding method that embeds an extra memorization clause, preserving more information for multiple objects. In addition, it employs an efficient factorization algorithm that selectively eliminates redundant classes by identifying the memorization clause of the target class. Such model significantly enhances computing efficiency and accuracy in representing and factorizing multiple objects with class-subclass relation, overcoming limitations of existing HDC models such as "superposition catastrophe" and "the problem of 2". Evaluations show that FactorHD achieves approximately 5667x speedup at a representation size of 10^9 compared to existing HDC models. When integrated with the ResNet-18 neural network, FactorHD achieves 92.48% factorization accuracy on the Cifar-10 dataset.

Comment: The paper introduces a novel HDC model for efficient representation and factorization, which is relevant to emerging trends in neuro-symbolic AI.

Relevance: 8 Novelty: 8

12. Einstein Fields: A Neural Perspective To Computational General Relativity

ArXiv ID: 2507.11589

Authors: Sandeep Suresh Cranganore, Andrei Bodnar, Arturs Berzins, Johannes Brandstetter

Abstract: We introduce Einstein Fields, a neural representation that is designed to compress computationally intensive four-dimensional numerical relativity simulations into compact implicit neural network weights. By modeling the \emph{metric}, which is the core tensor field of general relativity, Einstein Fields enable the derivation of physical quantities via automatic differentiation. However, unlike conventional neural fields (e.g., signed distance, occupancy, or radiance fields), Einstein Fields are \emph{Neural Tensor Fields} with the key difference that when encoding the spacetime geometry of general relativity into neural field representations, dynamics emerge naturally as a byproduct. Einstein Fields show remarkable potential, including continuum modeling of 4D spacetime, mesh-agnosticity, storage efficiency, derivative accuracy, and ease of use. We address these challenges across several canonical test beds of general relativity and release an open source JAX-based library, paving the way for more scalable and expressive approaches to numerical relativity. Code is made available at https://github.com/AndreiB137/EinFields

Comment: The paper introduces a neural representation for computational general relativity, which is relevant to AI for Science with a focus on foundational research.

Relevance: 8 Novelty: 8

13. Newfluence: Boosting Model interpretability and Understanding in High Dimensions

ArXiv ID: 2507.11895

Authors: Haolin Zou, Arnab Auddy, Yongchan Kwon, Kamiar Rahnama Rad, Arian Maleki

Abstract: The increasing complexity of machine learning (ML) and artificial intelligence (AI) models has created a pressing need for tools that help scientists, engineers, and policymakers interpret and refine model decisions and predictions. Influence functions, originating from robust statistics, have emerged as a popular approach for this purpose. However, the heuristic foundations of influence functions rely on low-dimensional assumptions where the number of parameters $p$ is much smaller than the number of observations $n$. In contrast, modern AI models often operate in high-dimensional regimes with large $p$, challenging these assumptions. In this paper, we examine the accuracy of influence functions in high-dimensional settings. Our theoretical and empirical analyses reveal that influence functions cannot reliably fulfill their intended purpose. We then introduce an alternative approximation, called Newfluence, that maintains similar computational efficiency while offering significantly improved accuracy. Newfluence is expected to provide more accurate insights than many existing methods for interpreting complex AI models and diagnosing their issues. Moreover, the high-dimensional framework we develop in this paper can also be applied to analyze other popular techniques, such as Shapley values.

Comment: The paper introduces Newfluence, an alternative to influence functions for model interpretability in high dimensions, which is relevant to representation learning.

Relevance: 8 Novelty: 8

14. Incorporating Fairness Constraints into Archetypal Analysis

ArXiv ID: 2507.12021

Authors: Aleix Alcacer, Irene Epifanio

Abstract: Archetypal Analysis (AA) is an unsupervised learning method that represents data as convex combinations of extreme patterns called archetypes. While AA provides interpretable and low-dimensional representations, it can inadvertently encode sensitive attributes, leading to fairness concerns. In this work, we propose Fair Archetypal Analysis (FairAA), a modified formulation that explicitly reduces the influence of sensitive group information in the learned projections. We also introduce FairKernelAA, a nonlinear extension that addresses fairness in more complex data distributions. Our approach incorporates a fairness regularization term while preserving the structure and interpretability of the archetypes. We evaluate FairAA and FairKernelAA on synthetic datasets, including linear, nonlinear, and multi-group scenarios, demonstrating their ability to reduce group separability -- as measured by mean maximum discrepancy and linear separability -- without substantially compromising explained variance. We further validate our methods on the real-world ANSUR I dataset, confirming their robustness and practical utility. The results show that FairAA achieves a favorable trade-off between utility and fairness, making it a promising tool for responsible representation learning in sensitive applications.

Comment: The paper focuses on representation learning by proposing Fair Archetypal Analysis, which modifies Archetypal Analysis to incorporate fairness constraints, aligning with the representation learning criterion.