Personalized Daily ArXiv Papers 2025-07-29

[gpt-4o]	Prompt	Completion	Total
Token	41545	5151	46696
Cost	$0.1	$0.05	$0.16

Total arXiv papers: 801

Total scanned papers: 478

Total relevant papers: 30

Table of contents with paper titles:

Quantum-Informed Machine Learning for Chaotic Systems Authors: Maida Wang, Xiao Xue, Peter V. Coveney
Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression Authors: Te Zhang, Yuheng Li, Junxiang Wang, Lujun Li
TransPrune: Token Transition Pruning for Efficient Large Vision-Language Model Authors: Ao Li, Yuxiang Duan, Jinghui Zhang, Congbo Ma, Yutong Xie, Gustavo Carneiro, Mohammad Yaqub, Hu Wang
Transformers as Unrolled Inference in Probabilistic Laplacian Eigenmaps: An Interpretation and Potential Improvements Authors: Aditya Ravuri, Neil D. Lawrence
Feature learning is decoupled from generalization in high capacity neural networks Authors: Niclas Alexander G\"oring, Charles London, Abdurrahman Hadi Erturk, Chris Mingard, Yoonsoo Nam, Ard A. Louis
State evolution beyond first-order methods I: Rigorous predictions and finite-sample guarantees Authors: Michael Celentano, Chen Cheng, Ashwin Pananjady, Kabir Aladin Verchand
HCAttention: Extreme KV Cache Compression via Heterogeneous Attention Computing for LLMs Authors: Dongquan Yang, Yifan Yang, Xiaotian Yu, Xianbiao Qi, Rong Xiao
MeLA: A Metacognitive LLM-Driven Architecture for Automatic Heuristic Design Authors: Zishang Qiu, Xinan Chen, Long Chen, Ruibin Bai
Enhancing Materials Discovery with Valence Constrained Design in Generative Modeling Authors: Mouyang Cheng, Weiliang Luo, Hao Tang, Bowen Yu, Yongqiang Cheng, Weiwei Xie, Ju Li, Heather J. Kulik, Mingda Li
Dimer-Enhanced Optimization: A First-Order Approach to Escaping Saddle Points in Neural Network Training Authors: Yue Hu, Zanxia Cao, Yingchao Liu
The wall confronting large language models Authors: Peter V. Coveney, Sauro Succi
WEEP: A Differentiable Nonconvex Sparse Regularizer via Weakly-Convex Envelope Authors: Takanobu Furuhashi, Hidekata Hontani, Tatsuya Yokota
EcoTransformer: Attention without Multiplication Authors: Xin Gao, Xingming Xu
Memorization in Fine-Tuned Large Language Models Authors: Danil Savine, Muni Sreenivas Pydi, Jamal Atif, Olivier Capp\'e
Mixture of Length and Pruning Experts for Knowledge Graphs Reasoning Authors: Enjun Du, Siyi Liu, Yongqi Zhang
DeepJIVE: Learning Joint and Individual Variation Explained from Multimodal Data Using Deep Learning Authors: Matthew Drexler, Benjamin Risk, James J Lah, Suprateek Kundu, Deqiang Qiu
Iterative Pretraining Framework for Interatomic Potentials Authors: Taoyong Cui, Zhongyao Wang, Dongzhan Zhou, Yuqiang Li, Lei Bai, Wanli Ouyang, Mao Su, Shufei Zhang
Bayesian symbolic regression: Automated equation discovery from a physicists' perspective Authors: Roger Guimera, Marta Sales-Pardo
CLoRA: Parameter-Efficient Continual Learning with Low-Rank Adaptation Authors: Shishir Muralidhara, Didier Stricker, Ren\'e Schuster
Sparse-mode Dynamic Mode Decomposition for Disambiguating Local and Global Structures Authors: Sara M. Ichinaga, Steven L. Brunton, Aleksandr Y. Aravkin, J. Nathan Kutz
Bag of Coins: A Statistical Probe into Neural Confidence Structures Authors: Agnideep Aich, Ashit Baran Aich, Md Monzur Murshed, Sameera Hewage, Bruce Wade
Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning Authors: Tolga Dimlioglu, Anna Choromanska
Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis Authors: Zhuokun Chen, Jugang Fan, Zhuowei Yu, Bohan Zhuang, Mingkui Tan
Quantizing Text-attributed Graphs for Semantic-Structural Integration Authors: Jianyuan Bo, Hao Wu, Yuan Fang
Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning Authors: Zedong Wang, Siyuan Li, Dan Xu
Kolmogorov Arnold Network Autoencoder in Medicine Authors: Ugo Lomoio, Pierangelo Veltri, Pietro Hiram Guzzi
Diagonally-Weighted Generalized Method of Moments Estimation for Gaussian Mixture Modeling Authors: Liu Zhang, Oscar Mickelin, Sheng Xu, Amit Singer
Regularizing Subspace Redundancy of Low-Rank Adaptation Authors: Yue Zhu, Haiwen Diao, Shang Gao, Jiazuo Yu, Jiawen Zhu, Yunzhi Zhuge, Shuai Hao, Xu Jia, Lu Zhang, Ying Zhang, Huchuan Lu
DeltaLLM: A Training-Free Framework Exploiting Temporal Sparsity for Efficient Edge LLM Inference Authors: Jiawen Qi, Chang Gao, Zhaochun Ren, Qinyu Chen
Nonconvex Optimization Framework for Group-Sparse Feedback Linear-Quadratic Optimal Control. II: Non-Penalty Approach Authors: Lechen Feng, Xun Li, Yuan-Hua Ni

1. Quantum-Informed Machine Learning for Chaotic Systems

ArXiv ID: 2507.19861

Authors: Maida Wang, Xiao Xue, Peter V. Coveney

Abstract: Learning the behaviour of chaotic systems remains challenging due to instability in long-term predictions and difficulties in accurately capturing invariant statistical properties. While quantum machine learning offers a promising route to efficiently capture physical properties from high-dimensional data, its practical deployment is hindered by current hardware noise and limited scalability. We introduce a quantum-informed machine learning framework for learning partial differential equations, with an application focus on chaotic systems. A quantum circuit Born machine is employed to learn the invariant properties of chaotic dynamical systems, achieving substantial memory efficiency by representing these complex physical statistics with a compact set of trainable circuit parameters. This approach reduces the data storage requirement by over two orders of magnitude compared to the raw simulation data. The resulting statistical quantum-informed prior is then incorporated into a Koopman-based auto-regressive model to address issues such as gradient vanishing or explosion, while maintaining long-term statistical fidelity. The framework is evaluated on three representative systems: the Kuramoto-Sivashinsky equation, two-dimensional Kolmogorov flow and turbulent channel flow. In all cases, the quantum-informed model achieves superior performance compared to its classical counterparts without quantum priors. This hybrid architecture offers a practical route for learning dynamical systems using near-term quantum hardware.

Comment: The paper introduces a quantum-informed machine learning framework, which is an emerging trend in AI for science, offering a new paradigm for learning chaotic systems.

Relevance: 9 Novelty: 8

2. Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression

ArXiv ID: 2507.20613

Authors: Te Zhang, Yuheng Li, Junxiang Wang, Lujun Li

Abstract: Large multimodal models (LMMs) have advanced significantly by integrating visual encoders with extensive language models, enabling robust reasoning capabilities. However, compressing LMMs for deployment on edge devices remains a critical challenge. In this work, we propose an adaptive search algorithm that optimizes sparsity and KV cache compression to enhance LMM efficiency. Utilizing the Tree-structured Parzen Estimator, our method dynamically adjusts pruning ratios and KV cache quantization bandwidth across different LMM layers, using model performance as the optimization objective. This approach uniquely combines pruning with key-value cache quantization and incorporates a fast pruning technique that eliminates the need for additional fine-tuning or weight adjustments, achieving efficient compression without compromising accuracy. Comprehensive evaluations on benchmark datasets, including LLaVA-1.5 7B and 13B, demonstrate our method superiority over state-of-the-art techniques such as SparseGPT and Wanda across various compression levels. Notably, our framework automatic allocation of KV cache compression resources sets a new standard in LMM optimization, delivering memory efficiency without sacrificing much performance.

Comment: The paper proposes an adaptive search algorithm for sparsity and KV cache compression in large multimodal models, relevant to model compression.

Relevance: 9 Novelty: 8

3. TransPrune: Token Transition Pruning for Efficient Large Vision-Language Model

ArXiv ID: 2507.20630

Authors: Ao Li, Yuxiang Duan, Jinghui Zhang, Congbo Ma, Yutong Xie, Gustavo Carneiro, Mohammad Yaqub, Hu Wang

Abstract: Large Vision-Language Models (LVLMs) have advanced multimodal learning but face high computational costs due to the large number of visual tokens, motivating token pruning to improve inference efficiency. The key challenge lies in identifying which tokens are truly important. Most existing approaches rely on attention-based criteria to estimate token importance. However, they inherently suffer from certain limitations, such as positional bias. In this work, we explore a new perspective on token importance based on token transitions in LVLMs. We observe that the transition of token representations provides a meaningful signal of semantic information. Based on this insight, we propose TransPrune, a training-free and efficient token pruning method. Specifically, TransPrune progressively prunes tokens by assessing their importance through a combination of Token Transition Variation (TTV)-which measures changes in both the magnitude and direction of token representations-and Instruction-Guided Attention (IGA), which measures how strongly the instruction attends to image tokens via attention. Extensive experiments demonstrate that TransPrune achieves comparable multimodal performance to original LVLMs, such as LLaVA-v1.5 and LLaVA-Next, across eight benchmarks, while reducing inference TFLOPs by more than half. Moreover, TTV alone can serve as an effective criterion without relying on attention, achieving performance comparable to attention-based methods. The code will be made publicly available upon acceptance of the paper at https://github.com/liaolea/TransPrune.

Comment: The paper introduces a novel token pruning method for large vision-language models, focusing on efficiency improvements through token transition analysis, which aligns with model compression and efficiency breakthroughs.

Relevance: 9 Novelty: 8

4. Transformers as Unrolled Inference in Probabilistic Laplacian Eigenmaps: An Interpretation and Potential Improvements

ArXiv ID: 2507.21040

Authors: Aditya Ravuri, Neil D. Lawrence

Abstract: We propose a probabilistic interpretation of transformers as unrolled inference steps assuming a probabilistic Laplacian Eigenmaps model from the ProbDR framework. Our derivation shows that at initialisation, transformers perform "linear" dimensionality reduction. We also show that within the transformer block, a graph Laplacian term arises from our arguments, rather than an attention matrix (which we interpret as an adjacency matrix). We demonstrate that simply subtracting the identity from the attention matrix (and thereby taking a graph diffusion step) improves validation performance on a language model and a simple vision transformer.

Comment: This paper provides a probabilistic interpretation of transformers, offering insights into their structure and potential improvements, which aligns with model architecture analysis.

Relevance: 9 Novelty: 8

5. Feature learning is decoupled from generalization in high capacity neural networks

ArXiv ID: 2507.19680

Authors: Niclas Alexander G\"oring, Charles London, Abdurrahman Hadi Erturk, Chris Mingard, Yoonsoo Nam, Ard A. Louis

Abstract: Neural networks outperform kernel methods, sometimes by orders of magnitude, e.g. on staircase functions. This advantage stems from the ability of neural networks to learn features, adapting their hidden representations to better capture the data. We introduce a concept we call feature quality to measure this performance improvement. We examine existing theories of feature learning and demonstrate empirically that they primarily assess the strength of feature learning, rather than the quality of the learned features themselves. Consequently, current theories of feature learning do not provide a sufficient foundation for developing theories of neural network generalization.

Comment: The paper examines feature learning in neural networks, providing insights into representation learning and generalization, which aligns with foundational research in representation learning.

Relevance: 9 Novelty: 8

6. State evolution beyond first-order methods I: Rigorous predictions and finite-sample guarantees

ArXiv ID: 2507.19611

Authors: Michael Celentano, Chen Cheng, Ashwin Pananjady, Kabir Aladin Verchand

Abstract: We develop a toolbox for exact analysis of iterative algorithms on a class of high-dimensional nonconvex optimization problems with random data. While prior work has shown that low-dimensional statistics of (generalized) first-order methods can be predicted by a deterministic recursion known as state evolution, our focus is on developing such a prediction for a more general class of algorithms. We provide a state evolution for any method whose iterations are given by (possibly interleaved) first-order and saddle point updates, showing two main results. First, we establish a rigorous state evolution prediction that holds even when the updates are not coordinate-wise separable. Second, we establish finite-sample guarantees bounding the deviation of the empirical updates from the established state evolution. In the process, we develop a technical toolkit that may prove useful in related problems. One component of this toolkit is a general Hilbert space lifting technique to prove existence and uniqueness of a convenient parameterization of the state evolution. Another component of the toolkit combines a generic application of Bolthausen's conditioning method with a sequential variant of Gordon's Gaussian comparison inequality, and provides additional ingredients that enable a general finite-sample analysis.

Comment: The paper provides a theoretical framework for state evolution in high-dimensional nonconvex optimization, which is relevant to representation learning and emerging trends.

Relevance: 9 Novelty: 8

7. HCAttention: Extreme KV Cache Compression via Heterogeneous Attention Computing for LLMs

ArXiv ID: 2507.19823

Authors: Dongquan Yang, Yifan Yang, Xiaotian Yu, Xianbiao Qi, Rong Xiao

Abstract: Processing long-context inputs with large language models presents a significant challenge due to the enormous memory requirements of the Key-Value (KV) cache during inference. Existing KV cache compression methods exhibit noticeable performance degradation when memory is reduced by more than 85%. Additionally, strategies that leverage GPU-CPU collaboration for approximate attention remain underexplored in this setting. We propose HCAttention, a heterogeneous attention computation framework that integrates key quantization, value offloading, and dynamic KV eviction to enable efficient inference under extreme memory constraints. The method is compatible with existing transformer architectures and does not require model fine-tuning. Experimental results on the LongBench benchmark demonstrate that our approach preserves the accuracy of full-attention model while shrinking the KV cache memory footprint to 25% of its original size. Remarkably, it stays competitive with only 12.5% of the cache, setting a new state-of-the-art in LLM KV cache compression. To the best of our knowledge, HCAttention is the first to extend the Llama-3-8B model to process 4 million tokens on a single A100 GPU with 80GB memory.

Comment: The paper presents a new method for KV cache compression in LLMs, which is relevant to model compression and efficiency.

Relevance: 9 Novelty: 8

8. MeLA: A Metacognitive LLM-Driven Architecture for Automatic Heuristic Design

ArXiv ID: 2507.20541

Authors: Zishang Qiu, Xinan Chen, Long Chen, Ruibin Bai

Abstract: This paper introduces MeLA, a Metacognitive LLM-Driven Architecture that presents a new paradigm for Automatic Heuristic Design (AHD). Traditional evolutionary methods operate directly on heuristic code; in contrast, MeLA evolves the instructional prompts used to guide a Large Language Model (LLM) in generating these heuristics. This process of "prompt evolution" is driven by a novel metacognitive framework where the system analyzes performance feedback to systematically refine its generative strategy. MeLA's architecture integrates a problem analyzer to construct an initial strategic prompt, an error diagnosis system to repair faulty code, and a metacognitive search engine that iteratively optimizes the prompt based on heuristic effectiveness. In comprehensive experiments across both benchmark and real-world problems, MeLA consistently generates more effective and robust heuristics, significantly outperforming state-of-the-art methods. Ultimately, this research demonstrates the profound potential of using cognitive science as a blueprint for AI architecture, revealing that by enabling an LLM to metacognitively regulate its problem-solving process, we unlock a more robust and interpretable path to AHD.

Comment: The paper introduces a novel architecture, MeLA, which uses a metacognitive framework to evolve prompts for LLMs, aligning with the Large Language Models criterion.

Relevance: 9 Novelty: 8

9. Enhancing Materials Discovery with Valence Constrained Design in Generative Modeling

ArXiv ID: 2507.19799

Authors: Mouyang Cheng, Weiliang Luo, Hao Tang, Bowen Yu, Yongqiang Cheng, Weiwei Xie, Ju Li, Heather J. Kulik, Mingda Li

Abstract: Diffusion-based deep generative models have emerged as powerful tools for inverse materials design. Yet, many existing approaches overlook essential chemical constraints such as oxidation state balance, which can lead to chemically invalid structures. Here we introduce CrysVCD (Crystal generator with Valence-Constrained Design), a modular framework that integrates chemical rules directly into the generative process. CrysVCD first employs a transformer-based elemental language model to generate valence-balanced compositions, followed by a diffusion model to generate crystal structures. The valence constraint enables orders-of-magnitude more efficient chemical valence checking, compared to pure data-driven approaches with post-screening. When fine-tuned on stability metrics, CrysVCD achieves 85% thermodynamic stability and 68% phonon stability. Moreover, CrysVCD supports conditional generation of functional materials, enabling discovery of candidates such as high thermal conductivity semiconductors and high-$\kappa$ dielectric compounds. Designed as a general-purpose plugin, CrysVCD can be integrated into diverse generative pipeline to promote chemical validity, offering a reliable, scientifically grounded path for materials discovery.

Comment: CrysVCD integrates chemical rules into generative modeling for materials discovery, aligning with AI for Science through foundational research in molecular modeling.

Relevance: 9 Novelty: 8

10. Dimer-Enhanced Optimization: A First-Order Approach to Escaping Saddle Points in Neural Network Training

ArXiv ID: 2507.19968

Authors: Yue Hu, Zanxia Cao, Yingchao Liu

Abstract: First-order optimization methods, such as SGD and Adam, are widely used for training large-scale deep neural networks due to their computational efficiency and robust performance. However, relying solely on gradient information, these methods often struggle to navigate complex loss landscapes with flat regions, plateaus, and saddle points. Second-order methods, which use curvature information from the Hessian matrix, can address these challenges but are computationally infeasible for large models. The Dimer method, a first-order technique that constructs two closely spaced points to probe the local geometry of a potential energy surface, efficiently estimates curvature using only gradient information. Inspired by its use in molecular dynamics simulations for locating saddle points, we propose Dimer-Enhanced Optimization (DEO), a novel framework to escape saddle points in neural network training. DEO adapts the Dimer method to explore a broader region of the loss landscape, approximating the Hessian's smallest eigenvector without computing the full matrix. By periodically projecting the gradient onto the subspace orthogonal to the minimum curvature direction, DEO guides the optimizer away from saddle points and flat regions, enhancing training efficiency with non-stepwise updates. Preliminary experiments on a Transformer toy model show DEO achieves competitive performance compared to standard first-order methods, improving navigation of complex loss landscapes. Our work repurposes physics-inspired, first-order curvature estimation to enhance neural network training in high-dimensional spaces.

Comment: The paper introduces Dimer-Enhanced Optimization, a novel first-order method inspired by physics to escape saddle points in neural network training, aligning with representation learning and training dynamics.

Relevance: 9 Novelty: 8

11. The wall confronting large language models

ArXiv ID: 2507.19703

Authors: Peter V. Coveney, Sauro Succi

Abstract: We show that the scaling laws which determine the performance of large language models (LLMs) severely limit their ability to improve the uncertainty of their predictions. As a result, raising their reliability to meet the standards of scientific inquiry is intractable by any reasonable measure. We argue that the very mechanism which fuels much of the learning power of LLMs, namely the ability to generate non-Gaussian output distributions from Gaussian input ones, might well be at the roots of their propensity to produce error pileup, ensuing information catastrophes and degenerative AI behaviour. This tension between learning and accuracy is a likely candidate mechanism underlying the observed low values of the scaling components. It is substantially compounded by the deluge of spurious correlations pointed out by Calude and Longo which rapidly increase in any data set merely as a function of its size, regardless of its nature. The fact that a degenerative AI pathway is a very probable feature of the LLM landscape does not mean that it must inevitably arise in all future AI research. Its avoidance, which we also discuss in this paper, necessitates putting a much higher premium on insight and understanding of the structural characteristics of the problems being investigated.

Comment: The paper discusses theoretical limitations of LLMs, which aligns with the interest in theoretical insights into LLM behavior.

Relevance: 9 Novelty: 8

12. WEEP: A Differentiable Nonconvex Sparse Regularizer via Weakly-Convex Envelope

ArXiv ID: 2507.20447

Authors: Takanobu Furuhashi, Hidekata Hontani, Tatsuya Yokota

Abstract: Sparse regularization is fundamental in signal processing for efficient signal recovery and feature extraction. However, it faces a fundamental dilemma: the most powerful sparsity-inducing penalties are often non-differentiable, conflicting with gradient-based optimizers that dominate the field. We introduce WEEP (Weakly-convex Envelope of Piecewise Penalty), a novel, fully differentiable sparse regularizer derived from the weakly-convex envelope framework. WEEP provides strong, unbiased sparsity while maintaining full differentiability and L-smoothness, making it natively compatible with any gradient-based optimizer. This resolves the conflict between statistical performance and computational tractability. We demonstrate superior performance compared to the L1-norm and other established non-convex sparse regularizers on challenging signal and image denoising tasks.

Comment: The paper introduces WEEP, a novel differentiable sparse regularizer, which aligns with the representation learning and model compression criteria by addressing sparsity and differentiability.

Relevance: 9 Novelty: 8

13. EcoTransformer: Attention without Multiplication

ArXiv ID: 2507.20096

Authors: Xin Gao, Xingming Xu

Abstract: The Transformer, with its scaled dot-product attention mechanism, has become a foundational architecture in modern AI. However, this mechanism is computationally intensive and incurs substantial energy costs. We propose a new Transformer architecture EcoTransformer, in which the output context vector is constructed as the convolution of the values using a Laplacian kernel, where the distances are measured by the L1 metric between the queries and keys. Compared to dot-product based attention, the new attention score calculation is free of matrix multiplication. It performs on par with, or even surpasses, scaled dot-product attention in NLP, bioinformatics, and vision tasks, while consuming significantly less energy.

Comment: EcoTransformer proposes a new Transformer architecture that reduces computational costs, aligning with model architecture innovations.

Relevance: 9 Novelty: 8

14. Memorization in Fine-Tuned Large Language Models

ArXiv ID: 2507.21009

Authors: Danil Savine, Muni Sreenivas Pydi, Jamal Atif, Olivier Capp\'e

Abstract: This study investigates the mechanisms and factors influencing memorization in fine-tuned large language models (LLMs), with a focus on the medical domain due to its privacy-sensitive nature. We examine how different aspects of the fine-tuning process affect a model's propensity to memorize training data, using the PHEE dataset of pharmacovigilance events. Our research employs two main approaches: a membership inference attack to detect memorized data, and a generation task with prompted prefixes to assess verbatim reproduction. We analyze the impact of adapting different weight matrices in the transformer architecture, the relationship between perplexity and memorization, and the effect of increasing the rank in low-rank adaptation (LoRA) fine-tuning. Key findings include: (1) Value and Output matrices contribute more significantly to memorization compared to Query and Key matrices; (2) Lower perplexity in the fine-tuned model correlates with increased memorization; (3) Higher LoRA ranks lead to increased memorization, but with diminishing returns at higher ranks. These results provide insights into the trade-offs between model performance and privacy risks in fine-tuned LLMs. Our findings have implications for developing more effective and responsible strategies for adapting large language models while managing data privacy concerns.

Comment: The paper investigates memorization in fine-tuned LLMs, providing insights into LLM behavior and interpretability.

Relevance: 9 Novelty: 7

15. Mixture of Length and Pruning Experts for Knowledge Graphs Reasoning

ArXiv ID: 2507.20498

Authors: Enjun Du, Siyi Liu, Yongqi Zhang

Abstract: Knowledge Graph (KG) reasoning, which aims to infer new facts from structured knowledge repositories, plays a vital role in Natural Language Processing (NLP) systems. Its effectiveness critically depends on constructing informative and contextually relevant reasoning paths. However, existing graph neural networks (GNNs) often adopt rigid, query-agnostic path-exploration strategies, limiting their ability to adapt to diverse linguistic contexts and semantic nuances. To address these limitations, we propose \textbf{MoKGR}, a mixture-of-experts framework that personalizes path exploration through two complementary components: (1) a mixture of length experts that adaptively selects and weights candidate path lengths according to query complexity, providing query-specific reasoning depth; and (2) a mixture of pruning experts that evaluates candidate paths from a complementary perspective, retaining the most informative paths for each query. Through comprehensive experiments on diverse benchmark, MoKGR demonstrates superior performance in both transductive and inductive settings, validating the effectiveness of personalized path exploration in KGs reasoning.

Comment: The paper introduces a mixture-of-experts framework for knowledge graph reasoning, relevant to model architecture with a focus on MoE.

Relevance: 9 Novelty: 7

16. DeepJIVE: Learning Joint and Individual Variation Explained from Multimodal Data Using Deep Learning

ArXiv ID: 2507.19682

Authors: Matthew Drexler, Benjamin Risk, James J Lah, Suprateek Kundu, Deqiang Qiu

Abstract: Conventional multimodal data integration methods provide a comprehensive assessment of the shared or unique structure within each individual data type but suffer from several limitations such as the inability to handle high-dimensional data and identify nonlinear structures. In this paper, we introduce DeepJIVE, a deep-learning approach to performing Joint and Individual Variance Explained (JIVE). We perform mathematical derivation and experimental validations using both synthetic and real-world 1D, 2D, and 3D datasets. Different strategies of achieving the identity and orthogonality constraints for DeepJIVE were explored, resulting in three viable loss functions. We found that DeepJIVE can successfully uncover joint and individual variations of multimodal datasets. Our application of DeepJIVE to the Alzheimer's Disease Neuroimaging Initiative (ADNI) also identified biologically plausible covariation patterns between the amyloid positron emission tomography (PET) and magnetic resonance (MR) images. In conclusion, the proposed DeepJIVE can be a useful tool for multimodal data analysis.

Comment: The paper presents DeepJIVE, a deep-learning approach for multimodal data integration, focusing on representation learning by uncovering joint and individual variations.