Personalized Daily Arxiv Papers 3/24/2025

[gpt-4o]	Prompt	Completion	Total
Token	44946	6046	50992
Cost	$0.11	$0.06	$0.17

Total arXiv papers: 502

Total scanned papers: 298

Total relevant papers: 28

Table of contents with paper titles:

Offline Model-Based Optimization: Comprehensive Review Authors: Minsu Kim, Jiayao Gu, Ye Yuan, Taeyoung Yun, Zixuan Liu, Yoshua Bengio, Can Chen
Large Language Model Compression via the Nested Activation-Aware Decomposition Authors: Jun Lu, Tianyi Xu, Bill Ding, David Li, Yu Kang
Malliavin-Bismut Score-based Diffusion Models Authors: Ehsan Mirafzali, Utkarsh Gupta, Patrick Wyrod, Frank Proske, Daniele Venturi, Razvan Marinescu
Exploring a Principled Framework for Deep Subspace Clustering Authors: Xianghan Meng, Zhiyuan Huang, Wei He, Xianbiao Qi, Rong Xiao, Chun-Guang Li
SuperARC: A Test for General and Super Intelligence Based on First Principles of Recursion Theory and Algorithmic Probability Authors: Alberto Hern\'andez-Espinosa, Luan Ozelim, Felipe S. Abrah\~ao, Hector Zenil
Glivenko-Cantelli for $f$-divergence Authors: Haoming Wang, Lek-Heng Lim
Accelerating Transformer Inference and Training with 2:4 Activation Sparsity Authors: Daniel Haziza, Timothy Chou, Dhruv Choudhary, Luca Wehrstedt, Francisco Massa, Jiecao Yu, Geonhwa Jeong, Supriya Rao, Patrick Labatut, Jesse Cai
Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs Authors: Anshumann, Mohd Abbas Zaidi, Akhil Kedia, Jinwoo Ahn, Taehwak Kwon, Kangwook Lee, Haejun Lee, Joohyung Lee
Nonparametric Factor Analysis and Beyond Authors: Yujia Zheng, Yang Liu, Jiaxiong Yao, Yingyao Hu, Kun Zhang
NdLinear Is All You Need for Representation Learning Authors: Alex Reneau, Jerry Yao-Chieh Hu, Zhongfang Zhuang, Ting-Chun Liu
Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation Authors: Sophia Tang, Yinuo Zhang, Alexander Tong, Pranam Chatterjee
Token Dynamics: Towards Efficient and Dynamic Video Token Representation for Video Large Language Models Authors: Haichao Zhang, Zhuowei Li, Dimitris Metaxas, Yun Fu
Structure Is Not Enough: Leveraging Behavior for Neural Network Weight Reconstruction Authors: L\'eo Meynent, Ivan Melev, Konstantin Sch\"urholt, G\"oran Kauermann, Damian Borth
Physics-Informed Deep B-Spline Networks for Dynamical Systems Authors: Zhuoyuan Wang, Raffaele Romagnoli, Jasmine Ratchford, Yorie Nakahira
Ordered Topological Deep Learning: a Network Modeling Case Study Authors: Guillermo Bern\'ardez, Miquel Ferriol-Galm\'es, Carlos G\"uemes-Palau, Mathilde Papillon, Pere Barlet-Ros, Albert Cabellos-Aparicio, Nina Miolane
A Learnability Analysis on Neuro-Symbolic Learning Authors: Hao-Yuan He, Ming Li
Efficient ANN-Guided Distillation: Aligning Rate-based Features of Spiking Neural Networks through Hybrid Block-wise Replacement Authors: Shu Yang, Chengting Yu, Lei Liu, Hanzhi Ma, Aili Wang, Erping Li
KVShare: Semantic-Aware Key-Value Cache Sharing for Efficient Large Language Model Inference Authors: Huan Yang, Renji Zhang, Deyu Zhang
Neural-Guided Equation Discovery Authors: Jannis Brugger, Mattia Cerrato, David Richter, Cedric Derstroff, Daniel Maninger, Mira Mezini, Stefan Kramer
PRIOT: Pruning-Based Integer-Only Transfer Learning for Embedded Systems Authors: Honoka Anada, Sefutsu Ryu, Masayuki Usui, Tatsuya Kaneko, Shinya Takamaeda-Yamazaki
Token-Level Uncertainty-Aware Objective for Language Model Post-Training Authors: Tingkai Liu, Ari S. Benjamin, Anthony M. Zador
An Accelerated Bregman Algorithm for ReLU-based Symmetric Matrix Decomposition Authors: Qingsong Wang
Gene42: Long-Range Genomic Foundation Model With Dense Attention Authors: Kirill Vishniakov, Boulbaba Ben Amor, Engin Tekin, Nancy A. ElNaker, Karthik Viswanathan, Aleksandr Medvedev, Aahan Singh, Maryam Nadeem, Mohammad Amaan Sayeed, Praveenkumar Kanithi, Tiago Magalhaes, Natalia Vassilieva, Dwarikanath Mahapatra, Marco Pimentel, and Shadab Khan
SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging Authors: Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Syed Zawad, Holger Boche
Model-free front-to-end training of a large high performance laser neural network Authors: Anas Skalli, Satoshi Sunada, Mirko Goldmann, Marcin Gebski, Stephan Reitzenstein, James A. Lott, Tomasz Czyszanowski, Daniel Brunner
Efficient Training of Neural Fractional-Order Differential Equation via Adjoint Backpropagation Authors: Qiyu Kang, Xuhao Li, Kai Zhao, Wenjun Cui, Yanan Zhao, Weihua Deng, Wee Peng Tay
Do regularization methods for shortcut mitigation work as intended? Authors: Haoyang Hong, Ioanna Papanikolaou, Sonali Parbhoo
Rethinking the Role of Spatial Mixing Authors: George Cazenavette, Joel Julin, Simon Lucey

1. Offline Model-Based Optimization: Comprehensive Review

ArXiv ID: 2503.17286

Authors: Minsu Kim, Jiayao Gu, Ye Yuan, Taeyoung Yun, Zixuan Liu, Yoshua Bengio, Can Chen

Abstract: Offline optimization is a fundamental challenge in science and engineering, where the goal is to optimize black-box functions using only offline datasets. This setting is particularly relevant when querying the objective function is prohibitively expensive or infeasible, with applications spanning protein engineering, material discovery, neural architecture search, and beyond. The main difficulty lies in accurately estimating the objective landscape beyond the available data, where extrapolations are fraught with significant epistemic uncertainty. This uncertainty can lead to objective hacking(reward hacking), exploiting model inaccuracies in unseen regions, or other spurious optimizations that yield misleadingly high performance estimates outside the training distribution. Recent advances in model-based optimization(MBO) have harnessed the generalization capabilities of deep neural networks to develop offline-specific surrogate and generative models. Trained with carefully designed strategies, these models are more robust against out-of-distribution issues, facilitating the discovery of improved designs. Despite its growing impact in accelerating scientific discovery, the field lacks a comprehensive review. To bridge this gap, we present the first thorough review of offline MBO. We begin by formalizing the problem for both single-objective and multi-objective settings and by reviewing recent benchmarks and evaluation metrics. We then categorize existing approaches into two key areas: surrogate modeling, which emphasizes accurate function approximation in out-of-distribution regions, and generative modeling, which explores high-dimensional design spaces to identify high-performing designs. Finally, we examine the key challenges and propose promising directions for advancement in this rapidly evolving field including safe control of superintelligent systems.

Comment: Author match

2. Large Language Model Compression via the Nested Activation-Aware Decomposition

ArXiv ID: 2503.17101

Authors: Jun Lu, Tianyi Xu, Bill Ding, David Li, Yu Kang

Abstract: In this paper, we tackle the critical challenge of compressing large language models (LLMs) to facilitate their practical deployment and broader adoption. We introduce a novel post-training compression paradigm that focuses on low-rank decomposition of LLM weights. Our analysis identifies two main challenges in this task: the variability in LLM activation distributions and handling unseen activations from different datasets and models. To address these challenges, we propose a nested activation-aware framework (NSVD) for LLMs, a training-free approach designed to enhance the accuracy of low-rank decompositions by managing activation outliers through transforming the weight matrix based on activation distribution and the original weight matrix. This method allows for the absorption of outliers into the transformed weight matrix, improving decomposition accuracy. Our comprehensive evaluation across eight datasets and six models from three distinct LLM families demonstrates the superiority of NSVD over current state-of-the-art methods, especially at medium to large compression ratios or in multilingual and multitask settings.

Comment: The paper focuses on a novel low-rank decomposition method for compressing large language models (LLMs), which aligns closely with the 'Model Compression' criterion. The proposed nested activation-aware framework (NSVD) introduces a new approach to handle activation variability and outliers, making it a significant contribution to compression techniques.

Relevance: 10 Novelty: 8

3. Malliavin-Bismut Score-based Diffusion Models

ArXiv ID: 2503.16917

Authors: Ehsan Mirafzali, Utkarsh Gupta, Patrick Wyrod, Frank Proske, Daniele Venturi, Razvan Marinescu

Abstract: We introduce a new framework that employs Malliavin calculus to derive explicit expressions for the score function -- i.e., the gradient of the log-density -- associated with solutions to stochastic differential equations (SDEs). Our approach integrates classical integration-by-parts techniques with modern tools, such as Bismut's formula and Malliavin calculus, to address linear and nonlinear SDEs. In doing so, we establish a rigorous connection between the Malliavin derivative, its adjoint (the Malliavin divergence or the Skorokhod integral), Bismut's formula, and diffusion generative models, thus providing a systematic method for computing $\nabla \log p_t(x)$. For the linear case, we present a detailed study proving that our formula is equivalent to the actual score function derived from the solution of the Fokker--Planck equation for linear SDEs. Additionally, we derive a closed-form expression for $\nabla \log p_t(x)$ for nonlinear SDEs with state-independent diffusion coefficients. These advancements provide fresh theoretical insights into the smoothness and structure of probability densities and practical implications for score-based generative modelling, including the design and analysis of new diffusion models. Moreover, our findings promote the adoption of the robust Malliavin calculus framework in machine learning research. These results directly apply to various pure and applied mathematics fields, such as generative modelling, the study of SDEs driven by fractional Brownian motion, and the Fokker--Planck equations associated with nonlinear SDEs.

Comment: The paper introduces a novel theoretical framework using Malliavin calculus for score-based diffusion models, which aligns with foundational research in generative modeling.

Relevance: 9 Novelty: 9

4. Exploring a Principled Framework for Deep Subspace Clustering

ArXiv ID: 2503.17288

Authors: Xianghan Meng, Zhiyuan Huang, Wei He, Xianbiao Qi, Rong Xiao, Chun-Guang Li

Abstract: Subspace clustering is a classical unsupervised learning task, built on a basic assumption that high-dimensional data can be approximated by a union of subspaces (UoS). Nevertheless, the real-world data are often deviating from the UoS assumption. To address this challenge, state-of-the-art deep subspace clustering algorithms attempt to jointly learn UoS representations and self-expressive coefficients. However, the general framework of the existing algorithms suffers from a catastrophic feature collapse and lacks a theoretical guarantee to learn desired UoS representation. In this paper, we present a Principled fRamewOrk for Deep Subspace Clustering (PRO-DSC), which is designed to learn structured representations and self-expressive coefficients in a unified manner. Specifically, in PRO-DSC, we incorporate an effective regularization on the learned representations into the self-expressive model, prove that the regularized self-expressive model is able to prevent feature space collapse, and demonstrate that the learned optimal representations under certain condition lie on a union of orthogonal subspaces. Moreover, we provide a scalable and efficient approach to implement our PRO-DSC and conduct extensive experiments to verify our theoretical findings and demonstrate the superior performance of our proposed deep subspace clustering approach. The code is available at https://github.com/mengxianghan123/PRO-DSC.

Comment: The paper presents a principled framework for deep subspace clustering, addressing feature collapse and providing theoretical guarantees. This aligns with representation learning and foundational clustering methods.

Relevance: 9 Novelty: 9

5. SuperARC: A Test for General and Super Intelligence Based on First Principles of Recursion Theory and Algorithmic Probability

ArXiv ID: 2503.16743

Authors: Alberto Hern\'andez-Espinosa, Luan Ozelim, Felipe S. Abrah\~ao, Hector Zenil

Abstract: We introduce an open-ended test grounded in algorithmic probability that can avoid benchmark contamination in the quantitative evaluation of frontier models in the context of their Artificial General Intelligence (AGI) and Superintelligence (ASI) claims. Unlike other tests, this test does not rely on statistical compression methods (such as GZIP or LZW), which are more closely related to Shannon entropy than to Kolmogorov complexity. The test challenges aspects related to features of intelligence of fundamental nature such as synthesis and model creation in the context of inverse problems (generating new knowledge from observation). We argue that metrics based on model abstraction and optimal Bayesian inference for planning can provide a robust framework for testing intelligence, including natural intelligence (human and animal), narrow AI, AGI, and ASI. Our results show no clear evidence of LLM convergence towards a defined level of intelligence, particularly AGI or ASI. We found that LLM model versions tend to be fragile and incremental, as new versions may perform worse than older ones, with progress largely driven by the size of training data. The results were compared with a hybrid neurosymbolic approach that theoretically guarantees model convergence from optimal inference based on the principles of algorithmic probability and Kolmogorov complexity. The method outperforms LLMs in a proof-of-concept on short binary sequences. Our findings confirm suspicions regarding the fundamental limitations of LLMs, exposing them as systems optimised for the perception of mastery over human language. Progress among different LLM versions from the same developers was found to be inconsistent and limited, particularly in the absence of a solid symbolic counterpart.

Comment: The paper introduces a test for AGI and ASI based on algorithmic probability, which challenges established assumptions and aligns with emerging trends in foundational AI research.

Relevance: 9 Novelty: 9

6. Glivenko-Cantelli for $f$-divergence

ArXiv ID: 2503.17355

Authors: Haoming Wang, Lek-Heng Lim

Abstract: We extend the celebrated Glivenko-Cantelli theorem, sometimes called the fundamental theorem of statistics, from its standard setting of total variation distance to all $f$-divergences. A key obstacle in this endeavor is to define $f$-divergence on a subcollection of a $\sigma$-algebra that forms a $\pi$-system but not a $\sigma$-subalgebra. This is a side contribution of our work. We will show that this notion of $f$-divergence on the $\pi$-system of rays preserves nearly all known properties of standard $f$-divergence, yields a novel integral representation of the Kolmogorov-Smirnov distance, and has a Glivenko-Cantelli theorem.

Comment: The paper extends the Glivenko-Cantelli theorem to f-divergences, which is a cutting-edge theoretical contribution and aligns with emerging trends in foundational research.

Relevance: 9 Novelty: 9

7. Accelerating Transformer Inference and Training with 2:4 Activation Sparsity

ArXiv ID: 2503.16672

Authors: Daniel Haziza, Timothy Chou, Dhruv Choudhary, Luca Wehrstedt, Francisco Massa, Jiecao Yu, Geonhwa Jeong, Supriya Rao, Patrick Labatut, Jesse Cai

Abstract: In this paper, we demonstrate how to leverage 2:4 sparsity, a popular hardware-accelerated GPU sparsity pattern, to activations to accelerate large language model training and inference. Crucially we exploit the intrinsic sparsity found in Squared-ReLU activations to provide this acceleration with no accuracy loss. Our approach achieves up to 1.3x faster Feed Forward Network (FFNs) in both the forwards and backwards pass. This work highlights the potential for sparsity to play a key role in accelerating large language model training and inference.

Comment: The paper explores activation sparsity in Transformers for efficiency, which aligns with foundational research in model compression and efficiency.

Relevance: 9 Novelty: 8

8. Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs

ArXiv ID: 2503.16870

Authors: Anshumann, Mohd Abbas Zaidi, Akhil Kedia, Jinwoo Ahn, Taehwak Kwon, Kangwook Lee, Haejun Lee, Joohyung Lee

Abstract: Knowledge distillation can be a cost-effective technique to distill knowledge in Large Language Models, if the teacher output logits can be pre-computed and cached. However, successfully applying this to pre-training remains largely unexplored. In this work, we prove that naive approaches for sparse knowledge distillation such as caching Top-K probabilities, while intuitive, provide biased estimates of teacher probability distribution to the student, resulting in suboptimal performance and calibration. We propose an importance-sampling-based method `Random Sampling Knowledge Distillation', which provides unbiased estimates, preserves the gradient in expectation, and requires storing significantly sparser logits. Our method enables faster training of student models with marginal overhead (<10%) compared to cross-entropy based training, while maintaining competitive performance compared to full distillation, across a range of model sizes from 300M to 3B.

Comment: The paper proposes a sparse logit sampling method for knowledge distillation in LLMs, which aligns with model compression and efficiency. The use of importance sampling for unbiased estimates is a novel contribution.

Relevance: 9 Novelty: 8

9. Nonparametric Factor Analysis and Beyond

ArXiv ID: 2503.16865

Authors: Yujia Zheng, Yang Liu, Jiaxiong Yao, Yingyao Hu, Kun Zhang

Abstract: Nearly all identifiability results in unsupervised representation learning inspired by, e.g., independent component analysis, factor analysis, and causal representation learning, rely on assumptions of additive independent noise or noiseless regimes. In contrast, we study the more general case where noise can take arbitrary forms, depend on latent variables, and be non-invertibly entangled within a nonlinear function. We propose a general framework for identifying latent variables in the nonparametric noisy settings. We first show that, under suitable conditions, the generative model is identifiable up to certain submanifold indeterminacies even in the presence of non-negligible noise. Furthermore, under the structural or distributional variability conditions, we prove that latent variables of the general nonlinear models are identifiable up to trivial indeterminacies. Based on the proposed theoretical framework, we have also developed corresponding estimation methods and validated them in various synthetic and real-world settings. Interestingly, our estimate of the true GDP growth from alternative measurements suggests more insightful information on the economies than official reports. We expect our framework to provide new insight into how both researchers and practitioners deal with latent variables in real-world scenarios.

Comment: The paper provides a theoretical framework for identifying latent variables in noisy settings, which aligns with foundational research in representation learning.

Relevance: 9 Novelty: 8

10. NdLinear Is All You Need for Representation Learning

ArXiv ID: 2503.17353

Authors: Alex Reneau, Jerry Yao-Chieh Hu, Zhongfang Zhuang, Ting-Chun Liu

Abstract: Many high-impact machine learning tasks involve multi-dimensional data (e.g., images, volumetric medical scans, multivariate time-series). Yet, most neural architectures flatten inputs, discarding critical cross-dimension information. We introduce NdLinear, a novel linear transformation that preserves these structures without extra overhead. By operating separately along each dimension, NdLinear captures dependencies that standard fully connected layers overlook. Extensive experiments across convolutional, recurrent, and transformer-based networks show significant improvements in representational power and parameter efficiency. Crucially, NdLinear serves as a foundational building block for large-scale foundation models by operating on any unimodal or multimodal data in its native form. This removes the need for flattening or modality-specific preprocessing. Ndlinear rethinks core architectural priorities beyond attention, enabling more expressive, context-aware models at scale. We propose NdLinear as a drop-in replacement for standard linear layers -- marking an important step toward next-generation neural architectures.

Comment: The paper introduces NdLinear, a novel linear transformation for preserving multi-dimensional data structures, which aligns with foundational research in representation learning and architectural innovations.

Relevance: 9 Novelty: 8

11. Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation

ArXiv ID: 2503.17361

Authors: Sophia Tang, Yinuo Zhang, Alexander Tong, Pranam Chatterjee

Abstract: Flow matching in the continuous simplex has emerged as a promising strategy for DNA sequence design, but struggles to scale to higher simplex dimensions required for peptide and protein generation. We introduce Gumbel-Softmax Flow and Score Matching, a generative framework on the simplex based on a novel Gumbel-Softmax interpolant with a time-dependent temperature. Using this interpolant, we introduce Gumbel-Softmax Flow Matching by deriving a parameterized velocity field that transports from smooth categorical distributions to distributions concentrated at a single vertex of the simplex. We alternatively present Gumbel-Softmax Score Matching which learns to regress the gradient of the probability density. Our framework enables high-quality, diverse generation and scales efficiently to higher-dimensional simplices. To enable training-free guidance, we propose Straight-Through Guided Flows (STGFlow), a classifier-based guidance method that leverages straight-through estimators to steer the unconditional velocity field toward optimal vertices of the simplex. STGFlow enables efficient inference-time guidance using classifiers pre-trained on clean sequences, and can be used with any discrete flow method. Together, these components form a robust framework for controllable de novo sequence generation. We demonstrate state-of-the-art performance in conditional DNA promoter design, sequence-only protein generation, and target-binding peptide design for rare disease treatment.

Comment: The paper introduces a novel generative framework for biological sequence generation, which aligns with foundational research in AI for Science.

Relevance: 8 Novelty: 8

12. Token Dynamics: Towards Efficient and Dynamic Video Token Representation for Video Large Language Models

ArXiv ID: 2503.16980

Authors: Haichao Zhang, Zhuowei Li, Dimitris Metaxas, Yun Fu

Abstract: Token-based video representation has emerged as a promising approach for enabling large language models to interpret video content. However, existing token reduction techniques, such as token pruning and token merging, often disrupt essential spatial-temporal positional embeddings, failing to adequately balance computational efficiency with fewer tokens. Consequently, these methods result in relatively lengthy token sequences, limiting their applicability in scenarios requiring extreme token compression, such as video large language models. In this paper, we introduce the novel task of extreme short token reduction, aiming to represent extensive video sequences with a minimal number of tokens. To address this challenge, we propose Token Dynamics, a new video representation framework that dynamically reduces token count while preserving spatial-temporal coherence. Specifically, we disentangle video representations by separating visual embeddings from grid-level motion information, structuring them into: 1. a concise token base, created by clustering tokens that describe object-level content; 2. a token dynamics map, capturing detailed spatial-temporal motion patterns across grids. Furthermore, we introduce a cross-dynamics attention mechanism that integrates motion features into the token base without increasing token length, thereby maintaining compactness and spatial-temporal integrity. The experiments demonstrate a reduction of token count to merely 0.07% of the original tokens, with only a minor performance drop of 1.13%. Additionally, we propose two novel subtasks within extreme token reduction (fixed-length and adaptive-length compression), both effectively representing long token sequences for video-language tasks. Our method offers significantly lower theoretical complexity, fewer tokens, and enhanced throughput, thus providing an efficient solution for video LLMs.

Comment: The paper introduces a novel token reduction framework for video representation in large language models, which aligns with architectural innovations and efficiency improvements. The focus on extreme token reduction is a promising direction.

Relevance: 8 Novelty: 8

13. Structure Is Not Enough: Leveraging Behavior for Neural Network Weight Reconstruction

ArXiv ID: 2503.17138

Authors: L\'eo Meynent, Ivan Melev, Konstantin Sch\"urholt, G\"oran Kauermann, Damian Borth

Abstract: The weights of neural networks (NNs) have recently gained prominence as a new data modality in machine learning, with applications ranging from accuracy and hyperparameter prediction to representation learning or weight generation. One approach to leverage NN weights involves training autoencoders (AEs), using contrastive and reconstruction losses. This allows such models to be applied to a wide variety of downstream tasks, and they demonstrate strong predictive performance and low reconstruction error. However, despite the low reconstruction error, these AEs reconstruct NN models with deteriorated performance compared to the original ones, limiting their usability with regard to model weight generation. In this paper, we identify a limitation of weight-space AEs, specifically highlighting that a structural loss, that uses the Euclidean distance between original and reconstructed weights, fails to capture some features critical for reconstructing high-performing models. We analyze the addition of a behavioral loss for training AEs in weight space, where we compare the output of the reconstructed model with that of the original one, given some common input. We show a strong synergy between structural and behavioral signals, leading to increased performance in all downstream tasks evaluated, in particular NN weights reconstruction and generation.

Comment: The paper introduces a behavioral loss for neural network weight reconstruction, which aligns with representation learning and autoencoders. The focus on combining structural and behavioral signals is a novel approach.

Relevance: 8 Novelty: 8

14. Physics-Informed Deep B-Spline Networks for Dynamical Systems

ArXiv ID: 2503.16777

Authors: Zhuoyuan Wang, Raffaele Romagnoli, Jasmine Ratchford, Yorie Nakahira

Abstract: Physics-informed machine learning provides an approach to combining data and governing physics laws for solving complex partial differential equations (PDEs). However, efficiently solving PDEs with varying parameters and changing initial conditions and boundary conditions (ICBCs) with theoretical guarantees remains an open challenge. We propose a hybrid framework that uses a neural network to learn B-spline control points to approximate solutions to PDEs with varying system and ICBC parameters. The proposed network can be trained efficiently as one can directly specify ICBCs without imposing losses, calculate physics-informed loss functions through analytical formulas, and requires only learning the weights of B-spline functions as opposed to both weights and basis as in traditional neural operator learning methods. We provide theoretical guarantees that the proposed B-spline networks serve as universal approximators for the set of solutions of PDEs with varying ICBCs under mild conditions and establish bounds on the generalization errors in physics-informed learning. We also demonstrate in experiments that the proposed B-spline network can solve problems with discontinuous ICBCs and outperforms existing methods, and is able to learn solutions of 3D dynamics with diverse initial conditions.

Comment: The paper proposes a hybrid framework using B-spline networks for solving PDEs, which is relevant to AI for science and introduces theoretical guarantees, making it foundational.

Relevance: 8 Novelty: 8

15. Ordered Topological Deep Learning: a Network Modeling Case Study

ArXiv ID: 2503.16746

Authors: Guillermo Bern\'ardez, Miquel Ferriol-Galm\'es, Carlos G\"uemes-Palau, Mathilde Papillon, Pere Barlet-Ros, Albert Cabellos-Aparicio, Nina Miolane

Abstract: Computer networks are the foundation of modern digital infrastructure, facilitating global communication and data exchange. As demand for reliable high-bandwidth connectivity grows, advanced network modeling techniques become increasingly essential to optimize performance and predict network behavior. Traditional modeling methods, such as packet-level simulators and queueing theory, have notable limitations --either being computationally expensive or relying on restrictive assumptions that reduce accuracy. In this context, the deep learning-based RouteNet family of models has recently redefined network modeling by showing an unprecedented cost-performance trade-off. In this work, we revisit RouteNet's sophisticated design and uncover its hidden connection to Topological Deep Learning (TDL), an emerging field that models higher-order interactions beyond standard graph-based methods. We demonstrate that, although originally formulated as a heterogeneous Graph Neural Network, RouteNet serves as the first instantiation of a new form of TDL. More specifically, this paper presents OrdGCCN, a novel TDL framework that introduces the notion of ordered neighbors in arbitrary discrete topological spaces, and shows that RouteNet's architecture can be naturally described as an ordered topological neural network. To the best of our knowledge, this marks the first successful real-world application of state-of-the-art TDL principles --which we confirm through extensive testbed experiments--, laying the foundation for the next generation of ordered TDL-driven applications.

Comment: The paper introduces a novel topological deep learning framework, which aligns with architectural innovations and emerging trends.

Relevance: 8 Novelty: 8

16. A Learnability Analysis on Neuro-Symbolic Learning

ArXiv ID: 2503.16797

Authors: Hao-Yuan He, Ming Li

Abstract: This paper analyzes the learnability of neuro-symbolic (NeSy) tasks within hybrid systems. We show that the learnability of NeSy tasks can be characterized by their derived constraint satisfaction problems (DCSPs). Specifically, a task is learnable if the corresponding DCSP has a unique solution; otherwise, it is unlearnable. For learnable tasks, we establish error bounds by exploiting the clustering property of the hypothesis space. Additionally, we analyze the asymptotic error for general NeSy tasks, showing that the expected error scales with the disagreement among solutions. Our results offer a principled approach to determining learnability and provide insights into the design of new algorithms.

Comment: The paper provides a learnability analysis for neuro-symbolic tasks, which aligns with foundational research in representation learning and theoretical insights.