Implemented initial Galerkin Neural ODE#1010
Conversation
|
Hi @ChrisRackauckas! My name is Adam, and I am a final-year Bachelor’s student in Computer Science at the University of Amsterdam. I'm very interested in contributing to GSoC 2026 under NumFOCUS, specific to the project idea SciML Scientific Machine Learning Project - Improvements to Neural and Universal Differential Equations My background is in machine learning, scientific computing, probability, and parallel computing, and I’ve been strengthening that further through a mathematics minor, the CQF, and the MITx MicroMasters in Statistics and Data Science. My current bachelor thesis focuses on risk-sensitive GPU-accelerated multi-agent reinforcement learning for market making in JaxMARL-HFT, which is a big reason I’m especially interested in mathematically grounded learning systems and modeling under uncertainty. This PR is the first concrete step in the project I proposed extended from the project idea: "Galerkin Neural ODEs and Basis-Parameterized Universal Approximators for DiffEqFlux.jl". Adding a production-quality GalerkinNeuralODE to DiffEqFlux and extending it into a small framework for basis-parameterized continuous-depth models. The core idea is to let the vector-field parameters vary with depth/time through a basis expansion rather than staying fixed throughout the solve as in a standard NeuralODE. I already have a working prototype with an abstract basis API, a Fourier basis, parameter lifting/reconstruction, and training experiments, and I also had to debug sensitivity-method issues and Zygote mutation issues while getting the prototype running. I have drafted a first version of my GSoC proposal based on this project direction. I would be very grateful for any feedback on the proposal, especially on the scope, technical plan, and whether the proposed milestones make sense for DiffEqFlux/SciML. The current draft is here: SciML_GSoC_2026.pdf |
Current Status / Feedback Requested
This PR is an initial draft implementation of a GalerkinNeuralODE #290 for DiffEqFlux.jl, inspired by the paper "Dissecting Neural ODEs", mainly opened for design feedback. The current implementation works on the example benchmark and the preliminary results look encouraging, but the code is not yet aligned with SciML style, still contains many learning-oriented comments, and does not yet include unit tests. Before polishing the implementation further, I would greatly appreciate feedback on the overall design, API, parameter lifting/reconstruction approach, and any obvious AD, sensitivity, or performance issues. After incorporating feedback, I plan to clean up the code, remove unnecessary comments, add unit tests, and improve the documentation.
More detail please my blog arkdong.com
Key Idea from the paper "Dissecting Neural ODEs"
Vanilla Neural ODEs cannot be fully considered the deep limit of ResNets. The first attempt to pursue the true deep limit of ResNets is the hypernetwork approach of (Zhang et al., 2019b) where another neural network parametrizes the dynamics of$\theta(s)$ .
However, this approach is not backed by any theoretical argument and it exhibits a considerable parameter inefficiency, as it generally scales polynomially in$n_{\theta}$ . This paper approach to the problem by uncovering an optimization problem in functional space, solved by a direct application of the adjoint sensitivity method in infinite-dimensions.
Galerkin Neural ODEs is the spectral discretization verison. The idea is to expand$\theta (s)$ on complete orthogonal basis of a predetermined subspace $\mathbb{L}_{2}(\mathcal{S}\to \mathbb{R}^{n _\theta})$ and truncate the series to the $m$ -th term, where $\psi_j(s)$ are basis functions and the trainable objects are the coefficients $\alpha_j$ :
Preliminary Result
Testing and graphs are generated using the example code from the doc Neural Ordinary Differential Equations. These results are preliminary and mainly intended as a sanity check for the implementation.
In the untrained-state plots, both the standard NeuralODE and the Galerkin NeuralODE start far from the ground-truth trajectories, confirming that neither model matches the data before optimization.

During training, however, the loss curves show two important patterns:
Finally, the trained trajectory plots show that all three models recover the target dynamics well, with the$M=1$ Galerkin model nearly overlapping the NeuralODE baseline and the $M=5$ model achieving the best overall fit. 
Taken together, these preliminary experiments suggest that the implementation is behaving in the expected direction:
Checklist
contributor guidelines, in particular the SciML Style Guide and
COLPRAC.
Additional context
Add any other context about the problem here.