Skip to content

Implemented initial Galerkin Neural ODE#1010

Open
arkdong wants to merge 1 commit into
SciML:masterfrom
arkdong:master
Open

Implemented initial Galerkin Neural ODE#1010
arkdong wants to merge 1 commit into
SciML:masterfrom
arkdong:master

Conversation

@arkdong
Copy link
Copy Markdown

@arkdong arkdong commented Mar 18, 2026

Current Status / Feedback Requested

This PR is an initial draft implementation of a GalerkinNeuralODE #290 for DiffEqFlux.jl, inspired by the paper "Dissecting Neural ODEs", mainly opened for design feedback. The current implementation works on the example benchmark and the preliminary results look encouraging, but the code is not yet aligned with SciML style, still contains many learning-oriented comments, and does not yet include unit tests. Before polishing the implementation further, I would greatly appreciate feedback on the overall design, API, parameter lifting/reconstruction approach, and any obvious AD, sensitivity, or performance issues. After incorporating feedback, I plan to clean up the code, remove unnecessary comments, add unit tests, and improve the documentation.

More detail please my blog arkdong.com

Key Idea from the paper "Dissecting Neural ODEs"

  • Vanilla Neural ODEs cannot be fully considered the deep limit of ResNets. The first attempt to pursue the true deep limit of ResNets is the hypernetwork approach of (Zhang et al., 2019b) where another neural network parametrizes the dynamics of $\theta(s)$.

  • However, this approach is not backed by any theoretical argument and it exhibits a considerable parameter inefficiency, as it generally scales polynomially in $n_{\theta}$. This paper approach to the problem by uncovering an optimization problem in functional space, solved by a direct application of the adjoint sensitivity method in infinite-dimensions.

  • Galerkin Neural ODEs is the spectral discretization verison. The idea is to expand $\theta (s)$ on complete orthogonal basis of a predetermined subspace $\mathbb{L}_{2}(\mathcal{S}\to \mathbb{R}^{n _\theta})$ and truncate the series to the $m$-th term, where $\psi_j(s)$ are basis functions and the trainable objects are the coefficients $\alpha_j$:

$$ \theta (s)=\sum_{j=1}^{m}\alpha_{j}\odot\psi_{j}(s) $$

  • This turns an infinite-dimensional optimization over functions $\theta(s)$ into an ordinary finite-dimensional optimization over coefficient vectors $\alpha=(\alpha_{1}, \dots\alpha_{m})\in \mathbb{R}^{mn_{\theta}}$, whose gradient can be computed as follows
    • Corollary 1 (Spectral Gradients). Under the assumptions of Theorem 1 (Infinite-Dimensional Gradients), if $\theta (s)=\sum_{j=1} ^{m}\alpha_{j} \odot\psi_{j}(s)$, then:

$$ \frac{d \ell}{d\alpha}=\int_{\mathcal{S}}\vec{a}^{\top }(\tau) \frac{ \partial f_{\theta(s)} }{ \partial \theta(s) } \psi(\tau)d\tau, \quad \psi=(\psi_{1}, \dots \psi_{m}) $$

  • At solver time $s$, evaluate the basis functions $\psi_j(s)$, reconstruct the current parameter set $\theta(s)$, and then use that parameter set inside the vector field. So the system you solve is still an ODE, but the ODE’s neural-network parameters now evolve with depth according to the learned basis expansion.

Preliminary Result

  • Testing and graphs are generated using the example code from the doc Neural Ordinary Differential Equations. These results are preliminary and mainly intended as a sanity check for the implementation.

  • In the untrained-state plots, both the standard NeuralODE and the Galerkin NeuralODE start far from the ground-truth trajectories, confirming that neither model matches the data before optimization.
    neural_vs_galerkin_training

  • During training, however, the loss curves show two important patterns:

    • First, the Galerkin model with $M=1$ closely follows the standard NeuralODE, which is a key sanity check because the constant-only Galerkin case should reduce to the vanilla NeuralODE;
    • Second, the Galerkin model with $M=5$ converges substantially faster and reaches a noticeably lower training loss, indicating that the additional basis modes provide extra expressive power. training_loss
  • Finally, the trained trajectory plots show that all three models recover the target dynamics well, with the $M=1$ Galerkin model nearly overlapping the NeuralODE baseline and the $M=5$ model achieving the best overall fit. trajectories

  • Taken together, these preliminary experiments suggest that the implementation is behaving in the expected direction:

    • the $M=1$ case behaves similarly to vanilla NeuralODE
    • richer basis expansion can improve the fit, although part of the improvement for $M=5$ may also come from its larger effective parameterization.

Checklist

  • Appropriate tests were added
  • Any code changes were done in a way that does not break public API
  • All documentation related to code changes were updated
  • The new code follows the
    contributor guidelines, in particular the SciML Style Guide and
    COLPRAC.
  • Any new documentation only uses public API

Additional context

Add any other context about the problem here.

@arkdong
Copy link
Copy Markdown
Author

arkdong commented Mar 18, 2026

Hi @ChrisRackauckas!

My name is Adam, and I am a final-year Bachelor’s student in Computer Science at the University of Amsterdam. I'm very interested in contributing to GSoC 2026 under NumFOCUS, specific to the project idea SciML Scientific Machine Learning Project - Improvements to Neural and Universal Differential Equations

My background is in machine learning, scientific computing, probability, and parallel computing, and I’ve been strengthening that further through a mathematics minor, the CQF, and the MITx MicroMasters in Statistics and Data Science. My current bachelor thesis focuses on risk-sensitive GPU-accelerated multi-agent reinforcement learning for market making in JaxMARL-HFT, which is a big reason I’m especially interested in mathematically grounded learning systems and modeling under uncertainty.

This PR is the first concrete step in the project I proposed extended from the project idea: "Galerkin Neural ODEs and Basis-Parameterized Universal Approximators for DiffEqFlux.jl". Adding a production-quality GalerkinNeuralODE to DiffEqFlux and extending it into a small framework for basis-parameterized continuous-depth models. The core idea is to let the vector-field parameters vary with depth/time through a basis expansion rather than staying fixed throughout the solve as in a standard NeuralODE.

I already have a working prototype with an abstract basis API, a Fourier basis, parameter lifting/reconstruction, and training experiments, and I also had to debug sensitivity-method issues and Zygote mutation issues while getting the prototype running.

I have drafted a first version of my GSoC proposal based on this project direction. I would be very grateful for any feedback on the proposal, especially on the scope, technical plan, and whether the proposed milestones make sense for DiffEqFlux/SciML. The current draft is here: SciML_GSoC_2026.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant