Update documentation for exponential Euler integration and GPU acceleration

CodeReclaimers · claude · CodeReclaimers · commit 361af034e04c · 2026-03-15T16:46:40.000-04:00
- docs/ctrnn.rst: Replace forward Euler description with exponential Euler
  formula and stability properties. Add GPU-accelerated evaluation section
  with code example and constraints.
- docs/cookbook.rst: Add "How to: Use GPU-Accelerated Evaluation" section
  with CTRNN and Izhikevich examples, API differences from ParallelEvaluator.
- docs/academic_research.rst: Update integration method description, mention
  GPU acceleration option.
- docs/faq.rst: Note GPU acceleration for CTRNN and IZNN network types.
- examples/lorenz-ctrnn/docs/CTRNN-CHANGES.md: Rewrite Section 1 background
  to use exponential Euler formula. Rewrite Section 4 numerical stability
  to explain that unconditional stability eliminates the dt &lt; 2*tau
  constraint; retain forward Euler discussion as historical context.
- CHANGELOG.md: Add [Unreleased] section with GPU evaluation and CTRNN
  integration method changes.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,30 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
 
+## [Unreleased]
+
+### Added
+- **GPU-accelerated evaluation** for CTRNN and Izhikevich spiking networks via optional CuPy dependency
+  - `GPUCTRNNEvaluator` and `GPUIZNNEvaluator` in `neat.gpu.evaluator`
+  - Batch-evaluates entire populations on GPU using padded tensor operations
+  - Install with `pip install 'neat-python[gpu]'`
+  - Custom CUDA kernel supporting 11 activation functions
+  - Requires sum aggregation; other aggregation functions raise `ValueError`
+  - `import neat` never loads CuPy; all GPU imports are lazy
+  - Benchmark script in `benchmarks/gpu_benchmark.py`
+  - See `GPU_DESIGN_NOTES.md` for design rationale
+
+### Changed
+- **CTRNN integration method** changed from forward Euler to exponential Euler (ETD1)
+  - Integrates the linear decay term `-y/tau` exactly
+  - Unconditionally stable regardless of `dt/tau` ratio (forward Euler required `dt < 2*tau`)
+  - Same per-step cost (one `math.exp` call per node)
+  - Numerical results differ from previous versions for the same `dt`; both methods converge
+    to the same continuous solution as `dt` decreases
+  - The `time_constant_min_value` constraint is relaxed: values well below the integration
+    timestep are now safe
+
+
 ## [1.1.0] - 2025-12-05
 
 ### Added
diff --git a/docs/academic_research.rst b/docs/academic_research.rst
@@ -352,9 +352,10 @@ CTRNN and Specialized Networks
 
 When using CTRNN or other specialized network types:
 
-* **Integration method**: neat-python uses explicit Euler integration with fixed time steps
-* **Numerical stability**: Document ``dt`` and ``time_constant`` values; consider sensitivity analysis
+* **Integration method**: neat-python uses exponential Euler (ETD1) integration, which exactly integrates the linear decay term and is unconditionally stable regardless of ``dt/tau``
+* **Numerical stability**: Document ``dt`` and ``time_constant`` values; the exponential Euler method eliminates the ``dt < 2*tau`` stability constraint of forward Euler
 * **Validation**: For dynamics-sensitive applications, validate numerical behavior on simple test cases
+* **GPU acceleration**: For large populations, optional GPU-accelerated evaluation is available via ``neat.gpu`` (requires CuPy). The GPU evaluator uses the same integration method as the CPU implementation.
 
 See :doc:`ctrnn` for implementation details.
 
diff --git a/docs/cookbook.rst b/docs/cookbook.rst
@@ -121,6 +121,88 @@ How to: Use Parallel Evaluation
 
 **See also:** :doc:`module_summaries` for ``ParallelEvaluator`` API details.
 
+How to: Use GPU-Accelerated Evaluation
+---------------------------------------
+
+**Problem:** CTRNN or Izhikevich spiking network evaluation is too slow, even with multiple CPU cores.
+
+**Solution:** Use the GPU evaluators in ``neat.gpu`` to batch-evaluate the entire population on GPU.
+This requires CuPy: ``pip install 'neat-python[gpu]'``
+
+**CTRNN example:**
+
+.. code-block:: python
+
+   import math
+   from neat.gpu.evaluator import GPUCTRNNEvaluator
+
+   def input_fn(t, dt):
+       """Return input signal at time t. Shape: [num_inputs]."""
+       return [math.sin(2 * math.pi * t), math.cos(2 * math.pi * t)]
+
+   def fitness_fn(output_trajectory):
+       """Compute fitness from output trajectory.
+
+       Args:
+           output_trajectory: numpy array of shape [num_steps, num_outputs]
+       Returns:
+           Scalar fitness value.
+       """
+       # Example: reward output that tracks a target
+       return -float(np.mean((output_trajectory[:, 0] - target) ** 2))
+
+   evaluator = GPUCTRNNEvaluator(
+       dt=0.01,       # integration timestep (seconds)
+       t_max=1.0,     # total simulation time (seconds)
+       input_fn=input_fn,
+       fitness_fn=fitness_fn,
+   )
+   winner = population.run(evaluator.evaluate, n=300)
+
+**Izhikevich spiking network example:**
+
+.. code-block:: python
+
+   from neat.gpu.evaluator import GPUIZNNEvaluator
+
+   def input_fn(t, dt):
+       """Return input values at time t. Shape: [num_inputs]."""
+       return [1.0, 0.5]
+
+   def fitness_fn(output_trajectory):
+       """Compute fitness from spike train.
+
+       Args:
+           output_trajectory: numpy array of shape [num_steps, num_outputs],
+                              values are 0.0 (no spike) or 1.0 (spike).
+       """
+       return float(np.sum(output_trajectory))
+
+   evaluator = GPUIZNNEvaluator(
+       dt=0.05,       # integration timestep (milliseconds)
+       t_max=50.0,    # total simulation time (milliseconds)
+       input_fn=input_fn,
+       fitness_fn=fitness_fn,
+   )
+   winner = population.run(evaluator.evaluate, n=300)
+
+**Key differences from** ``ParallelEvaluator``:
+
+* The GPU evaluator handles network creation, simulation, and fitness assignment internally.
+  You provide ``input_fn`` (what to feed the network) and ``fitness_fn`` (how to score the output).
+* ``ParallelEvaluator`` takes an ``eval_genome`` function that returns a scalar fitness.
+  GPU evaluators take separate ``input_fn`` and ``fitness_fn`` callables.
+* ``input_fn`` runs on CPU; the simulation runs on GPU; ``fitness_fn`` runs on CPU per genome.
+
+**Constraints:**
+
+* Only ``sum`` aggregation is supported (required for batched matrix-vector multiply).
+* Supported activation functions: sigmoid, tanh, relu, identity, clamped, elu, softplus, sin,
+  gauss, abs, square. Unsupported functions raise ``ValueError`` at evaluation time.
+* ``import neat`` does not load CuPy. CuPy is imported lazily when a GPU evaluator is created.
+
+**See also:** :doc:`ctrnn` for CTRNN details including the integration method.
+
 How to: Save and Restore Checkpoints
 -------------------------------------
 
diff --git a/docs/ctrnn.rst b/docs/ctrnn.rst
@@ -22,6 +22,56 @@ Where:
 * :math:`A_i` is the set of indices of neurons that provide input to neuron :math:`i`.
 * :math:`w_{ij}` is the :term:`weight` of the :term:`connection` from neuron :math:`j` to neuron :math:`i`.
 
-The time evolution of the network is computed using the forward Euler method:
+The time evolution of the network is computed using the exponential Euler (ETD1) method, which
+integrates the linear decay term exactly:
 
-:math:`y_i(t+\Delta t) = y_i(t) + \Delta t \frac{d y_i}{dt}`
+:math:`y_i(t+\Delta t) = e^{-\Delta t / \tau_i} \cdot y_i(t) + \left(1 - e^{-\Delta t / \tau_i}\right) \cdot z_i`
+
+where :math:`z_i = f_i\left(\beta_i + \rho_i \sum\limits_{j \in A_i} w_{ij} y_j\right)` is the
+activated output and :math:`\rho_i` is the response multiplier of neuron :math:`i`.
+
+This method is **unconditionally stable** for the linear decay part regardless of the ratio
+:math:`\Delta t / \tau_i`. Forward Euler (used in versions prior to 2.0) required
+:math:`\Delta t < 2 \tau_i` for stability, which was problematic when evolving per-node time
+constants — nodes with small :math:`\tau_i` relative to the integration timestep would produce
+divergent trajectories.
+
+.. note::
+   The exponential Euler method holds the nonlinear term :math:`z_i` constant over each timestep
+   (the same assumption as forward Euler). The accuracy advantage comes from exactly integrating
+   the linear decay :math:`-y_i / \tau_i`, which is the dominant term for stiff systems where
+   :math:`\tau_i \ll \Delta t`.
+
+GPU-Accelerated Evaluation
+--------------------------
+
+For large populations, CTRNN evaluation can be accelerated on GPU using the optional ``neat.gpu``
+module. This requires CuPy (install via ``pip install 'neat-python[gpu]'``).
+
+The GPU evaluator uses the same exponential Euler integration method as the CPU implementation.
+Variable-topology genomes are packed into fixed-size padded tensors and evaluated in a single
+batched operation across the entire population.
+
+.. code-block:: python
+
+   from neat.gpu.evaluator import GPUCTRNNEvaluator
+
+   def input_fn(t, dt):
+       """Return input values at time t. Shape: [num_inputs]."""
+       return [math.sin(2 * math.pi * t), math.cos(2 * math.pi * t)]
+
+   def fitness_fn(output_trajectory):
+       """Compute fitness from output trajectory. Shape: [num_steps, num_outputs]."""
+       return float(output_trajectory[-1, 0])
+
+   evaluator = GPUCTRNNEvaluator(dt=0.01, t_max=1.0,
+                                  input_fn=input_fn, fitness_fn=fitness_fn)
+   winner = population.run(evaluator.evaluate, n=300)
+
+**Constraints:**
+
+* Only ``sum`` aggregation is supported (required for batched matrix-vector multiply).
+* Supported activation functions: sigmoid, tanh, relu, identity, clamped, elu, softplus, sin,
+  gauss, abs, square.
+* Genomes using unsupported aggregation or activation functions will raise ``ValueError`` at
+  evaluation time.
diff --git a/docs/faq.rst b/docs/faq.rst
@@ -656,13 +656,13 @@ NEAT-Python **closely follows** the original NEAT paper (Stanley & Miikkulainen,
 3. **Configurable reproduction** (original had fixed parameters)
 4. **Additional network types:**
    - Recurrent networks
-   - CTRNN (continuous-time)
-   - IZNN (Izhikevich spiking)
+   - CTRNN (continuous-time) with optional GPU acceleration
+   - IZNN (Izhikevich spiking) with optional GPU acceleration
 5. **Enhanced features:**
    - Checkpointing
    - Statistics reporting
    - Network export
-   - Parallel evaluation
+   - Parallel evaluation (CPU multiprocessing and GPU batch evaluation)
 
 **Key improvement in v1.0.0:**
 
diff --git a/examples/lorenz-ctrnn/docs/CTRNN-CHANGES.md b/examples/lorenz-ctrnn/docs/CTRNN-CHANGES.md
@@ -2,11 +2,18 @@
 
 ## 1. Background
 
-A Continuous-Time Recurrent Neural Network (CTRNN) models each node as a leaky integrator with state variable y_i and time constant tau_i. The state update under explicit Euler integration is:
+A Continuous-Time Recurrent Neural Network (CTRNN) models each node as a leaky integrator with state variable y_i and time constant tau_i. The underlying ODE is:
 
-    y_i(t + dt) = y_i(t) + (dt / tau_i) * (-y_i(t) + f(bias_i + response_i * aggregation_j(w_ij * y_j)))
+    tau_i * dy_i/dt = -y_i + f(bias_i + response_i * aggregation_j(w_ij * y_j))
 
-where f is the activation function, w_ij are connection weights, and dt is the integration timestep. The time constant tau_i controls how quickly node i responds to its inputs: small tau gives fast response (the node closely tracks its instantaneous input), while large tau gives slow response (the node integrates over a longer temporal window).
+where f is the activation function, w_ij are connection weights, and dt is the integration timestep. As of v2.0, this ODE is integrated using the exponential Euler (ETD1) method, which exactly integrates the linear decay term:
+
+    decay_i = exp(-dt / tau_i)
+    y_i(t + dt) = decay_i * y_i(t) + (1 - decay_i) * z_i
+
+where z_i = f(bias_i + response_i * aggregation_j(w_ij * y_j)). This replaces the forward Euler method used prior to v2.0. The exponential Euler method is unconditionally stable for the linear decay regardless of the dt/tau_i ratio, which is critical when evolving per-node time constants (see Section 4).
+
+The time constant tau_i controls how quickly node i responds to its inputs: small tau gives fast response (the node closely tracks its instantaneous input), while large tau gives slow response (the node integrates over a longer temporal window).
 
 In the NEAT (NeuroEvolution of Augmenting Topologies) framework, the structure and parameters of the neural network are evolved simultaneously. For CTRNNs, the natural approach is to evolve tau_i as a per-node gene attribute alongside bias, response, activation function, and connection weights.
 
@@ -85,30 +92,30 @@ When time constants are not configured (all nodes have tau = 1.0 by default), th
 
 ## 4. Numerical Stability Consideration
 
-Evolving per-node time constants introduces a stability constraint that does not arise with a fixed time constant. The explicit Euler update:
+### 4.1 Exponential Euler (v2.0)
 
-    y_i(t + dt) = y_i(t) + (dt / tau_i) * (-y_i(t) + z_i)
+The exponential Euler method used in v2.0 is unconditionally stable for the linear decay term, regardless of the dt/tau_i ratio. For example, with dt = 0.1 and tau = 0.01, the decay factor is exp(-0.1/0.01) = exp(-10) ≈ 0.000045, meaning the node responds almost instantaneously to its input — the state effectively jumps to z_i each step. This is physically correct behavior (a very fast node tracks its input closely), not a numerical instability.
 
-is stable only when dt / tau_i is of order 1 or smaller. When tau_i is much smaller than dt, the factor dt / tau_i becomes large and the integration oscillates with exponentially growing amplitude. For example, with dt = 0.1 and tau = 0.01, the factor dt / tau = 10. The state at each step is multiplied by approximately (1 - dt/tau) = -9, producing rapid divergence.
+The stability guarantee applies only to the linear decay. The nonlinear forcing term z_i = f(bias + response * aggregation(w_ij * y_j)) can still produce large values if weights are large, but this is bounded by the activation function (e.g., tanh saturates at ±1, sigmoid at [0,1]). Unbounded activation functions (identity, relu) with large weights can produce growing trajectories, but this is a modeling issue rather than a numerical stability issue.
 
-With a fixed time constant, the user chooses tau >= dt (or adjusts dt) and this issue does not arise. With evolved time constants, the evolutionary search explores the full range [tau_min, tau_max], and some genomes will inevitably have nodes where tau is small relative to the integration timestep.
+As a result, the `time_constant_min_value` constraint is relaxed compared to the forward Euler era. Users can safely set `time_constant_min_value` well below the integration timestep without risking numerical blowup.
 
-The recommended approach is to handle this through the fitness function rather than by modifying the integration:
+### 4.2 Forward Euler (prior to v2.0)
 
-```python
-if any(math.isnan(v) or math.isinf(v) or abs(v) > 1e10 for v in output):
-    return PENALTY_FITNESS
-```
+Prior to v2.0, the forward Euler update
 
-This assigns a poor fitness to genomes that produce numerically unstable outputs, allowing natural selection to eliminate unstable time constant configurations. The alternative -- clamping tau or switching to an implicit integration scheme -- would either restrict the evolvable range or change the network dynamics in ways that may not be desirable.
+    y_i(t + dt) = y_i(t) + (dt / tau_i) * (-y_i(t) + z_i)
 
-In practice, selection pressure eliminates unstable configurations within the first few generations. The penalty is rarely triggered thereafter.
+was stable only when dt / tau_i was of order 1 or smaller. When tau_i was much smaller than dt, the factor dt / tau_i became large and the integration oscillated with exponentially growing amplitude. For example, with dt = 0.1 and tau = 0.01, the factor dt / tau = 10. The state at each step was multiplied by approximately (1 - dt/tau) = -9, producing rapid divergence.
 
-Users should be aware of this constraint when setting `time_constant_min_value`. A conservative rule of thumb is:
+The recommended workaround was to guard against this in the fitness function:
 
-    time_constant_min_value >= integration_timestep
+```python
+if any(math.isnan(v) or math.isinf(v) or abs(v) > 1e10 for v in output):
+    return PENALTY_FITNESS
+```
 
-though smaller values can work if the fitness function includes a stability guard as shown above.
+This workaround is no longer necessary with the exponential Euler method, though it remains harmless if present in existing code.
 
 ## 5. Quantitative Improvement