Skip to content

Commit 361af03

Browse files
Update documentation for exponential Euler integration and GPU acceleration
- docs/ctrnn.rst: Replace forward Euler description with exponential Euler formula and stability properties. Add GPU-accelerated evaluation section with code example and constraints. - docs/cookbook.rst: Add "How to: Use GPU-Accelerated Evaluation" section with CTRNN and Izhikevich examples, API differences from ParallelEvaluator. - docs/academic_research.rst: Update integration method description, mention GPU acceleration option. - docs/faq.rst: Note GPU acceleration for CTRNN and IZNN network types. - examples/lorenz-ctrnn/docs/CTRNN-CHANGES.md: Rewrite Section 1 background to use exponential Euler formula. Rewrite Section 4 numerical stability to explain that unconditional stability eliminates the dt < 2*tau constraint; retain forward Euler discussion as historical context. - CHANGELOG.md: Add [Unreleased] section with GPU evaluation and CTRNN integration method changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent c3b36df commit 361af03

File tree

6 files changed

+188
-24
lines changed

6 files changed

+188
-24
lines changed

CHANGELOG.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,30 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

88

9+
## [Unreleased]
10+
11+
### Added
12+
- **GPU-accelerated evaluation** for CTRNN and Izhikevich spiking networks via optional CuPy dependency
13+
- `GPUCTRNNEvaluator` and `GPUIZNNEvaluator` in `neat.gpu.evaluator`
14+
- Batch-evaluates entire populations on GPU using padded tensor operations
15+
- Install with `pip install 'neat-python[gpu]'`
16+
- Custom CUDA kernel supporting 11 activation functions
17+
- Requires sum aggregation; other aggregation functions raise `ValueError`
18+
- `import neat` never loads CuPy; all GPU imports are lazy
19+
- Benchmark script in `benchmarks/gpu_benchmark.py`
20+
- See `GPU_DESIGN_NOTES.md` for design rationale
21+
22+
### Changed
23+
- **CTRNN integration method** changed from forward Euler to exponential Euler (ETD1)
24+
- Integrates the linear decay term `-y/tau` exactly
25+
- Unconditionally stable regardless of `dt/tau` ratio (forward Euler required `dt < 2*tau`)
26+
- Same per-step cost (one `math.exp` call per node)
27+
- Numerical results differ from previous versions for the same `dt`; both methods converge
28+
to the same continuous solution as `dt` decreases
29+
- The `time_constant_min_value` constraint is relaxed: values well below the integration
30+
timestep are now safe
31+
32+
933
## [1.1.0] - 2025-12-05
1034

1135
### Added

docs/academic_research.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -352,9 +352,10 @@ CTRNN and Specialized Networks
352352

353353
When using CTRNN or other specialized network types:
354354

355-
* **Integration method**: neat-python uses explicit Euler integration with fixed time steps
356-
* **Numerical stability**: Document ``dt`` and ``time_constant`` values; consider sensitivity analysis
355+
* **Integration method**: neat-python uses exponential Euler (ETD1) integration, which exactly integrates the linear decay term and is unconditionally stable regardless of ``dt/tau``
356+
* **Numerical stability**: Document ``dt`` and ``time_constant`` values; the exponential Euler method eliminates the ``dt < 2*tau`` stability constraint of forward Euler
357357
* **Validation**: For dynamics-sensitive applications, validate numerical behavior on simple test cases
358+
* **GPU acceleration**: For large populations, optional GPU-accelerated evaluation is available via ``neat.gpu`` (requires CuPy). The GPU evaluator uses the same integration method as the CPU implementation.
358359

359360
See :doc:`ctrnn` for implementation details.
360361

docs/cookbook.rst

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,88 @@ How to: Use Parallel Evaluation
121121

122122
**See also:** :doc:`module_summaries` for ``ParallelEvaluator`` API details.
123123

124+
How to: Use GPU-Accelerated Evaluation
125+
---------------------------------------
126+
127+
**Problem:** CTRNN or Izhikevich spiking network evaluation is too slow, even with multiple CPU cores.
128+
129+
**Solution:** Use the GPU evaluators in ``neat.gpu`` to batch-evaluate the entire population on GPU.
130+
This requires CuPy: ``pip install 'neat-python[gpu]'``
131+
132+
**CTRNN example:**
133+
134+
.. code-block:: python
135+
136+
import math
137+
from neat.gpu.evaluator import GPUCTRNNEvaluator
138+
139+
def input_fn(t, dt):
140+
"""Return input signal at time t. Shape: [num_inputs]."""
141+
return [math.sin(2 * math.pi * t), math.cos(2 * math.pi * t)]
142+
143+
def fitness_fn(output_trajectory):
144+
"""Compute fitness from output trajectory.
145+
146+
Args:
147+
output_trajectory: numpy array of shape [num_steps, num_outputs]
148+
Returns:
149+
Scalar fitness value.
150+
"""
151+
# Example: reward output that tracks a target
152+
return -float(np.mean((output_trajectory[:, 0] - target) ** 2))
153+
154+
evaluator = GPUCTRNNEvaluator(
155+
dt=0.01, # integration timestep (seconds)
156+
t_max=1.0, # total simulation time (seconds)
157+
input_fn=input_fn,
158+
fitness_fn=fitness_fn,
159+
)
160+
winner = population.run(evaluator.evaluate, n=300)
161+
162+
**Izhikevich spiking network example:**
163+
164+
.. code-block:: python
165+
166+
from neat.gpu.evaluator import GPUIZNNEvaluator
167+
168+
def input_fn(t, dt):
169+
"""Return input values at time t. Shape: [num_inputs]."""
170+
return [1.0, 0.5]
171+
172+
def fitness_fn(output_trajectory):
173+
"""Compute fitness from spike train.
174+
175+
Args:
176+
output_trajectory: numpy array of shape [num_steps, num_outputs],
177+
values are 0.0 (no spike) or 1.0 (spike).
178+
"""
179+
return float(np.sum(output_trajectory))
180+
181+
evaluator = GPUIZNNEvaluator(
182+
dt=0.05, # integration timestep (milliseconds)
183+
t_max=50.0, # total simulation time (milliseconds)
184+
input_fn=input_fn,
185+
fitness_fn=fitness_fn,
186+
)
187+
winner = population.run(evaluator.evaluate, n=300)
188+
189+
**Key differences from** ``ParallelEvaluator``:
190+
191+
* The GPU evaluator handles network creation, simulation, and fitness assignment internally.
192+
You provide ``input_fn`` (what to feed the network) and ``fitness_fn`` (how to score the output).
193+
* ``ParallelEvaluator`` takes an ``eval_genome`` function that returns a scalar fitness.
194+
GPU evaluators take separate ``input_fn`` and ``fitness_fn`` callables.
195+
* ``input_fn`` runs on CPU; the simulation runs on GPU; ``fitness_fn`` runs on CPU per genome.
196+
197+
**Constraints:**
198+
199+
* Only ``sum`` aggregation is supported (required for batched matrix-vector multiply).
200+
* Supported activation functions: sigmoid, tanh, relu, identity, clamped, elu, softplus, sin,
201+
gauss, abs, square. Unsupported functions raise ``ValueError`` at evaluation time.
202+
* ``import neat`` does not load CuPy. CuPy is imported lazily when a GPU evaluator is created.
203+
204+
**See also:** :doc:`ctrnn` for CTRNN details including the integration method.
205+
124206
How to: Save and Restore Checkpoints
125207
-------------------------------------
126208

docs/ctrnn.rst

Lines changed: 52 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,56 @@ Where:
2222
* :math:`A_i` is the set of indices of neurons that provide input to neuron :math:`i`.
2323
* :math:`w_{ij}` is the :term:`weight` of the :term:`connection` from neuron :math:`j` to neuron :math:`i`.
2424

25-
The time evolution of the network is computed using the forward Euler method:
25+
The time evolution of the network is computed using the exponential Euler (ETD1) method, which
26+
integrates the linear decay term exactly:
2627

27-
:math:`y_i(t+\Delta t) = y_i(t) + \Delta t \frac{d y_i}{dt}`
28+
:math:`y_i(t+\Delta t) = e^{-\Delta t / \tau_i} \cdot y_i(t) + \left(1 - e^{-\Delta t / \tau_i}\right) \cdot z_i`
29+
30+
where :math:`z_i = f_i\left(\beta_i + \rho_i \sum\limits_{j \in A_i} w_{ij} y_j\right)` is the
31+
activated output and :math:`\rho_i` is the response multiplier of neuron :math:`i`.
32+
33+
This method is **unconditionally stable** for the linear decay part regardless of the ratio
34+
:math:`\Delta t / \tau_i`. Forward Euler (used in versions prior to 2.0) required
35+
:math:`\Delta t < 2 \tau_i` for stability, which was problematic when evolving per-node time
36+
constants — nodes with small :math:`\tau_i` relative to the integration timestep would produce
37+
divergent trajectories.
38+
39+
.. note::
40+
The exponential Euler method holds the nonlinear term :math:`z_i` constant over each timestep
41+
(the same assumption as forward Euler). The accuracy advantage comes from exactly integrating
42+
the linear decay :math:`-y_i / \tau_i`, which is the dominant term for stiff systems where
43+
:math:`\tau_i \ll \Delta t`.
44+
45+
GPU-Accelerated Evaluation
46+
--------------------------
47+
48+
For large populations, CTRNN evaluation can be accelerated on GPU using the optional ``neat.gpu``
49+
module. This requires CuPy (install via ``pip install 'neat-python[gpu]'``).
50+
51+
The GPU evaluator uses the same exponential Euler integration method as the CPU implementation.
52+
Variable-topology genomes are packed into fixed-size padded tensors and evaluated in a single
53+
batched operation across the entire population.
54+
55+
.. code-block:: python
56+
57+
from neat.gpu.evaluator import GPUCTRNNEvaluator
58+
59+
def input_fn(t, dt):
60+
"""Return input values at time t. Shape: [num_inputs]."""
61+
return [math.sin(2 * math.pi * t), math.cos(2 * math.pi * t)]
62+
63+
def fitness_fn(output_trajectory):
64+
"""Compute fitness from output trajectory. Shape: [num_steps, num_outputs]."""
65+
return float(output_trajectory[-1, 0])
66+
67+
evaluator = GPUCTRNNEvaluator(dt=0.01, t_max=1.0,
68+
input_fn=input_fn, fitness_fn=fitness_fn)
69+
winner = population.run(evaluator.evaluate, n=300)
70+
71+
**Constraints:**
72+
73+
* Only ``sum`` aggregation is supported (required for batched matrix-vector multiply).
74+
* Supported activation functions: sigmoid, tanh, relu, identity, clamped, elu, softplus, sin,
75+
gauss, abs, square.
76+
* Genomes using unsupported aggregation or activation functions will raise ``ValueError`` at
77+
evaluation time.

docs/faq.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -656,13 +656,13 @@ NEAT-Python **closely follows** the original NEAT paper (Stanley & Miikkulainen,
656656
3. **Configurable reproduction** (original had fixed parameters)
657657
4. **Additional network types:**
658658
- Recurrent networks
659-
- CTRNN (continuous-time)
660-
- IZNN (Izhikevich spiking)
659+
- CTRNN (continuous-time) with optional GPU acceleration
660+
- IZNN (Izhikevich spiking) with optional GPU acceleration
661661
5. **Enhanced features:**
662662
- Checkpointing
663663
- Statistics reporting
664664
- Network export
665-
- Parallel evaluation
665+
- Parallel evaluation (CPU multiprocessing and GPU batch evaluation)
666666

667667
**Key improvement in v1.0.0:**
668668

examples/lorenz-ctrnn/docs/CTRNN-CHANGES.md

Lines changed: 24 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,18 @@
22

33
## 1. Background
44

5-
A Continuous-Time Recurrent Neural Network (CTRNN) models each node as a leaky integrator with state variable y_i and time constant tau_i. The state update under explicit Euler integration is:
5+
A Continuous-Time Recurrent Neural Network (CTRNN) models each node as a leaky integrator with state variable y_i and time constant tau_i. The underlying ODE is:
66

7-
y_i(t + dt) = y_i(t) + (dt / tau_i) * (-y_i(t) + f(bias_i + response_i * aggregation_j(w_ij * y_j)))
7+
tau_i * dy_i/dt = -y_i + f(bias_i + response_i * aggregation_j(w_ij * y_j))
88

9-
where f is the activation function, w_ij are connection weights, and dt is the integration timestep. The time constant tau_i controls how quickly node i responds to its inputs: small tau gives fast response (the node closely tracks its instantaneous input), while large tau gives slow response (the node integrates over a longer temporal window).
9+
where f is the activation function, w_ij are connection weights, and dt is the integration timestep. As of v2.0, this ODE is integrated using the exponential Euler (ETD1) method, which exactly integrates the linear decay term:
10+
11+
decay_i = exp(-dt / tau_i)
12+
y_i(t + dt) = decay_i * y_i(t) + (1 - decay_i) * z_i
13+
14+
where z_i = f(bias_i + response_i * aggregation_j(w_ij * y_j)). This replaces the forward Euler method used prior to v2.0. The exponential Euler method is unconditionally stable for the linear decay regardless of the dt/tau_i ratio, which is critical when evolving per-node time constants (see Section 4).
15+
16+
The time constant tau_i controls how quickly node i responds to its inputs: small tau gives fast response (the node closely tracks its instantaneous input), while large tau gives slow response (the node integrates over a longer temporal window).
1017

1118
In the NEAT (NeuroEvolution of Augmenting Topologies) framework, the structure and parameters of the neural network are evolved simultaneously. For CTRNNs, the natural approach is to evolve tau_i as a per-node gene attribute alongside bias, response, activation function, and connection weights.
1219

@@ -85,30 +92,30 @@ When time constants are not configured (all nodes have tau = 1.0 by default), th
8592

8693
## 4. Numerical Stability Consideration
8794

88-
Evolving per-node time constants introduces a stability constraint that does not arise with a fixed time constant. The explicit Euler update:
95+
### 4.1 Exponential Euler (v2.0)
8996

90-
y_i(t + dt) = y_i(t) + (dt / tau_i) * (-y_i(t) + z_i)
97+
The exponential Euler method used in v2.0 is unconditionally stable for the linear decay term, regardless of the dt/tau_i ratio. For example, with dt = 0.1 and tau = 0.01, the decay factor is exp(-0.1/0.01) = exp(-10) ≈ 0.000045, meaning the node responds almost instantaneously to its input — the state effectively jumps to z_i each step. This is physically correct behavior (a very fast node tracks its input closely), not a numerical instability.
9198

92-
is stable only when dt / tau_i is of order 1 or smaller. When tau_i is much smaller than dt, the factor dt / tau_i becomes large and the integration oscillates with exponentially growing amplitude. For example, with dt = 0.1 and tau = 0.01, the factor dt / tau = 10. The state at each step is multiplied by approximately (1 - dt/tau) = -9, producing rapid divergence.
99+
The stability guarantee applies only to the linear decay. The nonlinear forcing term z_i = f(bias + response * aggregation(w_ij * y_j)) can still produce large values if weights are large, but this is bounded by the activation function (e.g., tanh saturates at ±1, sigmoid at [0,1]). Unbounded activation functions (identity, relu) with large weights can produce growing trajectories, but this is a modeling issue rather than a numerical stability issue.
93100

94-
With a fixed time constant, the user chooses tau >= dt (or adjusts dt) and this issue does not arise. With evolved time constants, the evolutionary search explores the full range [tau_min, tau_max], and some genomes will inevitably have nodes where tau is small relative to the integration timestep.
101+
As a result, the `time_constant_min_value` constraint is relaxed compared to the forward Euler era. Users can safely set `time_constant_min_value` well below the integration timestep without risking numerical blowup.
95102

96-
The recommended approach is to handle this through the fitness function rather than by modifying the integration:
103+
### 4.2 Forward Euler (prior to v2.0)
97104

98-
```python
99-
if any(math.isnan(v) or math.isinf(v) or abs(v) > 1e10 for v in output):
100-
return PENALTY_FITNESS
101-
```
105+
Prior to v2.0, the forward Euler update
102106

103-
This assigns a poor fitness to genomes that produce numerically unstable outputs, allowing natural selection to eliminate unstable time constant configurations. The alternative -- clamping tau or switching to an implicit integration scheme -- would either restrict the evolvable range or change the network dynamics in ways that may not be desirable.
107+
y_i(t + dt) = y_i(t) + (dt / tau_i) * (-y_i(t) + z_i)
104108

105-
In practice, selection pressure eliminates unstable configurations within the first few generations. The penalty is rarely triggered thereafter.
109+
was stable only when dt / tau_i was of order 1 or smaller. When tau_i was much smaller than dt, the factor dt / tau_i became large and the integration oscillated with exponentially growing amplitude. For example, with dt = 0.1 and tau = 0.01, the factor dt / tau = 10. The state at each step was multiplied by approximately (1 - dt/tau) = -9, producing rapid divergence.
106110

107-
Users should be aware of this constraint when setting `time_constant_min_value`. A conservative rule of thumb is:
111+
The recommended workaround was to guard against this in the fitness function:
108112

109-
time_constant_min_value >= integration_timestep
113+
```python
114+
if any(math.isnan(v) or math.isinf(v) or abs(v) > 1e10 for v in output):
115+
return PENALTY_FITNESS
116+
```
110117

111-
though smaller values can work if the fitness function includes a stability guard as shown above.
118+
This workaround is no longer necessary with the exponential Euler method, though it remains harmless if present in existing code.
112119

113120
## 5. Quantitative Improvement
114121

0 commit comments

Comments
 (0)