Skip to content

Commit 0121bd0

Browse files
Update progress log with CTRNN exponential Euler and GPU evaluation entries
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent aaca8af commit 0121bd0

File tree

1 file changed

+152
-0
lines changed

1 file changed

+152
-0
lines changed

progress-20260315.md

Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# Progress Log - 2026-03-15
2+
3+
## Merged PR 290: Aggregation callable validation improvement
4+
5+
**PR:** https://github.com/CodeReclaimers/neat-python/pull/290
6+
**Author:** Mark Zhang (@shuofengzhang)
7+
8+
### Summary
9+
10+
Merged a community PR that tightens `validate_aggregation()` in `neat/aggregations.py`. The old validation checked `co_argcount >= 1`, which incorrectly accepted functions requiring two or more positional arguments — these would pass registration but fail at runtime when called as `aggregation(node_inputs)`.
11+
12+
### Changes
13+
14+
- **`neat/aggregations.py`**: Replaced `isinstance` type check + `co_argcount` check with `callable()` + `inspect.signature().bind(object())`. This correctly rejects functions that can't be called with exactly one positional arg (two-required-arg functions, keyword-only-arg functions) while still accepting functions with optional extra args.
15+
- **`tests/test_aggregation.py`**: Added three new tests — `test_add_builtin_max` (builtins pass), `test_bad_add3` (keyword-only rejected), `test_bad_add4` (two-arg rejected).
16+
17+
### Tweaks before merge
18+
19+
Added two small changes on top of the PR before merging:
20+
1. **Error message clarity**: Changed "A function taking a single positional argument is required" to "A callable with exactly one required positional argument is required" — the old wording was misleading since functions with extra *optional* args are valid.
21+
2. **Comment on builtin fallback**: Added a comment explaining why `BuiltinFunctionType` instances skip signature validation (CPython builtins often lack introspectable signatures).
22+
23+
### Alternatives considered
24+
25+
- Could have requested the contributor make the tweaks themselves, but the changes were minor enough to apply directly.
26+
- The builtin escape hatch means something like `os.getpid` (zero-arg builtin) could pass validation. Accepted this as a pragmatic trade-off since builtins used as aggregations in practice (like `max`, `sum`) are correct, and CPython doesn't expose their signatures for inspection.
27+
28+
### Files modified
29+
30+
- `neat/aggregations.py` (+20, -8)
31+
- `tests/test_aggregation.py` (+49)
32+
33+
---
34+
35+
## Switch CTRNN integration to exponential Euler (ETD1)
36+
37+
**Commit:** 7d58c15
38+
39+
### Summary
40+
41+
Replaced forward Euler integration in `neat/ctrnn/__init__.py` with exponential Euler (ETD1). The linear decay term `-u/tau` is now integrated exactly, making the method unconditionally stable regardless of `dt/tau` ratio. Forward Euler required `dt < 2*tau` for stability.
42+
43+
### Change
44+
45+
Old (forward Euler):
46+
```python
47+
ovalues[node_key] += dt / ne.time_constant * (-ovalues[node_key] + z)
48+
```
49+
50+
New (exponential Euler):
51+
```python
52+
decay = math.exp(-dt / ne.time_constant)
53+
ovalues[node_key] = decay * ovalues[node_key] + (1.0 - decay) * z
54+
```
55+
56+
The nonlinear forcing term `z = activation(bias + response * aggregation(inputs))` is still held constant per step — same assumption as forward Euler. The cost per step is identical (one `math.exp` call per node per step, which is negligible).
57+
58+
### Why exponential Euler instead of forward Euler
59+
60+
- Forward Euler is conditionally stable: if a user sets `dt > 2*tau` (easy to do with small time constants like `tau=0.01` and `dt=0.05`), the solution blows up. The old code had a `get_max_time_step()` TODO to address this but it was never implemented.
61+
- Exponential Euler is unconditionally stable for the linear decay part. No step-size restriction needed.
62+
- This also aligns CPU and GPU integration methods (the new GPU evaluator uses exponential Euler), which simplifies numerical equivalence testing.
63+
64+
### Why this is safe
65+
66+
A literature search shows no published work using neat-python's CTRNN or Izhikevich implementations. The reference values in `test_ctrnn.py` were the only hardcoded numerical expectations, and those have been regenerated.
67+
68+
### Files modified
69+
70+
- `neat/ctrnn/__init__.py` (+3, -1) — integration formula change, added `import math`
71+
- `tests/test_ctrnn.py` (+7, -4) — updated reference values and added comment explaining the integration method
72+
73+
---
74+
75+
## Add optional GPU-accelerated CTRNN and Izhikevich evaluation
76+
77+
**Commit:** aaca8af
78+
79+
### Summary
80+
81+
Added a new `neat/gpu/` package providing GPU-accelerated batch evaluation for CTRNN and Izhikevich spiking networks using CuPy. The module is entirely optional — `import neat` never loads CuPy. Users install via `pip install 'neat-python[gpu]'`.
82+
83+
### Motivation
84+
85+
TensorNEAT (Wang et al., GECCO 2024) demonstrated up to 500x speedups by rewriting NEAT in JAX. Rather than rewriting neat-python, we add a GPU evaluation layer that integrates with the existing library. CTRNN and spiking networks are the most computationally expensive network types and the most likely to benefit from GPU acceleration.
86+
87+
### Architecture
88+
89+
```
90+
neat/gpu/
91+
__init__.py # Lazy imports, _import_cupy() helper, gpu_available()
92+
_padding.py # Genome-to-tensor conversion (pure Python/NumPy)
93+
_cupy_backend.py # CuPy kernels and batch evaluation functions
94+
evaluator.py # Public API: GPUCTRNNEvaluator, GPUIZNNEvaluator
95+
```
96+
97+
### Key design decisions
98+
99+
1. **Lazy CuPy imports:** `_import_cupy()` is called only when a GPU evaluator is instantiated. Verified that `import neat` has no CuPy dependency.
100+
101+
2. **Variable topology via padding:** Genomes are packed into fixed-size `[N, M, M]` weight matrices where M = max nodes across current population. Unused slots are zeroed. M is determined per generation, growing as NEAT complexifies.
102+
103+
3. **Node index mapping:** Input pins -> indices 0..num_inputs-1, output nodes -> next block, hidden nodes packed after that. Mapping computed per genome during CPU packing.
104+
105+
4. **Exponential Euler for CTRNN:** Same method as the updated CPU code. Precomputes `decay = exp(-dt/tau)` and `scale = 1 - decay` per node once per generation. Input nodes are clamped (not integrated) — decay/scale set to 1/0 for input slots.
106+
107+
5. **Half-step method for Izhikevich:** Uses the same numerical method as CPU for spike-exact equivalence. Input nodes route external values through the weight matrix via a source vector.
108+
109+
6. **Response parameter applied after matmul:** `z = activation(bias + response * (W @ u))`, not absorbed into weights. Per-node response variation preserved exactly.
110+
111+
7. **Custom CUDA activation kernel:** Dispatches 11 activation functions (sigmoid, tanh, relu, identity, clamped, elu, softplus, sin, gauss, abs, square) matching neat-python formulas including clipping constants.
112+
113+
8. **Sum aggregation only:** Required for batched matrix-vector multiply. Non-sum aggregation raises `ValueError` at pack time with a clear error message.
114+
115+
9. **Fitness function on CPU:** User provides `fitness_fn(trajectory)` called per genome after GPU simulation. Output trajectory shape is `[num_steps, num_outputs]`.
116+
117+
### Alternatives considered
118+
119+
- **PyTorch/JAX backend:** Rejected per plan scope — CuPy is the simplest path for array operations without framework overhead. CuPy's NumPy-compatible API keeps the code readable.
120+
- **Per-step mask multiply for inactive nodes:** Rejected — inactive padding nodes stay zero by construction (zero weights, zero bias, zero initial state), avoiding an extra elementwise operation in the inner loop.
121+
- **Bucketing genomes by topology size:** Rejected as premature optimization for an unlikely scenario (extreme topology variance within a population).
122+
123+
### Test coverage
124+
125+
20 tests total:
126+
- 11 CPU-only packing tests (run without CuPy): shapes, values, masks, disabled connections, unsupported activation/aggregation errors, no-connection genomes, hidden node padding
127+
- 4 CTRNN GPU equivalence tests: numerical agreement with CPU, batch vs individual consistency, response parameter effect, steady-state convergence
128+
- 3 Izhikevich GPU equivalence tests: spike count agreement, batch vs individual, zero-input behavior
129+
- 2 evaluator integration tests: fitness assignment for both network types
130+
131+
All GPU tests skip gracefully when CuPy is not available. Full test suite: 524 passed, 12 skipped (including 9 GPU tests on this machine without CuPy).
132+
133+
### Files created
134+
135+
- `neat/gpu/__init__.py` — lazy imports, capability detection
136+
- `neat/gpu/_padding.py``pack_ctrnn_population()`, `pack_iznn_population()`
137+
- `neat/gpu/_cupy_backend.py` — CUDA activation kernel, `evaluate_ctrnn_batch()`, `evaluate_iznn_batch()`
138+
- `neat/gpu/evaluator.py``GPUCTRNNEvaluator`, `GPUIZNNEvaluator`
139+
- `tests/test_gpu.py` — 20 tests
140+
- `tests/test_configuration_gpu_ctrnn` — config for CTRNN GPU tests
141+
- `benchmarks/gpu_benchmark.py` — CPU vs GPU wall-time comparison at pop sizes 100/500/1000
142+
- `GPU_DESIGN_NOTES.md` — design document covering tensor shapes, index mapping, integration methods, edge cases
143+
144+
### Files modified
145+
146+
- `pyproject.toml` — added `gpu = ["cupy-cuda12x>=12.0"]` optional dependency, `neat.gpu` package, included gpu in `all` extras
147+
148+
### Next steps
149+
150+
- Run benchmarks on a machine with CuPy/GPU to get actual speedup numbers
151+
- Run GPU equivalence tests to verify numerical agreement
152+
- Consider adding GPU support for more activation functions (lelu, selu, inv, log, exp, hat, cube) if users request them

0 commit comments

Comments
 (0)