Update progress log with CTRNN exponential Euler and GPU evaluation entries

CodeReclaimers · claude · CodeReclaimers · commit 0121bd0ce4e0 · 2026-03-15T16:33:28.000-04:00
Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/progress-20260315.md b/progress-20260315.md
@@ -0,0 +1,152 @@
+# Progress Log - 2026-03-15
+
+## Merged PR 290: Aggregation callable validation improvement
+
+**PR:** https://github.com/CodeReclaimers/neat-python/pull/290
+**Author:** Mark Zhang (@shuofengzhang)
+
+### Summary
+
+Merged a community PR that tightens `validate_aggregation()` in `neat/aggregations.py`. The old validation checked `co_argcount >= 1`, which incorrectly accepted functions requiring two or more positional arguments — these would pass registration but fail at runtime when called as `aggregation(node_inputs)`.
+
+### Changes
+
+- **`neat/aggregations.py`**: Replaced `isinstance` type check + `co_argcount` check with `callable()` + `inspect.signature().bind(object())`. This correctly rejects functions that can't be called with exactly one positional arg (two-required-arg functions, keyword-only-arg functions) while still accepting functions with optional extra args.
+- **`tests/test_aggregation.py`**: Added three new tests — `test_add_builtin_max` (builtins pass), `test_bad_add3` (keyword-only rejected), `test_bad_add4` (two-arg rejected).
+
+### Tweaks before merge
+
+Added two small changes on top of the PR before merging:
+1. **Error message clarity**: Changed "A function taking a single positional argument is required" to "A callable with exactly one required positional argument is required" — the old wording was misleading since functions with extra *optional* args are valid.
+2. **Comment on builtin fallback**: Added a comment explaining why `BuiltinFunctionType` instances skip signature validation (CPython builtins often lack introspectable signatures).
+
+### Alternatives considered
+
+- Could have requested the contributor make the tweaks themselves, but the changes were minor enough to apply directly.
+- The builtin escape hatch means something like `os.getpid` (zero-arg builtin) could pass validation. Accepted this as a pragmatic trade-off since builtins used as aggregations in practice (like `max`, `sum`) are correct, and CPython doesn't expose their signatures for inspection.
+
+### Files modified
+
+- `neat/aggregations.py` (+20, -8)
+- `tests/test_aggregation.py` (+49)
+
+---
+
+## Switch CTRNN integration to exponential Euler (ETD1)
+
+**Commit:** 7d58c15
+
+### Summary
+
+Replaced forward Euler integration in `neat/ctrnn/__init__.py` with exponential Euler (ETD1). The linear decay term `-u/tau` is now integrated exactly, making the method unconditionally stable regardless of `dt/tau` ratio. Forward Euler required `dt < 2*tau` for stability.
+
+### Change
+
+Old (forward Euler):
+```python
+ovalues[node_key] += dt / ne.time_constant * (-ovalues[node_key] + z)
+```
+
+New (exponential Euler):
+```python
+decay = math.exp(-dt / ne.time_constant)
+ovalues[node_key] = decay * ovalues[node_key] + (1.0 - decay) * z
+```
+
+The nonlinear forcing term `z = activation(bias + response * aggregation(inputs))` is still held constant per step — same assumption as forward Euler. The cost per step is identical (one `math.exp` call per node per step, which is negligible).
+
+### Why exponential Euler instead of forward Euler
+
+- Forward Euler is conditionally stable: if a user sets `dt > 2*tau` (easy to do with small time constants like `tau=0.01` and `dt=0.05`), the solution blows up. The old code had a `get_max_time_step()` TODO to address this but it was never implemented.
+- Exponential Euler is unconditionally stable for the linear decay part. No step-size restriction needed.
+- This also aligns CPU and GPU integration methods (the new GPU evaluator uses exponential Euler), which simplifies numerical equivalence testing.
+
+### Why this is safe
+
+A literature search shows no published work using neat-python's CTRNN or Izhikevich implementations. The reference values in `test_ctrnn.py` were the only hardcoded numerical expectations, and those have been regenerated.
+
+### Files modified
+
+- `neat/ctrnn/__init__.py` (+3, -1) — integration formula change, added `import math`
+- `tests/test_ctrnn.py` (+7, -4) — updated reference values and added comment explaining the integration method
+
+---
+
+## Add optional GPU-accelerated CTRNN and Izhikevich evaluation
+
+**Commit:** aaca8af
+
+### Summary
+
+Added a new `neat/gpu/` package providing GPU-accelerated batch evaluation for CTRNN and Izhikevich spiking networks using CuPy. The module is entirely optional — `import neat` never loads CuPy. Users install via `pip install 'neat-python[gpu]'`.
+
+### Motivation
+
+TensorNEAT (Wang et al., GECCO 2024) demonstrated up to 500x speedups by rewriting NEAT in JAX. Rather than rewriting neat-python, we add a GPU evaluation layer that integrates with the existing library. CTRNN and spiking networks are the most computationally expensive network types and the most likely to benefit from GPU acceleration.
+
+### Architecture
+
+```
+neat/gpu/
+    __init__.py          # Lazy imports, _import_cupy() helper, gpu_available()
+    _padding.py          # Genome-to-tensor conversion (pure Python/NumPy)
+    _cupy_backend.py     # CuPy kernels and batch evaluation functions
+    evaluator.py         # Public API: GPUCTRNNEvaluator, GPUIZNNEvaluator
+```
+
+### Key design decisions
+
+1. **Lazy CuPy imports:** `_import_cupy()` is called only when a GPU evaluator is instantiated. Verified that `import neat` has no CuPy dependency.
+
+2. **Variable topology via padding:** Genomes are packed into fixed-size `[N, M, M]` weight matrices where M = max nodes across current population. Unused slots are zeroed. M is determined per generation, growing as NEAT complexifies.
+
+3. **Node index mapping:** Input pins -> indices 0..num_inputs-1, output nodes -> next block, hidden nodes packed after that. Mapping computed per genome during CPU packing.
+
+4. **Exponential Euler for CTRNN:** Same method as the updated CPU code. Precomputes `decay = exp(-dt/tau)` and `scale = 1 - decay` per node once per generation. Input nodes are clamped (not integrated) — decay/scale set to 1/0 for input slots.
+
+5. **Half-step method for Izhikevich:** Uses the same numerical method as CPU for spike-exact equivalence. Input nodes route external values through the weight matrix via a source vector.
+
+6. **Response parameter applied after matmul:** `z = activation(bias + response * (W @ u))`, not absorbed into weights. Per-node response variation preserved exactly.
+
+7. **Custom CUDA activation kernel:** Dispatches 11 activation functions (sigmoid, tanh, relu, identity, clamped, elu, softplus, sin, gauss, abs, square) matching neat-python formulas including clipping constants.
+
+8. **Sum aggregation only:** Required for batched matrix-vector multiply. Non-sum aggregation raises `ValueError` at pack time with a clear error message.
+
+9. **Fitness function on CPU:** User provides `fitness_fn(trajectory)` called per genome after GPU simulation. Output trajectory shape is `[num_steps, num_outputs]`.
+
+### Alternatives considered
+
+- **PyTorch/JAX backend:** Rejected per plan scope — CuPy is the simplest path for array operations without framework overhead. CuPy's NumPy-compatible API keeps the code readable.
+- **Per-step mask multiply for inactive nodes:** Rejected — inactive padding nodes stay zero by construction (zero weights, zero bias, zero initial state), avoiding an extra elementwise operation in the inner loop.
+- **Bucketing genomes by topology size:** Rejected as premature optimization for an unlikely scenario (extreme topology variance within a population).
+
+### Test coverage
+
+20 tests total:
+- 11 CPU-only packing tests (run without CuPy): shapes, values, masks, disabled connections, unsupported activation/aggregation errors, no-connection genomes, hidden node padding
+- 4 CTRNN GPU equivalence tests: numerical agreement with CPU, batch vs individual consistency, response parameter effect, steady-state convergence
+- 3 Izhikevich GPU equivalence tests: spike count agreement, batch vs individual, zero-input behavior
+- 2 evaluator integration tests: fitness assignment for both network types
+
+All GPU tests skip gracefully when CuPy is not available. Full test suite: 524 passed, 12 skipped (including 9 GPU tests on this machine without CuPy).
+
+### Files created
+
+- `neat/gpu/__init__.py` — lazy imports, capability detection
+- `neat/gpu/_padding.py` — `pack_ctrnn_population()`, `pack_iznn_population()`
+- `neat/gpu/_cupy_backend.py` — CUDA activation kernel, `evaluate_ctrnn_batch()`, `evaluate_iznn_batch()`
+- `neat/gpu/evaluator.py` — `GPUCTRNNEvaluator`, `GPUIZNNEvaluator`
+- `tests/test_gpu.py` — 20 tests
+- `tests/test_configuration_gpu_ctrnn` — config for CTRNN GPU tests
+- `benchmarks/gpu_benchmark.py` — CPU vs GPU wall-time comparison at pop sizes 100/500/1000
+- `GPU_DESIGN_NOTES.md` — design document covering tensor shapes, index mapping, integration methods, edge cases
+
+### Files modified
+
+- `pyproject.toml` — added `gpu = ["cupy-cuda12x>=12.0"]` optional dependency, `neat.gpu` package, included gpu in `all` extras
+
+### Next steps
+
+- Run benchmarks on a machine with CuPy/GPU to get actual speedup numbers
+- Run GPU equivalence tests to verify numerical agreement
+- Consider adding GPU support for more activation functions (lelu, selu, inv, log, exp, hat, cube) if users request them