|
| 1 | +# Progress Log - 2026-03-15 |
| 2 | + |
| 3 | +## Merged PR 290: Aggregation callable validation improvement |
| 4 | + |
| 5 | +**PR:** https://github.com/CodeReclaimers/neat-python/pull/290 |
| 6 | +**Author:** Mark Zhang (@shuofengzhang) |
| 7 | + |
| 8 | +### Summary |
| 9 | + |
| 10 | +Merged a community PR that tightens `validate_aggregation()` in `neat/aggregations.py`. The old validation checked `co_argcount >= 1`, which incorrectly accepted functions requiring two or more positional arguments — these would pass registration but fail at runtime when called as `aggregation(node_inputs)`. |
| 11 | + |
| 12 | +### Changes |
| 13 | + |
| 14 | +- **`neat/aggregations.py`**: Replaced `isinstance` type check + `co_argcount` check with `callable()` + `inspect.signature().bind(object())`. This correctly rejects functions that can't be called with exactly one positional arg (two-required-arg functions, keyword-only-arg functions) while still accepting functions with optional extra args. |
| 15 | +- **`tests/test_aggregation.py`**: Added three new tests — `test_add_builtin_max` (builtins pass), `test_bad_add3` (keyword-only rejected), `test_bad_add4` (two-arg rejected). |
| 16 | + |
| 17 | +### Tweaks before merge |
| 18 | + |
| 19 | +Added two small changes on top of the PR before merging: |
| 20 | +1. **Error message clarity**: Changed "A function taking a single positional argument is required" to "A callable with exactly one required positional argument is required" — the old wording was misleading since functions with extra *optional* args are valid. |
| 21 | +2. **Comment on builtin fallback**: Added a comment explaining why `BuiltinFunctionType` instances skip signature validation (CPython builtins often lack introspectable signatures). |
| 22 | + |
| 23 | +### Alternatives considered |
| 24 | + |
| 25 | +- Could have requested the contributor make the tweaks themselves, but the changes were minor enough to apply directly. |
| 26 | +- The builtin escape hatch means something like `os.getpid` (zero-arg builtin) could pass validation. Accepted this as a pragmatic trade-off since builtins used as aggregations in practice (like `max`, `sum`) are correct, and CPython doesn't expose their signatures for inspection. |
| 27 | + |
| 28 | +### Files modified |
| 29 | + |
| 30 | +- `neat/aggregations.py` (+20, -8) |
| 31 | +- `tests/test_aggregation.py` (+49) |
| 32 | + |
| 33 | +--- |
| 34 | + |
| 35 | +## Switch CTRNN integration to exponential Euler (ETD1) |
| 36 | + |
| 37 | +**Commit:** 7d58c15 |
| 38 | + |
| 39 | +### Summary |
| 40 | + |
| 41 | +Replaced forward Euler integration in `neat/ctrnn/__init__.py` with exponential Euler (ETD1). The linear decay term `-u/tau` is now integrated exactly, making the method unconditionally stable regardless of `dt/tau` ratio. Forward Euler required `dt < 2*tau` for stability. |
| 42 | + |
| 43 | +### Change |
| 44 | + |
| 45 | +Old (forward Euler): |
| 46 | +```python |
| 47 | +ovalues[node_key] += dt / ne.time_constant * (-ovalues[node_key] + z) |
| 48 | +``` |
| 49 | + |
| 50 | +New (exponential Euler): |
| 51 | +```python |
| 52 | +decay = math.exp(-dt / ne.time_constant) |
| 53 | +ovalues[node_key] = decay * ovalues[node_key] + (1.0 - decay) * z |
| 54 | +``` |
| 55 | + |
| 56 | +The nonlinear forcing term `z = activation(bias + response * aggregation(inputs))` is still held constant per step — same assumption as forward Euler. The cost per step is identical (one `math.exp` call per node per step, which is negligible). |
| 57 | + |
| 58 | +### Why exponential Euler instead of forward Euler |
| 59 | + |
| 60 | +- Forward Euler is conditionally stable: if a user sets `dt > 2*tau` (easy to do with small time constants like `tau=0.01` and `dt=0.05`), the solution blows up. The old code had a `get_max_time_step()` TODO to address this but it was never implemented. |
| 61 | +- Exponential Euler is unconditionally stable for the linear decay part. No step-size restriction needed. |
| 62 | +- This also aligns CPU and GPU integration methods (the new GPU evaluator uses exponential Euler), which simplifies numerical equivalence testing. |
| 63 | + |
| 64 | +### Why this is safe |
| 65 | + |
| 66 | +A literature search shows no published work using neat-python's CTRNN or Izhikevich implementations. The reference values in `test_ctrnn.py` were the only hardcoded numerical expectations, and those have been regenerated. |
| 67 | + |
| 68 | +### Files modified |
| 69 | + |
| 70 | +- `neat/ctrnn/__init__.py` (+3, -1) — integration formula change, added `import math` |
| 71 | +- `tests/test_ctrnn.py` (+7, -4) — updated reference values and added comment explaining the integration method |
| 72 | + |
| 73 | +--- |
| 74 | + |
| 75 | +## Add optional GPU-accelerated CTRNN and Izhikevich evaluation |
| 76 | + |
| 77 | +**Commit:** aaca8af |
| 78 | + |
| 79 | +### Summary |
| 80 | + |
| 81 | +Added a new `neat/gpu/` package providing GPU-accelerated batch evaluation for CTRNN and Izhikevich spiking networks using CuPy. The module is entirely optional — `import neat` never loads CuPy. Users install via `pip install 'neat-python[gpu]'`. |
| 82 | + |
| 83 | +### Motivation |
| 84 | + |
| 85 | +TensorNEAT (Wang et al., GECCO 2024) demonstrated up to 500x speedups by rewriting NEAT in JAX. Rather than rewriting neat-python, we add a GPU evaluation layer that integrates with the existing library. CTRNN and spiking networks are the most computationally expensive network types and the most likely to benefit from GPU acceleration. |
| 86 | + |
| 87 | +### Architecture |
| 88 | + |
| 89 | +``` |
| 90 | +neat/gpu/ |
| 91 | + __init__.py # Lazy imports, _import_cupy() helper, gpu_available() |
| 92 | + _padding.py # Genome-to-tensor conversion (pure Python/NumPy) |
| 93 | + _cupy_backend.py # CuPy kernels and batch evaluation functions |
| 94 | + evaluator.py # Public API: GPUCTRNNEvaluator, GPUIZNNEvaluator |
| 95 | +``` |
| 96 | + |
| 97 | +### Key design decisions |
| 98 | + |
| 99 | +1. **Lazy CuPy imports:** `_import_cupy()` is called only when a GPU evaluator is instantiated. Verified that `import neat` has no CuPy dependency. |
| 100 | + |
| 101 | +2. **Variable topology via padding:** Genomes are packed into fixed-size `[N, M, M]` weight matrices where M = max nodes across current population. Unused slots are zeroed. M is determined per generation, growing as NEAT complexifies. |
| 102 | + |
| 103 | +3. **Node index mapping:** Input pins -> indices 0..num_inputs-1, output nodes -> next block, hidden nodes packed after that. Mapping computed per genome during CPU packing. |
| 104 | + |
| 105 | +4. **Exponential Euler for CTRNN:** Same method as the updated CPU code. Precomputes `decay = exp(-dt/tau)` and `scale = 1 - decay` per node once per generation. Input nodes are clamped (not integrated) — decay/scale set to 1/0 for input slots. |
| 106 | + |
| 107 | +5. **Half-step method for Izhikevich:** Uses the same numerical method as CPU for spike-exact equivalence. Input nodes route external values through the weight matrix via a source vector. |
| 108 | + |
| 109 | +6. **Response parameter applied after matmul:** `z = activation(bias + response * (W @ u))`, not absorbed into weights. Per-node response variation preserved exactly. |
| 110 | + |
| 111 | +7. **Custom CUDA activation kernel:** Dispatches 11 activation functions (sigmoid, tanh, relu, identity, clamped, elu, softplus, sin, gauss, abs, square) matching neat-python formulas including clipping constants. |
| 112 | + |
| 113 | +8. **Sum aggregation only:** Required for batched matrix-vector multiply. Non-sum aggregation raises `ValueError` at pack time with a clear error message. |
| 114 | + |
| 115 | +9. **Fitness function on CPU:** User provides `fitness_fn(trajectory)` called per genome after GPU simulation. Output trajectory shape is `[num_steps, num_outputs]`. |
| 116 | + |
| 117 | +### Alternatives considered |
| 118 | + |
| 119 | +- **PyTorch/JAX backend:** Rejected per plan scope — CuPy is the simplest path for array operations without framework overhead. CuPy's NumPy-compatible API keeps the code readable. |
| 120 | +- **Per-step mask multiply for inactive nodes:** Rejected — inactive padding nodes stay zero by construction (zero weights, zero bias, zero initial state), avoiding an extra elementwise operation in the inner loop. |
| 121 | +- **Bucketing genomes by topology size:** Rejected as premature optimization for an unlikely scenario (extreme topology variance within a population). |
| 122 | + |
| 123 | +### Test coverage |
| 124 | + |
| 125 | +20 tests total: |
| 126 | +- 11 CPU-only packing tests (run without CuPy): shapes, values, masks, disabled connections, unsupported activation/aggregation errors, no-connection genomes, hidden node padding |
| 127 | +- 4 CTRNN GPU equivalence tests: numerical agreement with CPU, batch vs individual consistency, response parameter effect, steady-state convergence |
| 128 | +- 3 Izhikevich GPU equivalence tests: spike count agreement, batch vs individual, zero-input behavior |
| 129 | +- 2 evaluator integration tests: fitness assignment for both network types |
| 130 | + |
| 131 | +All GPU tests skip gracefully when CuPy is not available. Full test suite: 524 passed, 12 skipped (including 9 GPU tests on this machine without CuPy). |
| 132 | + |
| 133 | +### Files created |
| 134 | + |
| 135 | +- `neat/gpu/__init__.py` — lazy imports, capability detection |
| 136 | +- `neat/gpu/_padding.py` — `pack_ctrnn_population()`, `pack_iznn_population()` |
| 137 | +- `neat/gpu/_cupy_backend.py` — CUDA activation kernel, `evaluate_ctrnn_batch()`, `evaluate_iznn_batch()` |
| 138 | +- `neat/gpu/evaluator.py` — `GPUCTRNNEvaluator`, `GPUIZNNEvaluator` |
| 139 | +- `tests/test_gpu.py` — 20 tests |
| 140 | +- `tests/test_configuration_gpu_ctrnn` — config for CTRNN GPU tests |
| 141 | +- `benchmarks/gpu_benchmark.py` — CPU vs GPU wall-time comparison at pop sizes 100/500/1000 |
| 142 | +- `GPU_DESIGN_NOTES.md` — design document covering tensor shapes, index mapping, integration methods, edge cases |
| 143 | + |
| 144 | +### Files modified |
| 145 | + |
| 146 | +- `pyproject.toml` — added `gpu = ["cupy-cuda12x>=12.0"]` optional dependency, `neat.gpu` package, included gpu in `all` extras |
| 147 | + |
| 148 | +### Next steps |
| 149 | + |
| 150 | +- Run benchmarks on a machine with CuPy/GPU to get actual speedup numbers |
| 151 | +- Run GPU equivalence tests to verify numerical agreement |
| 152 | +- Consider adding GPU support for more activation functions (lelu, selu, inv, log, exp, hat, cube) if users request them |
0 commit comments