You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Grad EMA signal reference** — replaces single-step `prev_grad` with a slow EMA (`α = 0.6`) for more stable hypergradient signals in recurrent regimes.
14
+
-**Group-aware frustration bursts** — Hebbian logits (`hebb_factor`, `hebb_decay`) are unconditionally excluded from burst noise. `chaos_core`/`memory`/`projections` receive full bursts; all other groups receive half-scale noise with no meta-reset.
15
+
-**9-group parameter classification** (`classify_params`) — `bias`, `norm`, and `scales` promoted from `lightweight` into dedicated groups with appropriate beta equilibria (0.95 for `chaos_core`/`memory`, 0.85 for `gates`).
16
+
-`OdyssNetTrainer.trigger_plateau_escape()` re-introduced (no-op when non-ChaosGrad optimizer is active).
17
+
-`OdyssNetTrainer.get_diagnostics()` automatically includes `'optimizer'` key with ChaosGrad diagnostics when ChaosGrad is detected.
18
+
-`ChaosGrad` exported from `odyssnet` public API.
19
+
- Neurogenesis (`trainer.expand()`) handles ChaosGrad migration natively: classified param groups are rebuilt for the grown model and global frustration state is preserved.
> -**ChaosGrad** (pass as `optimizer=`) — research into self-tuning dynamics; ideal when `hebb_type` is enabled (Hebbian parameters are unconditionally protected from weight decay and burst noise).
Copy file name to clipboardExpand all lines: docs/LIBRARY.md
+123-1Lines changed: 123 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -195,7 +195,7 @@ Runs the dynamic system.
195
195
196
196
## OdyssNet Trainer (`odyssnet.training.trainer`)
197
197
198
-
The `OdyssNetTrainer` handles the training loop, gradient accumulation, mixed precision (AMP), and experimental features like Ghost Gradients. **Prodigy** is the default optimizer (auto-calibrating, no LR tuning required). Pass an explicit `lr` to use AdamW instead.
198
+
The `OdyssNetTrainer` handles the training loop, gradient accumulation, mixed precision (AMP), and experimental features like Ghost Gradients. **Prodigy** is the default optimizer (auto-calibrating, no LR tuning required). Pass an explicit `lr` to use AdamW instead, or supply any custom optimizer — including **ChaosGrad**.
199
199
200
200
### Initialization
201
201
@@ -220,6 +220,11 @@ trainer = OdyssNetTrainer(
220
220
# Custom optimizer (bypasses both Prodigy and AdamW)
ChaosGrad is a **fully optional**, zero-hyperparameter optimizer designed specifically for OdyssNet. Pass it as a custom optimizer to bypass the default Prodigy / AdamW selection.
495
+
496
+
The trainer's default behavior is **unchanged** — Prodigy when `lr=None`, AdamW when `lr=float`.
497
+
498
+
### When to use ChaosGrad
499
+
500
+
| Situation | Recommendation |
501
+
|-----------|----------------|
502
+
| Quick prototyping, first run | Prodigy (default) |
503
+
| Reproducible benchmarks | AdamW with explicit `lr`|
504
+
| Research into self-tuning dynamics, OdyssNet-specific regularisation |**ChaosGrad**|
**Hebbian Bypass Rule:**`hebb_factor`and`hebb_decay`**never** receive weight decay regardless of any hypergradient signal. Frustration bursts also skip these parameters entirely.
551
+
552
+
### Public API
553
+
554
+
| Method | Signature | Description |
555
+
|--------|-----------|-------------|
556
+
|`classify_params`|`@staticmethod classify_params(model)`| Returns list of classified param-group dicts |
557
+
|`step`|`step(closure=None)`| One autonomous optimization step |
558
+
|`report_loss`|`report_loss(loss_value)`| Feed loss to the Frustration Accumulator (trainer does this automatically) |
559
+
|`trigger_plateau_escape`|`trigger_plateau_escape()`| Force a frustration burst on the next step |
560
+
|`get_diagnostics`|`get_diagnostics(debug=False)`| Optimizer health metrics |
561
+
562
+
### Frustration Accumulator
563
+
564
+
ChaosGrad tracks loss stagnation internally. When `frustration >0.75` (or`trigger_plateau_escape()`is called), it injects noise into the momentum buffers and resets meta-parameters toward their calibrated defaults — providing an automatic escape from plateaus without user intervention.
565
+
566
+
The `OdyssNetTrainer` automatically calls `report_loss()` after every optimizer step when ChaosGrad is detected.
567
+
568
+
### Neurogenesis Compatibility
569
+
570
+
`trainer.expand(amount=N)` works transparently with ChaosGrad. The optimizer state (momentum, meta-parameters, second moments) is migrated to the grown network — old neurons preserve their learned adaptation, new neurons start from cold-start calibration. The global frustration state (`_frustration`, `_best_loss`, `_global_step`) is also preserved across the expansion.
571
+
572
+
### Checkpoint Save / Load
573
+
574
+
ChaosGrad's global state (`frustration`, `best_loss`, `global_step`) is included in `optimizer.state_dict()` under the key `'chaos_global'` and is restored by `optimizer.load_state_dict()`. This means `save_checkpoint` / `load_checkpoint` round-trips preserve the full optimizer state including frustration dynamics:
575
+
576
+
```python
577
+
from odyssnet import save_checkpoint, load_checkpoint
578
+
579
+
save_checkpoint(model, trainer, path="run.pt")
580
+
epoch, loss = load_checkpoint(model, trainer, path="run.pt")
581
+
# trainer.optimizer._frustration is restored
582
+
```
583
+
584
+
If you override the genesis learning rate at load time, ChaosGrad reads it from the param group (notfrom`defaults`), so the override takes effect on the next step:
585
+
586
+
```python
587
+
epoch, loss = load_checkpoint(model, trainer, path="run.pt", lr=5e-4)
588
+
# ChaosGrad now uses genesis_lr=5e-4 for weight decay and update scaling
589
+
```
590
+
591
+
### Interactions with Other Features
592
+
593
+
| Feature | Interaction | Notes |
594
+
|---------|-------------|-------|
595
+
|**Synaptic noise** (`synaptic_noise >0`) | Noise is added to weights *before* the forward pass. ChaosGrad's `sig_wd = cos(g, W)` therefore measures alignment against the *noisy* weight. | Intentional — noisy W is what gradient was computed against. |
596
+
|**Gradient clipping** (applied inside trainer) | All three hypergradient signals are computed on clipped gradients. `grad_ema` also tracks clipped gradients. | Clipping reduces signal magnitude but doesn't break adaptation. |
597
+
|**Gradient persistence**| Persisted gradients from the previous step are injected *before* the ChaosGrad step. `sig_lr` therefore measures consistency of the *combined* (current + persisted) gradient vs `grad_ema`. | No issue; effectively a soft gradient accumulation. |
598
+
|**Gradient accumulation**|`report_loss`is called once per optimizer step (not per micro-batch), with the un-normalized loss value. `global_step` tracks optimizer steps. | Correct — frustration reflects true convergence, not accumulation count. |
599
+
|**Gradient checkpointing**| Recomputes activations during backward. Gradient values reaching ChaosGrad are identical whether ornot checkpointing is active. | Fully compatible. |
600
+
|**AMP (mixed precision)**| ChaosGrad receives gradients after `scaler.unscale_()` — in float32 scale. ChaosGrad internally casts gradients to float32 (`g_f = grad.float()`). | Fully compatible. |
601
+
|**`regenerate_synapses()`**| When weak entries of `W` are re-initialised, the trainer automatically clears ChaosGrad's per-parameter state for `W`. Cold-start recalibration happens on the next step, re-computing `init_lr` from the new gradient scale. | If `revived == 0` (no weights regenerated), state is preserved. |
602
+
|**`transplant_weights()`**| Weight transplantation does *not* transfer optimizer state (by design — cold restart after transplant). ChaosGrad cold-starts on all parameters after loading transplanted weights. | Same behaviour as AdamW / Prodigy after transplant. |
603
+
|**Neurogenesis (`trainer.expand()`)**| Per-parameter tensors (`momentum`, `grad_ema`) are zero-padded to the new size. Scalar state (`init_lr`, `per_param_lr`, etc.) is copied unchanged. New neurons start from cold-start calibration. Global frustration is preserved. | Fully compatible. |
604
+
|**`classify_params` (skipped)**| If you pass`model.parameters()` directly instead of `classify_params(model)`, all parameters — including Hebbian logits — are treated as`lightweight`. The Hebbian bypass rule (no decay, no burst) does NOT apply. Always use `classify_params` on models with`hebb_type !=None`. | Documented limitation; no crash. |
605
+
|**Anomaly hook**| ChaosGrad has its own internal plateau escape (frustration burst). The trainer's anomaly hook fires independently based on loss statistics. The two mechanisms don't interfere. | Use both together if needed. |
0 commit comments