Skip to content

Commit 0b916b9

Browse files
committed
### Breaking
- **Trainer default optimizer policy changed**: when `optimizer` is not explicitly provided, `OdyssNetTrainer` now always constructs **ChaosGrad**. - `lr=None` now means "use ChaosGrad with default genesis lr (`1e-4`)". - Passing an explicit `lr` no longer switches optimizer family; it overrides ChaosGrad genesis lr. - **ChaosGrad default `meta_resolution` changed** from `'scalar'` to `'row'`. - `'row'` remains compatible with checkpoints via `chaos_global.meta_resolution` persistence. ### Added - **Row-mode scope expanded**: `per_param_decay` and `per_param_alpha` now support per-row adaptation in addition to `per_param_lr`/`per_param_beta` on ≥2D parameters. ### Changed - **Dependencies**: removed `prodigyopt` from core dependency lists (`pyproject.toml`, `requirements.txt`) after defaulting fully to ChaosGrad. - **Documentation/Test alignment**: trainer/optimizer docs and test expectations are now aligned to the ChaosGrad-default policy. ### Fixed - **Neurogenesis + ChaosGrad migration**: preserved `meta_resolution` during optimizer migration and added robust handling for row-shaped ChaosGrad meta-state tensors. - **State resilience**: ChaosGrad now safely re-seeds missing essential per-parameter state keys to avoid partial-state step failures after legacy migration paths.
1 parent 0e9dde0 commit 0b916b9

17 files changed

Lines changed: 261 additions & 145 deletions

CHANGELOG.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,26 @@ All notable changes to OdyssNet will be documented in this file.
44

55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
66

7+
## [2.6.0] — 2026-04-16
8+
9+
### Breaking
10+
- **Trainer default optimizer policy changed**: when `optimizer` is not explicitly provided, `OdyssNetTrainer` now always constructs **ChaosGrad**.
11+
- `lr=None` now means "use ChaosGrad with default genesis lr (`1e-4`)".
12+
- Passing an explicit `lr` no longer switches optimizer family; it overrides ChaosGrad genesis lr.
13+
- **ChaosGrad default `meta_resolution` changed** from `'scalar'` to `'row'`.
14+
- `'row'` remains compatible with checkpoints via `chaos_global.meta_resolution` persistence.
15+
16+
### Added
17+
- **Row-mode scope expanded**: `per_param_decay` and `per_param_alpha` now support per-row adaptation in addition to `per_param_lr`/`per_param_beta` on ≥2D parameters.
18+
19+
### Changed
20+
- **Dependencies**: removed `prodigyopt` from core dependency lists (`pyproject.toml`, `requirements.txt`) after defaulting fully to ChaosGrad.
21+
- **Documentation/Test alignment**: trainer/optimizer docs and test expectations are now aligned to the ChaosGrad-default policy.
22+
23+
### Fixed
24+
- **Neurogenesis + ChaosGrad migration**: preserved `meta_resolution` during optimizer migration and added robust handling for row-shaped ChaosGrad meta-state tensors.
25+
- **State resilience**: ChaosGrad now safely re-seeds missing essential per-parameter state keys to avoid partial-state step failures after legacy migration paths.
26+
727
## [2.5.0] — 2026-04-14
828

929
### Added

CITATION.cff

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ authors:
55
given-names: "Cahit"
66
email: "cksoftwaresystems@gmail.com"
77
title: "OdyssNet: The Trainable Dynamic System & Zero-Hidden Architecture"
8-
version: 2.1.0
8+
version: 2.6.0
99
date-released: 2025-12-12
1010
url: "https://github.com/theomgdev/OdyssNet"
1111
abstract: "OdyssNet is a chaotic, fully connected neural network architecture that proves temporal depth (thinking steps) can replace spatial depth (hidden layers). It solves non-linear problems like MNIST with Zero Hidden Layers by utilizing Trainable Chaos."

CONTRIBUTING.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -195,27 +195,27 @@ model = OdyssNet(
195195
# Quick experiment — global plasticity
196196
model = OdyssNet(..., hebb_type='global')
197197

198-
# Default: Prodigy optimizer — auto-calibrates LR, no tuning needed
198+
# Default: ChaosGrad optimizer (row-mode by default)
199199
trainer = OdyssNetTrainer(model)
200200

201-
# AdamW: pass an explicit learning rate
201+
# Override ChaosGrad genesis learning rate
202202
trainer = OdyssNetTrainer(model, lr=3e-4)
203203

204-
# ChaosGrad: optional zero-hyperparameter optimizer (pass as custom optimizer)
205-
from odyssnet import ChaosGrad
206-
opt = ChaosGrad(ChaosGrad.classify_params(model), lr=1e-4)
207-
trainer = OdyssNetTrainer(model, optimizer=opt)
204+
# Or pass your own optimizer instance explicitly
205+
import torch
206+
trainer = OdyssNetTrainer(model, optimizer=torch.optim.AdamW(model.parameters(), lr=1e-4))
208207

209-
# ChaosGrad with per-neuron tempo (opt-in, O(N) VRAM per matrix)
208+
# ChaosGrad with explicit per-neuron tempo selection (row is default)
210209
opt = ChaosGrad(ChaosGrad.classify_params(model), lr=1e-4,
211210
meta_resolution='row')
212211
```
213212

214213
> **Optimizer selection guide:**
215-
> - **Prodigy** (`lr=None`, default) — best for quick experiments; non-deterministic curves.
216-
> - **AdamW** (explicit `lr`) — reproducible runs, benchmarks, production.
217-
> - **ChaosGrad** (pass as `optimizer=`) — research into self-tuning dynamics; ideal when `hebb_type` is enabled (Hebbian parameters are unconditionally protected from weight decay and burst noise).
218-
> - `meta_resolution='row'` gives each output neuron its own `per_param_lr` / `per_param_beta` on ≥2D parameters — useful for deep/wide chaos cores where rows diverge in role. Scalar mode (default) stays back-compatible; state dicts round-trip across both.
214+
> - **ChaosGrad** (default when `optimizer` is not supplied) — autonomous adaptation with built-in Hebbian-safe handling.
215+
> - `lr=None` uses ChaosGrad's default genesis lr (`1e-4`).
216+
> - Passing explicit `lr` changes ChaosGrad genesis lr; it does not switch optimizer type.
217+
> - `meta_resolution='row'` is the default and gives each output neuron its own `per_param_lr` / `per_param_beta` / `per_param_decay` / `per_param_alpha` on ≥2D parameters.
218+
> - **Custom optimizer** — pass `optimizer=...` when you explicitly want AdamW/SGD/etc.
219219
220220
---
221221

@@ -305,7 +305,7 @@ Use the `prepare_input` utility implicitly via the Trainer.
305305
* Data shuffling / batch sampling.
306306
* Dropout and stochastic regularization.
307307
* CUDA random state (for GPU consistency).
308-
* **Test:** If you run the script twice with the same seed, loss curves and final results should be **identical**, byte-for-byte. Note: this requires passing an explicit `lr` to `OdyssNetTrainer` — the default `lr=None` (Prodigy) adapts its learning rate online and will produce different curves across runs.
308+
* **Test:** If you run the script twice with the same seed, loss curves and final results should be **identical**, byte-for-byte. Prefer passing an explicit `lr` to pin ChaosGrad genesis lr for reproducible experiments.
309309

310310
2. **Visuals:** Your example should print a cool visualization. Don't just print "Loss: 0.01". Print the timeline.
311311
* *Example:* `t=05 | Input: 1 | Output: 0.99 🟢`
@@ -458,7 +458,7 @@ If loss oscillates or training is unstable:
458458
trainer = OdyssNetTrainer(model, gradient_persistence=0.1)
459459
```
460460

461-
2. **Use AdamW with a lower explicit learning rate** (bypasses Prodigy):
461+
2. **Tune ChaosGrad genesis `lr` downward** for a calmer update scale:
462462
```python
463463
trainer = OdyssNetTrainer(model, lr=1e-4)
464464
```
@@ -552,7 +552,7 @@ When modifying the library itself (not examples), follow these additional rules:
552552

553553
### New/modified example scripts (`examples/`)
554554
1. [ ] **Does your script call `set_seed(42)` at the START of `main()`?** (MANDATORY for reproducibility)
555-
2. [ ] **Does `OdyssNetTrainer` receive an explicit `lr`?** (e.g. `lr=1e-4`). The default `lr=None` activates Prodigy, which adapts LR online and breaks byte-for-byte reproducibility. Examples must pin a float lr.
555+
2. [ ] **Does `OdyssNetTrainer` receive an explicit `lr` when reproducibility matters?** (e.g. `lr=1e-4`). This pins ChaosGrad genesis lr across runs.
556556
3. [ ] Did you place it in the correct folder (`examples/` for core validations, `examples/advanced/` for complex tasks)?
557557
4. [ ] Are you using `OdyssNetTrainer`?
558558
5. [ ] Did you select the correct `activation`, `weight_init`, and `gate` setup? (Default `resonant` + `gate=None` is fine for most tasks.)

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ By "thinking" for 15 steps, OdyssNet simulates a 15-layer deep network using **o
158158
Uncontrolled feedback loops lead to explosion. OdyssNet engineers the chaos to form stable **Attractors**.
159159
* **StepNorm** acts as gravity, keeping energy bounded.
160160
* **Tanh** filters meaningful signals while maintaining signal symmetry.
161-
* **Prodigy Optimizer (default)**: Auto-calibrates the learning rate continuously — no manual tuning required. Pass an explicit `lr` to use AdamW instead.
161+
* **ChaosGrad Optimizer (default)**: OdyssNetTrainer now uses ChaosGrad by default with autonomous per-parameter adaptation. Passing `lr` overrides ChaosGrad's genesis learning rate; omitting it uses the built-in default (`1e-4`).
162162
* **Heterogeneous Synaptic Plasticity**: When `hebb_type` is set, temporal correlations $h_t \otimes h_{t-1}$ are accumulated each step and injected as $W_\text{eff} = W + (f_h \odot C_t)$ — where `hebb_factor` can be a global scalar, a per-neuron vector, or a full per-synapse matrix. All variants are learnable, letting the network discover how plastic each pathway should be.
163163
* **The Latch Experiment** proved OdyssNet can create a stable attractor to hold a decision forever against noise.
164164

README_TR.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ Sinyal her nörondan diğer her nörona ($N \times N$) yolculuk eder.
158158
Kontrolsüz geri besleme döngüleri patlamaya yol açar. OdyssNet kaosun mühendisliğini yaparak kararlı **Çekiciler** oluşturur.
159159
* **StepNorm** yerçekimi gibi davranır, enerjiyi sınırlı tutar.
160160
* **Tanh** anlamlı sinyalleri filtreler ve sinyal simetrisini korur.
161-
* **Prodigy Optimizer (varsayılan):** Öğrenme hızını sürekli olarak otomatik kalibre eder — manuel ayar gerekmez. Açık bir `lr` değeri geçildiğinde AdamW kullanılır.
161+
* **ChaosGrad Optimizer (varsayılan):** OdyssNetTrainer artık varsayılan olarak ChaosGrad kullanır ve parametreleri otonom biçimde adapte eder. `lr` verilirse ChaosGrad'ın genesis öğrenme oranı ezilir; verilmezse yerleşik varsayılan (`1e-4`) kullanılır.
162162
* **Heterojen Sinaptik Plastisitesi:** `hebb_type` ayarlandığında her adımda zamansal korelasyonlar $h_t \otimes h_{t-1}$ biriktirilir ve $W_\text{eff} = W + (f_h \odot C_t)$ olarak enjekte edilir — `hebb_factor` global bir skaler, nöron başına vektör veya tam sinaps başına matris olabilir. Tüm çeşitler öğrenilebilir olduğundan ağ, her sinaptik yolun ne kadar plastik olması gerektiğini keşfeder.
163163
* **Mandal Deneyi** OdyssNet'in gürültüye karşı bir kararı sonsuza kadar tutmak için kararlı bir çekici oluşturabileceğini kanıtladı.
164164

docs/LIBRARY.md

Lines changed: 26 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ OdyssNet is a PyTorch-based library that implements **Zero-Hidden Layer** neural
66

77
The library is organized into three primary modules:
88
1. **`odyssnet.core.network`**: The recurrent core architecture and update dynamics.
9-
2. **`odyssnet.training.trainer`**: Optimization engine with AdamW and bio-inspired regularization.
9+
2. **`odyssnet.training.trainer`**: Optimization engine with ChaosGrad defaulting and bio-inspired regularization.
1010
3. **`odyssnet.utils`**: Data utilities, model persistence (`odyssstore`), and dynamic expansion (`neurogenesis`).
1111

1212
---
@@ -195,17 +195,17 @@ Runs the dynamic system.
195195

196196
## OdyssNet Trainer (`odyssnet.training.trainer`)
197197

198-
The `OdyssNetTrainer` handles the training loop, gradient accumulation, mixed precision (AMP), and experimental features like Ghost Gradients. **Prodigy** is the default optimizer (auto-calibrating, no LR tuning required). Pass an explicit `lr` to use AdamW instead, or supply any custom optimizer — including **ChaosGrad**.
198+
The `OdyssNetTrainer` handles the training loop, gradient accumulation, mixed precision (AMP), and experimental features like Ghost Gradients. **ChaosGrad** is the default optimizer whenever `optimizer` is not explicitly provided.
199199

200200
### Initialization
201201

202202
```python
203203
from odyssnet import OdyssNetTrainer
204204

205-
# Quick prototyping: Prodigy — auto-calibrates LR, no tuning needed
205+
# Default path: ChaosGrad (row-mode by default)
206206
trainer = OdyssNetTrainer(model, device='cuda')
207207

208-
# Reproducible experiments and production: pin an explicit lr to use AdamW
208+
# Override ChaosGrad genesis learning rate
209209
trainer = OdyssNetTrainer(model, lr=1e-4, device='cuda')
210210

211211
# With optional features
@@ -217,20 +217,20 @@ trainer = OdyssNetTrainer(
217217
anomaly_hook=my_hook
218218
)
219219

220-
# Custom optimizer (bypasses both Prodigy and AdamW)
220+
# Custom optimizer (bypasses trainer's default ChaosGrad path)
221221
import torch
222222
trainer = OdyssNetTrainer(model, optimizer=torch.optim.AdamW(model.parameters(), lr=1e-4))
223223

224-
# ChaosGrad — zero-hyperparameter optimizer (optional, see ChaosGrad section below)
224+
# Explicit ChaosGrad construction (optional when defaults are enough)
225225
from odyssnet import ChaosGrad
226226
opt = ChaosGrad(ChaosGrad.classify_params(model), lr=1e-4)
227227
trainer = OdyssNetTrainer(model, optimizer=opt)
228228
```
229229

230230
**Parameters:**
231-
* `lr` (float or None): Learning rate. Default: `None`.
232-
* `None`: **Prodigy** optimizer is used. Auto-calibrates the learning rate continuously — no manual tuning required. Requires `pip install prodigyopt`. Best for quick prototyping; produces non-deterministic loss curves across runs even with a fixed seed.
233-
* float (e.g. `1e-4`): **AdamW** optimizer is used with `weight_decay=0.01`. Recommended for reproducible experiments, benchmarking, and production runs.
231+
* `lr` (float or None): ChaosGrad genesis learning rate override. Default: `None`.
232+
* `None`: uses ChaosGrad default genesis lr (`1e-4`).
233+
* float (e.g. `1e-4`): passed directly to ChaosGrad as genesis lr.
234234
* `gradient_persistence` (float): **Ghost Gradients / Persistence**.
235235
* `0.0`: Standard behavior (`zero_grad()` after every step).
236236
* `> 0.0` (e.g., `0.1`): Keeps a percentage of the previous step's gradient. This creates a "momentum" over time, effectively simulating a larger batch size or longer temporal context. Useful for difficult convergence landscapes.
@@ -491,16 +491,16 @@ model = OdyssNet(num_neurons=10, input_ids=range(10), output_ids=range(10), voca
491491

492492
## ChaosGrad Optimizer (`odyssnet.training.chaos_optimizer`)
493493

494-
ChaosGrad is a **fully optional**, zero-hyperparameter optimizer designed specifically for OdyssNet. Pass it as a custom optimizer to bypass the default Prodigy / AdamW selection.
494+
ChaosGrad is a zero-hyperparameter optimizer designed specifically for OdyssNet and is the trainer's default optimizer path.
495495

496-
The trainer's default behavior is **unchanged** — Prodigy when `lr=None`, AdamW when `lr=float`.
496+
The trainer constructs ChaosGrad whenever `optimizer` is not supplied (`lr` only overrides ChaosGrad's genesis lr).
497497

498498
### When to use ChaosGrad
499499

500500
| Situation | Recommendation |
501501
|-----------|----------------|
502-
| Quick prototyping, first run | Prodigy (default) |
503-
| Reproducible benchmarks | AdamW with explicit `lr` |
502+
| Quick prototyping, first run | **ChaosGrad** (default) |
503+
| Reproducible benchmarks | ChaosGrad with fixed seed + explicit genesis `lr` |
504504
| Research into self-tuning dynamics, OdyssNet-specific regularisation | **ChaosGrad** |
505505
| Hebbian plasticity enabled (`hebb_type != None`) | ChaosGrad handles hebb params specially |
506506

@@ -533,27 +533,27 @@ You can also pass plain `model.parameters()` without classification — every pa
533533

534534
### Meta Resolution — per-tensor vs. per-neuron adaptation
535535

536-
ChaosGrad's autonomously-adapted meta-parameters (`per_param_lr`, `per_param_beta`) come in two resolutions, selected at construction time:
536+
ChaosGrad's autonomously-adapted meta-parameters (`per_param_lr`, `per_param_beta`, `per_param_decay`, `per_param_alpha`) come in two resolutions, selected at construction time:
537537

538538
```python
539-
# Default — one lr / beta value per tensor (lightest, fully back-compatible)
540-
opt = ChaosGrad(ChaosGrad.classify_params(model), lr=1e-4, meta_resolution='scalar')
541-
542-
# Opt-in — (rows,) tensors for ≥2D parameters: one lr / beta per output neuron
539+
# Default — (rows,) tensors for ≥2D parameters: one value per output neuron
543540
opt = ChaosGrad(ChaosGrad.classify_params(model), lr=1e-4, meta_resolution='row')
541+
542+
# Opt-in compatibility mode — one meta-param value per tensor
543+
opt = ChaosGrad(ChaosGrad.classify_params(model), lr=1e-4, meta_resolution='scalar')
544544
```
545545

546-
| Meta-param | `'scalar'` (default) | `'row'` |
546+
| Meta-param | `'scalar'` | `'row'` (default) |
547547
|---|---|---|
548548
| `per_param_lr` | single float | `(rows,)` tensor on ≥2D params, scalar on 1D params |
549549
| `per_param_beta` | single float | `(rows,)` tensor on ≥2D params, scalar on 1D params |
550-
| `per_param_decay` | single float | single float (whole-tensor shrinkage gate) |
551-
| `per_param_alpha` | single float | single float (whole-tensor centralization gate) |
550+
| `per_param_decay` | single float | `(rows,)` tensor on ≥2D params, scalar on 1D params |
551+
| `per_param_alpha` | single float | `(rows,)` tensor on ≥2D params, scalar on 1D params |
552552
| `v2` (second moment) | elementwise tensor | elementwise tensor |
553-
| Extra VRAM per matrix | 0 | `2N` floats (negligible next to the N² weights) |
554-
| Hypergradient signals | scalar `cos(·)` | per-row `cos(·)` for `sig_lr`/`sig_mom`; scalar for `sig_wd` |
553+
| Extra VRAM per matrix | 0 | `4N` floats (still negligible next to the N² weights) |
554+
| Hypergradient signals | scalar `cos(·)` | per-row `cos(·)` for `sig_lr`/`sig_mom`/`sig_wd` |
555555

556-
**When to use `'row'`:** deep or wide recurrent cores where different output neurons are learning at different speeds — a single scalar lr averages over all of them. Each row of `W` belongs to one output neuron biologically; giving each neuron its own lr/beta matches that cell-level granularity.
556+
**When to use `'row'`:** deep or wide recurrent cores where different output neurons are learning at different speeds and regularization needs. Each row of `W` belongs to one output neuron biologically; giving each neuron its own lr/beta/decay/alpha matches that cell-level granularity.
557557

558558
**Checkpoint compatibility:** `meta_resolution` is stored in `state_dict`'s `chaos_global` block. A row-mode checkpoint replays as row-mode even if the reconstructing code omitted the keyword. Loading a scalar-mode checkpoint into a freshly-constructed row-mode optimizer works too — `load_state_dict` honours the saved flag.
559559

@@ -625,7 +625,7 @@ epoch, loss = load_checkpoint(model, trainer, path="run.pt", lr=5e-4)
625625
| **Gradient checkpointing** | Recomputes activations during backward. Gradient values reaching ChaosGrad are identical whether or not checkpointing is active. | Fully compatible. |
626626
| **AMP (mixed precision)** | ChaosGrad receives gradients after `scaler.unscale_()`in float32 scale. ChaosGrad internally casts gradients to float32 (`g_f = grad.float()`). | Fully compatible. |
627627
| **`regenerate_synapses()`** | When weak entries of `W` are re-initialised, the trainer automatically clears ChaosGrad's per-parameter state for `W`. Cold-start recalibration happens on the next step, re-computing `init_lr` from the new gradient scale. | If `revived == 0` (no weights regenerated), state is preserved. |
628-
| **`transplant_weights()`** | Weight transplantation does *not* transfer optimizer state (by design — cold restart after transplant). ChaosGrad cold-starts on all parameters after loading transplanted weights. | Same behaviour as AdamW / Prodigy after transplant. |
629-
| **Neurogenesis (`trainer.expand()`)** | Per-parameter tensors (`momentum`, `grad_ema`) are zero-padded to the new size. Scalar state (`init_lr`, `per_param_lr`, etc.) is copied unchanged. New neurons start from cold-start calibration. Global frustration is preserved. | Fully compatible. |
628+
| **`transplant_weights()`** | Weight transplantation does *not* transfer optimizer state (by design — cold restart after transplant). ChaosGrad cold-starts on all parameters after loading transplanted weights. | Matches standard cold-restart behaviour in alternative optimizers. |
629+
| **Neurogenesis (`trainer.expand()`)** | Per-parameter tensors (`momentum`, `grad_ema`, and row-mode meta tensors) are expanded/padded to new size. Scalars are copied unchanged. New neurons start from cold-start calibration. Global frustration is preserved. | Fully compatible. |
630630
| **`classify_params` (skipped)** | If you pass `model.parameters()` directly instead of `classify_params(model)`, all parameters — including Hebbian logits — are treated as `lightweight`. The Hebbian bypass rule (no decay, no burst) does NOT apply. Always use `classify_params` on models with `hebb_type != None`. | Documented limitation; no crash. |
631631
| **Anomaly hook** | ChaosGrad has its own internal plateau escape (frustration burst). The trainer's anomaly hook fires independently based on loss statistics. The two mechanisms don't interfere. | Use both together if needed. |

odyssnet/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
__version__ = "2.5.0"
1+
__version__ = "2.6.0"
22

33
from .core.network import OdyssNet
44
from .training.trainer import OdyssNetTrainer

0 commit comments

Comments
 (0)