You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- **Trainer default optimizer policy changed**: when `optimizer` is not explicitly provided, `OdyssNetTrainer` now always constructs **ChaosGrad**.
- `lr=None` now means "use ChaosGrad with default genesis lr (`1e-4`)".
- Passing an explicit `lr` no longer switches optimizer family; it overrides ChaosGrad genesis lr.
- **ChaosGrad default `meta_resolution` changed** from `'scalar'` to `'row'`.
- `'row'` remains compatible with checkpoints via `chaos_global.meta_resolution` persistence.
### Added
- **Row-mode scope expanded**: `per_param_decay` and `per_param_alpha` now support per-row adaptation in addition to `per_param_lr`/`per_param_beta` on ≥2D parameters.
### Changed
- **Dependencies**: removed `prodigyopt` from core dependency lists (`pyproject.toml`, `requirements.txt`) after defaulting fully to ChaosGrad.
- **Documentation/Test alignment**: trainer/optimizer docs and test expectations are now aligned to the ChaosGrad-default policy.
### Fixed
- **Neurogenesis + ChaosGrad migration**: preserved `meta_resolution` during optimizer migration and added robust handling for row-shaped ChaosGrad meta-state tensors.
- **State resilience**: ChaosGrad now safely re-seeds missing essential per-parameter state keys to avoid partial-state step failures after legacy migration paths.
Copy file name to clipboardExpand all lines: CHANGELOG.md
+20Lines changed: 20 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,26 @@ All notable changes to OdyssNet will be documented in this file.
4
4
5
5
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
6
6
7
+
## [2.6.0] — 2026-04-16
8
+
9
+
### Breaking
10
+
-**Trainer default optimizer policy changed**: when `optimizer` is not explicitly provided, `OdyssNetTrainer` now always constructs **ChaosGrad**.
11
+
-`lr=None` now means "use ChaosGrad with default genesis lr (`1e-4`)".
12
+
- Passing an explicit `lr` no longer switches optimizer family; it overrides ChaosGrad genesis lr.
13
+
-**ChaosGrad default `meta_resolution` changed** from `'scalar'` to `'row'`.
14
+
-`'row'` remains compatible with checkpoints via `chaos_global.meta_resolution` persistence.
15
+
16
+
### Added
17
+
-**Row-mode scope expanded**: `per_param_decay` and `per_param_alpha` now support per-row adaptation in addition to `per_param_lr`/`per_param_beta` on ≥2D parameters.
18
+
19
+
### Changed
20
+
-**Dependencies**: removed `prodigyopt` from core dependency lists (`pyproject.toml`, `requirements.txt`) after defaulting fully to ChaosGrad.
21
+
-**Documentation/Test alignment**: trainer/optimizer docs and test expectations are now aligned to the ChaosGrad-default policy.
22
+
23
+
### Fixed
24
+
-**Neurogenesis + ChaosGrad migration**: preserved `meta_resolution` during optimizer migration and added robust handling for row-shaped ChaosGrad meta-state tensors.
25
+
-**State resilience**: ChaosGrad now safely re-seeds missing essential per-parameter state keys to avoid partial-state step failures after legacy migration paths.
Copy file name to clipboardExpand all lines: CITATION.cff
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ authors:
5
5
given-names: "Cahit"
6
6
email: "cksoftwaresystems@gmail.com"
7
7
title: "OdyssNet: The Trainable Dynamic System & Zero-Hidden Architecture"
8
-
version: 2.1.0
8
+
version: 2.6.0
9
9
date-released: 2025-12-12
10
10
url: "https://github.com/theomgdev/OdyssNet"
11
11
abstract: "OdyssNet is a chaotic, fully connected neural network architecture that proves temporal depth (thinking steps) can replace spatial depth (hidden layers). It solves non-linear problems like MNIST with Zero Hidden Layers by utilizing Trainable Chaos."
> -**ChaosGrad** (pass as `optimizer=`) — research into self-tuning dynamics; ideal when `hebb_type` is enabled (Hebbian parameters are unconditionally protected from weight decay and burst noise).
218
-
> -`meta_resolution='row'` gives each output neuron its own `per_param_lr` / `per_param_beta` on ≥2D parameters — useful for deep/wide chaos cores where rows diverge in role. Scalar mode (default) stays back-compatible; state dicts round-trip across both.
214
+
> -**ChaosGrad** (default when `optimizer` is not supplied) — autonomous adaptation with built-in Hebbian-safe handling.
215
+
> -`lr=None` uses ChaosGrad's default genesis lr (`1e-4`).
216
+
> - Passing explicit `lr` changes ChaosGrad genesis lr; it does not switch optimizer type.
217
+
> -`meta_resolution='row'` is the default and gives each output neuron its own `per_param_lr` / `per_param_beta` / `per_param_decay` / `per_param_alpha` on ≥2D parameters.
218
+
> -**Custom optimizer** — pass `optimizer=...` when you explicitly want AdamW/SGD/etc.
219
219
220
220
---
221
221
@@ -305,7 +305,7 @@ Use the `prepare_input` utility implicitly via the Trainer.
305
305
* Data shuffling / batch sampling.
306
306
* Dropout and stochastic regularization.
307
307
*CUDA random state (forGPU consistency).
308
-
***Test:** If you run the script twice with the same seed, loss curves and final results should be **identical**, byte-for-byte. Note: this requires passing an explicit `lr` to `OdyssNetTrainer` — the default `lr=None` (Prodigy) adapts its learning rate online and will produce different curves across runs.
308
+
***Test:** If you run the script twice with the same seed, loss curves and final results should be **identical**, byte-for-byte. Prefer passing an explicit `lr` to pin ChaosGrad genesis lr for reproducible experiments.
309
309
310
310
2. **Visuals:** Your example should print a cool visualization. Don't just print "Loss: 0.01". Print the timeline.
311
311
**Example:*`t=05| Input: 1| Output: 0.99 🟢`
@@ -458,7 +458,7 @@ If loss oscillates or training is unstable:
2.**Use AdamW with a lower explicit learning rate** (bypasses Prodigy):
461
+
2. **Tune ChaosGrad genesis `lr` downward**for a calmer update scale:
462
462
```python
463
463
trainer = OdyssNetTrainer(model, lr=1e-4)
464
464
```
@@ -552,7 +552,7 @@ When modifying the library itself (not examples), follow these additional rules:
552
552
553
553
### New/modified example scripts (`examples/`)
554
554
1. [ ] **Does your script call `set_seed(42)` at the START of `main()`?** (MANDATORYfor reproducibility)
555
-
2.[ ]**Does `OdyssNetTrainer` receive an explicit `lr`?** (e.g. `lr=1e-4`). The default `lr=None` activates Prodigy, which adapts LR online and breaks byte-for-byte reproducibility. Examples must pin a float lr.
555
+
2. [ ] **Does `OdyssNetTrainer` receive an explicit `lr` when reproducibility matters?** (e.g. `lr=1e-4`). This pins ChaosGrad genesis lr across runs.
556
556
3. [ ] Did you place it in the correct folder (`examples/`for core validations, `examples/advanced/`forcomplex tasks)?
557
557
4. [ ] Are you using `OdyssNetTrainer`?
558
558
5. [ ] Did you select the correct `activation`, `weight_init`, and`gate` setup? (Default `resonant`+`gate=None`is fine for most tasks.)
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -158,7 +158,7 @@ By "thinking" for 15 steps, OdyssNet simulates a 15-layer deep network using **o
158
158
Uncontrolled feedback loops lead to explosion. OdyssNet engineers the chaos to form stable **Attractors**.
159
159
***StepNorm** acts as gravity, keeping energy bounded.
160
160
***Tanh** filters meaningful signals while maintaining signal symmetry.
161
-
***Prodigy Optimizer (default)**: Auto-calibrates the learning rate continuously — no manual tuning required. Pass an explicit `lr`to use AdamW instead.
161
+
***ChaosGrad Optimizer (default)**: OdyssNetTrainer now uses ChaosGrad by default with autonomous per-parameter adaptation. Passing `lr`overrides ChaosGrad's genesis learning rate; omitting it uses the built-in default (`1e-4`).
162
162
***Heterogeneous Synaptic Plasticity**: When `hebb_type` is set, temporal correlations $h_t \otimes h_{t-1}$ are accumulated each step and injected as $W_\text{eff} = W + (f_h \odot C_t)$ — where `hebb_factor` can be a global scalar, a per-neuron vector, or a full per-synapse matrix. All variants are learnable, letting the network discover how plastic each pathway should be.
163
163
***The Latch Experiment** proved OdyssNet can create a stable attractor to hold a decision forever against noise.
Copy file name to clipboardExpand all lines: README_TR.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -158,7 +158,7 @@ Sinyal her nörondan diğer her nörona ($N \times N$) yolculuk eder.
158
158
Kontrolsüz geri besleme döngüleri patlamaya yol açar. OdyssNet kaosun mühendisliğini yaparak kararlı **Çekiciler** oluşturur.
159
159
***StepNorm** yerçekimi gibi davranır, enerjiyi sınırlı tutar.
160
160
***Tanh** anlamlı sinyalleri filtreler ve sinyal simetrisini korur.
161
-
***Prodigy Optimizer (varsayılan):**Öğrenme hızını sürekli olarak otomatik kalibre eder — manuel ayar gerekmez. Açık bir `lr`değeri geçildiğinde AdamW kullanılır.
161
+
***ChaosGrad Optimizer (varsayılan):**OdyssNetTrainer artık varsayılan olarak ChaosGrad kullanır ve parametreleri otonom biçimde adapte eder. `lr`verilirse ChaosGrad'ın genesis öğrenme oranı ezilir; verilmezse yerleşik varsayılan (`1e-4`) kullanılır.
162
162
***Heterojen Sinaptik Plastisitesi:**`hebb_type` ayarlandığında her adımda zamansal korelasyonlar $h_t \otimes h_{t-1}$ biriktirilir ve $W_\text{eff} = W + (f_h \odot C_t)$ olarak enjekte edilir — `hebb_factor` global bir skaler, nöron başına vektör veya tam sinaps başına matris olabilir. Tüm çeşitler öğrenilebilir olduğundan ağ, her sinaptik yolun ne kadar plastik olması gerektiğini keşfeder.
163
163
***Mandal Deneyi** OdyssNet'in gürültüye karşı bir kararı sonsuza kadar tutmak için kararlı bir çekici oluşturabileceğini kanıtladı.
Copy file name to clipboardExpand all lines: docs/LIBRARY.md
+26-26Lines changed: 26 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ OdyssNet is a PyTorch-based library that implements **Zero-Hidden Layer** neural
6
6
7
7
The library is organized into three primary modules:
8
8
1.**`odyssnet.core.network`**: The recurrent core architecture and update dynamics.
9
-
2.**`odyssnet.training.trainer`**: Optimization engine with AdamW and bio-inspired regularization.
9
+
2.**`odyssnet.training.trainer`**: Optimization engine with ChaosGrad defaulting and bio-inspired regularization.
10
10
3.**`odyssnet.utils`**: Data utilities, model persistence (`odyssstore`), and dynamic expansion (`neurogenesis`).
11
11
12
12
---
@@ -195,17 +195,17 @@ Runs the dynamic system.
195
195
196
196
## OdyssNet Trainer (`odyssnet.training.trainer`)
197
197
198
-
The `OdyssNetTrainer` handles the training loop, gradient accumulation, mixed precision (AMP), and experimental features like Ghost Gradients. **Prodigy** is the default optimizer (auto-calibrating, no LR tuning required). Pass an explicit `lr` to use AdamW instead, or supply any custom optimizer — including **ChaosGrad**.
198
+
The `OdyssNetTrainer` handles the training loop, gradient accumulation, mixed precision (AMP), and experimental features like Ghost Gradients. **ChaosGrad** is the default optimizer whenever `optimizer` is not explicitly provided.
199
199
200
200
### Initialization
201
201
202
202
```python
203
203
from odyssnet import OdyssNetTrainer
204
204
205
-
#Quick prototyping: Prodigy — auto-calibrates LR, no tuning needed
205
+
#Default path: ChaosGrad (row-mode by default)
206
206
trainer = OdyssNetTrainer(model, device='cuda')
207
207
208
-
#Reproducible experiments and production: pin an explicit lr to use AdamW
*`lr` (float or None): Learning rate. Default: `None`.
232
-
*`None`: **Prodigy** optimizer is used. Auto-calibrates the learning rate continuously — no manual tuning required. Requires `pip install prodigyopt`. Best for quick prototyping; produces non-deterministic loss curves across runs even with a fixed seed.
233
-
* float (e.g. `1e-4`): **AdamW** optimizer is used with `weight_decay=0.01`. Recommended for reproducible experiments, benchmarking, and production runs.
*`0.0`: Standard behavior (`zero_grad()` after every step).
236
236
*`> 0.0` (e.g., `0.1`): Keeps a percentage of the previous step's gradient. This creates a "momentum" over time, effectively simulating a larger batch size or longer temporal context. Useful for difficult convergence landscapes.
@@ -491,16 +491,16 @@ model = OdyssNet(num_neurons=10, input_ids=range(10), output_ids=range(10), voca
ChaosGrad is a **fully optional**, zero-hyperparameter optimizer designed specifically for OdyssNet. Pass it as a custom optimizer to bypass the default Prodigy / AdamW selection.
494
+
ChaosGrad is a zero-hyperparameter optimizer designed specifically for OdyssNetandis the trainer's default optimizer path.
495
495
496
-
The trainer's default behavior is **unchanged** — Prodigy when `lr=None`, AdamW when `lr=float`.
496
+
The trainer constructs ChaosGrad whenever `optimizer`isnot supplied (`lr` only overrides ChaosGrad's genesis lr).
497
497
498
498
### When to use ChaosGrad
499
499
500
500
| Situation | Recommendation |
501
501
|-----------|----------------|
502
-
| Quick prototyping, first run |Prodigy (default) |
@@ -533,27 +533,27 @@ You can also pass plain `model.parameters()` without classification — every pa
533
533
534
534
### Meta Resolution — per-tensor vs. per-neuron adaptation
535
535
536
-
ChaosGrad's autonomously-adapted meta-parameters (`per_param_lr`, `per_param_beta`) come in two resolutions, selected at construction time:
536
+
ChaosGrad's autonomously-adapted meta-parameters (`per_param_lr`, `per_param_beta`, `per_param_decay`, `per_param_alpha`) come in two resolutions, selected at construction time:
537
537
538
538
```python
539
-
# Default — one lr / beta value per tensor (lightest, fully back-compatible)
**When to use `'row'`:** deep or wide recurrent cores where different output neurons are learning at different speeds — a single scalar lr averages over all of them. Each row of `W` belongs to one output neuron biologically; giving each neuron its own lr/beta matches that cell-level granularity.
556
+
**When to use `'row'`:** deep or wide recurrent cores where different output neurons are learning at different speeds and regularization needs. Each row of `W` belongs to one output neuron biologically; giving each neuron its own lr/beta/decay/alpha matches that cell-level granularity.
557
557
558
558
**Checkpoint compatibility:**`meta_resolution`is stored in`state_dict`'s `chaos_global` block. A row-mode checkpoint replays as row-mode even if the reconstructing code omitted the keyword. Loading a scalar-mode checkpoint into a freshly-constructed row-mode optimizer works too — `load_state_dict` honours the saved flag.
|**Gradient checkpointing**| Recomputes activations during backward. Gradient values reaching ChaosGrad are identical whether ornot checkpointing is active. | Fully compatible. |
626
626
|**AMP (mixed precision)**| ChaosGrad receives gradients after `scaler.unscale_()` — in float32 scale. ChaosGrad internally casts gradients to float32 (`g_f = grad.float()`). | Fully compatible. |
627
627
|**`regenerate_synapses()`**| When weak entries of `W` are re-initialised, the trainer automatically clears ChaosGrad's per-parameter state for `W`. Cold-start recalibration happens on the next step, re-computing `init_lr` from the new gradient scale. | If `revived == 0` (no weights regenerated), state is preserved. |
628
-
|**`transplant_weights()`**| Weight transplantation does *not* transfer optimizer state (by design — cold restart after transplant). ChaosGrad cold-starts on all parameters after loading transplanted weights. |Same behaviour as AdamW / Prodigy after transplant. |
629
-
|**Neurogenesis (`trainer.expand()`)**| Per-parameter tensors (`momentum`, `grad_ema`) are zero-padded to the new size. Scalar state (`init_lr`, `per_param_lr`, etc.) is copied unchanged. New neurons start from cold-start calibration. Global frustration is preserved. | Fully compatible. |
628
+
|**`transplant_weights()`**| Weight transplantation does *not* transfer optimizer state (by design — cold restart after transplant). ChaosGrad cold-starts on all parameters after loading transplanted weights. |Matches standard cold-restart behaviour in alternative optimizers. |
629
+
|**Neurogenesis (`trainer.expand()`)**| Per-parameter tensors (`momentum`, `grad_ema`, and row-mode meta tensors) are expanded/padded to new size. Scalars are copied unchanged. New neurons start from cold-start calibration. Global frustration is preserved. | Fully compatible. |
630
630
|**`classify_params` (skipped)**| If you pass`model.parameters()` directly instead of `classify_params(model)`, all parameters — including Hebbian logits — are treated as`lightweight`. The Hebbian bypass rule (no decay, no burst) does NOT apply. Always use `classify_params` on models with`hebb_type !=None`. | Documented limitation; no crash. |
631
631
|**Anomaly hook**| ChaosGrad has its own internal plateau escape (frustration burst). The trainer's anomaly hook fires independently based on loss statistics. The two mechanisms don't interfere. | Use both together if needed. |
0 commit comments