theomgdev
diff --git a/‎CHANGELOG.md‎
Lines changed: 20 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎CITATION.cff‎
Lines changed: 1 addition & 1 deletion b/‎CITATION.cff‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 14 additions & 14 deletions b/‎CONTRIBUTING.md‎
Lines changed: 14 additions & 14 deletions
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README_TR.md‎
Lines changed: 1 addition & 1 deletion b/‎README_TR.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/LIBRARY.md‎
Lines changed: 26 additions & 26 deletions b/‎docs/LIBRARY.md‎
Lines changed: 26 additions & 26 deletions
diff --git a/‎odyssnet/__init__.py‎
Lines changed: 1 addition & 1 deletion b/‎odyssnet/__init__.py‎
Lines changed: 1 addition & 1 deletion
@@ -4,6 +4,26 @@ All notable changes to OdyssNet will be documented in this file.
 
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
 
+## [2.6.0] — 2026-04-16
+
+### Breaking
+- **Trainer default optimizer policy changed**: when `optimizer` is not explicitly provided, `OdyssNetTrainer` now always constructs **ChaosGrad**.
+  - `lr=None` now means "use ChaosGrad with default genesis lr (`1e-4`)".
+  - Passing an explicit `lr` no longer switches optimizer family; it overrides ChaosGrad genesis lr.
+- **ChaosGrad default `meta_resolution` changed** from `'scalar'` to `'row'`.
+  - `'row'` remains compatible with checkpoints via `chaos_global.meta_resolution` persistence.
+
+### Added
+- **Row-mode scope expanded**: `per_param_decay` and `per_param_alpha` now support per-row adaptation in addition to `per_param_lr`/`per_param_beta` on ≥2D parameters.
+
+### Changed
+- **Dependencies**: removed `prodigyopt` from core dependency lists (`pyproject.toml`, `requirements.txt`) after defaulting fully to ChaosGrad.
+- **Documentation/Test alignment**: trainer/optimizer docs and test expectations are now aligned to the ChaosGrad-default policy.
+
+### Fixed
+- **Neurogenesis + ChaosGrad migration**: preserved `meta_resolution` during optimizer migration and added robust handling for row-shaped ChaosGrad meta-state tensors.
+- **State resilience**: ChaosGrad now safely re-seeds missing essential per-parameter state keys to avoid partial-state step failures after legacy migration paths.
+
 ## [2.5.0] — 2026-04-14
 
 ### Added
 
@@ -5,7 +5,7 @@ authors:
   given-names: "Cahit"
   email: "cksoftwaresystems@gmail.com"
 title: "OdyssNet: The Trainable Dynamic System & Zero-Hidden Architecture"
-version: 2.1.0
+version: 2.6.0
 date-released: 2025-12-12
 url: "https://github.com/theomgdev/OdyssNet"
 abstract: "OdyssNet is a chaotic, fully connected neural network architecture that proves temporal depth (thinking steps) can replace spatial depth (hidden layers). It solves non-linear problems like MNIST with Zero Hidden Layers by utilizing Trainable Chaos."
 
@@ -195,27 +195,27 @@ model = OdyssNet(
 # Quick experiment — global plasticity
 model = OdyssNet(..., hebb_type='global')
 
-# Default: Prodigy optimizer — auto-calibrates LR, no tuning needed
+# Default: ChaosGrad optimizer (row-mode by default)
 trainer = OdyssNetTrainer(model)
 
-# AdamW: pass an explicit learning rate
+# Override ChaosGrad genesis learning rate
 trainer = OdyssNetTrainer(model, lr=3e-4)
 
-# ChaosGrad: optional zero-hyperparameter optimizer (pass as custom optimizer)
-from odyssnet import ChaosGrad
-opt     = ChaosGrad(ChaosGrad.classify_params(model), lr=1e-4)
-trainer = OdyssNetTrainer(model, optimizer=opt)
+# Or pass your own optimizer instance explicitly
+import torch
+trainer = OdyssNetTrainer(model, optimizer=torch.optim.AdamW(model.parameters(), lr=1e-4))
 
-# ChaosGrad with per-neuron tempo (opt-in, O(N) VRAM per matrix)
+# ChaosGrad with explicit per-neuron tempo selection (row is default)
 opt = ChaosGrad(ChaosGrad.classify_params(model), lr=1e-4,
                 meta_resolution='row')
 ```
 
 > **Optimizer selection guide:**
-> - **Prodigy** (`lr=None`, default) — best for quick experiments; non-deterministic curves.
-> - **AdamW** (explicit `lr`) — reproducible runs, benchmarks, production.
-> - **ChaosGrad** (pass as `optimizer=`) — research into self-tuning dynamics; ideal when `hebb_type` is enabled (Hebbian parameters are unconditionally protected from weight decay and burst noise).
->   - `meta_resolution='row'` gives each output neuron its own `per_param_lr` / `per_param_beta` on ≥2D parameters — useful for deep/wide chaos cores where rows diverge in role. Scalar mode (default) stays back-compatible; state dicts round-trip across both.
+> - **ChaosGrad** (default when `optimizer` is not supplied) — autonomous adaptation with built-in Hebbian-safe handling.
+>   - `lr=None` uses ChaosGrad's default genesis lr (`1e-4`).
+>   - Passing explicit `lr` changes ChaosGrad genesis lr; it does not switch optimizer type.
+>   - `meta_resolution='row'` is the default and gives each output neuron its own `per_param_lr` / `per_param_beta` / `per_param_decay` / `per_param_alpha` on ≥2D parameters.
+> - **Custom optimizer** — pass `optimizer=...` when you explicitly want AdamW/SGD/etc.
 
 ---
 
@@ -305,7 +305,7 @@ Use the `prepare_input` utility implicitly via the Trainer.
         *   Data shuffling / batch sampling.
         *   Dropout and stochastic regularization.
         *   CUDA random state (for GPU consistency).
-    *   **Test:** If you run the script twice with the same seed, loss curves and final results should be **identical**, byte-for-byte. Note: this requires passing an explicit `lr` to `OdyssNetTrainer` — the default `lr=None` (Prodigy) adapts its learning rate online and will produce different curves across runs.
+    *   **Test:** If you run the script twice with the same seed, loss curves and final results should be **identical**, byte-for-byte. Prefer passing an explicit `lr` to pin ChaosGrad genesis lr for reproducible experiments.
 
 2.  **Visuals:** Your example should print a cool visualization. Don't just print "Loss: 0.01". Print the timeline.
     *   *Example:* `t=05 | Input: 1 | Output: 0.99 🟢`
@@ -458,7 +458,7 @@ If loss oscillates or training is unstable:
    trainer = OdyssNetTrainer(model, gradient_persistence=0.1)
    ```
 
-2. **Use AdamW with a lower explicit learning rate** (bypasses Prodigy):
+2. **Tune ChaosGrad genesis `lr` downward** for a calmer update scale:
    ```python
    trainer = OdyssNetTrainer(model, lr=1e-4)
    ```
@@ -552,7 +552,7 @@ When modifying the library itself (not examples), follow these additional rules:
 
 ### New/modified example scripts (`examples/`)
 1.  [ ] **Does your script call `set_seed(42)` at the START of `main()`?** (MANDATORY for reproducibility)
-2.  [ ] **Does `OdyssNetTrainer` receive an explicit `lr`?** (e.g. `lr=1e-4`). The default `lr=None` activates Prodigy, which adapts LR online and breaks byte-for-byte reproducibility. Examples must pin a float lr.
+2.  [ ] **Does `OdyssNetTrainer` receive an explicit `lr` when reproducibility matters?** (e.g. `lr=1e-4`). This pins ChaosGrad genesis lr across runs.
 3.  [ ] Did you place it in the correct folder (`examples/` for core validations, `examples/advanced/` for complex tasks)?
 4.  [ ] Are you using `OdyssNetTrainer`?
 5.  [ ] Did you select the correct `activation`, `weight_init`, and `gate` setup? (Default `resonant` + `gate=None` is fine for most tasks.)
 
@@ -158,7 +158,7 @@ By "thinking" for 15 steps, OdyssNet simulates a 15-layer deep network using **o
 Uncontrolled feedback loops lead to explosion. OdyssNet engineers the chaos to form stable **Attractors**.
 *   **StepNorm** acts as gravity, keeping energy bounded.
 *   **Tanh** filters meaningful signals while maintaining signal symmetry.
-*   **Prodigy Optimizer (default)**: Auto-calibrates the learning rate continuously — no manual tuning required. Pass an explicit `lr` to use AdamW instead.
+*   **ChaosGrad Optimizer (default)**: OdyssNetTrainer now uses ChaosGrad by default with autonomous per-parameter adaptation. Passing `lr` overrides ChaosGrad's genesis learning rate; omitting it uses the built-in default (`1e-4`).
 *   **Heterogeneous Synaptic Plasticity**: When `hebb_type` is set, temporal correlations $h_t \otimes h_{t-1}$ are accumulated each step and injected as $W_\text{eff} = W + (f_h \odot C_t)$ — where `hebb_factor` can be a global scalar, a per-neuron vector, or a full per-synapse matrix. All variants are learnable, letting the network discover how plastic each pathway should be.
 *   **The Latch Experiment** proved OdyssNet can create a stable attractor to hold a decision forever against noise.
 
 
@@ -158,7 +158,7 @@ Sinyal her nörondan diğer her nörona ($N \times N$) yolculuk eder.
 Kontrolsüz geri besleme döngüleri patlamaya yol açar. OdyssNet kaosun mühendisliğini yaparak kararlı **Çekiciler** oluşturur.
 *   **StepNorm** yerçekimi gibi davranır, enerjiyi sınırlı tutar.
 *   **Tanh** anlamlı sinyalleri filtreler ve sinyal simetrisini korur.
-*   **Prodigy Optimizer (varsayılan):** Öğrenme hızını sürekli olarak otomatik kalibre eder — manuel ayar gerekmez. Açık bir `lr` değeri geçildiğinde AdamW kullanılır.
+*   **ChaosGrad Optimizer (varsayılan):** OdyssNetTrainer artık varsayılan olarak ChaosGrad kullanır ve parametreleri otonom biçimde adapte eder. `lr` verilirse ChaosGrad'ın genesis öğrenme oranı ezilir; verilmezse yerleşik varsayılan (`1e-4`) kullanılır.
 *   **Heterojen Sinaptik Plastisitesi:** `hebb_type` ayarlandığında her adımda zamansal korelasyonlar $h_t \otimes h_{t-1}$ biriktirilir ve $W_\text{eff} = W + (f_h \odot C_t)$ olarak enjekte edilir — `hebb_factor` global bir skaler, nöron başına vektör veya tam sinaps başına matris olabilir. Tüm çeşitler öğrenilebilir olduğundan ağ, her sinaptik yolun ne kadar plastik olması gerektiğini keşfeder.
 *   **Mandal Deneyi** OdyssNet'in gürültüye karşı bir kararı sonsuza kadar tutmak için kararlı bir çekici oluşturabileceğini kanıtladı.
 
 
@@ -6,7 +6,7 @@ OdyssNet is a PyTorch-based library that implements **Zero-Hidden Layer** neural
 
 The library is organized into three primary modules:
 1.  **`odyssnet.core.network`**: The recurrent core architecture and update dynamics.
-2.  **`odyssnet.training.trainer`**: Optimization engine with AdamW and bio-inspired regularization.
+2.  **`odyssnet.training.trainer`**: Optimization engine with ChaosGrad defaulting and bio-inspired regularization.
 3.  **`odyssnet.utils`**: Data utilities, model persistence (`odyssstore`), and dynamic expansion (`neurogenesis`).
 
 ---
@@ -195,17 +195,17 @@ Runs the dynamic system.
 
 ## OdyssNet Trainer (`odyssnet.training.trainer`)
 
-The `OdyssNetTrainer` handles the training loop, gradient accumulation, mixed precision (AMP), and experimental features like Ghost Gradients. **Prodigy** is the default optimizer (auto-calibrating, no LR tuning required). Pass an explicit `lr` to use AdamW instead, or supply any custom optimizer — including **ChaosGrad**.
+The `OdyssNetTrainer` handles the training loop, gradient accumulation, mixed precision (AMP), and experimental features like Ghost Gradients. **ChaosGrad** is the default optimizer whenever `optimizer` is not explicitly provided.
 
 ### Initialization
 
 ```python
 from odyssnet import OdyssNetTrainer
 
-# Quick prototyping: Prodigy — auto-calibrates LR, no tuning needed
+# Default path: ChaosGrad (row-mode by default)
 trainer = OdyssNetTrainer(model, device='cuda')
 
-# Reproducible experiments and production: pin an explicit lr to use AdamW
+# Override ChaosGrad genesis learning rate
 trainer = OdyssNetTrainer(model, lr=1e-4, device='cuda')
 
 # With optional features
@@ -217,20 +217,20 @@ trainer = OdyssNetTrainer(
     anomaly_hook=my_hook
 )
 
-# Custom optimizer (bypasses both Prodigy and AdamW)
+# Custom optimizer (bypasses trainer's default ChaosGrad path)
 import torch
 trainer = OdyssNetTrainer(model, optimizer=torch.optim.AdamW(model.parameters(), lr=1e-4))
 
-# ChaosGrad — zero-hyperparameter optimizer (optional, see ChaosGrad section below)
+# Explicit ChaosGrad construction (optional when defaults are enough)
 from odyssnet import ChaosGrad
 opt     = ChaosGrad(ChaosGrad.classify_params(model), lr=1e-4)
 trainer = OdyssNetTrainer(model, optimizer=opt)
 ```
 
 **Parameters:**
-*   `lr` (float or None): Learning rate. Default: `None`.
-    *   `None`: **Prodigy** optimizer is used. Auto-calibrates the learning rate continuously — no manual tuning required. Requires `pip install prodigyopt`. Best for quick prototyping; produces non-deterministic loss curves across runs even with a fixed seed.
-    *   float (e.g. `1e-4`): **AdamW** optimizer is used with `weight_decay=0.01`. Recommended for reproducible experiments, benchmarking, and production runs.
+*   `lr` (float or None): ChaosGrad genesis learning rate override. Default: `None`.
+    *   `None`: uses ChaosGrad default genesis lr (`1e-4`).
+    *   float (e.g. `1e-4`): passed directly to ChaosGrad as genesis lr.
 *   `gradient_persistence` (float): **Ghost Gradients / Persistence**.
     *   `0.0`: Standard behavior (`zero_grad()` after every step).
     *   `> 0.0` (e.g., `0.1`): Keeps a percentage of the previous step's gradient. This creates a "momentum" over time, effectively simulating a larger batch size or longer temporal context. Useful for difficult convergence landscapes.
@@ -491,16 +491,16 @@ model = OdyssNet(num_neurons=10, input_ids=range(10), output_ids=range(10), voca
 
 ## ChaosGrad Optimizer (`odyssnet.training.chaos_optimizer`)
 
-ChaosGrad is a **fully optional**, zero-hyperparameter optimizer designed specifically for OdyssNet. Pass it as a custom optimizer to bypass the default Prodigy / AdamW selection.
+ChaosGrad is a zero-hyperparameter optimizer designed specifically for OdyssNet and is the trainer's default optimizer path.
 
-The trainer's default behavior is **unchanged** — Prodigy when `lr=None`, AdamW when `lr=float`.
+The trainer constructs ChaosGrad whenever `optimizer` is not supplied (`lr` only overrides ChaosGrad's genesis lr).
 
 ### When to use ChaosGrad
 
 | Situation | Recommendation |
 |-----------|----------------|
-| Quick prototyping, first run | Prodigy (default) |
-| Reproducible benchmarks | AdamW with explicit `lr` |
+| Quick prototyping, first run | **ChaosGrad** (default) |
+| Reproducible benchmarks | ChaosGrad with fixed seed + explicit genesis `lr` |
 | Research into self-tuning dynamics, OdyssNet-specific regularisation | **ChaosGrad** |
 | Hebbian plasticity enabled (`hebb_type != None`) | ChaosGrad handles hebb params specially |
 
@@ -533,27 +533,27 @@ You can also pass plain `model.parameters()` without classification — every pa
 
 ### Meta Resolution — per-tensor vs. per-neuron adaptation
 
-ChaosGrad's autonomously-adapted meta-parameters (`per_param_lr`, `per_param_beta`) come in two resolutions, selected at construction time:
+ChaosGrad's autonomously-adapted meta-parameters (`per_param_lr`, `per_param_beta`, `per_param_decay`, `per_param_alpha`) come in two resolutions, selected at construction time:
 
 ```python
-# Default — one lr / beta value per tensor (lightest, fully back-compatible)
-opt = ChaosGrad(ChaosGrad.classify_params(model), lr=1e-4, meta_resolution='scalar')
-
-# Opt-in — (rows,) tensors for ≥2D parameters: one lr / beta per output neuron
+# Default — (rows,) tensors for ≥2D parameters: one value per output neuron
 opt = ChaosGrad(ChaosGrad.classify_params(model), lr=1e-4, meta_resolution='row')
+
+# Opt-in compatibility mode — one meta-param value per tensor
+opt = ChaosGrad(ChaosGrad.classify_params(model), lr=1e-4, meta_resolution='scalar')
 ```
 
-| Meta-param | `'scalar'` (default) | `'row'` |
+| Meta-param | `'scalar'` | `'row'` (default) |
 |---|---|---|
 | `per_param_lr` | single float | `(rows,)` tensor on ≥2D params, scalar on 1D params |
 | `per_param_beta` | single float | `(rows,)` tensor on ≥2D params, scalar on 1D params |
-| `per_param_decay` | single float | single float (whole-tensor shrinkage gate) |
-| `per_param_alpha` | single float | single float (whole-tensor centralization gate) |
+| `per_param_decay` | single float | `(rows,)` tensor on ≥2D params, scalar on 1D params |
+| `per_param_alpha` | single float | `(rows,)` tensor on ≥2D params, scalar on 1D params |
 | `v2` (second moment) | elementwise tensor | elementwise tensor |
-| Extra VRAM per matrix | 0 | `2N` floats (negligible next to the N² weights) |
-| Hypergradient signals | scalar `cos(·)` | per-row `cos(·)` for `sig_lr`/`sig_mom`; scalar for `sig_wd` |
+| Extra VRAM per matrix | 0 | `4N` floats (still negligible next to the N² weights) |
+| Hypergradient signals | scalar `cos(·)` | per-row `cos(·)` for `sig_lr`/`sig_mom`/`sig_wd` |
 
-**When to use `'row'`:** deep or wide recurrent cores where different output neurons are learning at different speeds — a single scalar lr averages over all of them. Each row of `W` belongs to one output neuron biologically; giving each neuron its own lr/beta matches that cell-level granularity.
+**When to use `'row'`:** deep or wide recurrent cores where different output neurons are learning at different speeds and regularization needs. Each row of `W` belongs to one output neuron biologically; giving each neuron its own lr/beta/decay/alpha matches that cell-level granularity.
 
 **Checkpoint compatibility:** `meta_resolution` is stored in `state_dict`'s `chaos_global` block. A row-mode checkpoint replays as row-mode even if the reconstructing code omitted the keyword. Loading a scalar-mode checkpoint into a freshly-constructed row-mode optimizer works too — `load_state_dict` honours the saved flag.
 
@@ -625,7 +625,7 @@ epoch, loss = load_checkpoint(model, trainer, path="run.pt", lr=5e-4)
 | **Gradient checkpointing** | Recomputes activations during backward. Gradient values reaching ChaosGrad are identical whether or not checkpointing is active. | Fully compatible. |
 | **AMP (mixed precision)** | ChaosGrad receives gradients after `scaler.unscale_()` — in float32 scale. ChaosGrad internally casts gradients to float32 (`g_f = grad.float()`). | Fully compatible. |
 | **`regenerate_synapses()`** | When weak entries of `W` are re-initialised, the trainer automatically clears ChaosGrad's per-parameter state for `W`. Cold-start recalibration happens on the next step, re-computing `init_lr` from the new gradient scale. | If `revived == 0` (no weights regenerated), state is preserved. |
-| **`transplant_weights()`** | Weight transplantation does *not* transfer optimizer state (by design — cold restart after transplant). ChaosGrad cold-starts on all parameters after loading transplanted weights. | Same behaviour as AdamW / Prodigy after transplant. |
-| **Neurogenesis (`trainer.expand()`)** | Per-parameter tensors (`momentum`, `grad_ema`) are zero-padded to the new size. Scalar state (`init_lr`, `per_param_lr`, etc.) is copied unchanged. New neurons start from cold-start calibration. Global frustration is preserved. | Fully compatible. |
+| **`transplant_weights()`** | Weight transplantation does *not* transfer optimizer state (by design — cold restart after transplant). ChaosGrad cold-starts on all parameters after loading transplanted weights. | Matches standard cold-restart behaviour in alternative optimizers. |
+| **Neurogenesis (`trainer.expand()`)** | Per-parameter tensors (`momentum`, `grad_ema`, and row-mode meta tensors) are expanded/padded to new size. Scalars are copied unchanged. New neurons start from cold-start calibration. Global frustration is preserved. | Fully compatible. |
 | **`classify_params` (skipped)** | If you pass `model.parameters()` directly instead of `classify_params(model)`, all parameters — including Hebbian logits — are treated as `lightweight`. The Hebbian bypass rule (no decay, no burst) does NOT apply. Always use `classify_params` on models with `hebb_type != None`. | Documented limitation; no crash. |
 | **Anomaly hook** | ChaosGrad has its own internal plateau escape (frustration burst). The trainer's anomaly hook fires independently based on loss statistics. The two mechanisms don't interfere. | Use both together if needed. |
@@ -1,4 +1,4 @@
-__version__ = "2.5.0"
+__version__ = "2.6.0"
 
 from .core.network import OdyssNet
 from .training.trainer import OdyssNetTrainer
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-__version__ = "2.5.0"`
	`1`	`+__version__ = "2.6.0"`
`2`	`2`
`3`	`3`	`from .core.network import OdyssNet`
`4`	`4`	`from .training.trainer import OdyssNetTrainer`