ItCodinTime
diff --git a/‎README.md‎
Lines changed: 32 additions & 0 deletions b/‎README.md‎
Lines changed: 32 additions & 0 deletions
diff --git a/‎benchmarks/cifar10_adaptive_neat_vs_baselines.py‎
Lines changed: 37 additions & 0 deletions b/‎benchmarks/cifar10_adaptive_neat_vs_baselines.py‎
Lines changed: 37 additions & 0 deletions
diff --git a/‎benchmarks/glue_sst2_adaptive_neat_vs_baselines.py‎
Lines changed: 39 additions & 0 deletions b/‎benchmarks/glue_sst2_adaptive_neat_vs_baselines.py‎
Lines changed: 39 additions & 0 deletions
@@ -252,6 +252,8 @@ cleanly with:
 - Keras integration tests
 - reference/native parity tests
 - a real Keras MLP benchmark against SGD, Adam, and AdamW
+- a real Keras CNN benchmark on MNIST and Fashion-MNIST
+- runnable benchmark harnesses for CIFAR-10 and GLUE SST-2
 - benchmark diagnostics and sweep tooling for NEAT-specific ablations
 
 In a small real supervised-learning experiment on the `sklearn` digits dataset,
@@ -265,6 +267,13 @@ that is plausible: the mean correction ratio is only `0.00385` and the mean
 gradient/update alignment is `0.99991`, so NEAT is behaving very close to its
 base update on this task.
 
+On a stronger short-transfer benchmark with a small CNN on `MNIST` and
+`Fashion-MNIST` over 3 seeds and 2 epochs, adaptive NEAT now reaches the best
+mean test accuracy on both datasets: `0.9861` vs `0.9856` for Adam on MNIST,
+and `0.8786` vs `0.8725` for Adam on Fashion-MNIST. That is better evidence
+than the digits-only benchmark, but it is still not a substitute for broader
+GPU-side benchmarks such as ImageNet or GLUE.
+
 To reproduce the benchmark:
 
 ```bash
@@ -282,6 +291,29 @@ NEAT reached `94.72%` mean test accuracy across three seeds, versus `97.04%`
 for SGD with momentum and `96.85%` for Adam and AdamW. The detailed report is
 in [`docs/research/benchmarks.md`](docs/research/benchmarks.md).
 
+To run the short standard vision benchmark:
+
+```bash
+python benchmarks/vision_adaptive_neat_vs_baselines.py
+```
+
+To run the CIFAR-10 benchmark harness:
+
+```bash
+python benchmarks/cifar10_adaptive_neat_vs_baselines.py
+```
+
+To run the GLUE SST-2 benchmark harness:
+
+```bash
+python benchmarks/glue_sst2_adaptive_neat_vs_baselines.py
+```
+
+The CIFAR-10 and GLUE SST-2 harnesses are included so the repo can scale to
+stronger benchmark environments. They are runnable here, but full credible
+ImageNet- or broad-GLUE-style evidence still requires a stronger machine than
+this local CPU-only setup.
+
 ## Development
 
 Useful commands:
 
@@ -0,0 +1,37 @@
+"""Run a short CIFAR-10 benchmark for adaptive NEAT."""
+
+from __future__ import annotations
+
+import json
+import sys
+from pathlib import Path
+
+if __package__ in {None, ""}:
+    sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
+
+from benchmarks.tasks.keras_cifar10 import Cifar10BenchmarkConfig, run_benchmark
+
+
+def main() -> None:
+    result = run_benchmark(
+        Cifar10BenchmarkConfig(
+            seeds=(7, 11, 19),
+            epochs=3,
+            batch_size=128,
+            validation_size=5000,
+        )
+    )
+    output = Path(f"benchmarks/results/cifar10_adaptive_neat_{result['date']}.json")
+    output.write_text(json.dumps(result, indent=2))
+    print(output)
+    for row in result["summary"]:
+        print(
+            row["optimizer"],
+            row["mean_test_accuracy"],
+            row["mean_test_loss"],
+            row["mean_seconds"],
+        )
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,39 @@
+"""Run a short GLUE SST-2 benchmark for adaptive NEAT."""
+
+from __future__ import annotations
+
+import json
+import sys
+from pathlib import Path
+
+if __package__ in {None, ""}:
+    sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
+
+from benchmarks.tasks.glue_sst2 import GlueSst2BenchmarkConfig, run_benchmark
+
+
+def main() -> None:
+    result = run_benchmark(
+        GlueSst2BenchmarkConfig(
+            seeds=(7, 11, 19),
+            epochs=2,
+            batch_size=128,
+            max_tokens=20000,
+            sequence_length=64,
+            embedding_dim=128,
+        )
+    )
+    output = Path(f"benchmarks/results/glue_sst2_adaptive_neat_{result['date']}.json")
+    output.write_text(json.dumps(result, indent=2))
+    print(output)
+    for row in result["summary"]:
+        print(
+            row["optimizer"],
+            row["mean_test_accuracy"],
+            row["mean_test_loss"],
+            row["mean_seconds"],
+        )
+
+
+if __name__ == "__main__":
+    main()