Skip to content

Commit a84f763

Browse files
committed
Website / Keras Packaging
1 parent ca31aaa commit a84f763

16 files changed

Lines changed: 2308 additions & 44 deletions

README.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -252,6 +252,8 @@ cleanly with:
252252
- Keras integration tests
253253
- reference/native parity tests
254254
- a real Keras MLP benchmark against SGD, Adam, and AdamW
255+
- a real Keras CNN benchmark on MNIST and Fashion-MNIST
256+
- runnable benchmark harnesses for CIFAR-10 and GLUE SST-2
255257
- benchmark diagnostics and sweep tooling for NEAT-specific ablations
256258

257259
In a small real supervised-learning experiment on the `sklearn` digits dataset,
@@ -265,6 +267,13 @@ that is plausible: the mean correction ratio is only `0.00385` and the mean
265267
gradient/update alignment is `0.99991`, so NEAT is behaving very close to its
266268
base update on this task.
267269

270+
On a stronger short-transfer benchmark with a small CNN on `MNIST` and
271+
`Fashion-MNIST` over 3 seeds and 2 epochs, adaptive NEAT now reaches the best
272+
mean test accuracy on both datasets: `0.9861` vs `0.9856` for Adam on MNIST,
273+
and `0.8786` vs `0.8725` for Adam on Fashion-MNIST. That is better evidence
274+
than the digits-only benchmark, but it is still not a substitute for broader
275+
GPU-side benchmarks such as ImageNet or GLUE.
276+
268277
To reproduce the benchmark:
269278

270279
```bash
@@ -282,6 +291,29 @@ NEAT reached `94.72%` mean test accuracy across three seeds, versus `97.04%`
282291
for SGD with momentum and `96.85%` for Adam and AdamW. The detailed report is
283292
in [`docs/research/benchmarks.md`](docs/research/benchmarks.md).
284293

294+
To run the short standard vision benchmark:
295+
296+
```bash
297+
python benchmarks/vision_adaptive_neat_vs_baselines.py
298+
```
299+
300+
To run the CIFAR-10 benchmark harness:
301+
302+
```bash
303+
python benchmarks/cifar10_adaptive_neat_vs_baselines.py
304+
```
305+
306+
To run the GLUE SST-2 benchmark harness:
307+
308+
```bash
309+
python benchmarks/glue_sst2_adaptive_neat_vs_baselines.py
310+
```
311+
312+
The CIFAR-10 and GLUE SST-2 harnesses are included so the repo can scale to
313+
stronger benchmark environments. They are runnable here, but full credible
314+
ImageNet- or broad-GLUE-style evidence still requires a stronger machine than
315+
this local CPU-only setup.
316+
285317
## Development
286318

287319
Useful commands:
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
"""Run a short CIFAR-10 benchmark for adaptive NEAT."""
2+
3+
from __future__ import annotations
4+
5+
import json
6+
import sys
7+
from pathlib import Path
8+
9+
if __package__ in {None, ""}:
10+
sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
11+
12+
from benchmarks.tasks.keras_cifar10 import Cifar10BenchmarkConfig, run_benchmark
13+
14+
15+
def main() -> None:
16+
result = run_benchmark(
17+
Cifar10BenchmarkConfig(
18+
seeds=(7, 11, 19),
19+
epochs=3,
20+
batch_size=128,
21+
validation_size=5000,
22+
)
23+
)
24+
output = Path(f"benchmarks/results/cifar10_adaptive_neat_{result['date']}.json")
25+
output.write_text(json.dumps(result, indent=2))
26+
print(output)
27+
for row in result["summary"]:
28+
print(
29+
row["optimizer"],
30+
row["mean_test_accuracy"],
31+
row["mean_test_loss"],
32+
row["mean_seconds"],
33+
)
34+
35+
36+
if __name__ == "__main__":
37+
main()
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
"""Run a short GLUE SST-2 benchmark for adaptive NEAT."""
2+
3+
from __future__ import annotations
4+
5+
import json
6+
import sys
7+
from pathlib import Path
8+
9+
if __package__ in {None, ""}:
10+
sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
11+
12+
from benchmarks.tasks.glue_sst2 import GlueSst2BenchmarkConfig, run_benchmark
13+
14+
15+
def main() -> None:
16+
result = run_benchmark(
17+
GlueSst2BenchmarkConfig(
18+
seeds=(7, 11, 19),
19+
epochs=2,
20+
batch_size=128,
21+
max_tokens=20000,
22+
sequence_length=64,
23+
embedding_dim=128,
24+
)
25+
)
26+
output = Path(f"benchmarks/results/glue_sst2_adaptive_neat_{result['date']}.json")
27+
output.write_text(json.dumps(result, indent=2))
28+
print(output)
29+
for row in result["summary"]:
30+
print(
31+
row["optimizer"],
32+
row["mean_test_accuracy"],
33+
row["mean_test_loss"],
34+
row["mean_seconds"],
35+
)
36+
37+
38+
if __name__ == "__main__":
39+
main()

0 commit comments

Comments
 (0)