Skip to content

Commit 6af5cda

Browse files
committed
Hardest-case benchmark profile is now far tougher
1 parent b4ea123 commit 6af5cda

7 files changed

Lines changed: 49 additions & 46 deletions

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ for versioning even while in research-stage development.
6161
- Enhanced benchmark visuals with a compact combined overview figure (static + interactive) and linked it in README/dashboard for faster comparison.
6262
- Added hardest-case dynamics GIF (training progression + inference decision-map evolution) and surfaced it near the top of README and docs dashboard.
6363
- Added an interactive Plotly hardest-case dynamics page with playback controls and circadian internals visualization (node/edge weights, chemical/plasticity state) on the docs dashboard.
64+
- Increased hardest-case difficulty substantially (higher drift/noise, lower phase-B train fraction, longer training horizon) and raised hidden-layer width in hardest-case runs for all three models.
6465
- Refreshed README benchmark section with a latest master verification run on 2026-02-28 and added raw output artifact under `docs/benchmarks/`.
6566
- Repositioned repository messaging to Circadian Predictive Coding as the primary focus.
6667
- Updated `README.md` with:

README.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -123,16 +123,18 @@ Raw benchmark output: [`docs/benchmarks/benchmark_master_cifar100_subset_2026-02
123123

124124
Strengths:
125125

126-
- Strong retention/adaptation balance under hard continual shift.
127-
- Hardest-case balanced score: circadian `0.922` vs predictive coding `0.916` vs backprop `0.889`.
128-
- Hardest-case retention ratio: circadian `0.994`.
129-
- Source: [`docs/benchmarks/benchmark_continual_shift_hardest_case_2026-02-28.txt`](docs/benchmarks/benchmark_continual_shift_hardest_case_2026-02-28.txt).
130-
- Dynamic capacity adaptation is observable and measurable (for hardest-case: mean splits `6.29`, hidden size `8 -> 14.29`).
126+
- Competitive retention/adaptation behavior under hard continual shift.
127+
- Strong balance in the moderate strength-case stress test (circadian balanced score `0.949` vs predictive coding `0.947` vs backprop `0.946`).
128+
- Sources:
129+
- [`docs/benchmarks/benchmark_continual_shift_strength_case_2026-02-28.txt`](docs/benchmarks/benchmark_continual_shift_strength_case_2026-02-28.txt)
130+
- [`docs/benchmarks/benchmark_continual_shift_hardest_case_2026-02-28.txt`](docs/benchmarks/benchmark_continual_shift_hardest_case_2026-02-28.txt)
131+
- Dynamic capacity adaptation is observable and measurable (updated hardest-case: mean splits `27.57`, hidden size `24 -> 51.57`).
131132
- Competitive behavior in moderate continual-shift stress tests with stable multi-seed performance.
132133

133134
Weaknesses:
134135

135136
- Not best on every benchmark; on the latest CIFAR-100 subset master check, predictive coding accuracy (`0.692`) was higher than circadian (`0.685`).
137+
- In the updated ultra-hard hardest-case setting, predictive coding currently leads circadian on balanced score (`0.844` vs `0.831`), even though circadian still outperforms backprop (`0.785`).
136138
- Extra algorithmic machinery (sleep scheduling, replay, split/prune controls) adds tuning burden and implementation complexity compared with fixed-width baselines.
137139
- Speed overhead can appear depending on configuration; in the latest CIFAR-100 subset master check, circadian train speed (`874.2` SPS) was lower than predictive coding (`965.2` SPS).
138140
- Results are regime-dependent; claims should be tied to specific benchmark settings and seeds instead of treated as universal.
@@ -195,7 +197,7 @@ Continual shift stress test (retention vs adaptation):
195197
python scripts/run_continual_shift_benchmark.py --profile strength-case --seeds 3,7,11,19,23,31,37
196198
```
197199

198-
Hardest continual-shift stress test (small starting capacity + heavy drift):
200+
Hardest continual-shift stress test (expanded hidden capacity + very heavy drift):
199201

200202
```powershell
201203
python scripts/run_continual_shift_benchmark.py --profile hardest-case --seeds 3,7,11,19,23,31,37

docs/benchmarks/benchmark_continual_shift_hardest_case_2026-02-28.txt

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,10 @@ Continual Shift Benchmark
22
-------------------------
33
Phase A trains on base distribution; phase B trains on shifted/rotated distribution.
44
Seeds: [3, 7, 11, 19, 23, 31, 37]
5-
Setup: hidden_dim=8, phaseA_epochs=90, phaseB_epochs=120, phaseA_noise=0.80, phaseB_noise=1.20
6-
Phase B transform: rotation=44.0 deg, translation=(0.90, -0.70)
7-
Phase B train fraction: 0.08
5+
Setup: hidden_dim=24, phaseA_epochs=120, phaseB_epochs=180, phaseA_noise=0.80, phaseB_noise=1.45
6+
Phase B transform: rotation=68.0 deg, translation=(1.60, -1.30)
7+
Phase B train fraction: 0.05
88

9-
Backprop: A_pre=0.975+/-0.014, A_post=0.888+/-0.081, B_post=0.889+/-0.031, retention=0.911+/-0.083, balanced=0.889+/-0.052
10-
Predictive coding: A_pre=0.979+/-0.013, A_post=0.958+/-0.018, B_post=0.874+/-0.028, retention=0.978+/-0.021, balanced=0.916+/-0.015
11-
Circadian predictive coding: A_pre=0.973+/-0.015, A_post=0.967+/-0.017, B_post=0.878+/-0.031, retention=0.994+/-0.021, balanced=0.922+/-0.014, sleep_events=6.29, splits=6.29, prunes=0.00, hidden_end=14.29
9+
Backprop: A_pre=0.975+/-0.009, A_post=0.714+/-0.145, B_post=0.856+/-0.025, retention=0.732+/-0.145, balanced=0.785+/-0.064
10+
Predictive coding: A_pre=0.971+/-0.010, A_post=0.852+/-0.130, B_post=0.836+/-0.037, retention=0.878+/-0.131, balanced=0.844+/-0.058
11+
Circadian predictive coding: A_pre=0.972+/-0.009, A_post=0.829+/-0.143, B_post=0.833+/-0.029, retention=0.852+/-0.148, balanced=0.831+/-0.064, sleep_events=13.86, splits=27.57, prunes=0.00, hidden_end=51.57
323 KB
Loading

docs/figures/interactive_hardest_mode_dynamics.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

scripts/generate_hardest_mode_dynamics.py

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -40,21 +40,21 @@
4040
@dataclass(frozen=True)
4141
class HardestModeConfig:
4242
seed: int = 7
43-
sample_count_phase_a: int = 500
44-
sample_count_phase_b: int = 500
43+
sample_count_phase_a: int = 700
44+
sample_count_phase_b: int = 700
4545
test_ratio: float = 0.25
46-
phase_b_train_fraction: float = 0.08
47-
hidden_dim: int = 8
48-
phase_a_epochs: int = 90
49-
phase_b_epochs: int = 120
46+
phase_b_train_fraction: float = 0.05
47+
hidden_dim: int = 24
48+
phase_a_epochs: int = 120
49+
phase_b_epochs: int = 180
5050
phase_a_noise: float = 0.8
51-
phase_b_noise: float = 1.2
52-
phase_b_rotation_degrees: float = 44.0
53-
phase_b_translation_x: float = 0.9
54-
phase_b_translation_y: float = -0.7
51+
phase_b_noise: float = 1.45
52+
phase_b_rotation_degrees: float = 68.0
53+
phase_b_translation_x: float = 1.6
54+
phase_b_translation_y: float = -1.3
5555
sleep_interval_phase_a: int = 40
56-
sleep_interval_phase_b: int = 8
57-
snapshot_interval: int = 4
56+
sleep_interval_phase_b: int = 6
57+
snapshot_interval: int = 6
5858
decision_grid_size: int = 110
5959
latency_repeats: int = 10
6060
gif_duration_ms: int = 120
@@ -110,14 +110,14 @@ def parse_args() -> argparse.Namespace:
110110
def build_hardest_circadian_config() -> CircadianConfig:
111111
return CircadianConfig(
112112
use_reward_modulated_learning=False,
113-
split_threshold=0.25,
113+
split_threshold=0.22,
114114
prune_threshold=0.04,
115-
max_split_per_sleep=1,
115+
max_split_per_sleep=2,
116116
max_prune_per_sleep=0,
117-
replay_steps=2,
118-
replay_memory_size=10,
117+
replay_steps=3,
118+
replay_memory_size=14,
119119
replay_learning_rate=0.04,
120-
replay_inference_steps=12,
120+
replay_inference_steps=14,
121121
replay_inference_learning_rate=0.14,
122122
)
123123

scripts/run_continual_shift_benchmark.py

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -154,14 +154,14 @@ def _build_hardest_case_circadian_config() -> CircadianConfig:
154154
"""Build circadian profile tuned for the hardest continual-shift setup."""
155155
return CircadianConfig(
156156
use_reward_modulated_learning=False,
157-
split_threshold=0.25,
157+
split_threshold=0.22,
158158
prune_threshold=0.04,
159-
max_split_per_sleep=1,
159+
max_split_per_sleep=2,
160160
max_prune_per_sleep=0,
161-
replay_steps=2,
162-
replay_memory_size=10,
161+
replay_steps=3,
162+
replay_memory_size=14,
163163
replay_learning_rate=0.04,
164-
replay_inference_steps=12,
164+
replay_inference_steps=14,
165165
replay_inference_learning_rate=0.14,
166166
)
167167

@@ -173,19 +173,19 @@ def _build_baseline_circadian_config() -> CircadianConfig:
173173
def _build_profile_defaults(profile: str) -> ProfileDefaults:
174174
if profile == "hardest-case":
175175
return ProfileDefaults(
176-
sample_count_phase_a=500,
177-
sample_count_phase_b=500,
178-
phase_b_train_fraction=0.08,
179-
phase_a_epochs=90,
180-
phase_b_epochs=120,
181-
hidden_dim=8,
176+
sample_count_phase_a=700,
177+
sample_count_phase_b=700,
178+
phase_b_train_fraction=0.05,
179+
phase_a_epochs=120,
180+
phase_b_epochs=180,
181+
hidden_dim=24,
182182
phase_a_noise_scale=0.8,
183-
phase_b_noise_scale=1.2,
184-
phase_b_rotation_degrees=44.0,
185-
phase_b_translation_x=0.9,
186-
phase_b_translation_y=-0.7,
183+
phase_b_noise_scale=1.45,
184+
phase_b_rotation_degrees=68.0,
185+
phase_b_translation_x=1.6,
186+
phase_b_translation_y=-1.3,
187187
sleep_interval_phase_a=40,
188-
sleep_interval_phase_b=8,
188+
sleep_interval_phase_b=6,
189189
)
190190
return ProfileDefaults(
191191
sample_count_phase_a=500,

0 commit comments

Comments
 (0)