Three integrations surfaced by auditing 29 campaigns in inference-sim/.nous (May 2026). Each is opt-in / schema-additive, adds zero LLM tokens at the design seam, and ships behind a constructor injection point so tests run without live calls (per CLAUDE.md test policy).
Why now
- Most arms hard-code
10 seeds with no statistical rationale. Campaigns mech-design-kvtime (acc 25%/0%) and reviewer-gauntlet (acc 0%/0%/66%) may be underpowered rather than scientifically refuted.
- The
principles.json lifecycle (INSERT/UPDATE/PRUNE) is heuristic over a "low"/"medium"/"high" confidence string. The data-model already carries evidence (citing-arm list) — enough to ground a calibrated posterior.
composite-sensitivity-boundary (5 iters, ~65 sim runs each) and capacity-probe (rate × seed grids) hand-roll search grids that adaptive sampling would cut 30–60%.
Children
Single-PR landing
One branch off upstream/reflective, three feat commits (one per child, each Closes #<child>), tracking PR title:
Tracking #<this>: Statistical & Bayesian rigor (3 children). Mirrors #153/#161.
/goal predicate (for the tracking issue)
A PR exists with base upstream/reflective, head sriumcp:<branch>, AND the working tree of that PR satisfies ALL of:
orchestrator/power.py exists and exports required_seeds
orchestrator/principles_posterior.py exists and exports posterior accepting a posterior_fn= injection arg
orchestrator/arm_sweep.py exists and exports run_sweep accepting a sampler= injection arg
tests/test_power.py, tests/test_principles_posterior.py, tests/test_arm_sweep.py all exist and pytest -q passes
tests/conftest.py blocks live PyMC NUTS sampling and live Optuna trial execution alongside its existing live-LLM block
pytest -q exit code is 0
- the PR body references
Closes #163, Closes #164, Closes #165 and Refs #<this>
Three integrations surfaced by auditing 29 campaigns in
inference-sim/.nous(May 2026). Each is opt-in / schema-additive, adds zero LLM tokens at the design seam, and ships behind a constructor injection point so tests run without live calls (perCLAUDE.mdtest policy).Why now
10 seedswith no statistical rationale. Campaignsmech-design-kvtime(acc 25%/0%) andreviewer-gauntlet(acc 0%/0%/66%) may be underpowered rather than scientifically refuted.principles.jsonlifecycle (INSERT/UPDATE/PRUNE) is heuristic over a"low"/"medium"/"high"confidence string. The data-model already carriesevidence(citing-arm list) — enough to ground a calibrated posterior.composite-sensitivity-boundary(5 iters, ~65 sim runs each) andcapacity-probe(rate × seed grids) hand-roll search grids that adaptive sampling would cut 30–60%.Children
scipy.stats.power)Single-PR landing
One branch off
upstream/reflective, three feat commits (one per child, eachCloses #<child>), tracking PR title:Tracking #<this>: Statistical & Bayesian rigor (3 children). Mirrors #153/#161./goal predicate (for the tracking issue)
A PR exists with base
upstream/reflective, headsriumcp:<branch>, AND the working tree of that PR satisfies ALL of:orchestrator/power.pyexists and exportsrequired_seedsorchestrator/principles_posterior.pyexists and exportsposterioraccepting aposterior_fn=injection argorchestrator/arm_sweep.pyexists and exportsrun_sweepaccepting asampler=injection argtests/test_power.py,tests/test_principles_posterior.py,tests/test_arm_sweep.pyall exist andpytest -qpassestests/conftest.pyblocks live PyMC NUTS sampling and live Optuna trial execution alongside its existing live-LLM blockpytest -qexit code is 0Closes #163,Closes #164,Closes #165andRefs #<this>