Skip to content

Tracking: Statistical & Bayesian rigor in Nous (3 children — power analysis, principle posteriors, sub-arm sweeps) #162

@sriumcp

Description

@sriumcp

Three integrations surfaced by auditing 29 campaigns in inference-sim/.nous (May 2026). Each is opt-in / schema-additive, adds zero LLM tokens at the design seam, and ships behind a constructor injection point so tests run without live calls (per CLAUDE.md test policy).

Why now

  • Most arms hard-code 10 seeds with no statistical rationale. Campaigns mech-design-kvtime (acc 25%/0%) and reviewer-gauntlet (acc 0%/0%/66%) may be underpowered rather than scientifically refuted.
  • The principles.json lifecycle (INSERT/UPDATE/PRUNE) is heuristic over a "low"/"medium"/"high" confidence string. The data-model already carries evidence (citing-arm list) — enough to ground a calibrated posterior.
  • composite-sensitivity-boundary (5 iters, ~65 sim runs each) and capacity-probe (rate × seed grids) hand-roll search grids that adaptive sampling would cut 30–60%.

Children

Single-PR landing

One branch off upstream/reflective, three feat commits (one per child, each Closes #<child>), tracking PR title:
Tracking #<this>: Statistical & Bayesian rigor (3 children). Mirrors #153/#161.

/goal predicate (for the tracking issue)

A PR exists with base upstream/reflective, head sriumcp:<branch>, AND the working tree of that PR satisfies ALL of:

  • orchestrator/power.py exists and exports required_seeds
  • orchestrator/principles_posterior.py exists and exports posterior accepting a posterior_fn= injection arg
  • orchestrator/arm_sweep.py exists and exports run_sweep accepting a sampler= injection arg
  • tests/test_power.py, tests/test_principles_posterior.py, tests/test_arm_sweep.py all exist and pytest -q passes
  • tests/conftest.py blocks live PyMC NUTS sampling and live Optuna trial execution alongside its existing live-LLM block
  • pytest -q exit code is 0
  • the PR body references Closes #163, Closes #164, Closes #165 and Refs #<this>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions