Version 0.79.3. The export inventory + per-symbol signatures live in the generated
docs/api/reference:docs/api/primitive-catalog.mdis the never-stale, grouped list of every primitive to reuse (own surface + the agent-eval judge / authenticity / verification / statistics / campaign / token-usage surfaces), with each one's import path and one-line summary read live from source; the per-module pages hold the full signatures. The pinned substrate is agent-eval>=0.97.0 <1.0.0; the sandbox substrate that materializes profiles into harness shapes is@tangle-network/sandbox(peer>=0.8.0 <1.0.0). The neutral contract types (AgentProfile,AgentProfileMcpServer,HarnessType,ReasoningEffort,Part/ToolPart/ToolState, plus environment-provider types) are owned by@tangle-network/agent-interface(peer>=0.14.0 <1.0.0) — the single source of truth. Substrate primitives are re-exported through@tangle-network/agent-eval/contract(or/campaign), not local to this package — the catalog's §2 shows exactly which subpath each lives under.
./loopsis the runtime barrel —package.jsonmaps it tosrc/runtime/index.ts. Everything below labelled/loopsis the recursive-atom + loop-kernel surface.Read this before writing any orchestration, optimization, or measurement code in this repo. If you are about to write a persona⟷agent conversation runner, a "skill optimizer", a "profile-seam", a depth-vs-breadth A/B harness, a bootstrap loop, or a
new Sandbox(...)+ stream + read dance — stop, it already exists, and a parallel will silently break a load-bearing invariant (equal-k, selector≠judge, capture-integrity, or eval/prod parity).
A genome (an AgentProfile: systemPrompt + skills + tools + mcp + knowledge + memory + rag — one combined surface) is run as a driver⟷worker conversation (runPersonified composing a combinator like loopUntil/fanout over the keystone Supervisor — K rounds spent against one persistent, journaled, resumable artifact on a conserved budget pool so equal-compute holds by construction) over a benchmark (the ADAPTERS registry driven by runGate over the Supervisor, or an AgenticSurface driven by runBenchmark/runAgentic), then optimized by a gated loop (selfImprove/runImprovementLoop + gepaProposer, certified by defaultProductionGate/heldOutGate/promotionGate, or the full multi-generation runStrategyEvolution) that evolves the genome and certifies wins on a frozen holdout — never on the training composite. The selector is never the judge; observation attaches to the loop via RuntimeHooks, never to the portable genome.
Two substrates implement the same recursive-atom over the one Executor port and share defaultSelectWinner — a deliberate pair, do not invent a third: the reactive Supervisor/Scope + personify combinators (the agent-driver; equal-k by construction via the conserved budget pool — prefer for NEW recursive/keystone work) and the round-synchronous runLoop kernel (the leaf; what most sandbox benches drive today). inlineSandboxClient adapts any non-box Executor into a SandboxClient for runLoop, and settledToIteration bridges reactive Settleds into the kernel's Iteration, so the two interoperate without forking selection or metering.
1.5 The AgentProfile law — author the profile, the substrate materializes it (WE KEEP FORGETTING THIS)
An agent IS its AgentProfile, and the profile is the WHOLE agent — not just a prompt. The surface is systemPrompt + skills + tools + mcp + subagents + hooks + permissions + memory/rag + model (the AgentProfile* family in @tangle-network/sandbox, constructed via defineAgentProfile). System prompt ≠ skills — skills are separate, invokable how-tos the agent reads when prompted to invoke them; never concatenate a skill body into the system prompt.
You change an agent's behavior by changing its PROFILE — never by writing orchestration code around it. The behaviors we keep hand-rolling are profile properties:
- Self-verification is a profile lever, three ways, all configuration and zero glue code: (1) steered — the prompt says "run the tests, read failures, fix, repeat"; (2) process-defined — its instructions make verify-after-every-change its standing process; or (3) a post-finish hook that auto-runs the check and feeds failures back. The harness runs that loop. You do not write a per-round judge, a
while(!done), or a bash hill-climb. - Iteration, delegation, audit-against-spec are likewise hooks / subagents / skills / process in the profile.
The sandbox substrate materializes a profile into the harness's real shapes — so author the GENERAL profile and NEVER code to a harness. @tangle-network/sandbox renders an AgentProfile into whatever the running harness needs (instructions file, tool/MCP config, mounted skills, hooks, subagents). opencode / Claude Code / Codex are interchangeable targets; opencode is only the local test substrate behind the cli-bridge. Do NOT write harness-specific config or a profile → opencode.json realizer. A lever that isn't materialized yet is a substrate gap to fill in @tangle-network/sandbox, not a bespoke realizer here.
Therefore the supervisor's only intelligence is AUTHORING full profiles — the optimizable self-improvement surface: read the task, decompose it, and for each sub-task author the complete profile (which prompt, skills, tools/MCP, hooks, subagents, model). The quality of a worker IS the quality of the profile authored for it. The harness executes; you compose.
This table is judgment-only: it maps an intent to the ONE primitive to reach for and the thing NOT to build. It is not an inventory — for the full list of what exists (every export, its import path, its one-line summary) see the generated docs/api/primitive-catalog.md; for full signatures, the per-module docs/api/ pages. Each row tags its import subpath; a row is a LOCAL export of this package unless tagged with a substrate package (agent-eval/contract, agent-eval/campaign, @tangle-network/sandbox) or bench.
| I want to… | Use (import) | Do NOT build |
|---|---|---|
| Just run a supervisor to a goal (one call, scaffolding defaulted) — START HERE | supervise(profile, task, { budget, backend? }) — /loops |
hand-wiring createSupervisor().run + blobs/perWorker/journal/executors; reaching for the lower-level run-verbs below before you need a specific counterparty |
| Run a genome through a topology shape over the keystone Supervisor, end-to-end | runPersonified({ persona, shape, task, budget }) — /loops |
a hand-rolled createSupervisor().run + seam-wiring helper |
| Loop a worker over one evolving artifact, K rounds, stop-when-good | loopUntil(seed, spec) as the shape — /loops |
a while(!done){runWorker();decide()} hand-loop or "multi-attempt refine driver" |
| Run a worker agent under test conversing with a simulated-user persona, K rounds, worker-only metered | runPersonaConversation({ worker, persona, backendFor, systemPromptOf }) — root . (also /loops) |
a hand-rolled per-agent dispatchWithSurface bridge / eval-dispatch loop |
Run two AgentProfiles head-to-head over a persistent transcript |
runConversation(...) — root . |
a hand-rolled two-agent turn loop |
| Drop a persona⟷agent conversation into an eval matrix as its dispatch | runPersonaDispatch → runProfileMatrix({ dispatch }) — root . / agent-eval/campaign |
a per-agent custom dispatch bridge |
| Best-of-N / parallel-research / map-reduce at equal compute | fanout(items, opts) — /loops |
Promise.all over N calls + manual argmax/merge (bypasses the budget pool → breaks equal-k) |
| Produce-then-gate with a real checker | verify(spec) — /loops |
"generate, then self-check with the same model, ship if ok" (collapses selector+judge) |
| Multi-judge review / rubric quorum over one artifact | panel(spec) — /loops |
a judge ensemble that feeds one judge's score into another |
| Fixed sequential chain (plan→implement→…) | pipeline(stages) — /loops |
hand-chained awaits passing outputs along |
| Adaptive tree search / progressive widening | widen(spec) + flatWidenGate() — /loops |
a best-first/MCTS that reads child scores to expand (selector=judge); keep flatWidenGate() until your gate is proven |
| Define the genome record for a personified run | definePersona(input) — /loops |
a "profile-seam" / agent-config wrapper carrying model+prompt+tools+role |
| Make a worker self-verify / iterate / audit | a hook / process / skill on its authored AgentProfile — §1.5 |
a per-round judge, a while(!done) loop, or a bash hill-climb (it's a profile lever) |
| Run an authored profile on a real harness | author the AgentProfile, hand it to the sandbox substrate — @tangle-network/sandbox (defineAgentProfile) |
a profile → opencode.json realizer or any harness-specific config writer |
| Have the supervisor design its workers | author a full AgentProfile per sub-task (prompt+skills+tools+mcp+hooks+subagents) — /loops |
author a bare systemPrompt string (a worker can't act on levers it has no levers for) |
| Write a custom driver Agent and run it directly | createSupervisor().run(root, task, opts) — /loops |
a bespoke orchestrator that spawns sub-agents and tallies cost (equal-compute claim breaks there) |
| Run depth-vs-breadth (or a custom strategy) over a stateful tool domain | runAgentic({ surface, task, mode|strategy, budget }) — /loops |
a hand-rolled Supervisor.run + journal/registry, or a depth/breadth loop |
| Author a new topology/strategy compactly | defineStrategy(name, body) using ctx.shot()+ctx.critique() — /loops |
a 70-line driver with scope.spawn/scope.next ceremony, or trusting a body-returned score |
| Compare strategies + get a significance report on a domain | runBenchmark({ environment, tasks, worker, strategies }) — /loops |
your own strategy-comparison loop / paired-bootstrap / Pareto math |
| Add a stateful tool-using domain | implement AgenticSurface (5 hooks: open/tools/call/score/close) — /loops |
a bespoke per-benchmark agent runner / tool-loop harness |
| Run a sandbox coding rollout, round-synchronous (fresh box per round) | runLoop(options) — /loops |
a new Sandbox()+acquire+stream+parse+delete loop, or a 2nd winner-selector |
| Run + resume ONE persistent box across turns | openSandboxRun(client, opts, deliverable) — /loops |
a per-domain new Sandbox+box.fs.read+delete copy |
| Pick / register a leaf backend, or bring your own agent | createExecutor({ backend }) / createExecutorRegistry() / implement Executor — /loops |
a per-vendor adapter or closed inline|sandbox|cli switch (won't report through the UsageEvent channel) |
| Evolve a prompt/string surface | gepaProposer({ llm, model, target }) (default inside selfImprove; the skill-surface twin is skillOptProposer, same source) — agent-eval/campaign |
a hand-rolled prompt-mutation reflection loop with its own Pareto bookkeeping |
| Self-improve a profile (one pluggable verb) — START HERE | improve(profile, findings, { surface, gate }) — root . (the RSI verb; defaults the generator from surface, wraps selfImprove) |
a bespoke optimize loop, or calling selfImprove/a skill-optimizer directly for the common case |
| Measure one profile artifact's marginal lift (with-vs-without, score+cost) / catalog artifacts | measureMarginalLift(...) / ArtifactRegistry (applyArtifact is the one ArtifactKind→AgentProfile-field bridge) — /lifecycle |
a hand-rolled with/without ablation loop, or a per-kind if kind==='skill'… profile-field switch |
| Run the whole artifact lifecycle — generate→measure→promote→store→compose, then drift-watch/dedupe the live set — over ANY profile surface (skill/prompt/tool/MCP) | runLifecycle({ baseline, generators, evalRunner, gate }) then composeProfile(registry, base, query); maintain with driftWatch(...) / dedupeArtifacts(...) — /lifecycle |
a per-surface improve loop, a hand-rolled promote→compose step, or re-running measureMarginalLift without the registry/gate spine. The ONLY per-surface code is a thin CandidateGenerator (skillGenerator distills, promptGenerator/buildableGenerator for the rest) |
| Run the self-improvement loop with full substrate control | selfImprove({ agent, scenarios, judge, baselineSurface }) — agent-eval/contract |
a bespoke optimize loop or a parallel skill-optimizer |
| Run the gated loop with full control | runImprovementLoop({ baselineSurface, dispatchWithSurface, driver, holdoutScenarios, gate }) — agent-eval/contract |
your own propose→campaign→rank→re-score-on-holdout→gate→PR loop |
| Decide ship/hold on a candidate (campaign context) | defaultProductionGate({ holdoutScenarios, deltaThreshold }); compose with heldOutGate / composeGate — agent-eval/contract |
a raw h1>h0 point comparison on the training set |
Decide ship/hold from a BenchmarkReport (per-task cells) |
promotionGate({ report, incumbent, candidate }) — /loops |
comparing two strategies' mean scores directly; re-deriving the bootstrap |
| Run the full multi-generation strategy flywheel + certify | runStrategyEvolution(config) — /loops |
a bespoke gen0→author→gen1→holdout loop with hand-rolled champion selection |
| Add or run a benchmark from the CLI/harness | ADAPTERS / resolveAdapter(key), run via bench/src/gate-cli.mts |
a per-script switch(bench) or a local benchmark-factory map |
| Wire a new benchmark | implement BenchmarkAdapter (5 methods) + feed to runGate — bench |
a bespoke per-benchmark run script with its own (self-authored) scoring |
| Measure a topology on a benchmark at equal compute | runGate(cfg) (or runAgentic/runBenchmark) — equal-k holds via the conserved budget pool — bench//loops |
a batch-blind/batch-oracle/compare zoo, your own usage capture, or equal-k bookkeeping |
| Observe a run's full cost/time | createWaterfallCollector() → anytimeReport() — /loops |
a per-step cost/token tally by inspecting events yourself (drifts from billed totals) |
| Attach N observers to a running loop | composeRuntimeHooks(...) — root export |
a second event-bus or callback-prop zoo (there is ONE stream) |
| Ship traces to an OTLP collector | createOtelExporter() + buildLoopOtelSpans() — root export |
your own OTLP serializer or pulling the OTEL SDK |
| Know what got mounted into a run / why a candidate won | result.provenance.mounts / result.provenance.selectionReceipts (MountManifestEntry/SelectionReceipt/RunProvenance); declare mounts via the recordMount recorder in prepareBox — root export |
re-reading box contents to reconstruct what was mounted, or re-deriving which candidate the selector picked |
| State any benchmark/A-B claim | pairedLift(...) (bench) over pairedBootstrap/heldoutSignificance (substrate) |
your own bootstrap loop/PRNG per gate; a point lift without low/high/pairs |
| Let an agent delegate ONE generic INTENT (no fixed coder/researcher type) and get the result + real spend SYNCHRONOUSLY | the delegate tool — createDelegateHandler via createMcpServer({ delegateSupervisor }); mount it over the agent-runtime mcp bin with MCP_ENABLE_DELEGATE=1 (the bin authors a supervisor over a sandbox backend) — /mcp |
a hardcoded coder/researcher profile, or task-specific delegate_code/delegate_research verbs (RETIRED) — delegate is the ONE delegation path and the only one with a cost channel |
| Run a coding task INSIDE the agent's OWN sandbox session (a sibling box, fresh branch, validated patch) | detachedSessionDelegate({ sandboxClient | executor, workerProfile? }) — /mcp (pass the worker AgentProfile; omit for a minimal model-only default) |
a hardcoded coder profile baked into the delegate; delegate() (that spawns workers in a chosen backend, not the agent's own session) |
| Have a supervisor spawn + live-drive workers in a backend you choose and observe/steer/resume them | the coordination MCP — createCoordinationTools / serveCoordinationMcp over a live Scope; each worker's leaf is createExecutor({ backend }) — /mcp,/loops |
detachedSessionDelegate — own-sandbox-session only, one-shot, no live steer/recursion/conserved-budget |
| Stand up a vertical agent in the eval loop | defineAgent(manifest) + createSurfaceImprovementAdapter — /agent |
a per-vertical manifest parser, surface-validator, or bespoke ImprovementAdapter |
| Turn intelligence/observation OFF (prove inference-only billing) | withTangleIntelligence(agent, { effort: 'off' }) — /intelligence |
a custom trace-wrapper or hand-rolled effort/tier config |
For the full export inventory (every primitive, its import path, its summary — generated, never stale), see docs/api/primitive-catalog.md; for per-symbol signatures, the per-module docs/api/ pages. For the recursive atom (recursion · isolated-or-collaborative artifact · conserved budget · analysts) and the two-timescale architecture, see docs/architecture.md. For the genome→run→optimize→ship spine in depth, docs/concepts.md + docs/learning-flywheel.md. For the Intelligence SDK (Observe + the provable-OFF billing boundary), docs/intelligence-sdk.md.
§2 maps a fine-grained intent to a primitive; this is the coarse router one level up. Pick a front door by what you hand in. Each bottoms out at ONE function (anchored to source); the §2 rows above carry each one's "do NOT build" twin. The file:line here is accurate at this commit but is not the never-stale reference — the generated docs/api/ pages are; the freshness gate only asserts these files exist.
| You hand in… | Front door | Bottoms out at | What it is |
|---|---|---|---|
| a string intent ("fix the failing auth test") — you don't care HOW | the delegate tool |
delegate(intent, opts) — src/runtime/supervise/delegate.ts:88 (MCP handler createDelegateHandler, src/mcp/tools/delegate.ts:139) |
a default authoring supervisor decomposes the intent and writes the worker profile per sub-task; synchronous, returns the delivered output + spentTotal. The ONE delegation path. |
an authored supervisor AgentProfile + a task |
supervise(profile, task, opts) |
src/runtime/supervise/supervise.ts:102 |
the one-call LLM-brain driver over the keystone Supervisor, scaffolding defaulted. START HERE when you wrote the driver. |
| a deterministic shot grammar over a stateful tool domain | runAgentic(opts) |
src/runtime/strategy.ts:1030 |
runs a Strategy (depth/breadth/custom) through the Supervisor — programmatic, no LLM picking the shape. |
a deterministic topology combinator (loopUntil/fanout/verify/panel/pipeline) over a persona |
runPersonified(options) |
src/runtime/personify/persona.ts:131 |
composes a persona + a CombinatorShape over the Supervisor — programmatic. |
Rule of thumb: delegate = "I don't care how"; supervise = "I authored the driver"; runAgentic/runPersonified = "I want a deterministic topology, no LLM choosing the shape." All four run over the one Executor port on the conserved budget pool, so equal-compute holds by construction.
Two-agent patterns — compose a shape, don't hand-roll a turn loop:
| Pattern | Use | Bottoms out at |
|---|---|---|
| researcher → engineer (gather, then build) | defineStrategy(name, body) — both agents in one body via ctx.shot() + ctx.critique() |
src/runtime/strategy.ts:789 |
| implement → verify (build, then a SEPARATE checker gates it — selector ≠ judge) | verify(spec) as the shape |
src/runtime/personify/combinators.ts:333 |
| N-judge panel (fan judges out, merge verdicts) | panel(spec) as the shape |
src/runtime/personify/combinators.ts:273 |