`@tangle-network/agent-runtime` — Canonical API Reference

Version 0.79.3. The export inventory + per-symbol signatures live in the generated docs/api/ reference: docs/api/primitive-catalog.md is the never-stale, grouped list of every primitive to reuse (own surface + the agent-eval judge / authenticity / verification / statistics / campaign / token-usage surfaces), with each one's import path and one-line summary read live from source; the per-module pages hold the full signatures. The pinned substrate is agent-eval >=0.97.0 <1.0.0; the sandbox substrate that materializes profiles into harness shapes is @tangle-network/sandbox (peer >=0.8.0 <1.0.0). The neutral contract types (AgentProfile, AgentProfileMcpServer, HarnessType, ReasoningEffort, Part/ToolPart/ToolState, plus environment-provider types) are owned by @tangle-network/agent-interface (peer >=0.14.0 <1.0.0) — the single source of truth. Substrate primitives are re-exported through @tangle-network/agent-eval/contract (or /campaign), not local to this package — the catalog's §2 shows exactly which subpath each lives under.

./loops is the runtime barrel — package.json maps it to src/runtime/index.ts. Everything below labelled /loops is the recursive-atom + loop-kernel surface.

Read this before writing any orchestration, optimization, or measurement code in this repo. If you are about to write a persona⟷agent conversation runner, a "skill optimizer", a "profile-seam", a depth-vs-breadth A/B harness, a bootstrap loop, or a new Sandbox(...) + stream + read dance — stop, it already exists, and a parallel will silently break a load-bearing invariant (equal-k, selector≠judge, capture-integrity, or eval/prod parity).

1. Mental model — the spine

A genome (an AgentProfile: systemPrompt + skills + tools + mcp + knowledge + memory + rag — one combined surface) is run as a driver⟷worker conversation (runPersonified composing a combinator like loopUntil/fanout over the keystone Supervisor — K rounds spent against one persistent, journaled, resumable artifact on a conserved budget pool so equal-compute holds by construction) over a benchmark (the ADAPTERS registry driven by runGate over the Supervisor, or an AgenticSurface driven by runBenchmark/runAgentic), then optimized by a gated loop (selfImprove/runImprovementLoop + gepaProposer, certified by defaultProductionGate/heldOutGate/promotionGate, or the full multi-generation runStrategyEvolution) that evolves the genome and certifies wins on a frozen holdout — never on the training composite. The selector is never the judge; observation attaches to the loop via RuntimeHooks, never to the portable genome.

Two substrates implement the same recursive-atom over the one Executor port and share defaultSelectWinner — a deliberate pair, do not invent a third: the reactive Supervisor/Scope + personify combinators (the agent-driver; equal-k by construction via the conserved budget pool — prefer for NEW recursive/keystone work) and the round-synchronous runLoop kernel (the leaf; what most sandbox benches drive today). inlineSandboxClient adapts any non-box Executor into a SandboxClient for runLoop, and settledToIteration bridges reactive Settleds into the kernel's Iteration, so the two interoperate without forking selection or metering.

1.5 The AgentProfile law — author the profile, the substrate materializes it (WE KEEP FORGETTING THIS)

An agent IS its AgentProfile, and the profile is the WHOLE agent — not just a prompt. The surface is systemPrompt + skills + tools + mcp + subagents + hooks + permissions + memory/rag + model (the AgentProfile* family in @tangle-network/sandbox, constructed via defineAgentProfile). System prompt ≠ skills — skills are separate, invokable how-tos the agent reads when prompted to invoke them; never concatenate a skill body into the system prompt.

You change an agent's behavior by changing its PROFILE — never by writing orchestration code around it. The behaviors we keep hand-rolling are profile properties:

Self-verification is a profile lever, three ways, all configuration and zero glue code: (1) steered — the prompt says "run the tests, read failures, fix, repeat"; (2) process-defined — its instructions make verify-after-every-change its standing process; or (3) a post-finish hook that auto-runs the check and feeds failures back. The harness runs that loop. You do not write a per-round judge, a while(!done), or a bash hill-climb.
Iteration, delegation, audit-against-spec are likewise hooks / subagents / skills / process in the profile.

The sandbox substrate materializes a profile into the harness's real shapes — so author the GENERAL profile and NEVER code to a harness. @tangle-network/sandbox renders an AgentProfile into whatever the running harness needs (instructions file, tool/MCP config, mounted skills, hooks, subagents). opencode / Claude Code / Codex are interchangeable targets; opencode is only the local test substrate behind the cli-bridge. Do NOT write harness-specific config or a profile → opencode.json realizer. A lever that isn't materialized yet is a substrate gap to fill in @tangle-network/sandbox, not a bespoke realizer here.

Therefore the supervisor's only intelligence is AUTHORING full profiles — the optimizable self-improvement surface: read the task, decompose it, and for each sub-task author the complete profile (which prompt, skills, tools/MCP, hooks, subagents, model). The quality of a worker IS the quality of the profile authored for it. The harness executes; you compose.

2. Decision table — "I want to _ → use _ → NOT ___"

This table is judgment-only: it maps an intent to the ONE primitive to reach for and the thing NOT to build. It is not an inventory — for the full list of what exists (every export, its import path, its one-line summary) see the generated docs/api/primitive-catalog.md; for full signatures, the per-module docs/api/ pages. Each row tags its import subpath; a row is a LOCAL export of this package unless tagged with a substrate package (agent-eval/contract, agent-eval/campaign, @tangle-network/sandbox) or bench.

I want to…	Use (import)	Do NOT build
Just run a supervisor to a goal (one call, scaffolding defaulted) — START HERE	`supervise(profile, task, { budget, backend? })` — `/loops`	hand-wiring `createSupervisor().run` + `blobs`/`perWorker`/`journal`/`executors`; reaching for the lower-level run-verbs below before you need a specific counterparty
Run a genome through a topology shape over the keystone Supervisor, end-to-end	`runPersonified({ persona, shape, task, budget })` — `/loops`	a hand-rolled `createSupervisor().run` + seam-wiring helper
Loop a worker over one evolving artifact, K rounds, stop-when-good	`loopUntil(seed, spec)` as the `shape` — `/loops`	a `while(!done){runWorker();decide()}` hand-loop or "multi-attempt refine driver"
Run a worker agent under test conversing with a simulated-user persona, K rounds, worker-only metered	`runPersonaConversation({ worker, persona, backendFor, systemPromptOf })` — root `.` (also `/loops`)	a hand-rolled per-agent `dispatchWithSurface` bridge / eval-dispatch loop
Run two `AgentProfile`s head-to-head over a persistent transcript	`runConversation(...)` — root `.`	a hand-rolled two-agent turn loop
Drop a persona⟷agent conversation into an eval matrix as its dispatch	`runPersonaDispatch` → `runProfileMatrix({ dispatch })` — root `.` / `agent-eval/campaign`	a per-agent custom dispatch bridge
Best-of-N / parallel-research / map-reduce at equal compute	`fanout(items, opts)` — `/loops`	`Promise.all` over N calls + manual argmax/merge (bypasses the budget pool → breaks equal-k)
Produce-then-gate with a real checker	`verify(spec)` — `/loops`	"generate, then self-check with the same model, ship if ok" (collapses selector+judge)
Multi-judge review / rubric quorum over one artifact	`panel(spec)` — `/loops`	a judge ensemble that feeds one judge's score into another
Fixed sequential chain (plan→implement→…)	`pipeline(stages)` — `/loops`	hand-chained `await`s passing outputs along
Adaptive tree search / progressive widening	`widen(spec)` + `flatWidenGate()` — `/loops`	a best-first/MCTS that reads child scores to expand (selector=judge); keep `flatWidenGate()` until your gate is proven
Define the genome record for a personified run	`definePersona(input)` — `/loops`	a "profile-seam" / agent-config wrapper carrying model+prompt+tools+role
Make a worker self-verify / iterate / audit	a hook / process / skill on its authored `AgentProfile` — §1.5	a per-round judge, a `while(!done)` loop, or a bash hill-climb (it's a profile lever)
Run an authored profile on a real harness	author the `AgentProfile`, hand it to the sandbox substrate — `@tangle-network/sandbox` (`defineAgentProfile`)	a `profile → opencode.json` realizer or any harness-specific config writer
Have the supervisor design its workers	author a full `AgentProfile` per sub-task (prompt+skills+tools+mcp+hooks+subagents) — `/loops`	author a bare `systemPrompt` string (a worker can't act on levers it has no levers for)
Write a custom driver Agent and run it directly	`createSupervisor().run(root, task, opts)` — `/loops`	a bespoke orchestrator that spawns sub-agents and tallies cost (equal-compute claim breaks there)
Run depth-vs-breadth (or a custom strategy) over a stateful tool domain	`runAgentic({ surface, task, mode\|strategy, budget })` — `/loops`	a hand-rolled `Supervisor.run` + journal/registry, or a depth/breadth loop
Author a new topology/strategy compactly	`defineStrategy(name, body)` using `ctx.shot()`+`ctx.critique()` — `/loops`	a 70-line driver with `scope.spawn`/`scope.next` ceremony, or trusting a body-returned score
Compare strategies + get a significance report on a domain	`runBenchmark({ environment, tasks, worker, strategies })` — `/loops`	your own strategy-comparison loop / paired-bootstrap / Pareto math
Add a stateful tool-using domain	implement `AgenticSurface` (5 hooks: open/tools/call/score/close) — `/loops`	a bespoke per-benchmark agent runner / tool-loop harness
Run a sandbox coding rollout, round-synchronous (fresh box per round)	`runLoop(options)` — `/loops`	a `new Sandbox()`+acquire+stream+parse+delete loop, or a 2nd winner-selector
Run + resume ONE persistent box across turns	`openSandboxRun(client, opts, deliverable)` — `/loops`	a per-domain `new Sandbox`+`box.fs.read`+delete copy
Pick / register a leaf backend, or bring your own agent	`createExecutor({ backend })` / `createExecutorRegistry()` / implement `Executor` — `/loops`	a per-vendor adapter or closed `inline\|sandbox\|cli` switch (won't report through the `UsageEvent` channel)
Evolve a prompt/string surface	`gepaProposer({ llm, model, target })` (default inside `selfImprove`; the skill-surface twin is `skillOptProposer`, same source) — `agent-eval/campaign`	a hand-rolled prompt-mutation reflection loop with its own Pareto bookkeeping
Self-improve a profile (one pluggable verb) — START HERE	`improve(profile, findings, { surface, gate })` — root `.` (the RSI verb; defaults the generator from `surface`, wraps `selfImprove`)	a bespoke optimize loop, or calling `selfImprove`/a skill-optimizer directly for the common case
Measure one profile artifact's marginal lift (with-vs-without, score+cost) / catalog artifacts	`measureMarginalLift(...)` / `ArtifactRegistry` (`applyArtifact` is the one `ArtifactKind`→`AgentProfile`-field bridge) — `/lifecycle`	a hand-rolled with/without ablation loop, or a per-kind `if kind==='skill'…` profile-field switch
Run the whole artifact lifecycle — generate→measure→promote→store→compose, then drift-watch/dedupe the live set — over ANY profile surface (skill/prompt/tool/MCP)	`runLifecycle({ baseline, generators, evalRunner, gate })` then `composeProfile(registry, base, query)`; maintain with `driftWatch(...)` / `dedupeArtifacts(...)` — `/lifecycle`	a per-surface improve loop, a hand-rolled promote→compose step, or re-running `measureMarginalLift` without the registry/gate spine. The ONLY per-surface code is a thin `CandidateGenerator` (`skillGenerator` distills, `promptGenerator`/`buildableGenerator` for the rest)
Run the self-improvement loop with full substrate control	`selfImprove({ agent, scenarios, judge, baselineSurface })` — `agent-eval/contract`	a bespoke optimize loop or a parallel skill-optimizer
Run the gated loop with full control	`runImprovementLoop({ baselineSurface, dispatchWithSurface, driver, holdoutScenarios, gate })` — `agent-eval/contract`	your own propose→campaign→rank→re-score-on-holdout→gate→PR loop
Decide ship/hold on a candidate (campaign context)	`defaultProductionGate({ holdoutScenarios, deltaThreshold })`; compose with `heldOutGate` / `composeGate` — `agent-eval/contract`	a raw `h1>h0` point comparison on the training set
Decide ship/hold from a `BenchmarkReport` (per-task cells)	`promotionGate({ report, incumbent, candidate })` — `/loops`	comparing two strategies' mean scores directly; re-deriving the bootstrap
Run the full multi-generation strategy flywheel + certify	`runStrategyEvolution(config)` — `/loops`	a bespoke gen0→author→gen1→holdout loop with hand-rolled champion selection
Add or run a benchmark from the CLI/harness	`ADAPTERS` / `resolveAdapter(key)`, run via `bench/src/gate-cli.mts`	a per-script `switch(bench)` or a local benchmark-factory map
Wire a new benchmark	implement `BenchmarkAdapter` (5 methods) + feed to `runGate` — `bench`	a bespoke per-benchmark run script with its own (self-authored) scoring
Measure a topology on a benchmark at equal compute	`runGate(cfg)` (or `runAgentic`/`runBenchmark`) — equal-k holds via the conserved budget pool — `bench`/`/loops`	a batch-blind/batch-oracle/compare zoo, your own usage capture, or equal-k bookkeeping
Observe a run's full cost/time	`createWaterfallCollector()` → `anytimeReport()` — `/loops`	a per-step cost/token tally by inspecting events yourself (drifts from billed totals)
Attach N observers to a running loop	`composeRuntimeHooks(...)` — root export	a second event-bus or callback-prop zoo (there is ONE stream)
Ship traces to an OTLP collector	`createOtelExporter()` + `buildLoopOtelSpans()` — root export	your own OTLP serializer or pulling the OTEL SDK
Know what got mounted into a run / why a candidate won	`result.provenance.mounts` / `result.provenance.selectionReceipts` (`MountManifestEntry`/`SelectionReceipt`/`RunProvenance`); declare mounts via the `recordMount` recorder in `prepareBox` — root export	re-reading box contents to reconstruct what was mounted, or re-deriving which candidate the selector picked
State any benchmark/A-B claim	`pairedLift(...)` (bench) over `pairedBootstrap`/`heldoutSignificance` (substrate)	your own bootstrap loop/PRNG per gate; a point lift without `low/high/pairs`
Let an agent delegate ONE generic INTENT (no fixed coder/researcher type) and get the result + real spend SYNCHRONOUSLY	the `delegate` tool — `createDelegateHandler` via `createMcpServer({ delegateSupervisor })`; mount it over the `agent-runtime mcp` bin with `MCP_ENABLE_DELEGATE=1` (the bin authors a supervisor over a `sandbox` backend) — `/mcp`	a hardcoded coder/researcher profile, or task-specific `delegate_code`/`delegate_research` verbs (RETIRED) — `delegate` is the ONE delegation path and the only one with a cost channel
Run a coding task INSIDE the agent's OWN sandbox session (a sibling box, fresh branch, validated patch)	`detachedSessionDelegate({ sandboxClient \| executor, workerProfile? })` — `/mcp` (pass the worker `AgentProfile`; omit for a minimal model-only default)	a hardcoded coder profile baked into the delegate; `delegate()` (that spawns workers in a chosen backend, not the agent's own session)
Have a supervisor spawn + live-drive workers in a backend you choose and observe/steer/resume them	the coordination MCP — `createCoordinationTools` / `serveCoordinationMcp` over a live `Scope`; each worker's leaf is `createExecutor({ backend })` — `/mcp`,`/loops`	`detachedSessionDelegate` — own-sandbox-session only, one-shot, no live steer/recursion/conserved-budget
Stand up a vertical agent in the eval loop	`defineAgent(manifest)` + `createSurfaceImprovementAdapter` — `/agent`	a per-vertical manifest parser, surface-validator, or bespoke `ImprovementAdapter`
Turn intelligence/observation OFF (prove inference-only billing)	`withTangleIntelligence(agent, { effort: 'off' })` — `/intelligence`	a custom trace-wrapper or hand-rolled effort/tier config

For the full export inventory (every primitive, its import path, its summary — generated, never stale), see docs/api/primitive-catalog.md; for per-symbol signatures, the per-module docs/api/ pages. For the recursive atom (recursion · isolated-or-collaborative artifact · conserved budget · analysts) and the two-timescale architecture, see docs/architecture.md. For the genome→run→optimize→ship spine in depth, docs/concepts.md + docs/learning-flywheel.md. For the Intelligence SDK (Observe + the provable-OFF billing boundary), docs/intelligence-sdk.md.

2.1 Which front door do I use? — the four public verbs, file:line-anchored

§2 maps a fine-grained intent to a primitive; this is the coarse router one level up. Pick a front door by what you hand in. Each bottoms out at ONE function (anchored to source); the §2 rows above carry each one's "do NOT build" twin. The file:line here is accurate at this commit but is not the never-stale reference — the generated docs/api/ pages are; the freshness gate only asserts these files exist.

You hand in…	Front door	Bottoms out at	What it is
a string intent ("fix the failing auth test") — you don't care HOW	the `delegate` tool	`delegate(intent, opts)` — `src/runtime/supervise/delegate.ts:88` (MCP handler `createDelegateHandler`, `src/mcp/tools/delegate.ts:139`)	a default authoring supervisor decomposes the intent and writes the worker profile per sub-task; synchronous, returns the delivered output + `spentTotal`. The ONE delegation path.
an authored supervisor `AgentProfile` + a task	`supervise(profile, task, opts)`	`src/runtime/supervise/supervise.ts:102`	the one-call LLM-brain driver over the keystone `Supervisor`, scaffolding defaulted. START HERE when you wrote the driver.
a deterministic shot grammar over a stateful tool domain	`runAgentic(opts)`	`src/runtime/strategy.ts:1030`	runs a `Strategy` (depth/breadth/custom) through the `Supervisor` — programmatic, no LLM picking the shape.
a deterministic topology combinator (`loopUntil`/`fanout`/`verify`/`panel`/`pipeline`) over a persona	`runPersonified(options)`	`src/runtime/personify/persona.ts:131`	composes a persona + a `CombinatorShape` over the `Supervisor` — programmatic.

Rule of thumb: delegate = "I don't care how"; supervise = "I authored the driver"; runAgentic/runPersonified = "I want a deterministic topology, no LLM choosing the shape." All four run over the one Executor port on the conserved budget pool, so equal-compute holds by construction.

Two-agent patterns — compose a shape, don't hand-roll a turn loop:

Pattern	Use	Bottoms out at
researcher → engineer (gather, then build)	`defineStrategy(name, body)` — both agents in one body via `ctx.shot()` + `ctx.critique()`	`src/runtime/strategy.ts:789`
implement → verify (build, then a SEPARATE checker gates it — selector ≠ judge)	`verify(spec)` as the `shape`	`src/runtime/personify/combinators.ts:333`
N-judge panel (fan judges out, merge verdicts)	`panel(spec)` as the `shape`	`src/runtime/personify/combinators.ts:273`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`@tangle-network/agent-runtime` — Canonical API Reference

1. Mental model — the spine

1.5 The AgentProfile law — author the profile, the substrate materializes it (WE KEEP FORGETTING THIS)

2. Decision table — "I want to _ → use _ → NOT ___"

2.1 Which front door do I use? — the four public verbs, file:line-anchored

Uh oh!

FilesExpand file tree

canonical-api.md

Latest commit

History

canonical-api.md

File metadata and controls

@tangle-network/agent-runtime — Canonical API Reference

1. Mental model — the spine

1.5 The AgentProfile law — author the profile, the substrate materializes it (WE KEEP FORGETTING THIS)

2. Decision table — "I want to ___ → use ___ → NOT ___"

2.1 Which front door do I use? — the four public verbs, file:line-anchored

`@tangle-network/agent-runtime` — Canonical API Reference

2. Decision table — "I want to _ → use _ → NOT ___"