Skip to content

Latest commit

 

History

History
102 lines (82 loc) · 20.9 KB

File metadata and controls

102 lines (82 loc) · 20.9 KB

@tangle-network/agent-runtime — Canonical API Reference

Version 0.79.3. The export inventory + per-symbol signatures live in the generated docs/api/ reference: docs/api/primitive-catalog.md is the never-stale, grouped list of every primitive to reuse (own surface + the agent-eval judge / authenticity / verification / statistics / campaign / token-usage surfaces), with each one's import path and one-line summary read live from source; the per-module pages hold the full signatures. The pinned substrate is agent-eval >=0.97.0 <1.0.0; the sandbox substrate that materializes profiles into harness shapes is @tangle-network/sandbox (peer >=0.8.0 <1.0.0). The neutral contract types (AgentProfile, AgentProfileMcpServer, HarnessType, ReasoningEffort, Part/ToolPart/ToolState, plus environment-provider types) are owned by @tangle-network/agent-interface (peer >=0.14.0 <1.0.0) — the single source of truth. Substrate primitives are re-exported through @tangle-network/agent-eval/contract (or /campaign), not local to this package — the catalog's §2 shows exactly which subpath each lives under.

./loops is the runtime barrelpackage.json maps it to src/runtime/index.ts. Everything below labelled /loops is the recursive-atom + loop-kernel surface.

Read this before writing any orchestration, optimization, or measurement code in this repo. If you are about to write a persona⟷agent conversation runner, a "skill optimizer", a "profile-seam", a depth-vs-breadth A/B harness, a bootstrap loop, or a new Sandbox(...) + stream + read dance — stop, it already exists, and a parallel will silently break a load-bearing invariant (equal-k, selector≠judge, capture-integrity, or eval/prod parity).

1. Mental model — the spine

A genome (an AgentProfile: systemPrompt + skills + tools + mcp + knowledge + memory + rag — one combined surface) is run as a driver⟷worker conversation (runPersonified composing a combinator like loopUntil/fanout over the keystone Supervisor — K rounds spent against one persistent, journaled, resumable artifact on a conserved budget pool so equal-compute holds by construction) over a benchmark (the ADAPTERS registry driven by runGate over the Supervisor, or an AgenticSurface driven by runBenchmark/runAgentic), then optimized by a gated loop (selfImprove/runImprovementLoop + gepaProposer, certified by defaultProductionGate/heldOutGate/promotionGate, or the full multi-generation runStrategyEvolution) that evolves the genome and certifies wins on a frozen holdout — never on the training composite. The selector is never the judge; observation attaches to the loop via RuntimeHooks, never to the portable genome.

Two substrates implement the same recursive-atom over the one Executor port and share defaultSelectWinner — a deliberate pair, do not invent a third: the reactive Supervisor/Scope + personify combinators (the agent-driver; equal-k by construction via the conserved budget pool — prefer for NEW recursive/keystone work) and the round-synchronous runLoop kernel (the leaf; what most sandbox benches drive today). inlineSandboxClient adapts any non-box Executor into a SandboxClient for runLoop, and settledToIteration bridges reactive Settleds into the kernel's Iteration, so the two interoperate without forking selection or metering.

1.5 The AgentProfile law — author the profile, the substrate materializes it (WE KEEP FORGETTING THIS)

An agent IS its AgentProfile, and the profile is the WHOLE agent — not just a prompt. The surface is systemPrompt + skills + tools + mcp + subagents + hooks + permissions + memory/rag + model (the AgentProfile* family in @tangle-network/sandbox, constructed via defineAgentProfile). System prompt ≠ skills — skills are separate, invokable how-tos the agent reads when prompted to invoke them; never concatenate a skill body into the system prompt.

You change an agent's behavior by changing its PROFILE — never by writing orchestration code around it. The behaviors we keep hand-rolling are profile properties:

  • Self-verification is a profile lever, three ways, all configuration and zero glue code: (1) steered — the prompt says "run the tests, read failures, fix, repeat"; (2) process-defined — its instructions make verify-after-every-change its standing process; or (3) a post-finish hook that auto-runs the check and feeds failures back. The harness runs that loop. You do not write a per-round judge, a while(!done), or a bash hill-climb.
  • Iteration, delegation, audit-against-spec are likewise hooks / subagents / skills / process in the profile.

The sandbox substrate materializes a profile into the harness's real shapes — so author the GENERAL profile and NEVER code to a harness. @tangle-network/sandbox renders an AgentProfile into whatever the running harness needs (instructions file, tool/MCP config, mounted skills, hooks, subagents). opencode / Claude Code / Codex are interchangeable targets; opencode is only the local test substrate behind the cli-bridge. Do NOT write harness-specific config or a profile → opencode.json realizer. A lever that isn't materialized yet is a substrate gap to fill in @tangle-network/sandbox, not a bespoke realizer here.

Therefore the supervisor's only intelligence is AUTHORING full profiles — the optimizable self-improvement surface: read the task, decompose it, and for each sub-task author the complete profile (which prompt, skills, tools/MCP, hooks, subagents, model). The quality of a worker IS the quality of the profile authored for it. The harness executes; you compose.

2. Decision table — "I want to ___ → use ___ → NOT ___"

This table is judgment-only: it maps an intent to the ONE primitive to reach for and the thing NOT to build. It is not an inventory — for the full list of what exists (every export, its import path, its one-line summary) see the generated docs/api/primitive-catalog.md; for full signatures, the per-module docs/api/ pages. Each row tags its import subpath; a row is a LOCAL export of this package unless tagged with a substrate package (agent-eval/contract, agent-eval/campaign, @tangle-network/sandbox) or bench.

I want to… Use (import) Do NOT build
Just run a supervisor to a goal (one call, scaffolding defaulted) — START HERE supervise(profile, task, { budget, backend? })/loops hand-wiring createSupervisor().run + blobs/perWorker/journal/executors; reaching for the lower-level run-verbs below before you need a specific counterparty
Run a genome through a topology shape over the keystone Supervisor, end-to-end runPersonified({ persona, shape, task, budget })/loops a hand-rolled createSupervisor().run + seam-wiring helper
Loop a worker over one evolving artifact, K rounds, stop-when-good loopUntil(seed, spec) as the shape/loops a while(!done){runWorker();decide()} hand-loop or "multi-attempt refine driver"
Run a worker agent under test conversing with a simulated-user persona, K rounds, worker-only metered runPersonaConversation({ worker, persona, backendFor, systemPromptOf }) — root . (also /loops) a hand-rolled per-agent dispatchWithSurface bridge / eval-dispatch loop
Run two AgentProfiles head-to-head over a persistent transcript runConversation(...) — root . a hand-rolled two-agent turn loop
Drop a persona⟷agent conversation into an eval matrix as its dispatch runPersonaDispatchrunProfileMatrix({ dispatch }) — root . / agent-eval/campaign a per-agent custom dispatch bridge
Best-of-N / parallel-research / map-reduce at equal compute fanout(items, opts)/loops Promise.all over N calls + manual argmax/merge (bypasses the budget pool → breaks equal-k)
Produce-then-gate with a real checker verify(spec)/loops "generate, then self-check with the same model, ship if ok" (collapses selector+judge)
Multi-judge review / rubric quorum over one artifact panel(spec)/loops a judge ensemble that feeds one judge's score into another
Fixed sequential chain (plan→implement→…) pipeline(stages)/loops hand-chained awaits passing outputs along
Adaptive tree search / progressive widening widen(spec) + flatWidenGate()/loops a best-first/MCTS that reads child scores to expand (selector=judge); keep flatWidenGate() until your gate is proven
Define the genome record for a personified run definePersona(input)/loops a "profile-seam" / agent-config wrapper carrying model+prompt+tools+role
Make a worker self-verify / iterate / audit a hook / process / skill on its authored AgentProfile — §1.5 a per-round judge, a while(!done) loop, or a bash hill-climb (it's a profile lever)
Run an authored profile on a real harness author the AgentProfile, hand it to the sandbox substrate@tangle-network/sandbox (defineAgentProfile) a profile → opencode.json realizer or any harness-specific config writer
Have the supervisor design its workers author a full AgentProfile per sub-task (prompt+skills+tools+mcp+hooks+subagents) — /loops author a bare systemPrompt string (a worker can't act on levers it has no levers for)
Write a custom driver Agent and run it directly createSupervisor().run(root, task, opts)/loops a bespoke orchestrator that spawns sub-agents and tallies cost (equal-compute claim breaks there)
Run depth-vs-breadth (or a custom strategy) over a stateful tool domain runAgentic({ surface, task, mode|strategy, budget })/loops a hand-rolled Supervisor.run + journal/registry, or a depth/breadth loop
Author a new topology/strategy compactly defineStrategy(name, body) using ctx.shot()+ctx.critique()/loops a 70-line driver with scope.spawn/scope.next ceremony, or trusting a body-returned score
Compare strategies + get a significance report on a domain runBenchmark({ environment, tasks, worker, strategies })/loops your own strategy-comparison loop / paired-bootstrap / Pareto math
Add a stateful tool-using domain implement AgenticSurface (5 hooks: open/tools/call/score/close) — /loops a bespoke per-benchmark agent runner / tool-loop harness
Run a sandbox coding rollout, round-synchronous (fresh box per round) runLoop(options)/loops a new Sandbox()+acquire+stream+parse+delete loop, or a 2nd winner-selector
Run + resume ONE persistent box across turns openSandboxRun(client, opts, deliverable)/loops a per-domain new Sandbox+box.fs.read+delete copy
Pick / register a leaf backend, or bring your own agent createExecutor({ backend }) / createExecutorRegistry() / implement Executor/loops a per-vendor adapter or closed inline|sandbox|cli switch (won't report through the UsageEvent channel)
Evolve a prompt/string surface gepaProposer({ llm, model, target }) (default inside selfImprove; the skill-surface twin is skillOptProposer, same source) — agent-eval/campaign a hand-rolled prompt-mutation reflection loop with its own Pareto bookkeeping
Self-improve a profile (one pluggable verb) — START HERE improve(profile, findings, { surface, gate }) — root . (the RSI verb; defaults the generator from surface, wraps selfImprove) a bespoke optimize loop, or calling selfImprove/a skill-optimizer directly for the common case
Measure one profile artifact's marginal lift (with-vs-without, score+cost) / catalog artifacts measureMarginalLift(...) / ArtifactRegistry (applyArtifact is the one ArtifactKindAgentProfile-field bridge) — /lifecycle a hand-rolled with/without ablation loop, or a per-kind if kind==='skill'… profile-field switch
Run the whole artifact lifecycle — generate→measure→promote→store→compose, then drift-watch/dedupe the live set — over ANY profile surface (skill/prompt/tool/MCP) runLifecycle({ baseline, generators, evalRunner, gate }) then composeProfile(registry, base, query); maintain with driftWatch(...) / dedupeArtifacts(...)/lifecycle a per-surface improve loop, a hand-rolled promote→compose step, or re-running measureMarginalLift without the registry/gate spine. The ONLY per-surface code is a thin CandidateGenerator (skillGenerator distills, promptGenerator/buildableGenerator for the rest)
Run the self-improvement loop with full substrate control selfImprove({ agent, scenarios, judge, baselineSurface })agent-eval/contract a bespoke optimize loop or a parallel skill-optimizer
Run the gated loop with full control runImprovementLoop({ baselineSurface, dispatchWithSurface, driver, holdoutScenarios, gate })agent-eval/contract your own propose→campaign→rank→re-score-on-holdout→gate→PR loop
Decide ship/hold on a candidate (campaign context) defaultProductionGate({ holdoutScenarios, deltaThreshold }); compose with heldOutGate / composeGateagent-eval/contract a raw h1>h0 point comparison on the training set
Decide ship/hold from a BenchmarkReport (per-task cells) promotionGate({ report, incumbent, candidate })/loops comparing two strategies' mean scores directly; re-deriving the bootstrap
Run the full multi-generation strategy flywheel + certify runStrategyEvolution(config)/loops a bespoke gen0→author→gen1→holdout loop with hand-rolled champion selection
Add or run a benchmark from the CLI/harness ADAPTERS / resolveAdapter(key), run via bench/src/gate-cli.mts a per-script switch(bench) or a local benchmark-factory map
Wire a new benchmark implement BenchmarkAdapter (5 methods) + feed to runGatebench a bespoke per-benchmark run script with its own (self-authored) scoring
Measure a topology on a benchmark at equal compute runGate(cfg) (or runAgentic/runBenchmark) — equal-k holds via the conserved budget pool — bench//loops a batch-blind/batch-oracle/compare zoo, your own usage capture, or equal-k bookkeeping
Observe a run's full cost/time createWaterfallCollector()anytimeReport()/loops a per-step cost/token tally by inspecting events yourself (drifts from billed totals)
Attach N observers to a running loop composeRuntimeHooks(...) — root export a second event-bus or callback-prop zoo (there is ONE stream)
Ship traces to an OTLP collector createOtelExporter() + buildLoopOtelSpans() — root export your own OTLP serializer or pulling the OTEL SDK
Know what got mounted into a run / why a candidate won result.provenance.mounts / result.provenance.selectionReceipts (MountManifestEntry/SelectionReceipt/RunProvenance); declare mounts via the recordMount recorder in prepareBox — root export re-reading box contents to reconstruct what was mounted, or re-deriving which candidate the selector picked
State any benchmark/A-B claim pairedLift(...) (bench) over pairedBootstrap/heldoutSignificance (substrate) your own bootstrap loop/PRNG per gate; a point lift without low/high/pairs
Let an agent delegate ONE generic INTENT (no fixed coder/researcher type) and get the result + real spend SYNCHRONOUSLY the delegate toolcreateDelegateHandler via createMcpServer({ delegateSupervisor }); mount it over the agent-runtime mcp bin with MCP_ENABLE_DELEGATE=1 (the bin authors a supervisor over a sandbox backend) — /mcp a hardcoded coder/researcher profile, or task-specific delegate_code/delegate_research verbs (RETIRED) — delegate is the ONE delegation path and the only one with a cost channel
Run a coding task INSIDE the agent's OWN sandbox session (a sibling box, fresh branch, validated patch) detachedSessionDelegate({ sandboxClient | executor, workerProfile? })/mcp (pass the worker AgentProfile; omit for a minimal model-only default) a hardcoded coder profile baked into the delegate; delegate() (that spawns workers in a chosen backend, not the agent's own session)
Have a supervisor spawn + live-drive workers in a backend you choose and observe/steer/resume them the coordination MCPcreateCoordinationTools / serveCoordinationMcp over a live Scope; each worker's leaf is createExecutor({ backend })/mcp,/loops detachedSessionDelegate — own-sandbox-session only, one-shot, no live steer/recursion/conserved-budget
Stand up a vertical agent in the eval loop defineAgent(manifest) + createSurfaceImprovementAdapter/agent a per-vertical manifest parser, surface-validator, or bespoke ImprovementAdapter
Turn intelligence/observation OFF (prove inference-only billing) withTangleIntelligence(agent, { effort: 'off' })/intelligence a custom trace-wrapper or hand-rolled effort/tier config

For the full export inventory (every primitive, its import path, its summary — generated, never stale), see docs/api/primitive-catalog.md; for per-symbol signatures, the per-module docs/api/ pages. For the recursive atom (recursion · isolated-or-collaborative artifact · conserved budget · analysts) and the two-timescale architecture, see docs/architecture.md. For the genome→run→optimize→ship spine in depth, docs/concepts.md + docs/learning-flywheel.md. For the Intelligence SDK (Observe + the provable-OFF billing boundary), docs/intelligence-sdk.md.

2.1 Which front door do I use? — the four public verbs, file:line-anchored

§2 maps a fine-grained intent to a primitive; this is the coarse router one level up. Pick a front door by what you hand in. Each bottoms out at ONE function (anchored to source); the §2 rows above carry each one's "do NOT build" twin. The file:line here is accurate at this commit but is not the never-stale reference — the generated docs/api/ pages are; the freshness gate only asserts these files exist.

You hand in… Front door Bottoms out at What it is
a string intent ("fix the failing auth test") — you don't care HOW the delegate tool delegate(intent, opts)src/runtime/supervise/delegate.ts:88 (MCP handler createDelegateHandler, src/mcp/tools/delegate.ts:139) a default authoring supervisor decomposes the intent and writes the worker profile per sub-task; synchronous, returns the delivered output + spentTotal. The ONE delegation path.
an authored supervisor AgentProfile + a task supervise(profile, task, opts) src/runtime/supervise/supervise.ts:102 the one-call LLM-brain driver over the keystone Supervisor, scaffolding defaulted. START HERE when you wrote the driver.
a deterministic shot grammar over a stateful tool domain runAgentic(opts) src/runtime/strategy.ts:1030 runs a Strategy (depth/breadth/custom) through the Supervisor — programmatic, no LLM picking the shape.
a deterministic topology combinator (loopUntil/fanout/verify/panel/pipeline) over a persona runPersonified(options) src/runtime/personify/persona.ts:131 composes a persona + a CombinatorShape over the Supervisor — programmatic.

Rule of thumb: delegate = "I don't care how"; supervise = "I authored the driver"; runAgentic/runPersonified = "I want a deterministic topology, no LLM choosing the shape." All four run over the one Executor port on the conserved budget pool, so equal-compute holds by construction.

Two-agent patterns — compose a shape, don't hand-roll a turn loop:

Pattern Use Bottoms out at
researcher → engineer (gather, then build) defineStrategy(name, body) — both agents in one body via ctx.shot() + ctx.critique() src/runtime/strategy.ts:789
implement → verify (build, then a SEPARATE checker gates it — selector ≠ judge) verify(spec) as the shape src/runtime/personify/combinators.ts:333
N-judge panel (fan judges out, merge verdicts) panel(spec) as the shape src/runtime/personify/combinators.ts:273