feat(0.7.0): clean substrate — error taxonomy, canonical RuntimeRun, cost ledger, biome, strict indices#10
Merged
tangletools merged 1 commit intoMay 14, 2026
Conversation
…cost ledger, biome, strict indices
Take ownership of the production-run lifecycle so consumer agents (legal, tax,
gtm, creative, agent-builder) stop reinventing `agentRuns`-row plumbing. Split
the single-file runtime into eight focused modules, adopt agent-eval 0.24.0
error taxonomy, add a canonical `RuntimeRunHandle` (cost ledger + persistence
adapter), and bring DX in line with agent-eval (biome, CI, strict indices,
JSDoc stability tags).
Module split:
- src/types.ts — task/session/adapter/stream-event types
- src/errors.ts — re-export agent-eval typed errors + 3 runtime-only ones
- src/readiness.ts — `decideKnowledgeReadiness` + minimum-score validation
- src/sessions.ts — InMemoryRuntimeSessionStore + helpers
- src/sanitize.ts — runtime/stream event sanitization + telemetry collectors
- src/sse.ts — server-sent event encoding
- src/backends.ts — backend factories + stream normalization
- src/run.ts — runAgentTask + runAgentTaskStream
- src/runtime-run.ts — `startRuntimeRun` lifecycle + cost ledger (NEW)
- src/trace-bridge.ts — `createTraceBridge` mapping RuntimeStreamEvent -> TraceEvent (NEW)
Error taxonomy (replaces bespoke `throw new Error(...)` at user-facing sites):
- ValidationError, ConfigError, NotFoundError, CaptureIntegrityError,
JudgeError, ReplayError, VerificationError — re-exported from agent-eval
- SessionMismatchError — runtime-specific (wrong-backend resume)
- BackendTransportError — runtime-specific (HTTP/IPC non-success)
- RuntimeRunStateError — runtime-specific (lifecycle methods called out of order)
`RuntimeRunHandle` (canonical production-run lifecycle):
- startRuntimeRun({ workspaceId, sessionId, taskSpec, scenarioId, adapter })
- handle.observe(event) — llm_call events accumulate the cost ledger
- handle.cost() — { tokensIn, tokensOut, costUsd, wallMs, llmCalls }
- handle.complete({ status, resultSummary, cost?, error?, metadata? })
- handle.persist(metadata?) — writes `RuntimeRunRow` via adapter.upsert(row)
- handle.toRow(metadata?) — dry-run for tests + adapter rehearsal
`RuntimeStreamEvent` gains an `llm_call` variant so the cost ledger has a
canonical input. Existing backends ignore unrecognized event types; new
backends emit `llm_call` per model call.
`createTraceBridge({ runId, spanId? })` exports `toTraceEvent(event)` and
`drain(events)` — drop-in replacement for the hand-rolled adapter
consumers currently write.
DX bar matches agent-eval 0.24.0:
- biome.json mirrors agent-eval (lineWidth 100, single quotes, no semis)
- .github/workflows/ci.yml: lint + typecheck + test + build on every PR
- .github/workflows/publish.yml: tag-driven npm publish (idempotent)
- tsconfig adds `noUncheckedIndexedAccess: true` (clean, 0 new failures)
- All public exports carry `@stable` JSDoc tags
- README rewritten — what runtime IS in 2 sentences, then a feature table
Telemetry sanitization sprouts an `llm_call` case so tokens/cost flow through
the safe envelope. Test coverage rises from 18 -> 39 tests (new modules):
- 12 tests in tests/runtime-run.test.ts (state machine, ledger, persistence)
- 9 tests in tests/trace-bridge.test.ts (mapping, drain, error-kind routing)
Examples gain `examples/runtime-run/` — the canonical pattern for consumer
repos to copy into their chat / task routes.
Breaking changes:
- `src/index.ts` is now a re-export hub. Deep imports (`@tangle-network/agent-runtime/src/...`) were never part of the public API and are not preserved.
- `Cannot resume X session with Y backend` now throws `SessionMismatchError` (was plain Error). Consumers pattern-matching by message string keep working; consumers using `instanceof` upgrade with a type narrow.
- `chat backend returned <status>` now throws `BackendTransportError` carrying `status`. Same migration path.
Consumer follow-up PRs:
- legal-agent: delete `completeProductionAgentRun` + `persistRuntimeRun`; adopt `startRuntimeRun`. Delete the hand-rolled `mapRuntimeEventToStreamEvent` if the chat surface adopts the trace bridge.
- tax-agent, gtm-agent, creative-agent, agent-builder: same pattern.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Two parallel error-tracking paths surfaced in legal-agent's chat handler today:
completeProductionAgentRunfrom legal's bespokeeval-evidence.tsANDpersistRuntimeRunfromapi.chat.ts. Both write to the sameagentRunstable, both wrap the same try/catch, neither owns the cost ledger. That's the
rot we're killing. The runtime must own production-run lifecycle so the five
consumer agents (legal, tax, gtm, creative, agent-builder) stop reinventing
it per repo.
This PR is the clean substrate that makes the consumer-side deletions possible.
What changed
Canonical
RuntimeRunHandle(NEW)State machine (
running -> completed | failed | cancelled) enforced byRuntimeRunStateError.complete()is idempotent for the same status (soretry/cleanup paths don't double-fire). Persistence is pluggable via a single
upsert(row)method — same shape works for D1, postgres, KV.Error taxonomy (replaces bespoke
throw new Error)Re-exported from
@tangle-network/agent-eval0.24.0:ValidationError,ConfigError,NotFoundError,CaptureIntegrityError,JudgeError,ReplayError,VerificationError. All extendAgentEvalError.Runtime-specific subclasses (also extending
AgentEvalError):SessionMismatchError— resume requested against a different backend's kindBackendTransportError— backend HTTP / IPC returned non-success (carriesstatus)RuntimeRunStateError—RuntimeRunHandlelifecycle out of orderUser-facing throws migrated: 2 in src (
backends.ts,run.ts). Internalinvariants (none in this package) stay plain
Error.RuntimeStreamEvent-> agent-evalTraceEventbridge (NEW)Drop-in replacement for the hand-rolled adapter consumers currently write.
readiness_end (blocked)->policy_violation,backend_errorand failedtask_end/final->error, everything else ->log.text_delta/reasoning_deltaare dropped (they belong inside anLlmSpan.output, not asseparate trace events).
Cost ledger
RuntimeStreamEventgains anllm_callvariant:{ model, tokensIn, tokensOut, costUsd, latencyMs, finishReason }.handle.observe(event)only mutates state onllm_call, so consumers can pipethe whole stream through without filtering.
handle.cost()returns therunning total any time. Required for "cost per customer task" dashboards.
DX bar = agent-eval 0.24.0
biome.jsonmirrors agent-eval (lineWidth 100, single quotes, no semis, organize imports).github/workflows/ci.yml— lint + typecheck + test + build on every PR.github/workflows/publish.yml— tag-driven npm publish (idempotent on rerun)tsconfig.json—noUncheckedIndexedAccess: trueenabled; zero new typecheck failures (the codebase already used?.at(-1)and?.[0]defensively, so the strict flag was a free win)@stableJSDoc tagModule split
src/index.tsshrank from 1,388 lines to a re-export hub. New layout:src/types.tssrc/errors.tssrc/readiness.tsdecideKnowledgeReadinesssrc/sessions.tsInMemoryRuntimeSessionStore+ helperssrc/sanitize.tssrc/sse.tssrc/backends.tssrc/run.tsrunAgentTask+runAgentTaskStreamsrc/runtime-run.tsstartRuntimeRun+ cost ledger (NEW)src/trace-bridge.tscreateTraceBridge(NEW)Before / after API
throw new Error('Cannot resume ...')throw new SessionMismatchError(...)throw new Error('chat backend returned <status>')throw new BackendTransportError(kind, message, { status })completeProductionAgentRun(handle, outcome)per repohandle.complete({ status, resultSummary, error? })persistRuntimeRun({ db, ... })per repohandle.persist({ adapter })RuntimeStreamEvent -> TraceEventper repocreateTraceBridge({ runId }).toTraceEvent(event)handle.cost()returns{ tokensIn, tokensOut, costUsd, wallMs, llmCalls }Why
completeProductionAgentRunis now@deprecatedThe consumer-side pattern at
legal-agent/src/lib/.server/eval-evidence.ts:59duplicated three concerns:
running -> completed/failedwith noidempotency guarantee. Retry paths double-fire
logAudit.persistRuntimeRuninapi.chat.ts:515. Two writers, one table, no transaction.agent_eval.trace.*actions onlogAuditdirectly,coupling the run lifecycle to the audit shape.
startRuntimeRuncollapses (1) and (2) behind a single state-machine handlewith a pluggable
upsert(row)adapter. Consumers route audit fromhandle.complete()-> their existinglogAuditin the adapter or in a thinwrapper. The
@deprecatedmarker isn't on this package — it's the patternitself that's deprecated; the in-repo callers ditch it on their own version
bumps.
Test plan
pnpm run lintclean (biome)pnpm run typecheckclean (strict +noUncheckedIndexedAccess)pnpm run test— 39 tests pass (18 existing + 12runtime-run+ 9trace-bridge)pnpm run build— ESM + d.ts emit clean (dist/index.js 45.7 KB, dist/index.d.ts 34.4 KB)examples/runtime-run/runtime-run.tsruns end-to-end (toy backend, in-memory adapter)Consumer follow-up
Each consumer agent gets its own PR on its own version bump. Tracking:
eval-evidence.ts::completeProductionAgentRun+api.chat.ts::persistRuntimeRun; adoptstartRuntimeRun. Optional: replacemapRuntimeEventToStreamEventwith the trace bridge if the chat client moves to consume agent-evalTraceEventdirectly.completeProductionAgentRun/persistRuntimeRunequivalent first.