feat(0.7.0): clean substrate — error taxonomy, canonical RuntimeRun, cost ledger, biome, strict indices by tangletools · Pull Request #10 · tangle-network/agent-runtime

tangletools · 2026-05-14T17:25:45Z

Why

Two parallel error-tracking paths surfaced in legal-agent's chat handler today:
completeProductionAgentRun from legal's bespoke eval-evidence.ts AND
persistRuntimeRun from api.chat.ts. Both write to the same agentRuns
table, both wrap the same try/catch, neither owns the cost ledger. That's the
rot we're killing. The runtime must own production-run lifecycle so the five
consumer agents (legal, tax, gtm, creative, agent-builder) stop reinventing
it per repo.

This PR is the clean substrate that makes the consumer-side deletions possible.

What changed

Canonical `RuntimeRunHandle` (NEW)

const run = startRuntimeRun({
  workspaceId, sessionId, agentId, taskSpec, scenarioId,
  adapter: { upsert: (row) => db.insert(agentRuns).values(row) },
})

for await (const event of runAgentTaskStream({ task, backend, input })) {
  run.observe(event) // llm_call events update the cost ledger
  if (event.type === 'final') run.complete({ status: ..., resultSummary: ... })
}
await run.persist({ runtimeEvents: telemetry.events })
console.log(run.cost()) // { tokensIn, tokensOut, costUsd, wallMs, llmCalls }

State machine (running -> completed | failed | cancelled) enforced by
RuntimeRunStateError. complete() is idempotent for the same status (so
retry/cleanup paths don't double-fire). Persistence is pluggable via a single
upsert(row) method — same shape works for D1, postgres, KV.

Error taxonomy (replaces bespoke `throw new Error`)

Re-exported from @tangle-network/agent-eval 0.24.0:
ValidationError, ConfigError, NotFoundError, CaptureIntegrityError,
JudgeError, ReplayError, VerificationError. All extend AgentEvalError.

Runtime-specific subclasses (also extending AgentEvalError):

SessionMismatchError — resume requested against a different backend's kind
BackendTransportError — backend HTTP / IPC returned non-success (carries status)
RuntimeRunStateError — RuntimeRunHandle lifecycle out of order

User-facing throws migrated: 2 in src (backends.ts, run.ts). Internal
invariants (none in this package) stay plain Error.

`RuntimeStreamEvent` -> agent-eval `TraceEvent` bridge (NEW)

const bridge = createTraceBridge({ runId, spanId })
for await (const event of runAgentTaskStream(...)) {
  const trace = bridge.toTraceEvent(event)
  if (trace) await traceStore.appendEvent(trace)
}

Drop-in replacement for the hand-rolled adapter consumers currently write.
readiness_end (blocked) -> policy_violation, backend_error and failed
task_end / final -> error, everything else -> log. text_delta /
reasoning_delta are dropped (they belong inside an LlmSpan.output, not as
separate trace events).

Cost ledger

RuntimeStreamEvent gains an llm_call variant:
{ model, tokensIn, tokensOut, costUsd, latencyMs, finishReason }.
handle.observe(event) only mutates state on llm_call, so consumers can pipe
the whole stream through without filtering. handle.cost() returns the
running total any time. Required for "cost per customer task" dashboards.

DX bar = agent-eval 0.24.0

biome.json mirrors agent-eval (lineWidth 100, single quotes, no semis, organize imports)
.github/workflows/ci.yml — lint + typecheck + test + build on every PR
.github/workflows/publish.yml — tag-driven npm publish (idempotent on rerun)
tsconfig.json — noUncheckedIndexedAccess: true enabled; zero new typecheck failures (the codebase already used ?.at(-1) and ?.[0] defensively, so the strict flag was a free win)
Every public export carries a @stable JSDoc tag
README rewritten — what runtime IS in 2 sentences + a feature table, instead of 20 sentences of taxonomy

Module split

src/index.ts shrank from 1,388 lines to a re-export hub. New layout:

File	Owns
`src/types.ts`	task/session/adapter/stream-event types
`src/errors.ts`	re-export agent-eval typed errors + 3 runtime-only
`src/readiness.ts`	`decideKnowledgeReadiness`
`src/sessions.ts`	`InMemoryRuntimeSessionStore` + helpers
`src/sanitize.ts`	event sanitization + telemetry collectors
`src/sse.ts`	server-sent event encoding
`src/backends.ts`	backend factories + stream normalization
`src/run.ts`	`runAgentTask` + `runAgentTaskStream`
`src/runtime-run.ts`	`startRuntimeRun` + cost ledger (NEW)
`src/trace-bridge.ts`	`createTraceBridge` (NEW)

Before / after API

Before	After
`throw new Error('Cannot resume ...')`	`throw new SessionMismatchError(...)`
`throw new Error('chat backend returned <status>')`	`throw new BackendTransportError(kind, message, { status })`
Consumers wrote `completeProductionAgentRun(handle, outcome)` per repo	`handle.complete({ status, resultSummary, error? })`
Consumers wrote `persistRuntimeRun({ db, ... })` per repo	`handle.persist({ adapter })`
Consumers mapped `RuntimeStreamEvent -> TraceEvent` per repo	`createTraceBridge({ runId }).toTraceEvent(event)`
Cost accounting was per-event ad-hoc	`handle.cost()` returns `{ tokensIn, tokensOut, costUsd, wallMs, llmCalls }`

Why `completeProductionAgentRun` is now `@deprecated`

The consumer-side pattern at legal-agent/src/lib/.server/eval-evidence.ts:59
duplicated three concerns:

State machine — open-coded running -> completed/failed with no
idempotency guarantee. Retry paths double-fire logAudit.
Persistence — interleaves with persistRuntimeRun in
api.chat.ts:515. Two writers, one table, no transaction.
Audit — fires agent_eval.trace.* actions on logAudit directly,
coupling the run lifecycle to the audit shape.

startRuntimeRun collapses (1) and (2) behind a single state-machine handle
with a pluggable upsert(row) adapter. Consumers route audit from
handle.complete() -> their existing logAudit in the adapter or in a thin
wrapper. The @deprecated marker isn't on this package — it's the pattern
itself that's deprecated; the in-repo callers ditch it on their own version
bumps.

Test plan

pnpm run lint clean (biome)
pnpm run typecheck clean (strict + noUncheckedIndexedAccess)
pnpm run test — 39 tests pass (18 existing + 12 runtime-run + 9 trace-bridge)
pnpm run build — ESM + d.ts emit clean (dist/index.js 45.7 KB, dist/index.d.ts 34.4 KB)
examples/runtime-run/runtime-run.ts runs end-to-end (toy backend, in-memory adapter)
Pending CI run on this PR

Consumer follow-up

Each consumer agent gets its own PR on its own version bump. Tracking:

legal-agent — delete eval-evidence.ts::completeProductionAgentRun + api.chat.ts::persistRuntimeRun; adopt startRuntimeRun. Optional: replace mapRuntimeEventToStreamEvent with the trace bridge if the chat client moves to consume agent-eval TraceEvent directly.
tax-agent / gtm-agent / creative-agent / agent-builder — same pattern. Audit each repo for a completeProductionAgentRun / persistRuntimeRun equivalent first.

…cost ledger, biome, strict indices Take ownership of the production-run lifecycle so consumer agents (legal, tax, gtm, creative, agent-builder) stop reinventing `agentRuns`-row plumbing. Split the single-file runtime into eight focused modules, adopt agent-eval 0.24.0 error taxonomy, add a canonical `RuntimeRunHandle` (cost ledger + persistence adapter), and bring DX in line with agent-eval (biome, CI, strict indices, JSDoc stability tags). Module split: - src/types.ts — task/session/adapter/stream-event types - src/errors.ts — re-export agent-eval typed errors + 3 runtime-only ones - src/readiness.ts — `decideKnowledgeReadiness` + minimum-score validation - src/sessions.ts — InMemoryRuntimeSessionStore + helpers - src/sanitize.ts — runtime/stream event sanitization + telemetry collectors - src/sse.ts — server-sent event encoding - src/backends.ts — backend factories + stream normalization - src/run.ts — runAgentTask + runAgentTaskStream - src/runtime-run.ts — `startRuntimeRun` lifecycle + cost ledger (NEW) - src/trace-bridge.ts — `createTraceBridge` mapping RuntimeStreamEvent -> TraceEvent (NEW) Error taxonomy (replaces bespoke `throw new Error(...)` at user-facing sites): - ValidationError, ConfigError, NotFoundError, CaptureIntegrityError, JudgeError, ReplayError, VerificationError — re-exported from agent-eval - SessionMismatchError — runtime-specific (wrong-backend resume) - BackendTransportError — runtime-specific (HTTP/IPC non-success) - RuntimeRunStateError — runtime-specific (lifecycle methods called out of order) `RuntimeRunHandle` (canonical production-run lifecycle): - startRuntimeRun({ workspaceId, sessionId, taskSpec, scenarioId, adapter }) - handle.observe(event) — llm_call events accumulate the cost ledger - handle.cost() — { tokensIn, tokensOut, costUsd, wallMs, llmCalls } - handle.complete({ status, resultSummary, cost?, error?, metadata? }) - handle.persist(metadata?) — writes `RuntimeRunRow` via adapter.upsert(row) - handle.toRow(metadata?) — dry-run for tests + adapter rehearsal `RuntimeStreamEvent` gains an `llm_call` variant so the cost ledger has a canonical input. Existing backends ignore unrecognized event types; new backends emit `llm_call` per model call. `createTraceBridge({ runId, spanId? })` exports `toTraceEvent(event)` and `drain(events)` — drop-in replacement for the hand-rolled adapter consumers currently write. DX bar matches agent-eval 0.24.0: - biome.json mirrors agent-eval (lineWidth 100, single quotes, no semis) - .github/workflows/ci.yml: lint + typecheck + test + build on every PR - .github/workflows/publish.yml: tag-driven npm publish (idempotent) - tsconfig adds `noUncheckedIndexedAccess: true` (clean, 0 new failures) - All public exports carry `@stable` JSDoc tags - README rewritten — what runtime IS in 2 sentences, then a feature table Telemetry sanitization sprouts an `llm_call` case so tokens/cost flow through the safe envelope. Test coverage rises from 18 -> 39 tests (new modules): - 12 tests in tests/runtime-run.test.ts (state machine, ledger, persistence) - 9 tests in tests/trace-bridge.test.ts (mapping, drain, error-kind routing) Examples gain `examples/runtime-run/` — the canonical pattern for consumer repos to copy into their chat / task routes. Breaking changes: - `src/index.ts` is now a re-export hub. Deep imports (`@tangle-network/agent-runtime/src/...`) were never part of the public API and are not preserved. - `Cannot resume X session with Y backend` now throws `SessionMismatchError` (was plain Error). Consumers pattern-matching by message string keep working; consumers using `instanceof` upgrade with a type narrow. - `chat backend returned <status>` now throws `BackendTransportError` carrying `status`. Same migration path. Consumer follow-up PRs: - legal-agent: delete `completeProductionAgentRun` + `persistRuntimeRun`; adopt `startRuntimeRun`. Delete the hand-rolled `mapRuntimeEventToStreamEvent` if the chat surface adopts the trace bridge. - tax-agent, gtm-agent, creative-agent, agent-builder: same pattern.

tangletools merged commit 6e53aac into main May 14, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(0.7.0): clean substrate — error taxonomy, canonical RuntimeRun, cost ledger, biome, strict indices#10

feat(0.7.0): clean substrate — error taxonomy, canonical RuntimeRun, cost ledger, biome, strict indices#10
tangletools merged 1 commit into
mainfrom
feat/0.7.0-error-taxonomy-canonical-runtime-run

tangletools commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tangletools commented May 14, 2026

Why

What changed

Canonical RuntimeRunHandle (NEW)

Error taxonomy (replaces bespoke throw new Error)

RuntimeStreamEvent -> agent-eval TraceEvent bridge (NEW)

Cost ledger

DX bar = agent-eval 0.24.0

Module split

Before / after API

Why completeProductionAgentRun is now @deprecated

Test plan

Consumer follow-up

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Canonical `RuntimeRunHandle` (NEW)

Error taxonomy (replaces bespoke `throw new Error`)

`RuntimeStreamEvent` -> agent-eval `TraceEvent` bridge (NEW)

Why `completeProductionAgentRun` is now `@deprecated`