Skip to content

feat(opencode): task signals — structured returns, terse completion, sparse context, wake-on-message#35400

Draft
iceteaSA wants to merge 53 commits into
anomalyco:devfrom
iceteaSA:task-signals
Draft

feat(opencode): task signals — structured returns, terse completion, sparse context, wake-on-message#35400
iceteaSA wants to merge 53 commits into
anomalyco:devfrom
iceteaSA:task-signals

Conversation

@iceteaSA

@iceteaSA iceteaSA commented Jul 5, 2026

Copy link
Copy Markdown

Issue for this PR

Related to #19215 (agent-team coordination) — the wake-on-message piece. The other three features are new task-tool capabilities without a tracking issue.

Stacked PR. This sits on top of #32693 (session-to-session), which sits on #32517#32425#32192. Against dev the diff shows 52 commits / 122 files, but only the top 13 commits (36 files, +1798/-49) are new here — everything below is the #32693 stack. Review against #32693, or wait for the base stack to land. Of the four features, only wake-on-message depends on the base stack (it uses the Messaging service from #32192/#32517); task_return, terse completion, and sparse context are self-contained and would apply on dev directly.

Type of change

  • New feature

What does this PR do?

Four controls over what flows in and out of a Task dispatch. All default to today's behavior.

  • task_return tool — structured child results. A subagent calls task_return({ result: {...} }) to set a free-form JSON object (≤4KB) on its own session row; it lands in the completion frame and on a new optional result field of the task.completed event. This is the output symmetry to the metadata input param — reviewers/coordinators can read a machine result instead of regex-parsing a "VERDICT:" line out of prose. Oversize is a model-visible tool error, not a silent truncation.
  • Terse completion frames — completion: "full" | "terse". A background subagent's full <task_result> body is injected verbatim into the parent, which rewrites the parent's cache tail every completion — on orchestration-heavy runs that's dozens of 2–6K-token cache busts. Terse mode replaces the frame body with a digest: the structured task_return result, the last 500 chars of the final message (verdict lines live at the tail), and a pointer to the child session for the full output. The parent can always resume the child to read the rest.
  • Sparse dispatch context — context: "full" | "sparse". A general-style dispatch pays ~35K first-turn input tokens; ~12–14K of it is instructional payload a scoped child never uses. Sparse keeps the project AGENTS.md chain, the agent prompt, tool schemas, and env; it drops global instruction files, the skills block, and the MCP-instructions block — and genuinely skips assembling them, not just discards the output. Plugin system-prompt transforms still run (they're post-assembly), so plugins keep their say.
  • Wake-on-message — wake_on_message: true. Today a sibling/coordinator message to an idle child sits undelivered until something else wakes it. With this flag, an idle child is woken to drain its inbox when a message lands. Opt-in per dispatch (an always-on wake reintroduces interrupt storms), budgeted at 5 wakes per run (refreshed when the parent resumes the child), and never wakes a child whose ancestor session chain is gone.

Terse and sparse both read from a new task config family (task.completion / task.context) with precedence global config → project config → agent frontmatter → dispatch param, so an orchestration-heavy setup sets the default once and overrides per dispatch.

How did you verify your code works?

Executed as six reviewed tasks (implement → independent cross-family review → fix), red-first throughout.

  • Full suite green: bun test across test/tool/ test/session/ test/messaging/ test/s2s/ — 787 pass / 0 fail; typecheck clean on opencode, core, schema, tui. Two migrations (result, context_mode), each column mapped through all six parallel session-row sites (V1 fromRow/toRow, projector, V1 SessionInfo, V2 fromRow, V2 Info) — a missed site is silent data loss, which is exactly what the T1 review caught on the V2 read path before it shipped.
  • Wake-on-message was the risk, and review earned its keep. The first implementation forked the wake via a bare Effect.runFork, which runs against the default runtime and does not carry the per-request InstanceRef the Messaging/SessionStatus services resolve through — so prompt.loop would die on a missing-InstanceRef defect in production while every spy-handler unit test passed green. A reviewer proved it with a live probe (child-ref= MISSING) and rejected it. The fix captures the instance context with attach(...).pipe(Effect.forkIn(scope)) (the same pattern the s2s poller uses) and adds an integration test that stands up the real SessionPrompt layer and drives a real prompt.loop drain — it goes red if either the registration or the context-carrying fork is reverted.
  • Mutation-checked seams: the terse-frame branch, the sparse skills/mcp-drop branch, and the wake idle-predicate each fail their test when reverted.

Honest limit, same as #34947's fallback caveat: the token-economy wins (terse cache-bust reduction, sparse first-turn token drop) and the end-to-end wake round-trip are covered by unit/integration tests and the instance-context probe, but the before/after token measurements are pending a live build — the tool-test harness can't measure real cache-tail rewrites.

Screenshots / recordings

Not a UI change.

Checklist

iceteaSA added 30 commits July 2, 2026 08:32
Experimental capability for a parent agent or human operator to steer,
gracefully cancel, or hard-abort a specific running Task subagent mid-run,
without affecting the parent or sibling subagents.

Core:
- Interrupt service (session/interrupt.ts): process-local registry holding
  one pending interrupt per child plus a terminal record; steer/cancel frame
  renderers and a visible-marker renderer, both with origin attribution
  (user vs parent); reason length-capped and XML-escaped at every sink
  (frames AND the visible marker).
- The child consumes pending interrupts at the runLoop turn boundary: steer
  injects a <steer> frame and a visible "Steered by ..." marker and continues;
  cancel injects <cancel> + a visible marker, records a terminal, and
  force-breaks within a grace window. abortChild writes a visible "Aborted
  by ..." marker (model/agent derived from the child's latest user message),
  records a terminal, and cancels the BackgroundJob.

Agent tools (gated by permission.interrupt):
- task_steer / task_cancel / task_abort (origin=parent).

Human paths:
- POST /session/:id/interrupt (intent steer|cancel|abort, origin=user),
  restricted to subagent sessions, gated by the experimental flag, and
  rejecting non-running children.
- TUI: esc on a subagent opens a Steer/Cancel/Abort menu, then a reason
  prompt; markers render as "... by user". Bound at the session route via a
  uniquely-named gather bucket (the keymap gather() caches by name). Visible
  interrupt markers render as a distinct "Interrupt" line (tagged via
  part.metadata.interrupt), not as user prose.

Whole feature gated by OPENCODE_EXPERIMENTAL_SUBAGENT_INTERRUPT (off by
default): agent tools, HTTP endpoint, and TUI affordance.

Limitations: agent-driven steer/cancel applies to background children only
(a foreground child blocks the parent turn); cancel is boundary-soft (use
task_abort / Abort for a child stuck in a long tool call).
The sender-echo markers duplicated information already shown by the
message tool call itself (✉ Sent to parent / ✉ Replied to subagent sat
right under the visible tool call), and the subagent's "Reply from
parent" marker was written twice — once by the parent's reply branch and
again by the subagent's own send path.

Keep only the incoming markers: the parent sees "✉ Message from
subagent", the recipient subagent sees "✉ Reply from parent", each once.
Drop the now-unused marker direction field.
experimentalS2S runtime flag; s2s_inbox/s2s_token/s2s_allow tables (hand-written migration + Drizzle s2s.sql.ts mirror so fresh-DB CREATE and upgrade paths agree); session_slug_unique migration neutralized to DROP INDEX (slugs are not unique). Store is one statement per method: atomic single-winner claims via UPDATE…RETURNING with drained_at IS NULL / accepted_by IS NULL guards, TTL enforced in the claimToken WHERE clause, and deleteInbox so a delivered row is hard-deleted (distinct from a merely-claimed crashed row). S2SCapsule v1 envelope with forward/back-compat serde and optional sender_name. UUIDv7 generator.
…up wiring

Per-instance wake poller (C′) lazily forked from SessionPrompt.loop via attach so it captures the live fiber's InstanceRef; runLoop turn-boundary drain (D) of s2s_inbox in-context; 60s reaper that reopens ONLY crashed claims (delivered rows are deleted). LayerNode.group exposes only DIRECT children, so S2SStore/Messaging/SessionStatus are spliced as direct members of every prompt-serving group (app httpapi + control-plane workspace) — this is what made cross-process delivery actually work. marker.ts: shared Marker.render + escapeAttr (escapes " ' for untrusted attribute values so a peer cannot break out of the <external-context> name=/session= attributes); escape() for element content/visible markers. Slug-decoupled: Messaging.enqueue lazily inits the inbox queue and registerSlug is dropped from the loop, so s2s rides session_id only and the slug registry stays coordinator-messaging-owned. s2s frames carry the sender session name + addressable session_id.
s2s tool (invite/accept/msg/leave/relay) gated behind experimentalS2S: single-use 10-min invite tokens, durable bidirectional s2s_allow consent, peers addressed by globally-unique session_id (accept reports the inviter's id). Same-process sends hit the in-process inbox; cross-process persist to s2s_inbox for the recipient's poller. Outbound 50/hr is a SOFT per-process throttle (documented as such in code + s2s.txt); the durable cross-process bound is the recipient INBOX_CAP (exact now that delivered rows are deleted). Registry wires S2SStore into ToolRegistry; message tool gains peer-slug send (message_allow). TUI renders the ✉ inbox marker (session name + id) and the session-list surface.
…eted events for comms dashboard

(cherry picked from commit 939ffdd61748fed5ae41429be9d0b80e9ea3992a)
- interrupt.ts: remove defaultLayer (deleted at dev), fix node to object form
- interrupt.test.ts: migrate from EventV2Bridge.defaultLayer to LayerNode.compile
- task-interrupt.test.ts: migrate from Layer.mergeAll(defaultLayer) to LayerNode group/compile
- task.test.ts: add Interrupt.node to test group (registry gained the dep)
iceteaSA added 23 commits July 2, 2026 08:59
- Rewrite LayerNode.make positional→object form in poller/store
- Add defaultLayer re-exports (raw layer) to 37 modules
- Export layer variable for modules used via .layer access
- Rewrite coordinator-messaging + s2s tests to LayerNode.group pattern
- Switch tests to testEffectShared for Database memoMap sharing
- Add NodePath to runLoopInfra for CrossSpawnSpawner deps
- Create task-event.ts schema + manifest registration
- Regenerate SDK types (task.completed, messaging.peer_sent, s2s.delivered)
- Fix topology-repro.test.ts positional LayerNode.make + buildLayer→compile
…ator suites

SessionProjector.node added to the s2s/coordinator runLoop harnesses (session
readback needs projection at current dev), the reaper harness gets an explicit
EventV2 layer, the poller runLoop shares the file-level :memory: database via
node replacement, and the fork-fiber-sensitive suites (poller, coordinator
runLoop) build per-test layers (testEffect) instead of a shared memoMap build
so forked poller/drain fibers see the same instances as the test assertions.
Drops the defaultLayer = layer re-exports from 28 modules nothing consumes;
the 8 that remain (Agent, Config, Session, Truncate, Messaging, S2SStore,
EventV2Bridge, CrossSpawnSpawner) back the three s2s test harnesses that still
compose with Layer.mergeAll. Follow-up: convert those harnesses to LayerNode
and delete the bridge entirely.
Add optional slug, agent, model, variant, elapsedMs, tokens, and cost
fields to the task.completed event. A shared completedPayload helper
reads session metadata and sums tokens/cost from child assistant
messages, used at all 9 publish sites instead of duplicated logic.
Wrap completedPayload assembly in Effect.exit so any defect
during session/message reads falls back to the base payload
instead of killing the publish path. Use optional chaining and
nullish defaults for token/cost field access to tolerate missing
or malformed assistant rows. Rewrite test to observe the actual
published event via Deferred + listen instead of relying on
message persistence alone.
Three blocking bugs in wake-on-message (97ae2bd), all in the
instance-context/fork wiring:

1. registerWakeHandler was called at layer-build time without yield*,
   so the Effect was constructed and discarded. Moved the (now yield*ed)
   registration into loop()'s body, which always runs inside a fiber
   that has InstanceRef from the caller (HTTP request / CLI run) —
   mirroring how the C-prime wake-poller is wired in the same function.
   Re-registering on every loop() call is idempotent.

2. Messaging.wakeIfIdle forked the handler via bare Effect.runFork,
   which starts a fresh top-level runtime with an empty context and
   drops InstanceRef, so the forked prompt.loop died the instant it
   touched any InstanceState-scoped service. Replaced both call sites
   with attach(handler(target)).pipe(Effect.forkIn(scope)) — attach
   re-provides the caller fiber's InstanceRef/WorkspaceRef before the
   fork runs, and scope is a new layer-lifetime Scope so the fiber
   doesn't leak past Messaging's lifetime.

3. wake.test.ts only exercised spy handlers against a minimal
   Messaging-only layer — it never ran the real registration line or a
   real prompt.loop, so 786 tests passed over an entirely dead
   feature. Added wake-real-path.test.ts, which builds the full
   run-loop layer, drives a real prompt.loop, sets a real wake policy,
   and enqueues through the real Messaging.enqueue; it asserts the
   drain's observable side effect (the ✉ inbox marker) appears without
   a second explicit loop() call. Verified red/green against both bugs
   individually before landing the fix.
Pre-existing base drift, not introduced by task-signals: the hardcoded
Latest.size assertion was already failing at base 11e2b8a (Received 97,
Expected 88) — the fork's accumulated messaging/interrupt/s2s/task events
grew the manifest without updating this constant. Empirically verified
identical size (97) at base and HEAD. Fixed here so the integration
branches go green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant