test(#1738): step-2 closeout — cancellation test + honest docs update

Skobeltsyn · claude · Skobeltsyn · commit e8b74371bd3f · 2026-05-16T00:03:44.000+03:00
AgentSessionCancellationTest.kt — proves that cancelling a session's
events collector terminates promptly (under 500ms) even while an
implementedBy skill is still in a 2-second Thread.sleep. Step-2
honesty noted in test docstring: the invocation itself can't be
cancelled mid-body because implementedBy lambdas have no suspension
points. Step 3's executeAgentic rewire closes that gap by making the
HTTP path a chatStream suspend function — cancellation will then
propagate through the suspension.

README Limitations — gains a session-surface entry that documents
what ships (bracket events) vs what's deferred to step 3 (Token /
ToolCall* emission, agentic-loop rewire) vs the partial cancellation
contract. The chatStream entry stays; this is a sibling.

docs/roadmap.md Phase 2 Secondary — adds [x] for the streaming
session surface (#1736) with #1737/#1738 integration coverage
called out, plus a new [ ] line for the agentic-loop rewire so the
v0.5.0 step-3 work has a roadmap home.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/README.md b/README.md
@@ -160,6 +160,7 @@ What the framework does **not** enforce — your responsibility:
 - **No incoming auth on `McpServer`** — outgoing client supports Bearer; the server does not validate credentials. Suitable for trusted-network deployments only.
 - **No Origin header validation on MCP HTTP** — deferred until the MCP-server hardening pass.
 - **No per-adapter native streaming yet** — `LlmChunk` sealed type + `ModelClient.chatStream(messages): Flow<LlmChunk>` default impl landed in v0.4.6 (#1722), so `chatStream` is callable on every `ModelClient`. The default wraps `chat()` and emits one `TextDelta` + `End` (or `ToolCallStarted` / `ArgumentsDelta` / `Finished` / `End` for tool turns), so non-streaming consumers see ordered chunks but no real-time partial output. Native streaming overrides (Anthropic SSE, OpenAI SSE, Ollama `stream: true`) are next on the Phase 2 list — see [docs/premortem-0.5.0-streaming.md](docs/premortem-0.5.0-streaming.md).
+- **Streaming session surface ships now; token-level streaming inside the agentic loop is in progress.** `agent.session(input): AgentSession<OUT>` exposes `events: Flow<AgentEvent<OUT>>` with `SkillStarted` / `SkillCompleted` / `Completed<OUT>` / `Failed` bracket events (#1736). The `Token` / `ToolCallStarted` / `ToolCallArgumentsDelta` / `ToolCallFinished` event types are defined but not yet emitted — `executeAgentic` still calls `chat()` synchronously. Step 3 rewires the agentic loop onto a `FlowCollector<AgentEvent>` so mid-loop events fire as the LLM streams. Cancellation is partial today (Flow collection cancels promptly; `implementedBy` skill bodies have no suspend points and may run to completion in the background — step 3 closes that gap via the native streaming HTTP path).
 - **No native binary** — JVM-only (≥ JDK 21). GraalVM and `jlink` bundles are Phase 2 priorities.
 - **No A2A protocol yet** — agent-to-agent over network (Phase 2 / 3).
 - **Inline-tool-call fallback model variance** — small Ollama models (e.g. `gemma3:4b`) reliably emit single tool calls via the inline format but may produce thin final-turn text after multi-step tool sequences. For multi-step reasoning, a tool-native model (`gpt-oss:20b-cloud` and similar) is the better fit.
diff --git a/docs/roadmap.md b/docs/roadmap.md
@@ -49,6 +49,8 @@
 - [x] Agent memory — `MemoryBank`, `memory_read`/`memory_write`/`memory_search` auto-injected tools
 - [ ] `.spawn {}` — independent sub-agent lifecycle, `AgentHandle<OUT>`, parent-managed join
 - [x] Streaming foundation — `LlmChunk` sealed type (`TextDelta` / `ToolCallStarted` / `ToolCallArgumentsDelta` / `ToolCallFinished` / `End`) + `ModelClient.chatStream(messages): Flow<LlmChunk>` with a default impl that wraps `chat()` so non-streaming providers keep working unchanged. Provider-native streaming (Anthropic SSE, OpenAI SSE, Ollama `stream: true`) overrides land per-adapter. `LlmChunk` stays narrow — no agentic concepts like `skillName` / `agentId` (#1722)
+- [x] Streaming session surface — `AgentEvent` sealed hierarchy (`Token` / `ToolCallStarted` / `ToolCallArgumentsDelta` / `ToolCallFinished` / `SkillStarted` / `SkillCompleted` / `Completed<OUT>` / `Failed`, every event carrying `agentId`), `AgentSession<OUT>` (cold `events: Flow<AgentEvent<OUT>>` + `suspend fun await(): OUT`), and free function `Agent<IN, OUT>.session(input): AgentSession<OUT>` (#1736). Existing `Agent.invokeSuspend` delegates to a new internal `invokeSuspendForSession` with a no-op skill listener — backward-compat byte-for-byte. Today emits only bracket events (`SkillStarted` / `SkillCompleted` / `Completed` / `Failed`) — the `Token` / `ToolCall*` subtypes are defined and ready for consumers but not yet emitted (next entry). Integration coverage: failure-path identity-preserved `cause`, concurrent sessions, agentic-stub bracketing, live-LLM π-to-20-decimals against Ollama (#1737), and prompt-cancellation of the events collector (#1738).
+- [ ] Agentic-loop rewire onto `FlowCollector<AgentEvent>` — `Token` and `ToolCall*` events fire mid-loop; cancellation propagates into `chatStream` HTTP suspension; `tokensUsed` gets threaded through `SkillCompleted` / `Completed`. Step 3 of the v0.5.0 plan.
 - [ ] Per-adapter native streaming overrides — Anthropic SSE, OpenAI SSE, Ollama `stream: true` — emit real partial chunks instead of the default `chat()`-wrap. See [v0.5.0 streaming premortem](premortem-0.5.0-streaming.md)
 - [ ] `Flow<PipelineEvent>` for reactive UIs + Pipeline-level events (`StageStarted`, `PipelineCompleted`, etc) — built on top of `LlmChunk`; depends on sub-agents and sessions
 - [ ] **Multimodal input** — vision and audio content blocks on LLM messages.
diff --git a/src/test/kotlin/agents_engine/runtime/events/AgentSessionCancellationTest.kt b/src/test/kotlin/agents_engine/runtime/events/AgentSessionCancellationTest.kt
@@ -0,0 +1,77 @@
+package agents_engine.runtime.events
+
+import agents_engine.core.agent
+import kotlinx.coroutines.CompletableDeferred
+import kotlinx.coroutines.cancelAndJoin
+import kotlinx.coroutines.flow.collect
+import kotlinx.coroutines.launch
+import kotlinx.coroutines.runBlocking
+import kotlinx.coroutines.withTimeout
+import kotlin.test.Test
+import kotlin.test.assertTrue
+
+// #1738 — step-2 closeout: prove that cancelling a session's events
+// collector terminates the collect promptly, even when the underlying
+// `implementedBy` skill is mid-execution.
+//
+// **Step-2 honesty note.** Coroutine cancellation can only fire at
+// suspension points. An `implementedBy` lambda is `(IN) -> OUT` — pure
+// synchronous code with no suspension points — so the *invocation*
+// itself may run to completion in the background after a cancel. What
+// we CAN verify in step 2:
+//   - The collector job's `cancelAndJoin()` returns promptly (well under
+//     the skill's synthetic sleep duration).
+//   - Subsequent `collect`s on the cancelled session don't deliver
+//     additional events to the cancelled job.
+//
+// What step 3 will add (and what this test will be extended to cover):
+// once `executeAgentic` is rewired onto a `FlowCollector<AgentEvent>`,
+// the HTTP call inside the loop becomes a `chatStream(...)` suspend
+// function — cancellation propagates through that suspend boundary and
+// the actual invocation stops. Today, only the *Flow surface* respects
+// cancellation; the agentic loop's body doesn't yet.
+
+class AgentSessionCancellationTest {
+
+    @Test
+    fun `cancelling the events collect terminates promptly even while a slow skill is mid-execution`() = runBlocking {
+        val skillEntered = CompletableDeferred<Unit>()
+
+        // 2-second synthetic delay — large enough that "cancel returns
+        // before the skill finishes" is unambiguously measurable.
+        val slowAgent = agent<String, String>("slow") {
+            skills {
+                skill<String, String>("work", "Synthetic slow work to exercise cancellation") {
+                    implementedBy {
+                        skillEntered.complete(Unit)
+                        Thread.sleep(2000)
+                        "done"
+                    }
+                }
+            }
+        }
+
+        val session = slowAgent.session("input")
+        val collectJob = launch {
+            session.events.collect { /* receive but don't act */ }
+        }
+
+        // Wait for the skill to enter — this proves the invocation
+        // actually started before we cancel.
+        withTimeout(1000) { skillEntered.await() }
+
+        val cancelStartNs = System.nanoTime()
+        collectJob.cancelAndJoin()
+        val cancelMs = (System.nanoTime() - cancelStartNs) / 1_000_000
+
+        // The skill's Thread.sleep continues to run in the background
+        // (step-2 gap, documented above). What we assert: the cancel
+        // returned promptly — under half a second is generous slack
+        // for CI noise; the skill's sleep is 2 seconds. If cancel were
+        // waiting on the skill, this would never hold.
+        assertTrue(
+            cancelMs < 500,
+            "cancel should return well under the skill's 2-second sleep; took ${cancelMs}ms",
+        )
+    }
+}