docs(#1731): README — drop runtime-reflection limitation, reframe streaming as foundation-shipped

Skobeltsyn · claude · Skobeltsyn · commit 748ccc6298dd · 2026-05-15T20:26:38.000+03:00
The Known Limitations list and Phase 2 highlight blurb still said
"KSP is Phase 2" and "No streaming" — both contradict v0.4.6.

- Drop the `Runtime reflection for @Generable` bullet. KSP codegen is
  the default in v0.4.6 (#1700–#1705) and `kotlin-reflect` is genuinely
  `compileOnly` — pinned by `agents-kt-no-reflect-test`. Not a known
  limitation anymore.
- Replace `No streaming` with `No per-adapter native streaming yet`.
  The `LlmChunk` foundation + `ModelClient.chatStream` default impl
  landed (#1722); naive consumers see ordered chunks but no real-time
  partials until Anthropic / OpenAI / Ollama SSE overrides land.
  Links to docs/premortem-0.5.0-streaming.md.
- Phase 2 highlight: drop `KSP compile-time @Generable` (done); replace
  generic `Flow&lt;...&gt; streaming on every adapter` with the more accurate
  `per-adapter native streaming overrides on top of the v0.4.6
  LlmChunk foundation` framing.

UUID: FF947697-8BD5-4F6B-AD5F-AC61E2FE1747

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/README.md b/README.md
@@ -158,8 +158,7 @@ What the framework does **not** enforce — your responsibility:
 - **Synchronous agentic loop** — `runBlocking` inside the loop until the suspend refactor lands (#638). Calling agents from existing coroutine scopes works but doesn't propagate cancellation cleanly.
 - **No incoming auth on `McpServer`** — outgoing client supports Bearer; the server does not validate credentials. Suitable for trusted-network deployments only.
 - **No Origin header validation on MCP HTTP** — deferred until the MCP-server hardening pass.
-- **Runtime reflection for `@Generable`** — KSP compile-time generation is Phase 2. Today's path uses reflection at first-use; cost is amortized but not zero.
-- **No streaming** — `chat()` returns a complete `LlmResponse`; `Flow<...>` streaming is on the Phase 2 roadmap.
+- **No per-adapter native streaming yet** — `LlmChunk` sealed type + `ModelClient.chatStream(messages): Flow<LlmChunk>` default impl landed in v0.4.6 (#1722), so `chatStream` is callable on every `ModelClient`. The default wraps `chat()` and emits one `TextDelta` + `End` (or `ToolCallStarted` / `ArgumentsDelta` / `Finished` / `End` for tool turns), so non-streaming consumers see ordered chunks but no real-time partial output. Native streaming overrides (Anthropic SSE, OpenAI SSE, Ollama `stream: true`) are next on the Phase 2 list — see [docs/premortem-0.5.0-streaming.md](docs/premortem-0.5.0-streaming.md).
 - **No native binary** — JVM-only (≥ JDK 21). GraalVM and `jlink` bundles are Phase 2 priorities.
 - **No A2A protocol yet** — agent-to-agent over network (Phase 2 / 3).
 - **Inline-tool-call fallback model variance** — small Ollama models (e.g. `gemma3:4b`) reliably emit single tool calls via the inline format but may produce thin final-turn text after multi-step tool sequences. For multi-step reasoning, a tool-native model (`gpt-oss:20b-cloud` and similar) is the better fit.
@@ -222,7 +221,7 @@ Testing details — task names, integration test setup, mutation testing, how to
 
 **Phase 1 — Core DSL** *(in progress)*: typed agents, skills, knowledge, composition operators (`then`, `/`, `*`, `forum`, `.loop`, `.branch`), MCP client + server, agent memory, `loadResource(path)` for prompts from classpath, agentic loop with full budget controls (`maxTurns` / `maxToolCalls` / `maxDuration` / `perToolTimeout` / `maxTokens` / `maxConsecutiveSameTool`), observability hooks (`onSkillChosen`, `onToolUse`, `onKnowledgeUsed`, `onError`, `onBudgetThreshold`, `Agent.observe { }`).
 
-**Phase 2 — Runtime + Distribution** *(Q2 2026)*: remaining provider (Google), `Flow<...>` streaming on every adapter, KSP compile-time `@Generable`, native CLI / jlink, `Tool<IN, OUT>` hierarchy, `grants {}` permissions, session model, Flow-based observability, **multimodal input** (image + audio content blocks; vision-capable adapters for Anthropic/OpenAI/Ollama/Gemini), `agent.json` serialization, Gradle plugin. *(Anthropic and OpenAI adapters already landed in #1644 and #1656.)*
+**Phase 2 — Runtime + Distribution** *(Q2 2026)*: remaining provider (Google), per-adapter native streaming overrides (Anthropic / OpenAI / Ollama SSE → real partial chunks on top of the v0.4.6 `LlmChunk` foundation), provider-level constrained decoding / guided JSON mode wired to `@Generable` schemas, native CLI / jlink, `Tool<IN, OUT>` hierarchy, `grants {}` permissions, session model, Flow-based observability, **multimodal input** (image + audio content blocks; vision-capable adapters for Anthropic/OpenAI/Ollama/Gemini), `agent.json` serialization, Gradle plugin. *(Anthropic and OpenAI adapters already landed in #1644 and #1656; KSP `@Generable` codegen + streaming foundation shipped in v0.4.6.)*
 
 **Phase 3 — Production** *(Q3 2026)*: Layer 2 Structure DSL, all 37 compile-time validations, AgentUnit, A2A protocol, file-based knowledge with RAG, OpenTelemetry, **sandboxed tool execution** (`SandboxedExecutor` with `ProcessSandbox` (Seatbelt / bwrap), `WasmSandbox` (Chicory), `DockerSandbox` backends — opt-in per tool, subprocess-shaped tools only, default executor stays in-process), **generative outputs** (`ImageModelClient` for DALL-E / Imagen / Stability, `TTSModelClient` for OpenAI / ElevenLabs / Google).