0.5.0 Agents with boundaries — shipped
0.6.0 Boundaries you can audit — current focus (epic [#1911](../../issues/1911))
0.7.0 Boundaries you can enforce externally
0.6.0 hero feature: the permission manifest / capability graph (#1912) — a deterministic YAML/JSON artifact showing every agent / skill / tool / memory access / MCP endpoint / provider / budget / policy boundary in a system. Build-time evidence for security review; the manifest hash (#1913) propagates into every runtime audit event so dynamic behaviour ties back to the signed-off capability graph.
The 0.6.0 epic (#1911) tracks the full acceptance criteria. The phase layout below remains time-based; the release-arc tags below each item show which release that item targets.
Phase 1 — Core DSL (in progress)
-
Agent<IN, OUT>with SRP enforcement -
Agent.prompt— base context string for the LLM - Skills-only execution — all agents run through
skills { implementedBy { } } -
Skill.description— sells the skill to the LLM alongside its type signature -
Skill.knowledge("key", "description") { }— named lazy context providers;loadFile()inside lambdas -
Skill.toLlmDescription()— auto-generated markdown (name, types, description, knowledge index);llmDescription("...")override -
Skill.toLlmContext()— full context: description markdown + all knowledge content -
Skill.knowledgeTools()/KnowledgeTool(name, description, call)— tools model with lazy per-entry loading -
then— sequential pipeline with composed execution (no runtime casts) -
/— parallel fan-out with coroutine concurrency -
*— forum shorthand with concurrent participants, last-agent captain, andonMentionEmitted -
forum { participant(...); captain(...); allowForumReturn(...) }— explicit forum roles and finalization permissions - Single-placement enforcement across all structure types
-
.loop {}— iterative execution with(OUT) -> IN?feedback block -
.branch {}— conditional routing on sealed types, composable withthen -
@Generable("desc")/@Guide/@LlmDescription— runtime reflection:toLlmDescription(),jsonSchema(),promptFragment(),fromLlmOutput<T>(),PartiallyGenerated<T> -
model { }— Ollama backend;host,port,temperature; injectableModelClientfor tests; auto-fallback to inline JSON tool-call format for models without native tool support (#706) -
model { claude(name); apiKey = ... }— Anthropic Messages API adapter mappingLlmMessage/LlmResponseto/from Anthropic's structuredtool_use/tool_resultcontent blocks; live integration tests againstclaude-haiku-4-5-20251001(#1644) -
model { openai(name); apiKey = ... }— OpenAI Chat Completions adapter;tool_calls↔tool_call_idpaired by synthesized id,parametersschema field (vs Anthropic'sinput_schema); live integration tests againstgpt-4o-mini(#1656) - Agentic execution loop — multi-turn tool calling with budget controls (
maxTurns,maxToolCalls,maxDuration,perToolTimeout,maxTokens,maxConsecutiveSameTool) +onToolUseobservability hook (#637, #963, #969) - Skill selection — manual
skillSelection {}+ automatic LLM routing when multiple skills match -
onError { Throwable -> }— infrastructure-error observability hook (LLM transport, response parse, budget); pure observability — original exception always rethrows (#962) -
Agent.observe { event -> }— sealedPipelineEventbridges the four hooks (skill / tool / knowledge / error) into one typed stream; composes additively (#965) -
Agent.toString()+Agent.describe()— readable single-line + multi-line debug output replacing the JVM identity-hash default (#970) -
onBudgetThreshold(threshold) { reason, usedPercent -> }— pre-cap warning hook; fires once perBudgetReasonwhen cumulative usage crosses the fraction, before the cap throws (#966) -
loadResource(path)/loadResourceOrNull(path)— read agent system prompts from classpath resources; fail-fast at agent construction when path is missing; UTF-8 decoded; leading-slash normalized (#980) -
wrap— teacher→student prompt-override operator (teacher wrap studentreturns aPipeline<IN, OUT>where the teacher'sStringoutput becomes the student's system prompt for that one call; restored after). Two framings: education (one generalist student specialized by many teachers) and security (the student's task surface is locked to what the teacher emits). The PRD calls this the>>operator; Kotlin can't overload>>so the function is namedwrap(#1698)
Phase 2 — Runtime + Distribution (Q2 2026)
Priority — 0.6.0 hero:
- Permission manifest / capability graph —
pipeline.permissionManifest { }DSL on agents and compositions;writeYaml(file)/writeJson(file)emit deterministic output; Gradle taskagentManifestplusverifyAgentManifestthat fails CI when high-risk changes appear (new high-risk tool, tool gains network/write access, MCP exposure widens, human-oversight removed, budgets relaxed, provider switches local→remote). Captures agents, skills, tools, memory R/W, budgets, MCP client/server caps, providers (secrets masked), guardrail hooks, composition structure. Lives in:agents-kt-manifest(zero vendor deps). The hero feature that turns the boundary-first runtime into something an auditor can sign off. (#1912) - Manifest hash + request/session IDs in runtime audit events —
AgentRuntimeContextcarriesrequestId(UUIDv4 perinvoke),sessionId(peragent.session()),manifestHash(sha256 of the deterministic manifest). EveryPipelineEvent/AgentEventincludes these three; consumed by the OTel bridge (#1908) and the JSONL exporter (#1914). Closes the loop from build-time evidence to runtime behaviour. (#1913) - JSONL audit log exporter — append-only, one event per line, grep/
jq-friendly. Schema coversrequestId / sessionId / manifestHash / agentId / skillId / toolId / eventType / timestamp / inputType / outputType / budgetState / guardrailDecision / mcpClientId / provider / model. Lives in:agents-kt-observability. Sibling to the OTel bridge (#1908) for teams that need a deterministic on-disk record. (#1914) - Declarative tool sandbox policy DSL (0.6.0 — declarative only, enforcement in 0.7.0) —
tool(..., policy { risk = ToolRisk.Medium; filesystem { read("/uploads/**"); writeNone() }; network { denyAll() } }). Captured in the permission manifest verbatim. Audit events notetoolPolicy.risk. The enforcement layer is sibling #1916. (#1915)
Priority — 0.6.0 platform:
-
Tool<IN, OUT>hierarchy +McpTool<IN, OUT>— typed tool inheritance refining the current skills-shape (#1948). Today MCP capabilities ship asSkill<Map<String,Any?>, String>viaMcpClient.toolSkills(); the typed-tool layer is additive, makesgrants { tools(...) }references compile againstTool<*,*>, and lets local + MCP tools share authorization / audit / policy machinery - MCP client integration —
McpClient.toolSkills()/promptSkills()/resourceSkills()expose every MCP capability as aSkillconsumable inskills { +... }. TheMcpTooltype-hierarchy refinement (above) is a future ergonomic upgrade; the user-facing feature shipped in 0.5.0 as the skills-shape (#1795 / #1796 / #1810).McpServerships DSLs to register prompts and resources alongside agents-as-tools, plusMcpServerInfofor the full capability snapshot - McpServer hardening — first-class incoming auth (
McpServerAuth), origin/host allowlist on HTTP transport,ClientPrincipalplumbed to tool execution, capability negotiation filtered per client,clientPolicy { client("ui") { allowSkill(...); denyTool(...); maxRequestsPerMinute = 60 } }DSL, audit event per accepted/rejected MCP request withmcpClientId/ decision reason. Default-deny outside localhost. Removes the README "no incoming auth on McpServer / no origin validation" limitations. (#1902) - Google Gemini provider adapter — fourth
ModelClientalongside Anthropic / OpenAI / Ollama; native SSE streaming override. Closes the "three providers only" objection without shifting Agents.KT into a provider-breadth race against Koog. (#1917) -
grants { tools(...) }— Layer 2 static permission DSL referencingTool<*,*>instances. Folded into the permission-manifest issue (#1912) — the manifest is the serialised view of every agent's grants; the DSL block is the input, the YAML/JSON is the output. Depends on the typedTool<IN,OUT>hierarchy (#1948) - Permission model: 3 states — Granted / Confirmed / Absent. Folded into the guardrails issue (#1907): Granted =
Allowor no interceptor registered; Confirmed =Escalate(reason, reviewerRole)resumed by host app; Absent = existing pre-guardrailallowedToolMaprejection now surfaced viaonUnauthorizedToolCall - KSP annotation processor — compile-time
@Generablecodegen: shape validation (#1700), schema emitter + field-type validation (#1701), sealed-root schema (#1702),toLlmDescription()+ multi-constant cache (#1703),constructFromMapcodegen (#1704), drop runtimekotlin-reflect+ empty-variants gate (#1705). Ships asagents-kt-kspmodule - Provider-level constrained decoding (Ollama
format: schema) + guided JSON mode (Anthropic / OpenAIresponse_format: json_schema, GeminiresponseSchema) — wire the KSP-emitted@GenerableJSON schemas through to provider request payloads so the model is forced to emit valid shape; eliminates the argument-repair retry loop (up to 8 retries today) for providers that support it. Schemas already emitted byagents-kt-ksp/SchemaEmitter; they just aren't threaded into provider payloads yet. (#1949) - Native CLI binary (GraalVM — no JRE required);
brew, npm, pip, curl, apt. Subcommands:manifest(emit),inspect(show manifest for a JAR),verify(diff against baseline, fail on policy relaxation). 0.7.0 deliverable. (#1923) - jlink minimal JRE bundle for runtime (~35MB)
Secondary:
- Session model — multi-turn
AgentSession, automatic compaction (SUMMARIZE,SLIDING_WINDOW,CUSTOM) -
onBefore*interceptor family — Rails-styleonBeforeSkill/onBeforeToolCall/onBeforeTurnreturning a sealedDecision { Proceed | ProceedWith(args) | Deny(reason) | Substitute(result) }. Sibling to today's post-hoc observer hooks (onToolUse/onSkillChosen/onError). Unifies per-client tool policy (McpServer), action confirmation, prompt-injection filtering (one-liner:onBeforeTurn { msgs -> if (filter.flag(msgs)) Decision.Deny(...) else Decision.Proceed }), and uniformperToolTimeoutwrapping. Chain semantics: registration order, all run, first non-Proceedwins. (#1907, blocks #1902 and feeds #1908) - Agent memory —
MemoryBank,memory_read/memory_write/memory_searchauto-injected tools -
.spawn {}— independent sub-agent lifecycle,AgentHandle<OUT>, parent-managed join - Streaming foundation —
LlmChunksealed type (TextDelta/ToolCallStarted/ToolCallArgumentsDelta/ToolCallFinished/End) +ModelClient.chatStream(messages): Flow<LlmChunk>with a default impl that wrapschat()so non-streaming providers keep working unchanged. Provider-native streaming (Anthropic SSE, OpenAI SSE, Ollamastream: true) overrides land per-adapter.LlmChunkstays narrow — no agentic concepts likeskillName/agentId(#1722) - Streaming session surface —
AgentEventsealed hierarchy (Token/ToolCallStarted/ToolCallArgumentsDelta/ToolCallFinished/SkillStarted/SkillCompleted/Completed<OUT>/Failed, every event carryingagentId),AgentSession<OUT>(coldevents: Flow<AgentEvent<OUT>>+suspend fun await(): OUT), and free functionAgent<IN, OUT>.session(input): AgentSession<OUT>(#1736). ExistingAgent.invokeSuspenddelegates to a new internalinvokeSuspendForSessionwith a no-op skill listener — backward-compat byte-for-byte. Today emits only bracket events (SkillStarted/SkillCompleted/Completed/Failed) — theToken/ToolCall*subtypes are defined and ready for consumers but not yet emitted (next entry). Integration coverage: failure-path identity-preservedcause, concurrent sessions, agentic-stub bracketing, live-LLM π-to-20-decimals against Ollama (#1737), and prompt-cancellation of the events collector (#1738). - Agentic-loop rewire onto
FlowCollector<AgentEvent>—TokenandToolCall*events fire mid-loop;tokensUsedthreaded throughSkillCompleted/Completed. Shipped in 0.5.0 (#1739 / #1740). Partial: synchronous skill bodies and blocking HTTP reads are not coroutine-cancellable mid-call yet — thesendAsyncadapter migration (step 4) is still pending and pairs with #1903 for the session-awareperToolTimeoutfix. - Enforce
perToolTimeouton session-aware tool path — close the documented gap atAgenticLoop.kt:392-405where session-aware tool execution (sessionExecutor) bypassesbudget.perToolTimeout. Migrate to coroutine-cancellable async execution so the timeout cancels underlying HTTP I/O, not just a worker thread. Best landed afteronBefore*interceptors so the timeout wraps uniformly. (#1903) - Streaming docs reconcile — README.md:162 ("no per-adapter native streaming yet") contradicts :163 / :193 ("all three adapters stream natively"). Sweep Limitations / Roadmap bullets and tag each as
shipped/experimental/planned. (#1901) - Per-adapter native streaming overrides — Anthropic SSE (
ClaudeClient.chatStream), OpenAI SSE (OpenAiClient.chatStream), Ollama NDJSONstream: true(OllamaClient.chatStream) all emit real partial chunks at the wire. Live integration tests measure 19 / 2 / 19 chunks per response respectively. See v0.5.0 streaming premortem -
Flow<PipelineEvent>for reactive UIs + Pipeline-level events (StageStarted,PipelineCompleted, etc) — built on top ofLlmChunk; depends on sub-agents and sessions - Multimodal input — vision and audio content blocks on LLM messages.
- Image input: vision-capable adapters accept image bytes + media type as a content block alongside text. Targets: Anthropic (
imagecontent blocks), OpenAI (image_url/ base64 in content), Ollama (llava/bakllavaviaimagesfield), Google Gemini. - Audio input: true audio input (Gemini, GPT-4o-audio) —
LlmContent.Audioblock. Optional STT-only helperaudio.transcribe(file)for the Whisper-style use case. - Architectural change:
LlmMessage.content: Stringneeds to evolve into aList<LlmContent>sealed type (Text / Image / Audio blocks). Binary-compat risk: add a siblingcontentBlocks: List<LlmContent>?field first with the existing String form auto-coerced into a single Text block; deprecate the String form once the API surface settles. Typed boundaries are unaffected —Agent<Image, String>(image classifier) andAgent<AudioClip, String>(transcriber) become coherent agent shapes.
- Image input: vision-capable adapters accept image bytes + media type as a content block alongside text. Targets: Anthropic (
- Serialization —
agent.json, A2A AgentCard - JAR bundles and folder-based assembly
- Gradle plugin
Phase 3 — Production (Q3 2026)
- Layer 2: Full Structure DSL with delegates, grants, authority, routing, escalation
- All 37 compile-time validations enforced by Gradle plugin
- AgentUnit testing framework — unit, semantic (LLM-as-judge), Skill Coverage metrics
- A2A protocol support (server + client)
- File-based knowledge:
skill.md,reference,examples,checklist+ RAG pipeline - Production observability — vendor-neutral
ObservabilityBridge+ adapter modules. Core ships a zero-dep:agents-kt-observabilitymodule exposingObservabilityBridge { onPipelineEvent / onAgentEvent / onInterceptorDecision }and anagent.observe(bridge)extension that wires both event surfaces plus theonBefore*decisions (#1907) into the bridge. Adapters live in separate Gradle modules so local-first users never pull vendor SDKs.-
:agents-kt-otel— OpenTelemetry adapter using the GenAI semantic conventions: skill = root span, model turn = child (gen_ai.operation.name=chat,gen_ai.system, token-usage attrs), tool call = grandchild (tool.name,tool.duration_ms), errors as span status, budget threshold / interceptor decisions as span events. Parent-context propagation viaContext.current(). (#1908, blocked-by #1907) -
:agents-kt-langsmith— LangSmith run-tree adapter (chain → llm → tool runs), async dispatch with backpressure. (#1909, blocked-by #1908) -
:agents-kt-langfuse— Langfuse traces / spans / generations adapter. (#1910, blocked-by #1908)
-
- Threat-model + deployment-pattern guide —
docs/threat-model.mdwith four worked scenarios (safe local assistant; internal business tool; MCP server behind gateway; anti-patterns), each calling out which Agents.KT guardrails apply and which gaps the deployer must close themselves. Linked from README security section andSECURITY.md. (#1904) - Release-signing hardening — replace the no-passphrase GPG example in the publishing guide with a passphrase-protected default; add a CI-signing section (secrets-manager-injected passphrase, short-lived subkey, or OIDC-to-signing-service); demote the no-protection variant to a clearly-labelled "local-only sandbox keys" subsection. (#1905)
- Three killer 0.6.0 demos — (1) safe MCP filesystem agent (read-only allowlist, rejection visible in audit log), (2) typed approval workflow with
Escalatedecisions for high-risk paths, (3) multi-agent audit pipeline binding every model + tool call to the manifest hash. Each lives inexamples/<name>/, runs against Ollama by default, emits manifest + JSONL audit on one invocation. Validates the 0.6.0 story end-to-end. (#1918) - Production hardening checklist + regulated deployment guide —
docs/production-hardening.mdcheckbox list (tool allowlists, MCP auth, conservative budgets, output wrapping, audit logs, manifest review in CI, etc.) anddocs/regulated-deployment.mdfor finance / healthcare / public-sector buyers (audit retention, evidence pack, manifest-hash chain-of-custody). Companion to threat-model (#1904). (#1919) - AI Act-aligned whitepaper — 8–12 page engineering-guidance document (explicitly not legal advice) on bounded agent systems, the manifest as static evidence, audit events as dynamic evidence, human-oversight hooks, shared-responsibility model. Timed for 2026 AI-governance attention. (#1921)
- README + landing repositioning — boundary-first / auditable register; "what Agents.KT owns" + "what it doesn't try to own" sections; marketing-register and compliance-language audit (avoid "fastest" / "fully compliant"; keep "auditable" / "least privilege" / "compliance-supporting"). Feeds off the comparison page (#1906). (#1922)
- Scarf integration + Maven adoption verification — set up Scarf on
ai.deep-code:agents-kt:*, 30-day baseline before public adoption claims, keep public wording soft ("Maven pull-through stronger than GitHub stars suggest") until verified. Outreach template prepped but not sent. (#1920) - Team DSL — swarm coordination (if isolated execution available)
- Generative outputs (image + audio) — sibling client interfaces to
ModelClientfor non-chat model families.ImageModelClient.generate(prompt, options): ImageBytes— text → image. Adapters: OpenAI DALL-E 3, Google Imagen, Stability. Optional streaming viagenerateStream(...): Flow<LlmChunk.ImageDelta>for partial-preview UX.TTSModelClient.synthesize(text, voice, options): AudioBytes— text → speech. Adapters: OpenAI TTS, ElevenLabs, Google Cloud TTS. Streaming viaLlmChunk.AudioDelta(pcmChunk)for low-latency playback (relevant for IDE voice agents, chat UIs).- These keep the typed-boundary identity:
Agent<String, ImageBytes>andAgent<TextRequest, AudioBytes>are first-class. Composition operators (then,wrap) work unchanged across modalities.
- Sandboxed tool execution (0.7.0 enforcement layer — depends on the declarative policy DSL in 0.6.0, #1915) (#1916) —
SandboxedExecutorinterface with three backends, opt-in per tool (tool(..., sandbox = ...)) or per skill (sandbox { }block). Default executor stays in-process for backward compatibility. Scope (lesson from Claude Code's implementation): sandbox only applies to subprocess-shaped tools — tools whose executor shells out viaProcessBuilderor invokes external binaries. In-process Kotlin lambdas don't get OS-level isolation becausegrants { }+ frozen agents already bound them; bolting on a sandbox is overkill that just makes the framework feel heavier.ProcessSandbox— subprocess executor with env / cwd / timeout / network constraints. Backends: Seatbelt on macOS (the framework behindsandbox-exec; built into the OS),bwrap(bubblewrap) on Linux as the primary,firejailas the fallback. On WSL2 same as Linux; WSL1 unsupported (no namespace support). PlainProcessBuilderwith a loud warning on platforms with no native sandboxing tool. Most pragmatic — every dev box has at least one path. Cribs profile shape + socat-proxy plumbing fromanthropic-experimental/sandbox-runtime(Anthropic's open-source Linux bwrap reference).- Network sub-policy: outbound blocked by default; allowlist via
sandbox.network.allowedDomains. A proxy server (running outside the sandbox) intercepts DNS + connections and gates by hostname. TLS caveat: the default proxy doesn't terminate TLS — it allows/denies by hostname only. Allowing broad domains (github.com,googleapis.com) leaves room for domain-fronting; consumers needing real traffic inspection plug in their own MITM proxy. Document this explicitly so it's not a surprise. - Permission/sandbox interaction: sandbox path config and
grants { }path config merge — both layers apply (matches Claude Code semantics). Sandbox cannot accidentally widen whatgrantsdenies. A tool with both must satisfy both. WasmSandbox— JAR-embedded WASM runtime via Chicory (pure-Java; no host setup). Tools compiled to WASM; filesystem and network capabilities granted explicitly at registration. Most truly embedded — works anywhere a JVM runs.DockerSandbox— opt-in extras module (agents-kt-docker-sandbox) viadocker-java. Talks to whatever Docker daemon the host already runs. Not embeddable — library ships in the JAR, daemon does not. For teams that already operate Docker.- Why this axis matters: today
grants { tools(writeFile, compile) }controls which tools an agent can call; sandboxing controls what those tools can do once invoked. Pairs with frozen agents + typed args to give a security model that's strictly stronger than "trust the executor lambda."
Phase 4 — Ecosystem (Q4 2026)
- Knowledge packs — battle-tested prompt libraries for common domains
- Agent generation from natural language (NL → Kotlin DSL)
- Skillify — extract reusable skills from session transcripts
- Visual structure editor, UML bidirectional conversion
- Knowledge marketplace
- Comparison page —
docs/comparison.mdwith a feature matrix vs LangChain (Py + LangChain4j), Microsoft Semantic Kernel, AutoGen, and a raw MCP client; covers typedAgent<IN,OUT>, runtime tool allowlist, MCP client/server, native streaming, budgets, sandboxing, KSP/compile-time validation, language, local-first model support; honest "where Agents.KT is weaker" subsection. (#1906)