|
48 | 48 | - [x] Agent memory — `MemoryBank`, `memory_read`/`memory_write`/`memory_search` auto-injected tools |
49 | 49 | - [ ] `.spawn {}` — independent sub-agent lifecycle, `AgentHandle<OUT>`, parent-managed join |
50 | 50 | - [ ] `Flow<PipelineEvent>` for reactive UIs + Pipeline-level events (`StageStarted`, `PipelineCompleted`, etc) — depends on streaming, sub-agents, sessions |
| 51 | +- [ ] **Multimodal input** — vision and audio content blocks on LLM messages. |
| 52 | + - **Image input:** vision-capable adapters accept image bytes + media type as a content block alongside text. Targets: Anthropic (`image` content blocks), OpenAI (`image_url` / base64 in content), Ollama (`llava` / `bakllava` via `images` field), Google Gemini. |
| 53 | + - **Audio input:** true audio input (Gemini, GPT-4o-audio) — `LlmContent.Audio` block. Optional STT-only helper `audio.transcribe(file)` for the Whisper-style use case. |
| 54 | + - **Architectural change:** `LlmMessage.content: String` needs to evolve into a `List<LlmContent>` sealed type (Text / Image / Audio blocks). Binary-compat risk: add a sibling `contentBlocks: List<LlmContent>?` field first with the existing String form auto-coerced into a single Text block; deprecate the String form once the API surface settles. Typed boundaries are unaffected — `Agent<Image, String>` (image classifier) and `Agent<AudioClip, String>` (transcriber) become coherent agent shapes. |
51 | 55 | - [ ] Serialization — `agent.json`, A2A AgentCard |
52 | 56 | - [ ] JAR bundles and folder-based assembly |
53 | 57 | - [ ] Gradle plugin |
|
60 | 64 | - [ ] File-based knowledge: `skill.md`, `reference`, `examples`, `checklist` + RAG pipeline |
61 | 65 | - [ ] Production observability: OpenTelemetry traces |
62 | 66 | - [ ] Team DSL — swarm coordination (if isolated execution available) |
63 | | -- [ ] **Sandboxed tool execution** — `SandboxedExecutor` interface with three backends, opt-in per tool (`tool(..., sandbox = ...)`) or per skill (`sandbox { }` block). Default executor stays in-process for backward compatibility. |
64 | | - - `ProcessSandbox` — subprocess executor with env / cwd / timeout / network constraints. Backends: `sandbox-exec` on macOS (built into the OS), `bwrap` or `firejail` on Linux. Falls back to plain `ProcessBuilder` with a loud warning on platforms with no native sandboxing tool. **Most pragmatic** — every dev box has at least one path. |
| 67 | +- [ ] **Generative outputs (image + audio)** — sibling client interfaces to `ModelClient` for non-chat model families. |
| 68 | + - `ImageModelClient.generate(prompt, options): ImageBytes` — text → image. Adapters: OpenAI DALL-E 3, Google Imagen, Stability. Optional streaming via `generateStream(...): Flow<LlmChunk.ImageDelta>` for partial-preview UX. |
| 69 | + - `TTSModelClient.synthesize(text, voice, options): AudioBytes` — text → speech. Adapters: OpenAI TTS, ElevenLabs, Google Cloud TTS. Streaming via `LlmChunk.AudioDelta(pcmChunk)` for low-latency playback (relevant for IDE voice agents, chat UIs). |
| 70 | + - These keep the typed-boundary identity: `Agent<String, ImageBytes>` and `Agent<TextRequest, AudioBytes>` are first-class. Composition operators (`then`, `wrap`) work unchanged across modalities. |
| 71 | +- [ ] **Sandboxed tool execution** — `SandboxedExecutor` interface with three backends, opt-in per tool (`tool(..., sandbox = ...)`) or per skill (`sandbox { }` block). Default executor stays in-process for backward compatibility. **Scope (lesson from Claude Code's implementation):** sandbox only applies to subprocess-shaped tools — tools whose executor shells out via `ProcessBuilder` or invokes external binaries. In-process Kotlin lambdas don't get OS-level isolation because `grants { }` + frozen agents already bound them; bolting on a sandbox is overkill that just makes the framework feel heavier. |
| 72 | + - `ProcessSandbox` — subprocess executor with env / cwd / timeout / network constraints. Backends: **Seatbelt** on macOS (the framework behind `sandbox-exec`; built into the OS), `bwrap` (bubblewrap) on Linux as the primary, `firejail` as the fallback. On WSL2 same as Linux; WSL1 unsupported (no namespace support). Plain `ProcessBuilder` with a loud warning on platforms with no native sandboxing tool. **Most pragmatic** — every dev box has at least one path. Cribs profile shape + socat-proxy plumbing from [`anthropic-experimental/sandbox-runtime`](https://github.com/anthropic-experimental/sandbox-runtime) (Anthropic's open-source Linux bwrap reference). |
| 73 | + - **Network sub-policy:** outbound blocked by default; allowlist via `sandbox.network.allowedDomains`. A proxy server (running outside the sandbox) intercepts DNS + connections and gates by hostname. **TLS caveat:** the default proxy doesn't terminate TLS — it allows/denies by hostname only. Allowing broad domains (`github.com`, `googleapis.com`) leaves room for domain-fronting; consumers needing real traffic inspection plug in their own MITM proxy. Document this explicitly so it's not a surprise. |
| 74 | + - **Permission/sandbox interaction:** sandbox path config and `grants { }` path config *merge* — both layers apply (matches Claude Code semantics). Sandbox cannot accidentally widen what `grants` denies. A tool with both must satisfy both. |
65 | 75 | - `WasmSandbox` — JAR-embedded WASM runtime via Chicory (pure-Java; no host setup). Tools compiled to WASM; filesystem and network capabilities granted explicitly at registration. **Most truly embedded** — works anywhere a JVM runs. |
66 | 76 | - `DockerSandbox` — opt-in extras module (`agents-kt-docker-sandbox`) via `docker-java`. Talks to whatever Docker daemon the host already runs. **Not embeddable** — library ships in the JAR, daemon does not. For teams that already operate Docker. |
67 | 77 | - Why this axis matters: today `grants { tools(writeFile, compile) }` controls *which* tools an agent can call; sandboxing controls *what those tools can do* once invoked. Pairs with frozen agents + typed args to give a security model that's strictly stronger than "trust the executor lambda." |
|
0 commit comments