docs(tutorial): add identity fields, structured output, multi-modal content, cost/latency routing, bidirectional MCP tools, and AI testing framework

jfarcand · jfarcand · commit bff97805851e · 2026-03-17T13:17:43.000-04:00
diff --git a/docs/src/content/docs/tutorial/09-ai-endpoint.md b/docs/src/content/docs/tutorial/09-ai-endpoint.md
@@ -473,6 +473,102 @@ public interface AiConversationMemory {
 
 The default implementation is `InMemoryConversationMemory`, which uses a sliding window capped at `maxHistoryMessages`.
 
+## Identity Fields
+
+`AiRequest` carries first-class identity fields so that adapters like Google ADK (which needs `userId`/`sessionId`) and Embabel (which needs `agentId`) can access them directly. The framework populates these from `AtmosphereResource` request attributes automatically.
+
+| Field | Purpose | Used by |
+|-------|---------|---------|
+| `userId` | End-user identifier (e.g., login name) | ADK, Spring AI, rate limiting |
+| `sessionId` | Session identifier for stateful backends | ADK (runner sessions) |
+| `agentId` | Target agent identifier | Embabel (`AgentPlatform`) |
+| `conversationId` | Conversation thread ID | Multi-turn memory, durable sessions |
+
+### Setting Identity in an Interceptor
+
+The cleanest pattern is an `AiInterceptor` that extracts identity from the HTTP request:
+
+```java
+public class IdentityInterceptor implements AiInterceptor {
+
+    @Override
+    public AiRequest preProcess(AiRequest request, AtmosphereResource resource) {
+        var httpReq = resource.getRequest();
+        return request
+            .withUserId(httpReq.getHeader("X-User-Id"))
+            .withSessionId(resource.uuid())
+            .withConversationId(httpReq.getParameter("conversationId"));
+    }
+}
+```
+
+```java
+@AiEndpoint(path = "/chat",
+    interceptors = {IdentityInterceptor.class},
+    conversationMemory = true)
+```
+
+Identity fields flow through the entire pipeline: interceptors, guardrails, RAG context providers, and the AI adapter. `AiRequest` is a record, so `withUserId()` etc. return a new immutable copy.
+
+## Multi-modal Content
+
+The `Content` sealed interface supports text, images, and files via `sendContent()`:
+
+```java
+// Text (delegates to send())
+session.sendContent(Content.text("Here are the results:"));
+
+// Image
+byte[] chartPng = renderChart(data);
+session.sendContent(Content.image(chartPng, "image/png"));
+
+// File
+byte[] csvBytes = exportCsv(rows);
+session.sendContent(Content.file(csvBytes, "text/csv", "results.csv"));
+```
+
+The wire protocol uses structured JSON with a `contentType` discriminator:
+
+```json
+{"type":"content","contentType":"text","data":"Here are the results:","sessionId":"abc","seq":1}
+{"type":"content","contentType":"image","mimeType":"image/png","data":"<base64>","sessionId":"abc","seq":2}
+{"type":"content","contentType":"file","mimeType":"text/csv","fileName":"results.csv","data":"<base64>","sessionId":"abc","seq":3}
+```
+
+Binary data is base64-encoded automatically via `Image.dataBase64()` / `File.dataBase64()`.
+
+## Structured Output
+
+The `StructuredOutputParser` SPI enables LLM responses to be parsed into typed Java objects. The built-in `JacksonStructuredOutputParser` generates JSON Schema instructions and parses JSON output via Jackson.
+
+```java
+// Parse a complete response into a typed record
+record WeatherReport(String city, double temp, String conditions) {}
+
+StructuredOutputParser parser = ... // auto-discovered via ServiceLoader
+String instructions = parser.schemaInstructions(WeatherReport.class);
+// "Respond with valid JSON matching this schema: {\"type\":\"object\",\"properties\":{...}}"
+
+WeatherReport report = parser.parse(llmOutput, WeatherReport.class);
+```
+
+For streaming, the parser can emit progressive field events:
+
+```java
+// In an adapter, as chunks arrive:
+parser.parseField(chunk, WeatherReport.class)
+    .ifPresent(entry -> session.emit(
+        new AiEvent.StructuredField(entry.getKey(), entry.getValue(), "string")));
+```
+
+These events enable real-time UI updates — the client can render fields as they arrive rather than waiting for the full response. When all fields are parsed, emit `EntityComplete`:
+
+```java
+session.emit(new AiEvent.EntityStart("WeatherReport", schema));
+// ... StructuredField events for each field ...
+session.emit(new AiEvent.EntityComplete("WeatherReport", report));
+```
+
 ## Guardrails and Context Providers
 
 ### Guardrails
@@ -486,7 +582,7 @@ The default implementation is `InMemoryConversationMemory`, which uses a sliding
 
 Execution order: guardrails -> interceptors -> [LLM] -> interceptors -> guardrails
 
-### Context Providers
+### Context Providers (RAG)
 
 `ContextProvider` classes augment the prompt with RAG context:
 
@@ -495,6 +591,33 @@ Execution order: guardrails -> interceptors -> [LLM] -> interceptors -> guardrai
     contextProviders = {DocumentSearchProvider.class})
 ```
 
+Enable auto-discovery to pick up all `ContextProvider` implementations on the classpath via `ServiceLoader`:
+
+```java
+@AiEndpoint(path = "/chat",
+    autoDiscoverContextProviders = true)
+```
+
+Three built-in providers are available:
+
+| Provider | Module | Description |
+|----------|--------|-------------|
+| `InMemoryContextProvider` | `atmosphere-rag` | Zero-dependency, word-overlap scoring |
+| `SpringAiVectorStoreContextProvider` | `atmosphere-rag` | Bridges Spring AI vector stores |
+| `LangChain4jEmbeddingStoreContextProvider` | `atmosphere-rag` | Bridges LangChain4j retrievers |
+
+The `ContextProvider` SPI supports query transformation and reranking:
+
+```java
+public interface ContextProvider {
+    List<Document> retrieve(String query, int maxResults);
+    default String transformQuery(String originalQuery) { return originalQuery; }
+    default List<Document> rerank(String query, List<Document> documents) { return documents; }
+}
+```
+
+Execution order: `transformQuery()` -> `retrieve()` -> `rerank()` -> inject into `AiRequest.message`.
+
 ## Client Integration
 
 ### Vanilla TypeScript
@@ -606,11 +729,17 @@ The `samples/spring-boot-ai-chat/` sample contains the complete `AiChat` endpoin
 |---------|---------|
 | `@AiEndpoint` | Annotation that wires up an AI chat endpoint with streaming, lifecycle, and configuration |
 | `@Prompt` | Marks the method that handles user messages (invoked on a virtual thread) |
-| `StreamingSession` | SPI for pushing streaming texts to clients: `send()`, `stream()`, `complete()`, `error()` |
+| `StreamingSession` | SPI for pushing streaming texts to clients: `send()`, `stream()`, `emit()`, `sendContent()` |
+| `AiEvent` | Sealed interface with 13 structured event types (tool calls, agent steps, entities, errors) |
+| `AiRequest` | Immutable record carrying message, identity fields, history, and metadata |
+| `Content` | Sealed interface for multi-modal content (text, images, files) |
 | `AiConfig` | Global LLM configuration (model, API key, base URL) |
 | `AiInterceptor` | Pre/post processing around the prompt (cost metering, RAG, logging) |
 | `AiConversationMemory` | Multi-turn conversation history per client |
+| `MemoryStrategy` | Pluggable strategy for selecting which history messages to include |
 | `AiGuardrail` | Safety checks before and after LLM calls |
-| `ContextProvider` | RAG context augmentation |
+| `ContextProvider` | RAG context augmentation with auto-discovery support |
+| `StructuredOutputParser` | SPI for parsing LLM output into typed Java objects with progressive field events |
+| `AiCapability` | Declares required backend capabilities; validated at startup |
 
 In the [next chapter](/docs/tutorial/10-ai-tools/), you will learn about `@AiTool` -- Atmosphere's framework-agnostic annotation for declaring tools that any LLM can call.
diff --git a/docs/src/content/docs/tutorial/12-ai-filters.md b/docs/src/content/docs/tutorial/12-ai-filters.md
@@ -322,6 +322,53 @@ router.streamChatCompletion(request, session);
 
 Rules are evaluated in order. The first matching rule determines the target client and model. If no rule matches, the default client is used.
 
+### Cost-based and Latency-based Routing
+
+Beyond content-based rules, `RoutingLlmClient` supports cost and latency constraints via `ModelOption` — a record that attaches cost, latency, and capability metadata to each model:
+
+```java
+var models = List.of(
+    new RoutingRule.ModelOption(geminiClient, "gemini-2.5-flash", 0.001, 200, 80),
+    new RoutingRule.ModelOption(openaiClient, "gpt-4o",           0.01,  500, 95),
+    new RoutingRule.ModelOption(claudeClient, "claude-3-haiku",   0.002, 150, 70)
+);
+
+var router = RoutingLlmClient.builder(geminiClient, "gemini-2.5-flash")
+    // Under budget: pick the most capable model that fits
+    .route(RoutingRule.costBased(5.0, models))
+    // Low latency: pick the most capable model under 300ms
+    .route(RoutingRule.latencyBased(300, models))
+    // Content fallback
+    .route(RoutingRule.contentBased(
+        prompt -> prompt.contains("code"), openaiClient, "gpt-4o"))
+    .build();
+```
+
+**Cost-based** (`CostBased`): filters models where `costPerStreamingText * maxStreamingTexts <= maxCost`, then selects the highest-capability model. This lets you use GPT-4o for short prompts and fall back to cheaper models for long ones.
+
+**Latency-based** (`LatencyBased`): filters models where `averageLatencyMs <= maxLatencyMs`, then selects the highest-capability model. Useful for real-time UIs that need sub-second time-to-first-token.
+
+The `ModelOption` fields:
+
+| Field | Description |
+|-------|-------------|
+| `costPerStreamingText` | Cost per streaming text in arbitrary units |
+| `averageLatencyMs` | Average response latency in milliseconds |
+| `capability` | Capability score (higher = more capable); used for tie-breaking |
+
+### Budget-aware Degradation
+
+Combine routing with `StreamingTextBudgetManager` for automatic model degradation when a user or organization approaches their budget:
+
+```java
+var router = RoutingLlmClient.builder(defaultClient, "gpt-4o")
+    .budgetManager(budgetManager, request -> extractOrgId(request))
+    .route(RoutingRule.costBased(10.0, models))
+    .build();
+```
+
+When an owner's usage exceeds the degradation threshold, the router switches to the budget manager's recommended model *before* evaluating rules. If the budget is fully exhausted, a `BudgetExceededException` is sent as an error to the client.
+
 ## Fan-out streaming
 
 Fan-out sends the same prompt to multiple models simultaneously, with each model streaming texts through its own child session. The `FanOutStreamingSession` (in `org.atmosphere.ai.fanout`) orchestrates this.
@@ -417,6 +464,85 @@ broadcaster.getBroadcasterConfig().addFilter(metering);
 
 The filter chain processes every streaming text in order: PII redaction first, then content safety, then cost metering. If PII redaction buffers a streaming text (waiting for a sentence boundary), it is not visible to downstream filters until the sentence is complete.
 
+## Testing AI Endpoints
+
+The `atmosphere-ai-test` module provides a lightweight testing framework for AI endpoints without spinning up a full server.
+
+### Dependency
+
+```xml
+<dependency>
+    <groupId>org.atmosphere</groupId>
+    <artifactId>atmosphere-ai-test</artifactId>
+    <version>LATEST</version>
+    <scope>test</scope>
+</dependency>
+```
+
+### AiTestClient
+
+`AiTestClient` wraps an `AiSupport` implementation and captures the full streaming response for assertion:
+
+```java
+@Test
+void toolsAreCalled() {
+    var client = new AiTestClient(myAiSupport);
+    var response = client.prompt("What's the weather in Tokyo?");
+
+    AiAssertions.assertThat(response)
+        .hasToolCall("get_weather")
+            .withArgument("city", "Tokyo")
+            .hasResult()
+            .and()
+        .containsText("Tokyo")
+        .completedWithin(Duration.ofSeconds(10))
+        .hasNoErrors();
+}
+```
+
+### AiResponse
+
+The captured `AiResponse` record exposes:
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `text()` | `String` | Full accumulated text response |
+| `events()` | `List<AiEvent>` | All structured events emitted during streaming |
+| `metadata()` | `Map<String, Object>` | Metadata key-value pairs |
+| `errors()` | `List<String>` | Error messages, if any |
+| `elapsed()` | `Duration` | Wall-clock response time |
+| `completed()` | `boolean` | Whether the stream completed normally |
+
+Filter events by type:
+
+```java
+List<AiEvent.ToolStart> toolCalls = response.eventsOfType(AiEvent.ToolStart.class);
+```
+
+### AiAssertions
+
+Fluent assertion API that chains naturally:
+
+```java
+AiAssertions.assertThat(response)
+    .containsText("weather")
+    .containsEventType(AiEvent.ToolStart.class)
+    .hasMetadata("routing.model")
+    .isComplete()
+    .hasNoErrors();
+```
+
+Tool call assertions support argument inspection:
+
+```java
+AiAssertions.assertThat(response)
+    .hasToolCall("search_docs")
+        .withArgument("query", "atmosphere framework")
+        .hasResult()
+        .and()
+    .completedWithin(Duration.ofSeconds(5));
+```
+
 ## Samples
 
 - **`samples/spring-boot-ai-tools/`** -- demonstrates the `CostMeteringInterceptor` that tracks streaming text usage and sends routing metadata to the client.
diff --git a/docs/src/content/docs/tutorial/13-mcp.md b/docs/src/content/docs/tutorial/13-mcp.md
@@ -393,6 +393,64 @@ The real power of Atmosphere's MCP module is that your MCP tools have full acces
 
 This means an AI agent can act as a **real-time chat moderator**, **notification system**, or **admin console** -- all through the standard MCP protocol that works with Claude Desktop, Cursor, and VS Code out of the box.
 
+## Bidirectional Tool Bridge
+
+Most MCP implementations are one-directional: the client calls tools on the server. Atmosphere's `BiDirectionalToolBridge` enables the **server to call tools on the client** — for example, invoking a JavaScript function in the user's browser.
+
+### How It Works
+
+The bridge sends a JSON-RPC tool call request over the Atmosphere transport (WebSocket/SSE) and waits for the client to respond asynchronously:
+
+```java
+var bridge = new BiDirectionalToolBridge(); // 30-second default timeout
+
+// Call a tool on the connected client
+CompletableFuture<String> result = bridge.callClientTool(
+    resource,
+    "getLocation",
+    Map.of()
+);
+
+// Non-blocking: process the result when it arrives
+result.thenAccept(location ->
+    logger.info("Client location: {}", location));
+
+// Or block (on a virtual thread):
+String location = result.join();
+```
+
+### Client-Side Handler
+
+The client must handle incoming tool call requests and respond:
+
+```javascript
+atmosphere.onMessage = function(response) {
+    var msg = JSON.parse(response.responseBody);
+    if (msg.type === 'tool-call') {
+        var result = executeClientTool(msg.toolName, msg.arguments);
+        atmosphere.push(JSON.stringify({
+            type: 'tool-response',
+            id: msg.id,
+            result: result
+        }));
+    }
+};
+```
+
+### Use Cases
+
+- **Browser-side data collection**: ask the client for geolocation, local storage data, or DOM state
+- **User confirmation**: request approval before executing a sensitive server-side action
+- **Client-side computation**: offload work to the browser (e.g., image processing in a Web Worker)
+
+The bridge is thread-safe, uses `ConcurrentHashMap` for pending calls, and supports custom timeouts:
+
+```java
+var bridge = new BiDirectionalToolBridge(Duration.ofSeconds(60));
+```
+
+Monitor pending calls via `bridge.pendingCount()` or `bridge.pendingCalls()` for observability.
+
 ## Sample
 
 The `samples/spring-boot-mcp-server/` sample contains the complete `DemoMcpServer` shown above, including a chat application that the MCP tools can interact with. Run it with: