Skip to content

Commit bff9780

Browse files
committed
docs(tutorial): add identity fields, structured output, multi-modal content, cost/latency routing, bidirectional MCP tools, and AI testing framework
1 parent fcf0a89 commit bff9780

3 files changed

Lines changed: 316 additions & 3 deletions

File tree

docs/src/content/docs/tutorial/09-ai-endpoint.md

Lines changed: 132 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -473,6 +473,102 @@ public interface AiConversationMemory {
473473

474474
The default implementation is `InMemoryConversationMemory`, which uses a sliding window capped at `maxHistoryMessages`.
475475

476+
## Identity Fields
477+
478+
`AiRequest` carries first-class identity fields so that adapters like Google ADK (which needs `userId`/`sessionId`) and Embabel (which needs `agentId`) can access them directly. The framework populates these from `AtmosphereResource` request attributes automatically.
479+
480+
| Field | Purpose | Used by |
481+
|-------|---------|---------|
482+
| `userId` | End-user identifier (e.g., login name) | ADK, Spring AI, rate limiting |
483+
| `sessionId` | Session identifier for stateful backends | ADK (runner sessions) |
484+
| `agentId` | Target agent identifier | Embabel (`AgentPlatform`) |
485+
| `conversationId` | Conversation thread ID | Multi-turn memory, durable sessions |
486+
487+
### Setting Identity in an Interceptor
488+
489+
The cleanest pattern is an `AiInterceptor` that extracts identity from the HTTP request:
490+
491+
```java
492+
public class IdentityInterceptor implements AiInterceptor {
493+
494+
@Override
495+
public AiRequest preProcess(AiRequest request, AtmosphereResource resource) {
496+
var httpReq = resource.getRequest();
497+
return request
498+
.withUserId(httpReq.getHeader("X-User-Id"))
499+
.withSessionId(resource.uuid())
500+
.withConversationId(httpReq.getParameter("conversationId"));
501+
}
502+
}
503+
```
504+
505+
```java
506+
@AiEndpoint(path = "/chat",
507+
interceptors = {IdentityInterceptor.class},
508+
conversationMemory = true)
509+
```
510+
511+
Identity fields flow through the entire pipeline: interceptors, guardrails, RAG context providers, and the AI adapter. `AiRequest` is a record, so `withUserId()` etc. return a new immutable copy.
512+
513+
## Multi-modal Content
514+
515+
The `Content` sealed interface supports text, images, and files via `sendContent()`:
516+
517+
```java
518+
// Text (delegates to send())
519+
session.sendContent(Content.text("Here are the results:"));
520+
521+
// Image
522+
byte[] chartPng = renderChart(data);
523+
session.sendContent(Content.image(chartPng, "image/png"));
524+
525+
// File
526+
byte[] csvBytes = exportCsv(rows);
527+
session.sendContent(Content.file(csvBytes, "text/csv", "results.csv"));
528+
```
529+
530+
The wire protocol uses structured JSON with a `contentType` discriminator:
531+
532+
```json
533+
{"type":"content","contentType":"text","data":"Here are the results:","sessionId":"abc","seq":1}
534+
{"type":"content","contentType":"image","mimeType":"image/png","data":"<base64>","sessionId":"abc","seq":2}
535+
{"type":"content","contentType":"file","mimeType":"text/csv","fileName":"results.csv","data":"<base64>","sessionId":"abc","seq":3}
536+
```
537+
538+
Binary data is base64-encoded automatically via `Image.dataBase64()` / `File.dataBase64()`.
539+
540+
## Structured Output
541+
542+
The `StructuredOutputParser` SPI enables LLM responses to be parsed into typed Java objects. The built-in `JacksonStructuredOutputParser` generates JSON Schema instructions and parses JSON output via Jackson.
543+
544+
```java
545+
// Parse a complete response into a typed record
546+
record WeatherReport(String city, double temp, String conditions) {}
547+
548+
StructuredOutputParser parser = ... // auto-discovered via ServiceLoader
549+
String instructions = parser.schemaInstructions(WeatherReport.class);
550+
// "Respond with valid JSON matching this schema: {\"type\":\"object\",\"properties\":{...}}"
551+
552+
WeatherReport report = parser.parse(llmOutput, WeatherReport.class);
553+
```
554+
555+
For streaming, the parser can emit progressive field events:
556+
557+
```java
558+
// In an adapter, as chunks arrive:
559+
parser.parseField(chunk, WeatherReport.class)
560+
.ifPresent(entry -> session.emit(
561+
new AiEvent.StructuredField(entry.getKey(), entry.getValue(), "string")));
562+
```
563+
564+
These events enable real-time UI updates — the client can render fields as they arrive rather than waiting for the full response. When all fields are parsed, emit `EntityComplete`:
565+
566+
```java
567+
session.emit(new AiEvent.EntityStart("WeatherReport", schema));
568+
// ... StructuredField events for each field ...
569+
session.emit(new AiEvent.EntityComplete("WeatherReport", report));
570+
```
571+
476572
## Guardrails and Context Providers
477573

478574
### Guardrails
@@ -486,7 +582,7 @@ The default implementation is `InMemoryConversationMemory`, which uses a sliding
486582

487583
Execution order: guardrails -> interceptors -> [LLM] -> interceptors -> guardrails
488584

489-
### Context Providers
585+
### Context Providers (RAG)
490586

491587
`ContextProvider` classes augment the prompt with RAG context:
492588

@@ -495,6 +591,33 @@ Execution order: guardrails -> interceptors -> [LLM] -> interceptors -> guardrai
495591
contextProviders = {DocumentSearchProvider.class})
496592
```
497593

594+
Enable auto-discovery to pick up all `ContextProvider` implementations on the classpath via `ServiceLoader`:
595+
596+
```java
597+
@AiEndpoint(path = "/chat",
598+
autoDiscoverContextProviders = true)
599+
```
600+
601+
Three built-in providers are available:
602+
603+
| Provider | Module | Description |
604+
|----------|--------|-------------|
605+
| `InMemoryContextProvider` | `atmosphere-rag` | Zero-dependency, word-overlap scoring |
606+
| `SpringAiVectorStoreContextProvider` | `atmosphere-rag` | Bridges Spring AI vector stores |
607+
| `LangChain4jEmbeddingStoreContextProvider` | `atmosphere-rag` | Bridges LangChain4j retrievers |
608+
609+
The `ContextProvider` SPI supports query transformation and reranking:
610+
611+
```java
612+
public interface ContextProvider {
613+
List<Document> retrieve(String query, int maxResults);
614+
default String transformQuery(String originalQuery) { return originalQuery; }
615+
default List<Document> rerank(String query, List<Document> documents) { return documents; }
616+
}
617+
```
618+
619+
Execution order: `transformQuery()` -> `retrieve()` -> `rerank()` -> inject into `AiRequest.message`.
620+
498621
## Client Integration
499622

500623
### Vanilla TypeScript
@@ -606,11 +729,17 @@ The `samples/spring-boot-ai-chat/` sample contains the complete `AiChat` endpoin
606729
|---------|---------|
607730
| `@AiEndpoint` | Annotation that wires up an AI chat endpoint with streaming, lifecycle, and configuration |
608731
| `@Prompt` | Marks the method that handles user messages (invoked on a virtual thread) |
609-
| `StreamingSession` | SPI for pushing streaming texts to clients: `send()`, `stream()`, `complete()`, `error()` |
732+
| `StreamingSession` | SPI for pushing streaming texts to clients: `send()`, `stream()`, `emit()`, `sendContent()` |
733+
| `AiEvent` | Sealed interface with 13 structured event types (tool calls, agent steps, entities, errors) |
734+
| `AiRequest` | Immutable record carrying message, identity fields, history, and metadata |
735+
| `Content` | Sealed interface for multi-modal content (text, images, files) |
610736
| `AiConfig` | Global LLM configuration (model, API key, base URL) |
611737
| `AiInterceptor` | Pre/post processing around the prompt (cost metering, RAG, logging) |
612738
| `AiConversationMemory` | Multi-turn conversation history per client |
739+
| `MemoryStrategy` | Pluggable strategy for selecting which history messages to include |
613740
| `AiGuardrail` | Safety checks before and after LLM calls |
614-
| `ContextProvider` | RAG context augmentation |
741+
| `ContextProvider` | RAG context augmentation with auto-discovery support |
742+
| `StructuredOutputParser` | SPI for parsing LLM output into typed Java objects with progressive field events |
743+
| `AiCapability` | Declares required backend capabilities; validated at startup |
615744

616745
In the [next chapter](/docs/tutorial/10-ai-tools/), you will learn about `@AiTool` -- Atmosphere's framework-agnostic annotation for declaring tools that any LLM can call.

docs/src/content/docs/tutorial/12-ai-filters.md

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -322,6 +322,53 @@ router.streamChatCompletion(request, session);
322322

323323
Rules are evaluated in order. The first matching rule determines the target client and model. If no rule matches, the default client is used.
324324

325+
### Cost-based and Latency-based Routing
326+
327+
Beyond content-based rules, `RoutingLlmClient` supports cost and latency constraints via `ModelOption` — a record that attaches cost, latency, and capability metadata to each model:
328+
329+
```java
330+
var models = List.of(
331+
new RoutingRule.ModelOption(geminiClient, "gemini-2.5-flash", 0.001, 200, 80),
332+
new RoutingRule.ModelOption(openaiClient, "gpt-4o", 0.01, 500, 95),
333+
new RoutingRule.ModelOption(claudeClient, "claude-3-haiku", 0.002, 150, 70)
334+
);
335+
336+
var router = RoutingLlmClient.builder(geminiClient, "gemini-2.5-flash")
337+
// Under budget: pick the most capable model that fits
338+
.route(RoutingRule.costBased(5.0, models))
339+
// Low latency: pick the most capable model under 300ms
340+
.route(RoutingRule.latencyBased(300, models))
341+
// Content fallback
342+
.route(RoutingRule.contentBased(
343+
prompt -> prompt.contains("code"), openaiClient, "gpt-4o"))
344+
.build();
345+
```
346+
347+
**Cost-based** (`CostBased`): filters models where `costPerStreamingText * maxStreamingTexts <= maxCost`, then selects the highest-capability model. This lets you use GPT-4o for short prompts and fall back to cheaper models for long ones.
348+
349+
**Latency-based** (`LatencyBased`): filters models where `averageLatencyMs <= maxLatencyMs`, then selects the highest-capability model. Useful for real-time UIs that need sub-second time-to-first-token.
350+
351+
The `ModelOption` fields:
352+
353+
| Field | Description |
354+
|-------|-------------|
355+
| `costPerStreamingText` | Cost per streaming text in arbitrary units |
356+
| `averageLatencyMs` | Average response latency in milliseconds |
357+
| `capability` | Capability score (higher = more capable); used for tie-breaking |
358+
359+
### Budget-aware Degradation
360+
361+
Combine routing with `StreamingTextBudgetManager` for automatic model degradation when a user or organization approaches their budget:
362+
363+
```java
364+
var router = RoutingLlmClient.builder(defaultClient, "gpt-4o")
365+
.budgetManager(budgetManager, request -> extractOrgId(request))
366+
.route(RoutingRule.costBased(10.0, models))
367+
.build();
368+
```
369+
370+
When an owner's usage exceeds the degradation threshold, the router switches to the budget manager's recommended model *before* evaluating rules. If the budget is fully exhausted, a `BudgetExceededException` is sent as an error to the client.
371+
325372
## Fan-out streaming
326373

327374
Fan-out sends the same prompt to multiple models simultaneously, with each model streaming texts through its own child session. The `FanOutStreamingSession` (in `org.atmosphere.ai.fanout`) orchestrates this.
@@ -417,6 +464,85 @@ broadcaster.getBroadcasterConfig().addFilter(metering);
417464

418465
The filter chain processes every streaming text in order: PII redaction first, then content safety, then cost metering. If PII redaction buffers a streaming text (waiting for a sentence boundary), it is not visible to downstream filters until the sentence is complete.
419466

467+
## Testing AI Endpoints
468+
469+
The `atmosphere-ai-test` module provides a lightweight testing framework for AI endpoints without spinning up a full server.
470+
471+
### Dependency
472+
473+
```xml
474+
<dependency>
475+
<groupId>org.atmosphere</groupId>
476+
<artifactId>atmosphere-ai-test</artifactId>
477+
<version>LATEST</version>
478+
<scope>test</scope>
479+
</dependency>
480+
```
481+
482+
### AiTestClient
483+
484+
`AiTestClient` wraps an `AiSupport` implementation and captures the full streaming response for assertion:
485+
486+
```java
487+
@Test
488+
void toolsAreCalled() {
489+
var client = new AiTestClient(myAiSupport);
490+
var response = client.prompt("What's the weather in Tokyo?");
491+
492+
AiAssertions.assertThat(response)
493+
.hasToolCall("get_weather")
494+
.withArgument("city", "Tokyo")
495+
.hasResult()
496+
.and()
497+
.containsText("Tokyo")
498+
.completedWithin(Duration.ofSeconds(10))
499+
.hasNoErrors();
500+
}
501+
```
502+
503+
### AiResponse
504+
505+
The captured `AiResponse` record exposes:
506+
507+
| Field | Type | Description |
508+
|-------|------|-------------|
509+
| `text()` | `String` | Full accumulated text response |
510+
| `events()` | `List<AiEvent>` | All structured events emitted during streaming |
511+
| `metadata()` | `Map<String, Object>` | Metadata key-value pairs |
512+
| `errors()` | `List<String>` | Error messages, if any |
513+
| `elapsed()` | `Duration` | Wall-clock response time |
514+
| `completed()` | `boolean` | Whether the stream completed normally |
515+
516+
Filter events by type:
517+
518+
```java
519+
List<AiEvent.ToolStart> toolCalls = response.eventsOfType(AiEvent.ToolStart.class);
520+
```
521+
522+
### AiAssertions
523+
524+
Fluent assertion API that chains naturally:
525+
526+
```java
527+
AiAssertions.assertThat(response)
528+
.containsText("weather")
529+
.containsEventType(AiEvent.ToolStart.class)
530+
.hasMetadata("routing.model")
531+
.isComplete()
532+
.hasNoErrors();
533+
```
534+
535+
Tool call assertions support argument inspection:
536+
537+
```java
538+
AiAssertions.assertThat(response)
539+
.hasToolCall("search_docs")
540+
.withArgument("query", "atmosphere framework")
541+
.hasResult()
542+
.and()
543+
.completedWithin(Duration.ofSeconds(5));
544+
```
545+
420546
## Samples
421547

422548
- **`samples/spring-boot-ai-tools/`** -- demonstrates the `CostMeteringInterceptor` that tracks streaming text usage and sends routing metadata to the client.

docs/src/content/docs/tutorial/13-mcp.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -393,6 +393,64 @@ The real power of Atmosphere's MCP module is that your MCP tools have full acces
393393

394394
This means an AI agent can act as a **real-time chat moderator**, **notification system**, or **admin console** -- all through the standard MCP protocol that works with Claude Desktop, Cursor, and VS Code out of the box.
395395

396+
## Bidirectional Tool Bridge
397+
398+
Most MCP implementations are one-directional: the client calls tools on the server. Atmosphere's `BiDirectionalToolBridge` enables the **server to call tools on the client** — for example, invoking a JavaScript function in the user's browser.
399+
400+
### How It Works
401+
402+
The bridge sends a JSON-RPC tool call request over the Atmosphere transport (WebSocket/SSE) and waits for the client to respond asynchronously:
403+
404+
```java
405+
var bridge = new BiDirectionalToolBridge(); // 30-second default timeout
406+
407+
// Call a tool on the connected client
408+
CompletableFuture<String> result = bridge.callClientTool(
409+
resource,
410+
"getLocation",
411+
Map.of()
412+
);
413+
414+
// Non-blocking: process the result when it arrives
415+
result.thenAccept(location ->
416+
logger.info("Client location: {}", location));
417+
418+
// Or block (on a virtual thread):
419+
String location = result.join();
420+
```
421+
422+
### Client-Side Handler
423+
424+
The client must handle incoming tool call requests and respond:
425+
426+
```javascript
427+
atmosphere.onMessage = function(response) {
428+
var msg = JSON.parse(response.responseBody);
429+
if (msg.type === 'tool-call') {
430+
var result = executeClientTool(msg.toolName, msg.arguments);
431+
atmosphere.push(JSON.stringify({
432+
type: 'tool-response',
433+
id: msg.id,
434+
result: result
435+
}));
436+
}
437+
};
438+
```
439+
440+
### Use Cases
441+
442+
- **Browser-side data collection**: ask the client for geolocation, local storage data, or DOM state
443+
- **User confirmation**: request approval before executing a sensitive server-side action
444+
- **Client-side computation**: offload work to the browser (e.g., image processing in a Web Worker)
445+
446+
The bridge is thread-safe, uses `ConcurrentHashMap` for pending calls, and supports custom timeouts:
447+
448+
```java
449+
var bridge = new BiDirectionalToolBridge(Duration.ofSeconds(60));
450+
```
451+
452+
Monitor pending calls via `bridge.pendingCount()` or `bridge.pendingCalls()` for observability.
453+
396454
## Sample
397455

398456
The `samples/spring-boot-mcp-server/` sample contains the complete `DemoMcpServer` shown above, including a chat application that the MCP tools can interact with. Run it with:

0 commit comments

Comments
 (0)