You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/content/docs/tutorial/09-ai-endpoint.md
+132-3Lines changed: 132 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -473,6 +473,102 @@ public interface AiConversationMemory {
473
473
474
474
The default implementation is `InMemoryConversationMemory`, which uses a sliding window capped at `maxHistoryMessages`.
475
475
476
+
## Identity Fields
477
+
478
+
`AiRequest` carries first-class identity fields so that adapters like Google ADK (which needs `userId`/`sessionId`) and Embabel (which needs `agentId`) can access them directly. The framework populates these from `AtmosphereResource` request attributes automatically.
Identity fields flow through the entire pipeline: interceptors, guardrails, RAG context providers, and the AI adapter. `AiRequest` is a record, so `withUserId()` etc. return a new immutable copy.
512
+
513
+
## Multi-modal Content
514
+
515
+
The `Content` sealed interface supports text, images, and files via `sendContent()`:
516
+
517
+
```java
518
+
// Text (delegates to send())
519
+
session.sendContent(Content.text("Here are the results:"));
Binary data is base64-encoded automatically via `Image.dataBase64()` / `File.dataBase64()`.
539
+
540
+
## Structured Output
541
+
542
+
The `StructuredOutputParser` SPI enables LLM responses to be parsed into typed Java objects. The built-in `JacksonStructuredOutputParser` generates JSON Schema instructions and parses JSON output via Jackson.
543
+
544
+
```java
545
+
// Parse a complete response into a typed record
546
+
record WeatherReport(String city, double temp, String conditions) {}
547
+
548
+
StructuredOutputParser parser =...// auto-discovered via ServiceLoader
These events enable real-time UI updates — the client can render fields as they arrive rather than waiting for the full response. When all fields are parsed, emit `EntityComplete`:
|`AiRequest`| Immutable record carrying message, identity fields, history, and metadata |
735
+
|`Content`| Sealed interface for multi-modal content (text, images, files) |
610
736
|`AiConfig`| Global LLM configuration (model, API key, base URL) |
611
737
|`AiInterceptor`| Pre/post processing around the prompt (cost metering, RAG, logging) |
612
738
|`AiConversationMemory`| Multi-turn conversation history per client |
739
+
|`MemoryStrategy`| Pluggable strategy for selecting which history messages to include |
613
740
|`AiGuardrail`| Safety checks before and after LLM calls |
614
-
|`ContextProvider`| RAG context augmentation |
741
+
|`ContextProvider`| RAG context augmentation with auto-discovery support |
742
+
|`StructuredOutputParser`| SPI for parsing LLM output into typed Java objects with progressive field events |
743
+
|`AiCapability`| Declares required backend capabilities; validated at startup |
615
744
616
745
In the [next chapter](/docs/tutorial/10-ai-tools/), you will learn about `@AiTool` -- Atmosphere's framework-agnostic annotation for declaring tools that any LLM can call.
Rules are evaluated in order. The first matching rule determines the target client and model. If no rule matches, the default client is used.
324
324
325
+
### Cost-based and Latency-based Routing
326
+
327
+
Beyond content-based rules, `RoutingLlmClient` supports cost and latency constraints via `ModelOption` — a record that attaches cost, latency, and capability metadata to each model:
**Cost-based** (`CostBased`): filters models where `costPerStreamingText * maxStreamingTexts <= maxCost`, then selects the highest-capability model. This lets you use GPT-4o for short prompts and fall back to cheaper models for long ones.
348
+
349
+
**Latency-based** (`LatencyBased`): filters models where `averageLatencyMs <= maxLatencyMs`, then selects the highest-capability model. Useful for real-time UIs that need sub-second time-to-first-token.
350
+
351
+
The `ModelOption` fields:
352
+
353
+
| Field | Description |
354
+
|-------|-------------|
355
+
|`costPerStreamingText`| Cost per streaming text in arbitrary units |
356
+
|`averageLatencyMs`| Average response latency in milliseconds |
357
+
|`capability`| Capability score (higher = more capable); used for tie-breaking |
358
+
359
+
### Budget-aware Degradation
360
+
361
+
Combine routing with `StreamingTextBudgetManager` for automatic model degradation when a user or organization approaches their budget:
362
+
363
+
```java
364
+
var router =RoutingLlmClient.builder(defaultClient, "gpt-4o")
When an owner's usage exceeds the degradation threshold, the router switches to the budget manager's recommended model *before* evaluating rules. If the budget is fully exhausted, a `BudgetExceededException` is sent as an error to the client.
371
+
325
372
## Fan-out streaming
326
373
327
374
Fan-out sends the same prompt to multiple models simultaneously, with each model streaming texts through its own child session. The `FanOutStreamingSession` (in `org.atmosphere.ai.fanout`) orchestrates this.
The filter chain processes every streaming text in order: PII redaction first, then content safety, then cost metering. If PII redaction buffers a streaming text (waiting for a sentence boundary), it is not visible to downstream filters until the sentence is complete.
419
466
467
+
## Testing AI Endpoints
468
+
469
+
The `atmosphere-ai-test` module provides a lightweight testing framework for AI endpoints without spinning up a full server.
470
+
471
+
### Dependency
472
+
473
+
```xml
474
+
<dependency>
475
+
<groupId>org.atmosphere</groupId>
476
+
<artifactId>atmosphere-ai-test</artifactId>
477
+
<version>LATEST</version>
478
+
<scope>test</scope>
479
+
</dependency>
480
+
```
481
+
482
+
### AiTestClient
483
+
484
+
`AiTestClient` wraps an `AiSupport` implementation and captures the full streaming response for assertion:
485
+
486
+
```java
487
+
@Test
488
+
void toolsAreCalled() {
489
+
var client =newAiTestClient(myAiSupport);
490
+
var response = client.prompt("What's the weather in Tokyo?");
491
+
492
+
AiAssertions.assertThat(response)
493
+
.hasToolCall("get_weather")
494
+
.withArgument("city", "Tokyo")
495
+
.hasResult()
496
+
.and()
497
+
.containsText("Tokyo")
498
+
.completedWithin(Duration.ofSeconds(10))
499
+
.hasNoErrors();
500
+
}
501
+
```
502
+
503
+
### AiResponse
504
+
505
+
The captured `AiResponse` record exposes:
506
+
507
+
| Field | Type | Description |
508
+
|-------|------|-------------|
509
+
|`text()`|`String`| Full accumulated text response |
510
+
|`events()`|`List<AiEvent>`| All structured events emitted during streaming |
-**`samples/spring-boot-ai-tools/`** -- demonstrates the `CostMeteringInterceptor` that tracks streaming text usage and sends routing metadata to the client.
Copy file name to clipboardExpand all lines: docs/src/content/docs/tutorial/13-mcp.md
+58Lines changed: 58 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -393,6 +393,64 @@ The real power of Atmosphere's MCP module is that your MCP tools have full acces
393
393
394
394
This means an AI agent can act as a **real-time chat moderator**, **notification system**, or **admin console** -- all through the standard MCP protocol that works with Claude Desktop, Cursor, and VS Code out of the box.
395
395
396
+
## Bidirectional Tool Bridge
397
+
398
+
Most MCP implementations are one-directional: the client calls tools on the server. Atmosphere's `BiDirectionalToolBridge` enables the **server to call tools on the client** — for example, invoking a JavaScript function in the user's browser.
399
+
400
+
### How It Works
401
+
402
+
The bridge sends a JSON-RPC tool call request over the Atmosphere transport (WebSocket/SSE) and waits for the client to respond asynchronously:
403
+
404
+
```java
405
+
var bridge =newBiDirectionalToolBridge(); // 30-second default timeout
406
+
407
+
// Call a tool on the connected client
408
+
CompletableFuture<String> result = bridge.callClientTool(
409
+
resource,
410
+
"getLocation",
411
+
Map.of()
412
+
);
413
+
414
+
// Non-blocking: process the result when it arrives
415
+
result.thenAccept(location ->
416
+
logger.info("Client location: {}", location));
417
+
418
+
// Or block (on a virtual thread):
419
+
String location = result.join();
420
+
```
421
+
422
+
### Client-Side Handler
423
+
424
+
The client must handle incoming tool call requests and respond:
425
+
426
+
```javascript
427
+
atmosphere.onMessage=function(response) {
428
+
var msg =JSON.parse(response.responseBody);
429
+
if (msg.type==='tool-call') {
430
+
var result =executeClientTool(msg.toolName, msg.arguments);
431
+
atmosphere.push(JSON.stringify({
432
+
type:'tool-response',
433
+
id:msg.id,
434
+
result: result
435
+
}));
436
+
}
437
+
};
438
+
```
439
+
440
+
### Use Cases
441
+
442
+
-**Browser-side data collection**: ask the client for geolocation, local storage data, or DOM state
443
+
-**User confirmation**: request approval before executing a sensitive server-side action
444
+
-**Client-side computation**: offload work to the browser (e.g., image processing in a Web Worker)
445
+
446
+
The bridge is thread-safe, uses `ConcurrentHashMap` for pending calls, and supports custom timeouts:
447
+
448
+
```java
449
+
var bridge =newBiDirectionalToolBridge(Duration.ofSeconds(60));
450
+
```
451
+
452
+
Monitor pending calls via `bridge.pendingCount()` or `bridge.pendingCalls()` for observability.
453
+
396
454
## Sample
397
455
398
456
The `samples/spring-boot-mcp-server/` sample contains the complete `DemoMcpServer` shown above, including a chat application that the MCP tools can interact with. Run it with:
0 commit comments