Skip to content

Commit 8ae4b6b

Browse files
author
Xiting Zhang
committed
Telemetry parity with Python SDK, sample and docs alignment
1 parent f01fcc7 commit 8ae4b6b

18 files changed

Lines changed: 1592 additions & 240 deletions

sdk/voicelive/azure-ai-voicelive/CHANGELOG.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,12 @@
77
- Added built-in OpenTelemetry tracing support for voice sessions following GenAI Semantic Conventions:
88
- `VoiceLiveClientBuilder.openTelemetry(OpenTelemetry)` method for providing a custom OpenTelemetry instance
99
- Defaults to `GlobalOpenTelemetry.getOrNoop()` for automatic Java agent detection with zero-cost no-op fallback
10-
- Emits spans for `connect`, `send`, `recv`, and `close` operations with voice-specific attributes
11-
- Session-level counters: turn count, interruption count, audio bytes sent/received, first token latency
12-
- Per-message attributes: token usage, event types, error details
13-
- Content recording controlled via `enableContentRecording(boolean)` or `AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED` environment variable
10+
- Emits spans for `connect`, `send`, `recv`, and `close` operations with Python-aligned VoiceLive telemetry semantics
11+
- Session-level counters: turn count, interruption count, audio bytes sent/received, first token latency, MCP call/list-tools counts
12+
- Tracks response and item hierarchy IDs (`response_id`, `conversation_id`, `item_id`, `call_id`, `previous_item_id`, `output_index`) on send/recv spans
13+
- Captures agent/session config attributes on connect spans (`gen_ai.agent.*`, `gen_ai.system_instructions`, `gen_ai.request.*`)
14+
- Adds OpenTelemetry metrics (`gen_ai.client.operation.duration`, `gen_ai.client.token.usage`) with provider/server/model dimensions
15+
- Content recording controlled via `enableContentRecording(boolean)` or `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` (with legacy `AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED` fallback)
1416
- Added `TelemetrySample.java` demonstrating OpenTelemetry integration patterns
1517

1618
### Breaking Changes

sdk/voicelive/azure-ai-voicelive/README.md

Lines changed: 111 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,8 @@ The following sections provide code snippets for common scenarios:
126126
* [Handle event types](#handle-event-types)
127127
* [Voice configuration](#voice-configuration)
128128
* [Function calling](#function-calling)
129+
* [MCP tool integration](#mcp-tool-integration)
130+
* [Azure AI Foundry agent session](#azure-ai-foundry-agent-session)
129131
* [Telemetry and tracing](#telemetry-and-tracing)
130132
* [Complete voice assistant with microphone](#complete-voice-assistant-with-microphone)
131133

@@ -173,6 +175,19 @@ For easier learning, explore these focused samples in order:
173175
- Span structure and session-level attributes
174176
- Azure Monitor integration example
175177

178+
8. **MCPSample.java** - Model Context Protocol (MCP) tool integration
179+
- Configure MCP servers for external tool access
180+
- Handle MCP call events and tool execution
181+
- Handle MCP approval requests for tool calls
182+
- Process MCP call results and continue conversations
183+
184+
9. **AgentV2Sample.java** - Azure AI Foundry agent session
185+
- Connect directly to an Azure AI Foundry agent via AgentSessionConfig
186+
- Real-time audio capture and playback
187+
- Sequence number based audio for interrupt handling
188+
- Azure noise suppression and echo cancellation
189+
- Conversation logging to file
190+
176191
> **Note:** To run audio samples (AudioPlaybackSample, MicrophoneInputSample, VoiceAssistantSample, FunctionCallingSample):
177192
> ```bash
178193
> mvn exec:java -Dexec.mainClass=com.azure.ai.voicelive.FunctionCallingSample -Dexec.classpathScope=test
@@ -404,6 +419,66 @@ client.startSession("gpt-4o-realtime-preview")
404419
* Results are sent back to continue the conversation
405420
* See `FunctionCallingSample.java` for a complete working example
406421
422+
### MCP tool integration
423+
424+
Use [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) servers to give the AI access to external tools during a voice session. The service calls the MCP server directly — your code only needs to handle approval requests when required:
425+
426+
```java com.azure.ai.voicelive.mcp
427+
// Configure MCP servers as tools
428+
MCPServer mcpServer = new MCPServer("deepwiki", "https://mcp.deepwiki.com/mcp")
429+
.setRequireApproval(BinaryData.fromObject(MCPApprovalType.ALWAYS));
430+
431+
VoiceLiveSessionOptions options = new VoiceLiveSessionOptions()
432+
.setTools(Arrays.asList(mcpServer))
433+
.setInstructions("You have access to external tools via MCP. Use them when asked.");
434+
435+
// Handle MCP approval requests in your event loop
436+
session.receiveEvents().subscribe(event -> {
437+
if (event instanceof SessionUpdateResponseOutputItemDone) {
438+
SessionUpdateResponseOutputItemDone itemDone = (SessionUpdateResponseOutputItemDone) event;
439+
SessionResponseItem item = itemDone.getItem();
440+
441+
if (item instanceof ResponseMCPApprovalRequestItem) {
442+
// Approve the tool call
443+
ResponseMCPApprovalRequestItem approvalRequest = (ResponseMCPApprovalRequestItem) item;
444+
MCPApprovalResponseRequestItem approval = new MCPApprovalResponseRequestItem(
445+
approvalRequest.getId(), true);
446+
ClientEventConversationItemCreate createItem = new ClientEventConversationItemCreate()
447+
.setItem(approval);
448+
session.sendEvent(createItem).subscribe();
449+
session.sendEvent(new ClientEventResponseCreate()).subscribe();
450+
}
451+
}
452+
});
453+
```
454+
455+
> See `MCPSample.java` for a complete working example with MCP server configuration.
456+
457+
### Azure AI Foundry agent session
458+
459+
Connect directly to an Azure AI Foundry agent using `AgentSessionConfig`. The agent becomes the primary responder for the voice session:
460+
461+
```java com.azure.ai.voicelive.agentsession
462+
// Configure agent connection
463+
AgentSessionConfig agentConfig = new AgentSessionConfig("my-agent", "my-project")
464+
.setAgentVersion("1.0");
465+
466+
// Start session with agent config (uses DefaultAzureCredential)
467+
VoiceLiveAsyncClient client = new VoiceLiveClientBuilder()
468+
.endpoint(endpoint)
469+
.credential(new DefaultAzureCredentialBuilder().build())
470+
.buildAsyncClient();
471+
472+
client.startSession(agentConfig)
473+
.flatMap(session -> {
474+
session.receiveEvents().subscribe(event -> handleEvent(event));
475+
return Mono.just(session);
476+
})
477+
.block();
478+
```
479+
480+
> See `AgentV2Sample.java` for a full implementation with audio capture, playback, and conversation logging.
481+
407482
### Telemetry and tracing
408483
409484
The SDK has built-in [OpenTelemetry](https://opentelemetry.io/) tracing that emits spans for every WebSocket operation. When no OpenTelemetry SDK is present, all tracing calls are automatically no-op with zero performance impact.
@@ -448,29 +523,58 @@ connect gpt-4o-realtime-preview ← session lifetime span
448523
449524
**Session-level attributes** (on the connect span):
450525
- `gen_ai.system``az.ai.voicelive`
526+
- `gen_ai.provider.name``microsoft.foundry`
451527
- `gen_ai.request.model` — Model name (e.g., `gpt-4o-realtime-preview`)
452528
- `server.address` — Service endpoint
453529
- `gen_ai.voice.session_id` — Voice session ID
530+
- `gen_ai.conversation.id` — Conversation ID
531+
- `gen_ai.response.id` — Latest response ID
532+
- `gen_ai.response.finish_reasons` — Response status list (e.g. `[`"completed"`]`)
533+
- `gen_ai.agent.name` / `gen_ai.agent.id` / `gen_ai.agent.thread_id` — Agent metadata when using agent sessions
534+
- `gen_ai.system_instructions` / `gen_ai.request.temperature` / `gen_ai.request.max_output_tokens` / `gen_ai.request.tools` — Session config tracked from `session.update`
454535
- `gen_ai.voice.turn_count` — Completed response turns
455536
- `gen_ai.voice.interruption_count` — User interruptions
456537
- `gen_ai.voice.audio_bytes_sent` / `gen_ai.voice.audio_bytes_received` — Audio payload bytes
457538
- `gen_ai.voice.first_token_latency_ms` — Time to first audio response
539+
- `gen_ai.voice.mcp.call_count` / `gen_ai.voice.mcp.list_tools_count` — MCP operation counters
458540
459541
#### Content recording
460542
461543
By default, message payloads are not recorded in spans for privacy. Enable content recording via the builder or environment variable:
462544
463-
```java
464-
// Via builder
465-
new VoiceLiveClientBuilder()
545+
```java com.azure.ai.voicelive.tracing.contentrecording
546+
// Enable content recording to capture full JSON payloads in span events
547+
VoiceLiveAsyncClient client = new VoiceLiveClientBuilder()
548+
.endpoint(endpoint)
549+
.credential(new AzureKeyCredential(apiKey))
550+
.openTelemetry(otel)
466551
.enableContentRecording(true)
467-
// ...
552+
.buildAsyncClient();
468553
469-
// Or via environment variable
470-
// AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED=true
554+
// Or via environment variables (no code changes needed):
555+
// OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true
556+
// (legacy fallback) AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED=true
471557
```
472558
473559
> See `TelemetrySample.java` for complete tracing examples including Azure Monitor integration.
560+
>
561+
> **Run the telemetry sample** to see tracing in action:
562+
> ```bash
563+
> # Tracing only (prints span names and attributes):
564+
> mvn exec:java -Dexec.mainClass="com.azure.ai.voicelive.TelemetrySample" -Dexec.classpathScope=test -Dexec.args="--enable-tracing"
565+
>
566+
> # Tracing + content recording (also prints full JSON payloads):
567+
> mvn exec:java -Dexec.mainClass="com.azure.ai.voicelive.TelemetrySample" -Dexec.classpathScope=test -Dexec.args="--enable-tracing --enable-recording"
568+
> ```
569+
>
570+
> Sample output with `--enable-tracing`:
571+
> ```
572+
> 'send session.update' : {gen_ai.operation.name=send, gen_ai.voice.event_type=session.update, ...}
573+
> 'recv session.created' : {gen_ai.operation.name=recv, gen_ai.voice.event_type=session.created, ...}
574+
> 'recv response.done' : {gen_ai.usage.input_tokens=100, gen_ai.usage.output_tokens=50, ...}
575+
> 'close' : {gen_ai.operation.name=close, ...}
576+
> 'connect gpt-4o-realtime-preview' : {gen_ai.voice.session_id=..., gen_ai.voice.turn_count=1, ...}
577+
> ```
474578
475579
### Complete voice assistant with microphone
476580
@@ -517,7 +621,7 @@ client.startSession("gpt-4o-realtime-preview")
517621
// Subscribe to receive server events
518622
session.receiveEvents()
519623
.subscribe(
520-
event -> handleEvent(event, session),
624+
event -> handleEvent(event),
521625
error -> System.err.println("Error: " + error.getMessage())
522626
);
523627

sdk/voicelive/azure-ai-voicelive/checkstyle-suppressions.xml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
<suppress files="com.azure.ai.voicelive.ReadmeSamples.java" checks="IllegalImportCheck" />
1515
<suppress files="com.azure.ai.voicelive.TelemetrySample.java" checks="IllegalImportCheck" />
1616
<suppress files="com.azure.ai.voicelive.VoiceAssistantSample.java" checks="IllegalImportCheck" />
17+
<suppress files="com.azure.ai.voicelive.telemetry.VoiceLiveTelemetryAttributeKeys.java" checks="IllegalImportCheck" />
1718

1819
<suppress files="com.azure.ai.voicelive.models.AssistantMessageItem.java" checks="io.clientcore.linting.extensions.checkstyle.checks.EnforceFinalFieldsCheck" />
1920
<suppress files="com.azure.ai.voicelive.models.MessageItem.java" checks="io.clientcore.linting.extensions.checkstyle.checks.EnforceFinalFieldsCheck" />

sdk/voicelive/azure-ai-voicelive/src/main/java/com/azure/ai/voicelive/VoiceLiveAsyncClient.java

Lines changed: 26 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
import com.azure.core.http.HttpHeaderName;
1919
import com.azure.core.http.HttpHeaders;
2020
import com.azure.core.util.logging.ClientLogger;
21+
import io.opentelemetry.api.metrics.Meter;
2122
import io.opentelemetry.api.trace.Tracer;
2223

2324
import reactor.core.publisher.Mono;
@@ -36,6 +37,7 @@ public final class VoiceLiveAsyncClient {
3637
private final String apiVersion;
3738
private final HttpHeaders additionalHeaders;
3839
private final Tracer tracer;
40+
private final Meter meter;
3941
private final Boolean enableContentRecording;
4042

4143
/**
@@ -62,12 +64,18 @@ public final class VoiceLiveAsyncClient {
6264
*/
6365
VoiceLiveAsyncClient(URI endpoint, KeyCredential keyCredential, String apiVersion, HttpHeaders additionalHeaders,
6466
Tracer tracer, Boolean enableContentRecording) {
67+
this(endpoint, keyCredential, apiVersion, additionalHeaders, tracer, null, enableContentRecording);
68+
}
69+
70+
VoiceLiveAsyncClient(URI endpoint, KeyCredential keyCredential, String apiVersion, HttpHeaders additionalHeaders,
71+
Tracer tracer, Meter meter, Boolean enableContentRecording) {
6572
this.endpoint = Objects.requireNonNull(endpoint, "'endpoint' cannot be null");
6673
this.keyCredential = Objects.requireNonNull(keyCredential, "'keyCredential' cannot be null");
6774
this.tokenCredential = null;
6875
this.apiVersion = Objects.requireNonNull(apiVersion, "'apiVersion' cannot be null");
6976
this.additionalHeaders = additionalHeaders != null ? additionalHeaders : new HttpHeaders();
7077
this.tracer = tracer;
78+
this.meter = meter;
7179
this.enableContentRecording = enableContentRecording;
7280
}
7381

@@ -96,12 +104,18 @@ public final class VoiceLiveAsyncClient {
96104
*/
97105
VoiceLiveAsyncClient(URI endpoint, TokenCredential tokenCredential, String apiVersion,
98106
HttpHeaders additionalHeaders, Tracer tracer, Boolean enableContentRecording) {
107+
this(endpoint, tokenCredential, apiVersion, additionalHeaders, tracer, null, enableContentRecording);
108+
}
109+
110+
VoiceLiveAsyncClient(URI endpoint, TokenCredential tokenCredential, String apiVersion,
111+
HttpHeaders additionalHeaders, Tracer tracer, Meter meter, Boolean enableContentRecording) {
99112
this.endpoint = Objects.requireNonNull(endpoint, "'endpoint' cannot be null");
100113
this.keyCredential = null;
101114
this.tokenCredential = Objects.requireNonNull(tokenCredential, "'tokenCredential' cannot be null");
102115
this.apiVersion = Objects.requireNonNull(apiVersion, "'apiVersion' cannot be null");
103116
this.additionalHeaders = additionalHeaders != null ? additionalHeaders : new HttpHeaders();
104117
this.tracer = tracer;
118+
this.meter = meter;
105119
this.enableContentRecording = enableContentRecording;
106120
}
107121

@@ -116,7 +130,7 @@ public Mono<VoiceLiveSessionAsyncClient> startSession(String model) {
116130
Objects.requireNonNull(model, "'model' cannot be null");
117131

118132
return Mono.fromCallable(() -> convertToWebSocketEndpoint(endpoint, model)).flatMap(wsEndpoint -> {
119-
VoiceLiveSessionAsyncClient session = createSessionClient(wsEndpoint, model);
133+
VoiceLiveSessionAsyncClient session = createSessionClient(wsEndpoint, model, null);
120134
return session.connect(additionalHeaders).thenReturn(session);
121135
});
122136
}
@@ -129,7 +143,7 @@ public Mono<VoiceLiveSessionAsyncClient> startSession(String model) {
129143
*/
130144
public Mono<VoiceLiveSessionAsyncClient> startSession() {
131145
return Mono.fromCallable(() -> convertToWebSocketEndpoint(endpoint, null)).flatMap(wsEndpoint -> {
132-
VoiceLiveSessionAsyncClient session = createSessionClient(wsEndpoint, null);
146+
VoiceLiveSessionAsyncClient session = createSessionClient(wsEndpoint, null, null);
133147
return session.connect(additionalHeaders).thenReturn(session);
134148
});
135149
}
@@ -149,7 +163,7 @@ public Mono<VoiceLiveSessionAsyncClient> startSession(String model, VoiceLiveReq
149163
return Mono
150164
.fromCallable(() -> convertToWebSocketEndpoint(endpoint, model, requestOptions.getCustomQueryParameters()))
151165
.flatMap(wsEndpoint -> {
152-
VoiceLiveSessionAsyncClient session = createSessionClient(wsEndpoint, model);
166+
VoiceLiveSessionAsyncClient session = createSessionClient(wsEndpoint, model, null);
153167
// Merge additional headers with custom headers from requestOptions
154168
HttpHeaders mergedHeaders = mergeHeaders(additionalHeaders, requestOptions.getCustomHeaders());
155169
return session.connect(mergedHeaders).thenReturn(session);
@@ -170,7 +184,7 @@ public Mono<VoiceLiveSessionAsyncClient> startSession(VoiceLiveRequestOptions re
170184
return Mono
171185
.fromCallable(() -> convertToWebSocketEndpoint(endpoint, null, requestOptions.getCustomQueryParameters()))
172186
.flatMap(wsEndpoint -> {
173-
VoiceLiveSessionAsyncClient session = createSessionClient(wsEndpoint, null);
187+
VoiceLiveSessionAsyncClient session = createSessionClient(wsEndpoint, null, null);
174188
// Merge additional headers with custom headers from requestOptions
175189
HttpHeaders mergedHeaders = mergeHeaders(additionalHeaders, requestOptions.getCustomHeaders());
176190
return session.connect(mergedHeaders).thenReturn(session);
@@ -193,7 +207,7 @@ public Mono<VoiceLiveSessionAsyncClient> startSession(AgentSessionConfig agentCo
193207

194208
return Mono.fromCallable(() -> convertToWebSocketEndpoint(endpoint, null, agentConfig.toQueryParameters()))
195209
.flatMap(wsEndpoint -> {
196-
VoiceLiveSessionAsyncClient session = createSessionClient(wsEndpoint, null);
210+
VoiceLiveSessionAsyncClient session = createSessionClient(wsEndpoint, null, agentConfig);
197211
return session.connect(additionalHeaders).thenReturn(session);
198212
});
199213
}
@@ -223,7 +237,7 @@ public Mono<VoiceLiveSessionAsyncClient> startSession(AgentSessionConfig agentCo
223237
}
224238

225239
return Mono.fromCallable(() -> convertToWebSocketEndpoint(endpoint, null, mergedParams)).flatMap(wsEndpoint -> {
226-
VoiceLiveSessionAsyncClient session = createSessionClient(wsEndpoint, null);
240+
VoiceLiveSessionAsyncClient session = createSessionClient(wsEndpoint, null, agentConfig);
227241
// Merge additional headers with custom headers from requestOptions
228242
HttpHeaders mergedHeaders = mergeHeaders(additionalHeaders, requestOptions.getCustomHeaders());
229243
return session.connect(mergedHeaders).thenReturn(session);
@@ -237,11 +251,14 @@ public Mono<VoiceLiveSessionAsyncClient> startSession(AgentSessionConfig agentCo
237251
* @param model The model name, used for tracing span names.
238252
* @return A new VoiceLiveSessionAsyncClient instance.
239253
*/
240-
private VoiceLiveSessionAsyncClient createSessionClient(URI wsEndpoint, String model) {
254+
private VoiceLiveSessionAsyncClient createSessionClient(URI wsEndpoint, String model,
255+
AgentSessionConfig agentSessionConfig) {
241256
if (keyCredential != null) {
242-
return new VoiceLiveSessionAsyncClient(wsEndpoint, keyCredential, tracer, model, enableContentRecording);
257+
return new VoiceLiveSessionAsyncClient(wsEndpoint, keyCredential, tracer, meter, model,
258+
enableContentRecording, agentSessionConfig);
243259
} else {
244-
return new VoiceLiveSessionAsyncClient(wsEndpoint, tokenCredential, tracer, model, enableContentRecording);
260+
return new VoiceLiveSessionAsyncClient(wsEndpoint, tokenCredential, tracer, meter, model,
261+
enableContentRecording, agentSessionConfig);
245262
}
246263
}
247264

0 commit comments

Comments
 (0)