[VoiceLive] Add built-in OpenTelemetry tracing support#48584
[VoiceLive] Add built-in OpenTelemetry tracing support#48584
Conversation
There was a problem hiding this comment.
Pull request overview
Adds built-in OpenTelemetry-based tracing instrumentation to the azure-ai-voicelive library, emitting spans for WebSocket session lifecycle and message operations, plus samples/tests/docs to demonstrate and validate the behavior.
Changes:
- Introduces
VoiceLiveTracerand wires it intoVoiceLiveSessionAsyncClientforconnect/send/recv/closespans and session counters. - Extends
VoiceLiveClientBuilderwithopenTelemetry(OpenTelemetry)andenableContentRecording(boolean)and plumbs tracing config throughVoiceLiveAsyncClient. - Adds tracing-focused tests, samples, README section, and changelog entry; updates module metadata and dependencies.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/voicelive/azure-ai-voicelive/src/main/java/com/azure/ai/voicelive/VoiceLiveTracer.java | New tracer implementation (span creation, attributes, counters). |
| sdk/voicelive/azure-ai-voicelive/src/main/java/com/azure/ai/voicelive/VoiceLiveSessionAsyncClient.java | Starts/ends session span and traces send/recv/close; adds recv payload handling changes. |
| sdk/voicelive/azure-ai-voicelive/src/main/java/com/azure/ai/voicelive/VoiceLiveClientBuilder.java | Adds OpenTelemetry + content recording knobs and creates an OTel Tracer. |
| sdk/voicelive/azure-ai-voicelive/src/main/java/com/azure/ai/voicelive/VoiceLiveAsyncClient.java | Plumbs tracer/config to session creation. |
| sdk/voicelive/azure-ai-voicelive/src/main/java/module-info.java | Declares module requirements for OTel API/context. |
| sdk/voicelive/azure-ai-voicelive/src/test/java/com/azure/ai/voicelive/VoiceLiveTracerTest.java | New unit tests validating spans/attrs/counters. |
| sdk/voicelive/azure-ai-voicelive/src/test/java/com/azure/ai/voicelive/VoiceLiveClientBuilderTest.java | Adds builder tests for explicit/default OpenTelemetry behavior. |
| sdk/voicelive/azure-ai-voicelive/src/samples/java/com/azure/ai/voicelive/TelemetrySample.java | New runnable tracing sample. |
| sdk/voicelive/azure-ai-voicelive/src/samples/java/com/azure/ai/voicelive/VoiceAssistantSample.java | Adds optional CLI flag to enable tracing in the sample. |
| sdk/voicelive/azure-ai-voicelive/src/samples/java/com/azure/ai/voicelive/ReadmeSamples.java | Adds README snippet methods for tracing usage. |
| sdk/voicelive/azure-ai-voicelive/pom.xml | Adds OpenTelemetry dependencies and module-level enforcer stanza. |
| sdk/voicelive/azure-ai-voicelive/checkstyle-suppressions.xml | Suppresses illegal import + external-dependency-exposed checks for tracing changes. |
| sdk/voicelive/azure-ai-voicelive/README.md | Adds “Telemetry and tracing” section and code snippets. |
| sdk/voicelive/azure-ai-voicelive/CHANGELOG.md | Documents new tracing feature + sample. |
Comments suppressed due to low confidence (2)
sdk/voicelive/azure-ai-voicelive/src/main/java/com/azure/ai/voicelive/VoiceLiveSessionAsyncClient.java:314
- The session "connect" span is never ended on a normal WebSocket completion (remote close). In
connect(...), the lifecycle subscriberonCompletepath logs and cleans up but doesn’t callvoiceLiveTracer.endConnectSpan(null)(or emit aclosespan). This will leave spans open and missing final session-level attributes when the server closes the socket withoutcloseAsync()being called.
Consider ending the connect span (and optionally emitting traceClose()) in the onComplete handler, or in the closeSignal.asMono().doFinally(...) block so it runs for both error and normal completion.
}, () -> {
LOGGER.info("WebSocket handler completed");
connectionCloseSignalRef.compareAndSet(closeSignal, null);
disposeLifecycleSubscription();
});
sdk/voicelive/azure-ai-voicelive/src/main/java/com/azure/ai/voicelive/VoiceLiveSessionAsyncClient.java:415
- The send span is created and ended before the actual WebSocket send happens.
traceSend(event, json)runs insidefromCallable, but the real send (and possible failure) occurs later inflatMap(this::send). IfsendSink.tryEmitNext(...)fails (or any downstream send error occurs), the send span will still appear successful and won’t capture the exception/status.
To align with the PR intent (“span per send operation”), consider moving span lifecycle to wrap the actual send Mono, and set span status/recordException when the send fails.
return Mono.fromCallable(() -> {
try {
String json = serializer.serialize(event, SerializerEncoding.JSON);
// Trace the send operation
if (voiceLiveTracer != null) {
voiceLiveTracer.traceSend(event, json);
}
return BinaryData.fromString(json);
} catch (IOException e) {
throw LOGGER.logExceptionAsError(new RuntimeException("Failed to serialize event", e));
}
}).flatMap(this::send);
API Change CheckAPIView identified API level changes in this PR and created the following API reviews |
API Change CheckAPIView identified API level changes in this PR and created the following API reviews |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…e-sdk-for-java into xitzhang/telemetrylog
VoiceLive SDK Telemetry Design
Overview
The Azure VoiceLive Java SDK has built-in OpenTelemetry tracing that instruments every WebSocket operation in a real-time voice session. The design follows OpenTelemetry GenAI Semantic Conventions and maintains parity with the Python VoiceLive SDK.
When no OpenTelemetry SDK is configured, all tracing is automatically no-op with zero performance overhead (OTel's
Noopimplementations are short-circuited at the API level).Architecture
Class Responsibilities
VoiceLiveClientBuilderOpenTelemetryinstance (defaults toGlobalOpenTelemetry.getOrNoop()). CreatesTracerandMeterviaotel.getTracer(SDK_NAME)/otel.getMeter(SDK_NAME). Passes them to the async client.VoiceLiveAsyncClientTracer,Meter, andenableContentRecording. Passes all three to eachVoiceLiveSessionAsyncClientcreated bystartSession().VoiceLiveSessionAsyncClientVoiceLiveTracerper session. Calls tracer methods at each lifecycle point (connect, send, receive, close).VoiceLiveTracerfinal class). Manages the span hierarchy, session-level counters, content recording, and error tracking. One instance per session.VoiceLiveTelemetryAttributeKeysAttributeKeyinstances for all traced attributes. Allows external consumers to query span data.VoiceLiveEventTypesOTel Initialization Flow
Content Recording Resolution
Content recording controls whether full JSON payloads (including base64-encoded audio) appear in span events. Resolution order:
builder.enableContentRecording(true/false)— passed asBoolean captureContentOverridetoVoiceLiveTracerAZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED=true(read via Azure CoreConfiguration)false(off for privacy)The
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENTvariable is mentioned in docs as the standard GenAI convention name, but the actual env var parsed byVoiceLiveTraceris the Azure-specific one above.Span Hierarchy
Each voice session produces exactly one parent span with N child spans:
Span Naming Convention
All spans follow
{operation} {event_type}:send session.update,send input_audio_buffer.append, etc.recv session.created,recv response.done, etc.connect {model}(e.g.,connect gpt-4o-realtime-preview)closeAll spans are
SpanKind.CLIENT.Span Lifecycle
startConnectSpan()Contextfor parentingtraceSend(event, json)traceRecv(update, rawJson)traceRecvRaw(rawJson)traceClose()endConnectSpan(error)Key design: Send and recv spans are created and immediately ended (they represent instantaneous events, not durations). Only the connect span has a meaningful duration (session lifetime).
Attributes
Common Attributes (All Spans)
Set by
childSpanBuilder()— every send, recv, and close span carries these:gen_ai.system"az.ai.voicelive"gen_ai.operation.name"connect","send","recv","close"gen_ai.provider.name"microsoft.foundry"gen_ai.request.modelstartSession()az.namespace"Microsoft.CognitiveServices"server.addressserver.portwss://)gen_ai.voice.session_idsession.createdis receivedgen_ai.conversation.idresponse.created/response.doneSend/Recv Span Attributes
gen_ai.voice.event_typesession.update)gen_ai.voice.message_sizeRecv-Only Span Attributes
gen_ai.usage.input_tokensresponse.doneresponse.usage.input_tokensgen_ai.usage.output_tokensresponse.doneresponse.usage.output_tokensgen_ai.response.idresponse.created,response.done, output itemsgen_ai.response.finish_reasonsresponse.done["completed"]formatgen_ai.conversation.idresponse.created,response.donegen_ai.voice.item_idgen_ai.voice.call_idgen_ai.voice.output_indexSend-Only Span Attributes
gen_ai.voice.previous_item_idconversation.item.creategen_ai.voice.call_idconversation.item.create(function output)Connect Span Attributes (Session-Level)
Accumulated during the session and flushed when
endConnectSpan()is called:gen_ai.voice.session_idsession.createdresponsegen_ai.voice.input_audio_formatsession.updateorsession.createdgen_ai.voice.output_audio_formatsession.updateorsession.createdgen_ai.voice.input_sample_ratesession.updategen_ai.voice.turn_countresponse.donegen_ai.voice.interruption_countresponse.cancelgen_ai.voice.audio_bytes_sentinput_audio_buffer.appendgen_ai.voice.audio_bytes_receivedresponse.audio.deltagen_ai.voice.first_token_latency_msSystem.nanoTime()delta fromresponse.createto firstresponse.audio.deltagen_ai.conversation.idresponse.doneor client-providedgen_ai.response.idgen_ai.response.finish_reasonsgen_ai.system_instructionssession.updateJSONgen_ai.request.temperaturesession.updateJSONgen_ai.request.max_output_tokenssession.updateJSONgen_ai.request.toolssession.updategen_ai.agent.nameAgentSessionConfiggen_ai.agent.idsession.createdagent responsegen_ai.agent.versionAgentSessionConfiggen_ai.agent.project_nameAgentSessionConfiggen_ai.agent.thread_idsession.createdagent responseerror.typeSpan Events
Each send/recv span contains exactly one span event:
gen_ai.input.messagesgen_ai.system,gen_ai.voice.event_type,gen_ai.event.content*gen_ai.output.messagesgen_ai.system,gen_ai.voice.event_type,gen_ai.event.content*gen_ai.voice.errorgen_ai.system,error.code,error.messagegen_ai.voice.rate_limits.updatedgen_ai.voice.rate_limits(JSON array)*
gen_ai.event.contentis only present when content recording is enabled.Content Recording
When enabled, the
gen_ai.event.contentattribute on span events contains the full JSON payload. For audio events likeinput_audio_buffer.append, this includes the base64-encoded audio data — payloads can be very large.Privacy consideration: Content recording is off by default. When enabled, it captures system instructions, user messages, assistant responses, function call arguments/results, and raw audio data.
Session-Level Counter Tracking
The tracer accumulates counters throughout the session using thread-safe
AtomicLong/AtomicReferencefields. These are flushed as attributes on the connect span at session close.Counter Sources
audioBytesSentinput_audio_buffer.appendsend eventsaudioBytesReceivedresponse.audio.deltarecv eventsgetDelta()byte array lengthturnCountresponse.donerecv eventsinterruptionCountresponse.cancelsend eventsfirstTokenLatencyMsresponse.audio.deltaafterresponse.create(System.nanoTime() - createTimestamp) / 1_000_000Session Config Tracking
When a
session.updateevent is sent, the tracer parses the JSON payload (without a full JSON parser, using lightweight string extraction) to capture:instructions→gen_ai.system_instructionstemperature→gen_ai.request.temperaturemax_response_output_tokens→gen_ai.request.max_output_tokensinput_audio_sampling_rate→gen_ai.voice.input_sample_rateinput_audio_format/output_audio_format→ audio format attributestoolsarray →gen_ai.request.toolsThese are extracted using simple
indexOf-based parsing (no Jackson dependency in the tracer), which is sufficient for the flat/shallow JSON structures ofsession.update.Error Handling
Server Errors (SessionUpdateError)
When a
recv errorspan is created for aSessionUpdateErrorevent:gen_ai.voice.errorspan event is added witherror.codeanderror.messageConnection/Close Errors
When
endConnectSpan(error)is called with a non-nullThrowable:StatusCode.ERRORspan.recordException(error)error.typeis set to the exception's canonical class nameAll SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines