Skip to content

[VoiceLive] Add built-in OpenTelemetry tracing support#48584

Open
xitzhang wants to merge 9 commits intomainfrom
xitzhang/telemetrylog
Open

[VoiceLive] Add built-in OpenTelemetry tracing support#48584
xitzhang wants to merge 9 commits intomainfrom
xitzhang/telemetrylog

Conversation

@xitzhang
Copy link
Copy Markdown
Member

@xitzhang xitzhang commented Mar 25, 2026

VoiceLive SDK Telemetry Design

Overview

The Azure VoiceLive Java SDK has built-in OpenTelemetry tracing that instruments every WebSocket operation in a real-time voice session. The design follows OpenTelemetry GenAI Semantic Conventions and maintains parity with the Python VoiceLive SDK.

When no OpenTelemetry SDK is configured, all tracing is automatically no-op with zero performance overhead (OTel's Noop implementations are short-circuited at the API level).

Architecture

┌──────────────────────────────┐
│  VoiceLiveClientBuilder      │
│  - openTelemetry(otel)       │  User provides OTel instance
│  - enableContentRecording()  │  (or defaults to GlobalOpenTelemetry)
│  - buildAsyncClient()        │
│    ↓ creates Tracer + Meter  │
└──────────────┬───────────────┘
               │
               ▼
┌──────────────────────────────┐
│  VoiceLiveAsyncClient        │
│  - Tracer tracer             │  Stores OTel primitives
│  - Meter meter               │
│  - Boolean contentRecording  │
│  - startSession() ──────────────► creates VoiceLiveSessionAsyncClient
└──────────────────────────────┘         with tracer/meter/contentRecording
                                         │
                                         ▼
                               ┌──────────────────────────────────┐
                               │  VoiceLiveSessionAsyncClient     │
                               │  - VoiceLiveTracer voiceLiveTracer│
                               │                                  │
                               │  connect()  → startConnectSpan() │
                               │  sendEvent()→ traceSend()        │
                               │  getEvents()→ traceRecv/RecvRaw()│
                               │  closeAsync()→traceClose()       │
                               │              endConnectSpan()     │
                               └──────────────┬───────────────────┘
                                              │
                                              ▼
                               ┌──────────────────────────────────┐
                               │  VoiceLiveTracer                 │
                               │  (core tracing engine)           │
                               │                                  │
                               │  - Connect span (session lifetime)│
                               │  - Child spans (send/recv/close) │
                               │  - Session-level counters        │
                               │  - Content recording             │
                               │  - Error tracking                │
                               └──────────────────────────────────┘

Class Responsibilities

Class Role
VoiceLiveClientBuilder Accepts OpenTelemetry instance (defaults to GlobalOpenTelemetry.getOrNoop()). Creates Tracer and Meter via otel.getTracer(SDK_NAME) / otel.getMeter(SDK_NAME). Passes them to the async client.
VoiceLiveAsyncClient Stores Tracer, Meter, and enableContentRecording. Passes all three to each VoiceLiveSessionAsyncClient created by startSession().
VoiceLiveSessionAsyncClient Creates one VoiceLiveTracer per session. Calls tracer methods at each lifecycle point (connect, send, receive, close).
VoiceLiveTracer Core tracing engine. Package-private (final class). Manages the span hierarchy, session-level counters, content recording, and error tracking. One instance per session.
VoiceLiveTelemetryAttributeKeys Public constants — AttributeKey instances for all traced attributes. Allows external consumers to query span data.
VoiceLiveEventTypes Public constants — event type strings and classification sets (item-bearing events, delta-skip events, MCP events).

OTel Initialization Flow

User code                              SDK internals
─────────                              ──────────────
builder.openTelemetry(otel)     →  stores OpenTelemetry instance
  (or omit — uses GlobalOTel)

builder.enableContentRecording(true)  →  stores Boolean override

builder.buildAsyncClient()      →  otel = openTelemetry ?? GlobalOpenTelemetry.getOrNoop()
                                   tracer = otel.getTracer("azure-ai-voicelive")
                                   meter  = otel.getMeter("azure-ai-voicelive")
                                   return new VoiceLiveAsyncClient(tracer, meter, contentRecording)

client.startSession(model)      →  new VoiceLiveSessionAsyncClient(tracer, meter, model, contentRecording)
                                     └→ new VoiceLiveTracer(tracer, meter, endpoint, model, contentRecording)

Content Recording Resolution

Content recording controls whether full JSON payloads (including base64-encoded audio) appear in span events. Resolution order:

  1. Builder override: builder.enableContentRecording(true/false) — passed as Boolean captureContentOverride to VoiceLiveTracer
  2. Environment variable: AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED=true (read via Azure Core Configuration)
  3. Default: false (off for privacy)

The OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT variable is mentioned in docs as the standard GenAI convention name, but the actual env var parsed by VoiceLiveTracer is the Azure-specific one above.

Span Hierarchy

Each voice session produces exactly one parent span with N child spans:

connect {model}                          ← CLIENT span, session lifetime
├── send session.update                  ← CLIENT span, immediate
├── send input_audio_buffer.append       ← one per audio chunk
├── send response.create                 ← explicit response trigger
├── send conversation.item.create        ← function call output
├── recv session.created                 ← CLIENT span, immediate
├── recv session.updated
├── recv response.created
├── recv response.audio.delta            ← one per audio chunk (high volume)
├── recv response.function_call_arguments.delta
├── recv response.function_call_arguments.done
├── recv response.output_item.added
├── recv response.output_item.done
├── recv conversation.item.created
├── recv response.done                   ← includes token usage
├── recv error                           ← server error event
├── recv rate_limits.updated             ← raw event (no typed model)
└── close                                ← explicit close

Span Naming Convention

All spans follow {operation} {event_type}:

  • Send spans: send session.update, send input_audio_buffer.append, etc.
  • Recv spans: recv session.created, recv response.done, etc.
  • Connect span: connect {model} (e.g., connect gpt-4o-realtime-preview)
  • Close span: close

All spans are SpanKind.CLIENT.

Span Lifecycle

Event Tracer Method Span Behavior
WebSocket connected startConnectSpan() Starts parent span, stores Context for parenting
Client sends event traceSend(event, json) Creates child span, records attributes + event, ends immediately
Server event parsed traceRecv(update, rawJson) Creates child span, records attributes + event, ends immediately
Server event unparsed traceRecvRaw(rawJson) Creates child span for raw/unrecognized events
Session close traceClose() Creates close child span
Session close (cont.) endConnectSpan(error) Flushes counters → connect span attributes, ends parent span

Key design: Send and recv spans are created and immediately ended (they represent instantaneous events, not durations). Only the connect span has a meaningful duration (session lifetime).

Attributes

Common Attributes (All Spans)

Set by childSpanBuilder() — every send, recv, and close span carries these:

Attribute Type Value/Source
gen_ai.system string "az.ai.voicelive"
gen_ai.operation.name string "connect", "send", "recv", "close"
gen_ai.provider.name string "microsoft.foundry"
gen_ai.request.model string Model name from startSession()
az.namespace string "Microsoft.CognitiveServices"
server.address string WebSocket endpoint hostname
server.port long WebSocket endpoint port (443 for wss://)
gen_ai.voice.session_id string Set after session.created is received
gen_ai.conversation.id string Set after response.created/response.done

Send/Recv Span Attributes

Attribute Type Applies To Source
gen_ai.voice.event_type string send, recv Event type string (e.g., session.update)
gen_ai.voice.message_size long send, recv JSON payload length in characters

Recv-Only Span Attributes

Attribute Type Applies To Source
gen_ai.usage.input_tokens long response.done response.usage.input_tokens
gen_ai.usage.output_tokens long response.done response.usage.output_tokens
gen_ai.response.id string response.created, response.done, output items Response ID
gen_ai.response.finish_reasons string response.done ["completed"] format
gen_ai.conversation.id string response.created, response.done Conversation ID
gen_ai.voice.item_id string item events Conversation item ID
gen_ai.voice.call_id string function call events Function call ID
gen_ai.voice.output_index long output item events Output item index

Send-Only Span Attributes

Attribute Type Applies To Source
gen_ai.voice.previous_item_id string conversation.item.create Previous item ID
gen_ai.voice.call_id string conversation.item.create (function output) Function call ID

Connect Span Attributes (Session-Level)

Accumulated during the session and flushed when endConnectSpan() is called:

Attribute Type Source
gen_ai.voice.session_id string From session.created response
gen_ai.voice.input_audio_format string From session.update or session.created
gen_ai.voice.output_audio_format string From session.update or session.created
gen_ai.voice.input_sample_rate long From session.update
gen_ai.voice.turn_count long Incremented on each response.done
gen_ai.voice.interruption_count long Incremented on each response.cancel
gen_ai.voice.audio_bytes_sent long Sum of decoded audio bytes from input_audio_buffer.append
gen_ai.voice.audio_bytes_received long Sum of decoded audio bytes from response.audio.delta
gen_ai.voice.first_token_latency_ms double System.nanoTime() delta from response.create to first response.audio.delta
gen_ai.conversation.id string From response.done or client-provided
gen_ai.response.id string Last response ID
gen_ai.response.finish_reasons string Last response finish reasons
gen_ai.system_instructions string From session.update JSON
gen_ai.request.temperature string From session.update JSON
gen_ai.request.max_output_tokens string From session.update JSON
gen_ai.request.tools string Tools array JSON from session.update
gen_ai.agent.name string From AgentSessionConfig
gen_ai.agent.id string From session.created agent response
gen_ai.agent.version string From AgentSessionConfig
gen_ai.agent.project_name string From AgentSessionConfig
gen_ai.agent.thread_id string From session.created agent response
error.type string Exception class name (on error)

Span Events

Each send/recv span contains exactly one span event:

Span Type Event Name Attributes
Send gen_ai.input.messages gen_ai.system, gen_ai.voice.event_type, gen_ai.event.content*
Recv gen_ai.output.messages gen_ai.system, gen_ai.voice.event_type, gen_ai.event.content*
Recv (error) gen_ai.voice.error gen_ai.system, error.code, error.message
Recv (rate limits) gen_ai.voice.rate_limits.updated gen_ai.voice.rate_limits (JSON array)

*gen_ai.event.content is only present when content recording is enabled.

Content Recording

When enabled, the gen_ai.event.content attribute on span events contains the full JSON payload. For audio events like input_audio_buffer.append, this includes the base64-encoded audio data — payloads can be very large.

Privacy consideration: Content recording is off by default. When enabled, it captures system instructions, user messages, assistant responses, function call arguments/results, and raw audio data.

Session-Level Counter Tracking

The tracer accumulates counters throughout the session using thread-safe AtomicLong/AtomicReference fields. These are flushed as attributes on the connect span at session close.

Counter Sources

Counter Incremented By Mechanism
audioBytesSent input_audio_buffer.append send events Base64-decodes the audio field, counts decoded byte length
audioBytesReceived response.audio.delta recv events Counts getDelta() byte array length
turnCount response.done recv events Increments by 1 per completed response
interruptionCount response.cancel send events Increments by 1 per user interruption
firstTokenLatencyMs First response.audio.delta after response.create (System.nanoTime() - createTimestamp) / 1_000_000

Session Config Tracking

When a session.update event is sent, the tracer parses the JSON payload (without a full JSON parser, using lightweight string extraction) to capture:

  • instructionsgen_ai.system_instructions
  • temperaturegen_ai.request.temperature
  • max_response_output_tokensgen_ai.request.max_output_tokens
  • input_audio_sampling_rategen_ai.voice.input_sample_rate
  • input_audio_format / output_audio_format → audio format attributes
  • tools array → gen_ai.request.tools

These are extracted using simple indexOf-based parsing (no Jackson dependency in the tracer), which is sufficient for the flat/shallow JSON structures of session.update.

Error Handling

Server Errors (SessionUpdateError)

When a recv error span is created for a SessionUpdateError event:

  • A gen_ai.voice.error span event is added with error.code and error.message
  • The span itself is not marked as error status (per design — the session may continue)

Connection/Close Errors

When endConnectSpan(error) is called with a non-null Throwable:

  • The connect span is set to StatusCode.ERROR
  • The exception is recorded via span.recordException(error)
  • error.type is set to the exception's canonical class name

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

Copilot AI review requested due to automatic review settings March 25, 2026 22:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds built-in OpenTelemetry-based tracing instrumentation to the azure-ai-voicelive library, emitting spans for WebSocket session lifecycle and message operations, plus samples/tests/docs to demonstrate and validate the behavior.

Changes:

  • Introduces VoiceLiveTracer and wires it into VoiceLiveSessionAsyncClient for connect/send/recv/close spans and session counters.
  • Extends VoiceLiveClientBuilder with openTelemetry(OpenTelemetry) and enableContentRecording(boolean) and plumbs tracing config through VoiceLiveAsyncClient.
  • Adds tracing-focused tests, samples, README section, and changelog entry; updates module metadata and dependencies.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
sdk/voicelive/azure-ai-voicelive/src/main/java/com/azure/ai/voicelive/VoiceLiveTracer.java New tracer implementation (span creation, attributes, counters).
sdk/voicelive/azure-ai-voicelive/src/main/java/com/azure/ai/voicelive/VoiceLiveSessionAsyncClient.java Starts/ends session span and traces send/recv/close; adds recv payload handling changes.
sdk/voicelive/azure-ai-voicelive/src/main/java/com/azure/ai/voicelive/VoiceLiveClientBuilder.java Adds OpenTelemetry + content recording knobs and creates an OTel Tracer.
sdk/voicelive/azure-ai-voicelive/src/main/java/com/azure/ai/voicelive/VoiceLiveAsyncClient.java Plumbs tracer/config to session creation.
sdk/voicelive/azure-ai-voicelive/src/main/java/module-info.java Declares module requirements for OTel API/context.
sdk/voicelive/azure-ai-voicelive/src/test/java/com/azure/ai/voicelive/VoiceLiveTracerTest.java New unit tests validating spans/attrs/counters.
sdk/voicelive/azure-ai-voicelive/src/test/java/com/azure/ai/voicelive/VoiceLiveClientBuilderTest.java Adds builder tests for explicit/default OpenTelemetry behavior.
sdk/voicelive/azure-ai-voicelive/src/samples/java/com/azure/ai/voicelive/TelemetrySample.java New runnable tracing sample.
sdk/voicelive/azure-ai-voicelive/src/samples/java/com/azure/ai/voicelive/VoiceAssistantSample.java Adds optional CLI flag to enable tracing in the sample.
sdk/voicelive/azure-ai-voicelive/src/samples/java/com/azure/ai/voicelive/ReadmeSamples.java Adds README snippet methods for tracing usage.
sdk/voicelive/azure-ai-voicelive/pom.xml Adds OpenTelemetry dependencies and module-level enforcer stanza.
sdk/voicelive/azure-ai-voicelive/checkstyle-suppressions.xml Suppresses illegal import + external-dependency-exposed checks for tracing changes.
sdk/voicelive/azure-ai-voicelive/README.md Adds “Telemetry and tracing” section and code snippets.
sdk/voicelive/azure-ai-voicelive/CHANGELOG.md Documents new tracing feature + sample.
Comments suppressed due to low confidence (2)

sdk/voicelive/azure-ai-voicelive/src/main/java/com/azure/ai/voicelive/VoiceLiveSessionAsyncClient.java:314

  • The session "connect" span is never ended on a normal WebSocket completion (remote close). In connect(...), the lifecycle subscriber onComplete path logs and cleans up but doesn’t call voiceLiveTracer.endConnectSpan(null) (or emit a close span). This will leave spans open and missing final session-level attributes when the server closes the socket without closeAsync() being called.

Consider ending the connect span (and optionally emitting traceClose()) in the onComplete handler, or in the closeSignal.asMono().doFinally(...) block so it runs for both error and normal completion.

        }, () -> {
            LOGGER.info("WebSocket handler completed");
            connectionCloseSignalRef.compareAndSet(closeSignal, null);
            disposeLifecycleSubscription();
        });

sdk/voicelive/azure-ai-voicelive/src/main/java/com/azure/ai/voicelive/VoiceLiveSessionAsyncClient.java:415

  • The send span is created and ended before the actual WebSocket send happens. traceSend(event, json) runs inside fromCallable, but the real send (and possible failure) occurs later in flatMap(this::send). If sendSink.tryEmitNext(...) fails (or any downstream send error occurs), the send span will still appear successful and won’t capture the exception/status.

To align with the PR intent (“span per send operation”), consider moving span lifecycle to wrap the actual send Mono, and set span status/recordException when the send fails.

        return Mono.fromCallable(() -> {
            try {
                String json = serializer.serialize(event, SerializerEncoding.JSON);

                // Trace the send operation
                if (voiceLiveTracer != null) {
                    voiceLiveTracer.traceSend(event, json);
                }

                return BinaryData.fromString(json);
            } catch (IOException e) {
                throw LOGGER.logExceptionAsError(new RuntimeException("Failed to serialize event", e));
            }
        }).flatMap(this::send);

Comment thread sdk/voicelive/azure-ai-voicelive/checkstyle-suppressions.xml
Comment thread sdk/voicelive/azure-ai-voicelive/README.md Outdated
Comment thread sdk/voicelive/azure-ai-voicelive/pom.xml
@github-actions
Copy link
Copy Markdown
Contributor

API Change Check

APIView identified API level changes in this PR and created the following API reviews

com.azure:azure-ai-voicelive

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 25, 2026

API Change Check

APIView identified API level changes in this PR and created the following API reviews

com.azure:azure-ai-voicelive

@xitzhang xitzhang requested a review from a team as a code owner April 4, 2026 06:30
@xitzhang xitzhang changed the title [draft][VoiceLive] Add built-in OpenTelemetry tracing support [VoiceLive] Add built-in OpenTelemetry tracing support Apr 4, 2026
@xitzhang xitzhang enabled auto-merge (squash) April 6, 2026 18:04
@xitzhang xitzhang disabled auto-merge April 6, 2026 18:04
@samvaity samvaity requested a review from trask April 10, 2026 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants