Skip to content

Commit a3f900f

Browse files
xitzhangXiting ZhangCopilot
authored
[VoiceLive] Add support for built-in web search and file search tools (#46723)
* azure-ai-voicelive 1.2.0 GA: Update API version to 2026-04-10 with new features - Web Search & File Search support (ResponseWebSearchCallItem, ResponseFileSearchCallItem) - Avatar enhancements (voice sync, idle/speaking states, video delta, output buffer) - Transcription improvements (TranscriptionPhrase, TranscriptionWord, new models) - New SessionIncludeOption enum - Personal voice model updates (DragonHDOmniLatestNeural, MAI-Voice-1) - Fix ServerEvent.deserialize -> _deserialize for new model_base - Update tests for new models, enums, and breaking changes - Consolidate CHANGELOG for 1.2.0 GA release * Update version to 1.2.0 GA * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Fix azure/_version.py to 1.2.0 GA (was 1.0.0b1) * remove _version * Fix sphinx docstring formatting in ServerEventConversationItemCreated * update cspell * update cespell file * update status * Remove unused conn_self param from _extract_send_event_ids * regenerate from typespec * Update CHANGELOG.md * fix bulletlist in docs * Fix pylint implicit string concatenation in aio patch --------- Co-authored-by: Xiting Zhang <xitzhang@microsoft.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
1 parent 55e6dcf commit a3f900f

29 files changed

Lines changed: 5717 additions & 288 deletions

sdk/voicelive/azure-ai-voicelive/CHANGELOG.md

Lines changed: 54 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,67 @@
11
# Release History
22

3-
## 1.2.0b6 (Unreleased)
3+
## 1.2.0 (Unreleased)
44

55
### Features Added
66

7+
- **Web Search & File Search**: Added support for built-in web search and file search tools:
8+
- New item types: `ResponseWebSearchCallItem`, `ResponseFileSearchCallItem`
9+
- New server events for web/file search lifecycle (`searching`, `in_progress`, `completed`)
10+
- New models: `ActionFind`, `ActionOpenPage`, `ActionSearch`, `ActionSearchSource`, `FileSearchResult`
11+
- New enum values: `ItemType.WEB_SEARCH_CALL`, `ItemType.FILE_SEARCH_CALL`
12+
- New `SessionIncludeOption` enum for controlling what data is included in session responses
13+
- **MCP (Model Context Protocol) Support**: Added comprehensive support for Model Context Protocol integration:
14+
- `MCPServer` tool type for defining MCP server configurations with authorization, headers, and approval requirements
15+
- `MCPTool` model for representing MCP tool definitions with input schemas and annotations
16+
- `MCPApprovalType` enum for controlling approval workflows (`never`, `always`, or tool-specific)
17+
- New item types for MCP approval and call workflows
18+
- New server events for MCP tool listing, call lifecycle, and approval flows
19+
- **Avatar Enhancements**:
20+
- Added `AzureAvatarVoiceSyncVoice` for avatar voice sync configuration
21+
- Added `ServerEventSessionAvatarSwitchToIdle` and `ServerEventSessionAvatarSwitchToSpeaking` events
22+
- Added `ServerEventResponseVideoDelta` for avatar video frame streaming
23+
- Added `ClientEventOutputAudioBufferClear` and `ServerEventOutputAudioBufferCleared` for output buffer management
24+
- Added `AvatarConfigTypes` enum with support for `video-avatar` and `photo-avatar` types
25+
- Added `AvatarOutputProtocol` enum for avatar streaming protocols (`webrtc`, `websocket`)
26+
- Added `Scene` model for controlling avatar zoom, position, rotation, and movement amplitude
27+
- Added `output_audit_audio` field to `AvatarConfig`
28+
- **OpenTelemetry Tracing Support**: Added `VoiceLiveInstrumentor` for opt-in OpenTelemetry-based
29+
tracing of VoiceLive WebSocket connections, following Azure SDK and GenAI semantic conventions.
30+
- Enable via `AZURE_EXPERIMENTAL_ENABLE_GENAI_TRACING=true` environment variable
31+
- Content recording controlled by `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT`
32+
- Comprehensive session-level telemetry: session ID, audio format, first-token latency,
33+
turn count, interruption count, audio bytes sent/received, message size
34+
- Response & function call ID tracking for end-to-end tracing
35+
- Agent v2 telemetry with agent identity and configuration tracking
36+
- MCP telemetry with tool call and approval flow tracking
37+
- **Agent Session Configuration**: Added `AgentSessionConfig` for configuring Azure AI Foundry agents
38+
at connection time with `agent_name`, `project_name`, `agent_version`, `conversation_id`, and more
39+
- **Transcription Improvements**:
40+
- Added `TranscriptionPhrase` and `TranscriptionWord` models for detailed transcription data
41+
- Added `ServerEventResponseAudioTranscriptAnnotationAdded` event
42+
- Added `gpt-4o-transcribe-diarize` and `mai-transcribe-1` transcription model support
43+
- **Interim Response Configuration**: Added `StaticInterimResponseConfig` and `LlmInterimResponseConfig`
44+
for generating interim responses during latency or tool calls
45+
- **Image Content Support**: Added `RequestImageContentPart` for image inputs in conversations
46+
- **Reasoning Effort Control**: Added `reasoning_effort` field with `ReasoningEffort` enum
47+
- **Response Metadata**: Added `metadata` field to `Response` and `ResponseCreateParams`
48+
- **Server Warning Events**: Added `ServerEventWarning` for handling non-fatal warnings
49+
- **Personal Voice Models**: Added `DragonHDOmniLatestNeural` and `MAI-Voice-1` model options
50+
- **Enhanced OpenAI Voices**: Added `marin` and `cedar` voices to `OpenAIVoiceName` enum
51+
- **Enhanced Azure Personal Voice**: Added `custom_lexicon_url`, `prefer_locales`, `locale`, `style`,
52+
`pitch`, `rate`, and `volume` properties
53+
- **Pre-generated Assistant Messages**: Added `pre_generated_assistant_message` in `ResponseCreateParams`
54+
- **Explicit Null Values**: Enhanced `RequestSession` to properly serialize explicitly set `None` values
55+
756
### Breaking Changes
857

9-
### Bugs Fixed
58+
- Removed Foundry Agent Tool classes (`FoundryAgentTool`, `ResponseFoundryAgentCallItem`, etc.) —
59+
use `AgentSessionConfig` with `connect()` instead
1060

1161
### Other Changes
1262

63+
- Updated default API version to `2026-04-10`
64+
1365
## 1.2.0b5 (2026-04-06)
1466

1567
### Features Added
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
2-
"apiVersion": "2026-01-01-preview",
2+
"apiVersion": "2026-04-10",
33
"apiVersions": {
4-
"VoiceLive": "2026-01-01-preview"
4+
"VoiceLive": "2026-04-10"
55
}
66
}

sdk/voicelive/azure-ai-voicelive/apiview-properties.json

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
{
22
"CrossLanguagePackageId": "VoiceLive",
33
"CrossLanguageDefinitionId": {
4+
"azure.ai.voicelive.models.ActionFind": "VoiceLive.ActionFind",
5+
"azure.ai.voicelive.models.ActionOpenPage": "VoiceLive.ActionOpenPage",
6+
"azure.ai.voicelive.models.ActionSearch": "VoiceLive.ActionSearch",
7+
"azure.ai.voicelive.models.ActionSearchSource": "VoiceLive.ActionSearchSource",
48
"azure.ai.voicelive.models.AgentConfig": "VoiceLive.AgentConfig",
59
"azure.ai.voicelive.models.Animation": "VoiceLive.Animation",
610
"azure.ai.voicelive.models.ConversationRequestItem": "VoiceLive.ConversationRequestItem",
@@ -11,6 +15,7 @@
1115
"azure.ai.voicelive.models.AudioNoiseReduction": "VoiceLive.AudioNoiseReduction",
1216
"azure.ai.voicelive.models.AvatarConfig": "VoiceLive.AvatarConfig",
1317
"azure.ai.voicelive.models.AzureVoice": "VoiceLive.AzureVoice",
18+
"azure.ai.voicelive.models.AzureAvatarVoiceSyncVoice": "VoiceLive.AzureAvatarVoiceSyncVoice",
1419
"azure.ai.voicelive.models.AzureCustomVoice": "VoiceLive.AzureCustomVoice",
1520
"azure.ai.voicelive.models.AzurePersonalVoice": "VoiceLive.AzurePersonalVoice",
1621
"azure.ai.voicelive.models.EouDetection": "VoiceLive.EouDetection",
@@ -37,13 +42,15 @@
3742
"azure.ai.voicelive.models.ClientEventInputAudioTurnCancel": "VoiceLive.ClientEventInputAudioTurnCancel",
3843
"azure.ai.voicelive.models.ClientEventInputAudioTurnEnd": "VoiceLive.ClientEventInputAudioTurnEnd",
3944
"azure.ai.voicelive.models.ClientEventInputAudioTurnStart": "VoiceLive.ClientEventInputAudioTurnStart",
45+
"azure.ai.voicelive.models.ClientEventOutputAudioBufferClear": "VoiceLive.ClientEventOutputAudioBufferClear",
4046
"azure.ai.voicelive.models.ClientEventResponseCancel": "VoiceLive.ClientEventResponseCancel",
4147
"azure.ai.voicelive.models.ClientEventResponseCreate": "VoiceLive.ClientEventResponseCreate",
4248
"azure.ai.voicelive.models.ClientEventSessionAvatarConnect": "VoiceLive.ClientEventSessionAvatarConnect",
4349
"azure.ai.voicelive.models.ClientEventSessionUpdate": "VoiceLive.ClientEventSessionUpdate",
4450
"azure.ai.voicelive.models.ContentPart": "VoiceLive.ContentPart",
4551
"azure.ai.voicelive.models.ConversationItemBase": "VoiceLive.ConversationItemBase",
4652
"azure.ai.voicelive.models.ErrorResponse": "VoiceLive.ErrorResponse",
53+
"azure.ai.voicelive.models.FileSearchResult": "VoiceLive.FileSearchResult",
4754
"azure.ai.voicelive.models.FunctionCallItem": "VoiceLive.FunctionCallItem",
4855
"azure.ai.voicelive.models.FunctionCallOutputItem": "VoiceLive.FunctionCallOutputItem",
4956
"azure.ai.voicelive.models.Tool": "VoiceLive.Tool",
@@ -73,6 +80,7 @@
7380
"azure.ai.voicelive.models.ResponseCreateParams": "VoiceLive.ResponseCreateParams",
7481
"azure.ai.voicelive.models.ResponseFailedDetails": "VoiceLive.ResponseFailedDetails",
7582
"azure.ai.voicelive.models.ResponseItem": "VoiceLive.ResponseItem",
83+
"azure.ai.voicelive.models.ResponseFileSearchCallItem": "VoiceLive.ResponseFileSearchCallItem",
7684
"azure.ai.voicelive.models.ResponseFunctionCallItem": "VoiceLive.ResponseFunctionCallItem",
7785
"azure.ai.voicelive.models.ResponseFunctionCallOutputItem": "VoiceLive.ResponseFunctionCallOutputItem",
7886
"azure.ai.voicelive.models.ResponseIncompleteDetails": "VoiceLive.ResponseIncompleteDetails",
@@ -83,6 +91,7 @@
8391
"azure.ai.voicelive.models.ResponseMessageItem": "VoiceLive.ResponseMessageItem",
8492
"azure.ai.voicelive.models.ResponseSession": "VoiceLive.ResponseSession",
8593
"azure.ai.voicelive.models.ResponseTextContentPart": "VoiceLive.ResponseTextContentPart",
94+
"azure.ai.voicelive.models.ResponseWebSearchCallItem": "VoiceLive.ResponseWebSearchCallItem",
8695
"azure.ai.voicelive.models.Scene": "VoiceLive.Scene",
8796
"azure.ai.voicelive.models.ServerEvent": "VoiceLive.ServerEvent",
8897
"azure.ai.voicelive.models.ServerEventConversationItemCreated": "VoiceLive.ServerEventConversationItemCreated",
@@ -101,6 +110,7 @@
101110
"azure.ai.voicelive.models.ServerEventMcpListToolsCompleted": "VoiceLive.ServerEventMcpListToolsCompleted",
102111
"azure.ai.voicelive.models.ServerEventMcpListToolsFailed": "VoiceLive.ServerEventMcpListToolsFailed",
103112
"azure.ai.voicelive.models.ServerEventMcpListToolsInProgress": "VoiceLive.ServerEventMcpListToolsInProgress",
113+
"azure.ai.voicelive.models.ServerEventOutputAudioBufferCleared": "VoiceLive.ServerEventOutputAudioBufferCleared",
104114
"azure.ai.voicelive.models.ServerEventResponseAnimationBlendshapeDelta": "VoiceLive.ServerEventResponseAnimationBlendshapeDelta",
105115
"azure.ai.voicelive.models.ServerEventResponseAnimationBlendshapeDone": "VoiceLive.ServerEventResponseAnimationBlendshapeDone",
106116
"azure.ai.voicelive.models.ServerEventResponseAnimationVisemeDelta": "VoiceLive.ServerEventResponseAnimationVisemeDelta",
@@ -109,12 +119,16 @@
109119
"azure.ai.voicelive.models.ServerEventResponseAudioDone": "VoiceLive.ServerEventResponseAudioDone",
110120
"azure.ai.voicelive.models.ServerEventResponseAudioTimestampDelta": "VoiceLive.ServerEventResponseAudioTimestampDelta",
111121
"azure.ai.voicelive.models.ServerEventResponseAudioTimestampDone": "VoiceLive.ServerEventResponseAudioTimestampDone",
122+
"azure.ai.voicelive.models.ServerEventResponseAudioTranscriptAnnotationAdded": "VoiceLive.ServerEventResponseAudioTranscriptAnnotationAdded",
112123
"azure.ai.voicelive.models.ServerEventResponseAudioTranscriptDelta": "VoiceLive.ServerEventResponseAudioTranscriptDelta",
113124
"azure.ai.voicelive.models.ServerEventResponseAudioTranscriptDone": "VoiceLive.ServerEventResponseAudioTranscriptDone",
114125
"azure.ai.voicelive.models.ServerEventResponseContentPartAdded": "VoiceLive.ServerEventResponseContentPartAdded",
115126
"azure.ai.voicelive.models.ServerEventResponseContentPartDone": "VoiceLive.ServerEventResponseContentPartDone",
116127
"azure.ai.voicelive.models.ServerEventResponseCreated": "VoiceLive.ServerEventResponseCreated",
117128
"azure.ai.voicelive.models.ServerEventResponseDone": "VoiceLive.ServerEventResponseDone",
129+
"azure.ai.voicelive.models.ServerEventResponseFileSearchCallCompleted": "VoiceLive.ServerEventResponseFileSearchCallCompleted",
130+
"azure.ai.voicelive.models.ServerEventResponseFileSearchCallInProgress": "VoiceLive.ServerEventResponseFileSearchCallInProgress",
131+
"azure.ai.voicelive.models.ServerEventResponseFileSearchCallSearching": "VoiceLive.ServerEventResponseFileSearchCallSearching",
118132
"azure.ai.voicelive.models.ServerEventResponseFunctionCallArgumentsDelta": "VoiceLive.ServerEventResponseFunctionCallArgumentsDelta",
119133
"azure.ai.voicelive.models.ServerEventResponseFunctionCallArgumentsDone": "VoiceLive.ServerEventResponseFunctionCallArgumentsDone",
120134
"azure.ai.voicelive.models.ServerEventResponseMcpCallArgumentsDelta": "VoiceLive.ServerEventResponseMcpCallArgumentsDelta",
@@ -126,7 +140,13 @@
126140
"azure.ai.voicelive.models.ServerEventResponseOutputItemDone": "VoiceLive.ServerEventResponseOutputItemDone",
127141
"azure.ai.voicelive.models.ServerEventResponseTextDelta": "VoiceLive.ServerEventResponseTextDelta",
128142
"azure.ai.voicelive.models.ServerEventResponseTextDone": "VoiceLive.ServerEventResponseTextDone",
143+
"azure.ai.voicelive.models.ServerEventResponseVideoDelta": "VoiceLive.ServerEventResponseVideoDelta",
144+
"azure.ai.voicelive.models.ServerEventResponseWebSearchCallCompleted": "VoiceLive.ServerEventResponseWebSearchCallCompleted",
145+
"azure.ai.voicelive.models.ServerEventResponseWebSearchCallInProgress": "VoiceLive.ServerEventResponseWebSearchCallInProgress",
146+
"azure.ai.voicelive.models.ServerEventResponseWebSearchCallSearching": "VoiceLive.ServerEventResponseWebSearchCallSearching",
129147
"azure.ai.voicelive.models.ServerEventSessionAvatarConnecting": "VoiceLive.ServerEventSessionAvatarConnecting",
148+
"azure.ai.voicelive.models.ServerEventSessionAvatarSwitchToIdle": "VoiceLive.ServerEventSessionAvatarSwitchToIdle",
149+
"azure.ai.voicelive.models.ServerEventSessionAvatarSwitchToSpeaking": "VoiceLive.ServerEventSessionAvatarSwitchToSpeaking",
130150
"azure.ai.voicelive.models.ServerEventSessionCreated": "VoiceLive.ServerEventSessionCreated",
131151
"azure.ai.voicelive.models.ServerEventSessionUpdated": "VoiceLive.ServerEventSessionUpdated",
132152
"azure.ai.voicelive.models.ServerEventWarning": "VoiceLive.ServerEventWarning",
@@ -138,6 +158,8 @@
138158
"azure.ai.voicelive.models.TokenUsage": "VoiceLive.TokenUsage",
139159
"azure.ai.voicelive.models.ToolChoiceSelection": "VoiceLive.ToolChoiceObject",
140160
"azure.ai.voicelive.models.ToolChoiceFunctionSelection": "VoiceLive.ToolChoiceFunctionObject",
161+
"azure.ai.voicelive.models.TranscriptionPhrase": "VoiceLive.TranscriptionPhrase",
162+
"azure.ai.voicelive.models.TranscriptionWord": "VoiceLive.TranscriptionWord",
141163
"azure.ai.voicelive.models.UserMessageItem": "VoiceLive.UserMessageItem",
142164
"azure.ai.voicelive.models.VideoCrop": "VoiceLive.VideoCrop",
143165
"azure.ai.voicelive.models.VideoParams": "VoiceLive.VideoParams",
@@ -156,6 +178,8 @@
156178
"azure.ai.voicelive.models.ToolType": "VoiceLive.ToolType",
157179
"azure.ai.voicelive.models.MCPApprovalType": "VoiceLive.MCPApprovalType",
158180
"azure.ai.voicelive.models.ReasoningEffort": "VoiceLive.ReasoningEffort",
181+
"azure.ai.voicelive.models.InterimResponseConfigType": "VoiceLive.InterimResponseConfigType",
182+
"azure.ai.voicelive.models.InterimResponseTrigger": "VoiceLive.InterimResponseTrigger",
159183
"azure.ai.voicelive.models.AnimationOutputType": "VoiceLive.AnimationOutputType",
160184
"azure.ai.voicelive.models.InputAudioFormat": "VoiceLive.InputAudioFormat",
161185
"azure.ai.voicelive.models.TurnDetectionType": "VoiceLive.TurnDetectionType",
@@ -165,11 +189,11 @@
165189
"azure.ai.voicelive.models.AvatarOutputProtocol": "VoiceLive.AvatarOutputProtocol",
166190
"azure.ai.voicelive.models.AudioTimestampType": "VoiceLive.AudioTimestampType",
167191
"azure.ai.voicelive.models.ToolChoiceLiteral": "VoiceLive.ToolChoiceLiteral",
168-
"azure.ai.voicelive.models.InterimResponseConfigType": "VoiceLive.InterimResponseConfigType",
169-
"azure.ai.voicelive.models.InterimResponseTrigger": "VoiceLive.InterimResponseTrigger",
192+
"azure.ai.voicelive.models.SessionIncludeOption": "VoiceLive.SessionIncludeOption",
170193
"azure.ai.voicelive.models.ResponseStatus": "VoiceLive.ResponseStatus",
171194
"azure.ai.voicelive.models.ResponseItemStatus": "VoiceLive.ResponseItemStatus",
172195
"azure.ai.voicelive.models.RequestImageContentPartDetail": "VoiceLive.RequestImageContentPartDetail",
173196
"azure.ai.voicelive.models.ServerEventType": "VoiceLive.ServerEventType"
174-
}
197+
},
198+
"CrossLanguageVersion": "86299c665983"
175199
}
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# coding=utf-8
2+
# --------------------------------------------------------------------------
3+
# Copyright (c) Microsoft Corporation. All rights reserved.
4+
# Licensed under the MIT License. See License.txt in the project root for license information.
5+
# Code generated by Microsoft (R) Python Code Generator.
6+
# Changes may cause incorrect behavior and will be lost if the code is regenerated.
7+
# --------------------------------------------------------------------------
8+
9+
from typing import TYPE_CHECKING, Union
10+
11+
if TYPE_CHECKING:
12+
from .ai.voicelive import models as _models
13+
Voice = Union[str, "_models.OpenAIVoiceName", "_models.OpenAIVoice", "_models.AzureVoice"]
14+
InterimResponseConfig = Union["_models.StaticInterimResponseConfig", "_models.LlmInterimResponseConfig"]
15+
ToolChoice = Union[str, "_models.ToolChoiceLiteral", "_models.ToolChoiceSelection"]
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# --------------------------------------------------------------------------
2+
# Copyright (c) Microsoft Corporation. All rights reserved.
3+
# Licensed under the MIT License. See License.txt in the project root for license information.
4+
# Code generated by Microsoft (R) Python Code Generator.
5+
# Changes may cause incorrect behavior and will be lost if the code is regenerated.
6+
# --------------------------------------------------------------------------

0 commit comments

Comments
 (0)