This sample demonstrates the usage of Azure Voice Live API with avatar, implemented in Python. The Voice Live SDK logic runs entirely on the server side (Python/FastAPI), while the browser handles UI, audio capture/playback, and avatar video rendering.
┌─────────────────────────┐ ┌─────────────────────────┐ ┌──────────────────┐
│ Browser (Frontend) │◄──WS───►│ Python Server (FastAPI)│◄──SDK──►│ Azure Voice Live │
│ │ │ │ │ Service │
│ • Audio capture (mic) │ │ • Session management │ └──────────────────┘
│ • Audio playback │ │ • Voice Live SDK calls │ │
│ • Avatar video │◄──WebRTC (peer-to-peer video)────────────────────────┘
│ • Settings UI │ │ • Event relay │
│ • Chat messages │ │ • Avatar SDP relay │
└─────────────────────────┘ └─────────────────────────┘
Key design: The Python backend acts as a bridge between the browser and Azure Voice Live service. All SDK operations (session creation, configuration, audio forwarding, event processing) happen in Python. The browser only handles:
- Microphone capture → sends PCM16 audio via WebSocket
- Audio playback ← receives PCM16 audio via WebSocket
- WebRTC signaling relay for avatar video (SDP offer/answer exchanged through Python backend)
- Avatar video rendering via direct WebRTC peer connection to Azure
- WebSocket video mode: receives fMP4 video chunks via WebSocket for MediaSource Extensions playback
- Python 3.10+
- An active Azure account. If you don't have an Azure account, you can create an account here.
- A Microsoft Foundry resource created in one of the supported regions. For more information about region availability, see the voice live overview documentation.
The avatar feature is currently available in the following service regions: Southeast Asia, North Europe, West Europe, Sweden Central, South Central US, East US 2, and West US 2.
-
Install dependencies:
pip install -r requirements.txt
-
Configure environment (optional):
You can optionally set environment variables to pre-fill settings:
AZURE_VOICELIVE_ENDPOINT- Your Azure AI Services endpointAZURE_VOICELIVE_API_KEY- Your API keyVOICELIVE_MODEL- Model to use (default:gpt-4o-realtime)VOICELIVE_VOICE- Voice name (default:en-US-AvaMultilingualNeural)
-
Run the server:
python app.py
Or with uvicorn directly:
uvicorn app:app --host 0.0.0.0 --port 3000 --reload
-
Open the browser:
Navigate to http://localhost:3000
To run the sample using Docker, navigate to the folder containing this README.md:
cd ./python/voice-live-avatar/Build the Docker image:
docker build -t voice-live-avatar-python .Start the container:
docker run --rm -p 3000:3000 voice-live-avatar-pythonThen open your web browser and navigate to http://localhost:3000.
-
Step 1: Under the
Connection Settingssection, fillAzure AI Services EndpointandSubscription Key, which can be obtained from theKeys and Endpointtab in your Azure AI Services resource. The endpoint can be the regional endpoint (e.g.,https://<region>.api.cognitive.microsoft.com/) or a custom domain endpoint (e.g.,https://<custom-domain>.cognitiveservices.azure.com/). -
Step 2: Under
Conversation Settingssection, configure the avatar:- Enable Avatar: Toggle the
Avatarswitch to enable the avatar feature. - Avatar Type: By default, a prebuilt avatar is used. Select a character from the
Avatar Characterdropdown list.- To use a photo avatar, toggle the
Use Photo Avatarswitch and select a prebuilt photo avatar character from the dropdown list. - To use a custom avatar, toggle the
Use Custom Avatarswitch and enter the character name in theCharacterfield.
- To use a photo avatar, toggle the
- Avatar Output Mode: Choose between
WebRTC(default, real-time streaming) andWebSocket(streams video data over the WebSocket connection). - Avatar Background Image URL (optional): Enter a URL to set a custom background image for the avatar.
- Scene Settings (photo avatar only): When using a photo avatar, adjust scene parameters such as
Zoom,Position X/Y,Rotation X/Y/Z, andAmplitude. These settings can also be adjusted live after connecting.
- Enable Avatar: Toggle the
-
Step 3: Click
Connectbutton to start the conversation. Once connected, you should see the avatar appearing on the page, and you can clickTurn on microphoneand start talking with the avatar with speech. -
Step 4: On top of the page, you can toggle the
Developer modeswitch to enable developer mode, which will show chat history in text and additional logs useful for debugging.
This sample can be deployed to cloud for global access. The recommended hosting platform is Azure Container Apps. Here are the steps to deploy this sample to Azure Container Apps:
-
Step 1: Push the Docker image to a container registry, such as Azure Container Registry. You can use the following command to push the image to Azure Container Registry:
docker tag voice-live-avatar-python <your-registry-name>.azurecr.io/voice-live-avatar-python:latest docker push <your-registry-name>.azurecr.io/voice-live-avatar-python:latest
-
Step 2: Create an
Azure Container Appand deploy the Docker image built from above steps, following Deploy from an existing container image. -
Step 3: Once the
Azure Container Appis created, you can access the sample by navigating to the URL of theAzure Container Appin your browser.
voice-live-avatar/
├── app.py # FastAPI server, WebSocket endpoint, static file serving
├── voice_handler.py # Voice Live SDK session management, event processing
├── requirements.txt # Python dependencies
├── Dockerfile # Docker container configuration
├── README.md # This file
└── static/
├── index.html # Main UI page
├── style.css # Styles
└── app.js # Client-side JS (audio, WebRTC, WebSocket, UI)
| Message Type | Description |
|---|---|
start_session |
Start Voice Live session with configuration |
stop_session |
Stop the active session |
audio_chunk |
Send microphone audio (base64 PCM16) |
send_text |
Send a text message |
avatar_sdp_offer |
Forward WebRTC SDP offer for avatar |
interrupt |
Cancel current assistant response |
update_scene |
Update photo avatar scene settings (live) |
| Message Type | Description |
|---|---|
session_started |
Session ready |
session_error |
Error starting/during session |
ice_servers |
ICE server config for avatar WebRTC |
avatar_sdp_answer |
Server's SDP answer for avatar WebRTC |
audio_data |
Assistant audio (base64 PCM16, 24kHz) |
video_data |
Avatar video chunk (base64 fMP4, WebSocket mode) |
transcript_delta |
Streaming transcript text |
transcript_done |
Completed transcript |
text_delta |
Streaming text response |
text_done |
Text response completed |
response_created |
New response started |
response_done |
Response completed |
speech_started |
User started speaking (barge-in) |
speech_stopped |
User stopped speaking |
avatar_connecting |
Avatar WebRTC connection in progress |
session_closed |
Session ended |