| sidebar_position | 3 |
|---|---|
| title | Avatar Integration |
| description | Add a digital avatar to your Conversational AI agent. |
You can attach a digital avatar to your voice agent so that users see a visual representation of the AI speaking.
Avatars are currently supported only with the cascading ASR + LLM + TTS pipeline. MLLM sessions handle audio end-to-end and do not support avatars at this time.
| Provider | Class | Required TTS Sample Rate |
|---|---|---|
| LiveAvatar | LiveAvatarAvatar |
24000 Hz |
| HeyGen (deprecated alias) | HeyGenAvatar |
24000 Hz |
| Akool | AkoolAvatar |
16000 Hz |
| Anam | AnamAvatar |
None |
| Generic | GenericAvatar |
None |
| SenseTime (CN) | SenseTimeAvatar |
None |
The agent and avatar join the same RTC channel with separate UIDs. The agent token is scoped to agent_uid; avatar.params.agora_token is scoped to the avatar agora_uid.
When using AgentSession.start(), agora_token is optional for LiveAvatar, HeyGen, Generic, and SenseTime avatars. If omitted, AgentKit generates it with the same ConvoAI token path as the agent, using the avatar UID. You can still pass agora_token explicitly.
Each avatar vendor requires a specific TTS sample rate. The SDK validates this when you add TTS or avatar configuration and again when the session starts. If the TTS sample rate does not match, a ValueError is raised:
ValueError: Avatar requires TTS sample rate of 24000 Hz, but TTS is configured with 16000 Hz. Please update your TTS sample_rate to 24000.
Python raises this as a ValueError — there is no compile-time check as in statically typed languages.
Additionally, if the TTS sample rate is not explicitly available, the SDK issues a warning through the session warning callback:
Warning: LiveAvatar avatar detected but TTS sample_rate is not explicitly set. LiveAvatar requires 24,000 Hz. Please ensure your TTS provider is configured for 24kHz.
HeyGen requires a TTS vendor configured at 24000 Hz:
from agora_agent import Agent, Agora, Area, OpenAI, ElevenLabsTTS, DeepgramSTT, HeyGenAvatar
import time
client = Agora(
area=Area.US,
app_id='your-app-id',
app_certificate='your-app-certificate',
)
agent = (
Agent(client=client)
.with_llm(OpenAI(
api_key='your-openai-key',
base_url='https://api.openai.com/v1/chat/completions',
model='gpt-4o-mini',
system_messages=[{'role': 'system', 'content': 'You are a helpful assistant with a visual avatar.'}],
))
.with_tts(ElevenLabsTTS(
key='your-elevenlabs-key',
model_id='eleven_flash_v2_5',
voice_id='your-voice-id',
base_url='wss://api.elevenlabs.io/v1',
sample_rate=24000, # Must be 24000 for HeyGen
))
.with_stt(DeepgramSTT(api_key='your-deepgram-key', language='en-US'))
.with_avatar(HeyGenAvatar(
api_key='your-heygen-key',
quality='medium',
agora_uid='2',
avatar_id='your-avatar-id',
))
)
session = agent.create_session(channel=f"demo-channel-{int(time.time())}", agent_uid='1', remote_uids=['100'], name=f"conversation-{int(time.time())}")
agent_id = session.start()
session.say('Hello! I am your visual assistant.')
session.stop()GenericAvatar supports custom avatar providers. agora_appid, agora_channel, and agora_token are optional when using AgentSession.start().
from agora_agent import GenericAvatar
agent = agent.with_avatar(GenericAvatar(
api_key='your-avatar-provider-key',
api_base_url='https://avatar-provider.example.com',
avatar_id='avatar-123',
agora_uid='2',
))SenseTimeAvatar is available for Area.CN sessions. Provide agora_uid, app_key, and sceneList when constructing the avatar. agora_token is optional and is generated at session start when omitted, like LiveAvatar and Generic avatars.
from agora_agent import Agora, Area, CNAgent, MiniMaxCNTTS, SenseTimeAvatar, TencentSTT
client = Agora(
area=Area.CN,
app_id="your-app-id",
app_certificate="your-app-certificate",
)
agent = (
CNAgent(client=client)
.with_stt(TencentSTT(key="...", app_id="...", secret="...", engine_model_type="16k_zh", voice_id="..."))
.with_tts(MiniMaxCNTTS(model="speech_2_6_turbo", voice_id="your-voice-id"))
.with_avatar(SenseTimeAvatar(
agora_uid="2",
app_key="your-sensetime-app-key",
sceneList=[{"digital_role": {"face_feature_id": "role-1"}}],
appId="your-sensetime-app-id",
))
)Akool requires a TTS vendor configured at 16000 Hz:
from agora_agent import Agent, Agora, Area, ElevenLabsTTS, AkoolAvatar, DeepgramSTT, OpenAI
client = Agora(area=Area.US, app_id='your-app-id', app_certificate='your-app-certificate')
agent = (
Agent(client=client)
.with_llm(OpenAI(
api_key='your-openai-key',
base_url='https://api.openai.com/v1/chat/completions',
model='gpt-4o-mini',
system_messages=[{'role': 'system', 'content': 'You are a helpful assistant.'}],
))
.with_tts(ElevenLabsTTS(
key='your-elevenlabs-key',
model_id='eleven_flash_v2_5',
voice_id='your-voice-id',
base_url='wss://api.elevenlabs.io/v1',
sample_rate=16000, # Must be 16000 for Akool
))
.with_stt(DeepgramSTT(api_key='your-deepgram-key', language='en-US'))
.with_avatar(AkoolAvatar(
api_key='your-akool-key',
agora_uid='2',
avatar_id='your-avatar-id',
))
)This example shows what happens when the TTS sample rate does not match the avatar's requirement:
from agora_agent import Agent, Agora, Area, ElevenLabsTTS, HeyGenAvatar, DeepgramSTT, OpenAI
client = Agora(area=Area.US, app_id='your-app-id', app_certificate='your-app-certificate')
# This raises ValueError at build time
agent = (
Agent(client=client)
.with_llm(OpenAI(
api_key='your-openai-key',
base_url='https://api.openai.com/v1/chat/completions',
model='gpt-4o-mini',
system_messages=[{'role': 'system', 'content': 'You are a helpful assistant.'}],
))
.with_tts(ElevenLabsTTS(
key='your-elevenlabs-key',
model_id='eleven_flash_v2_5',
voice_id='your-voice-id',
base_url='wss://api.elevenlabs.io/v1',
sample_rate=16000, # 16 kHz
))
.with_stt(DeepgramSTT(api_key='your-deepgram-key', language='en-US'))
.with_avatar(HeyGenAvatar( # Requires 24 kHz — mismatch!
api_key='your-heygen-key',
quality='medium',
agora_uid='2',
))
)
# ValueError: Avatar requires TTS sample rate of 24000 Hz, but TTS is configured
# with 16000 Hz. Please update your TTS sample_rate to 24000.Fix: Change sample_rate=16000 to sample_rate=24000 on the TTS vendor.
The with_avatar() call validates against the currently configured TTS. Always call with_tts() before with_avatar():
from agora_agent import Agent, Agora, Area, ElevenLabsTTS, HeyGenAvatar
client = Agora(area=Area.US, app_id='your-app-id', app_certificate='your-app-certificate')
# Correct order: TTS first, then avatar
agent = (
Agent(client=client)
.with_tts(ElevenLabsTTS(key='your-elevenlabs-key', model_id='eleven_flash_v2_5', voice_id='your-voice-id', base_url='wss://api.elevenlabs.io/v1', sample_rate=24000))
.with_avatar(HeyGenAvatar(api_key='your-heygen-key', quality='medium', agora_uid='2'))
)If you call with_avatar() before with_tts(), the sample rate check is deferred to session.start(), which validates the configuration before making the API call.
| Parameter | Type | Required | Description |
|---|---|---|---|
api_key |
str |
Yes | HeyGen API key |
quality |
str |
Yes | Avatar quality: low, medium, or high |
agora_uid |
str |
Yes | Agora UID for the avatar video stream |
agora_token |
str |
No | Avatar token, generated at session start when omitted |
avatar_id |
str |
No | Avatar ID |
disable_idle_timeout |
bool |
No | Disable idle timeout |
activity_idle_timeout |
int |
No | Idle timeout in seconds |
| Parameter | Type | Required | Description |
|---|---|---|---|
api_key |
str |
Yes | Akool API key |
avatar_id |
str |
No | Avatar ID |
| Parameter | Type | Required | Description |
|---|---|---|---|
agora_token |
str |
No | Avatar publisher RTC token; generated at session start when omitted |
agora_uid |
str |
Yes | Avatar publisher RTC UID |
app_key |
str |
Yes | SenseTime application key |
sceneList |
List[Dict[str, Any]] |
Yes | SenseTime scene configuration list |
appId |
str |
No | SenseTime application ID |
enable |
bool |
No | Whether to enable the avatar |
additional_params |
Dict[str, Any] |
No | Additional SenseTime avatar parameters |