Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .fern/metadata.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"cliVersion": "3.76.0",
"cliVersion": "4.4.4",
"generatorName": "fernapi/fern-python-sdk",
"generatorVersion": "4.37.0",
"generatorConfig": {
Expand Down
10 changes: 5 additions & 5 deletions .fernignore
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Specify files that shouldn't be modified by Fern
src/agoraio/pool_client.py
src/agoraio/__init__.py
src/agoraio/core/domain.py
src/agora_agent/pool_client.py
src/agora_agent/__init__.py
src/agora_agent/core/domain.py
changelog.md

# Wrapper layer - custom code that should not be overwritten
src/agoraio/wrapper/
# Agentkit layer - custom code that should not be overwritten
src/agora_agent/agentkit/

# Documentation - managed manually, not generated by Fern
docs/
62 changes: 62 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Agora Conversational AI Python SDK

[![fern shield](https://img.shields.io/badge/%F0%9F%8C%BF-Built%20with%20Fern-brightgreen)](https://buildwithfern.com?utm_source=github&utm_medium=github&utm_campaign=readme&utm_source=https%3A%2F%2Fgithub.com%2FAgoraIO-Conversational-AI%2Fagora-agent-python-sdk)
[![pypi](https://img.shields.io/pypi/v/agora-agent-sdk)](https://pypi.python.org/pypi/agora-agent-sdk)

The Agora Conversational AI SDK provides convenient access to the Agora Conversational AI APIs, enabling you to build voice-powered AI agents with support for both **cascading flows** (ASR → LLM → TTS) and **multimodal flows** (MLLM) for real-time audio processing.

## Installation

```sh
pip install agora-agent-sdk
```

## Quick Start

Use the **builder pattern** with `Agent` and `AgentSession`:

```python
from agora_agent import Agora, Area
from agora_agent.agentkit import Agent
from agora_agent.agentkit.vendors import OpenAI, ElevenLabsTTS, DeepgramSTT

client = Agora(
area=Area.US,
app_id="your-app-id",
app_certificate="your-app-certificate",
)

agent = (
Agent(name="support-assistant", instructions="You are a helpful voice assistant.")
.with_llm(OpenAI(api_key="your-openai-key", model="gpt-4o-mini"))
.with_tts(ElevenLabsTTS(key="your-elevenlabs-key", model_id="eleven_flash_v2_5", voice_id="your-voice-id", sample_rate=24000))
.with_stt(DeepgramSTT(api_key="your-deepgram-key", language="en-US"))
)

session = agent.create_session(client, channel="support-room-123", agent_uid="1", remote_uids=["100"])
agent_id = session.start()
session.say("Hello! How can I help you today?")
session.stop()
```

For async usage, use `AsyncAgora` and `await session.start()`, `await session.say()`, etc. See [Quick Start](docs/getting-started/quick-start.md).

## Documentation

| Topic | Link |
|-------|------|
| **API docs** | [docs.agora.io](https://docs.agora.io/en/conversational-ai/overview) |
| **Installation** | [docs/getting-started/installation.md](docs/getting-started/installation.md) |
| **Authentication** | [docs/getting-started/authentication.md](docs/getting-started/authentication.md) |
| **Quick Start** | [docs/getting-started/quick-start.md](docs/getting-started/quick-start.md) |
| **Cascading flow** | [docs/guides/cascading-flow.md](docs/guides/cascading-flow.md) |
| **MLLM flow** | [docs/guides/mllm-flow.md](docs/guides/mllm-flow.md) |
| **Low-level API** | [docs/guides/low-level-api.md](docs/guides/low-level-api.md) |
| **Error handling** | [docs/guides/error-handling.md](docs/guides/error-handling.md) |
| **Pagination** | [docs/guides/pagination.md](docs/guides/pagination.md) |
| **Advanced** | [docs/guides/advanced.md](docs/guides/advanced.md) |
| **API reference** | [reference.md](reference.md) |

## Contributing

This library is generated programmatically. Contributions to the README and docs are welcome. For code changes, open an issue first to discuss.
6 changes: 3 additions & 3 deletions changelog.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
## 1.0.0 - 2026-02-20
* refactor: rename SDK package from agoraio-sdk to agora-agent-sdk
* Major refactoring to rename the Python SDK package from "agoraio-sdk" to "agora-agent-sdk" for better clarity and branding alignment. This change improves package naming consistency and removes outdated documentation.
* refactor: rename SDK package from agora_agent-sdk to agora-agent-sdk
* Major refactoring to rename the Python SDK package from "agora_agent-sdk" to "agora-agent-sdk" for better clarity and branding alignment. This change improves package naming consistency and removes outdated documentation.
* Key changes:
* Rename package from "agoraio-sdk" to "agora-agent-sdk" in pyproject.toml
* Rename package from "agora_agent-sdk" to "agora-agent-sdk" in pyproject.toml
* Update SDK name header in client wrapper to use new package name
* Update version metadata import to use new package name
* Make TTS and LLM properties optional in StartAgentsRequestProperties
Expand Down
147 changes: 147 additions & 0 deletions docs/concepts/agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
---
sidebar_position: 2
title: Agent
description: The Agent builder — configure an AI agent with LLM, TTS, STT, and more.
---

# Agent

The `Agent` class is a fluent builder for configuring AI agent properties. It collects vendor settings (LLM, TTS, STT, MLLM, avatar) and session parameters, then produces a fully configured `AgentSession` when you call `create_session()`.

## Constructor

```python
from agora_agent.agentkit import Agent

agent = Agent(
name='support-assistant',
instructions='You are a helpful voice assistant.',
greeting='Hello! How can I help you?',
failure_message='Sorry, something went wrong.',
max_history=20,
)
```

| Parameter | Type | Required | Description |
|---|---|---|---|
| `name` | `str` | No | Agent display name (used as session name if not overridden) |
| `instructions` | `str` | No | System prompt for the LLM |
| `greeting` | `str` | No | Message spoken when the agent joins |
| `failure_message` | `str` | No | Message spoken on error |
| `max_history` | `int` | No | Maximum conversation history length |
| `turn_detection` | `TurnDetectionConfig` | No | Turn detection settings |
| `sal` | `SalConfig` | No | SAL (Speech Activity Level) configuration |
| `advanced_features` | `Dict[str, Any]` | No | Advanced features (e.g., `{'enable_mllm': True}`) |
| `parameters` | `SessionParams` | No | Additional session parameters |

## Builder Methods

Each `with_*` method returns a **new** `Agent` instance — the original is unchanged. This immutability lets you safely reuse a base configuration for multiple sessions.

### Vendor Methods

| Method | Accepts | Purpose |
|---|---|---|
| `with_llm(vendor)` | `BaseLLM` | Set the LLM provider |
| `with_tts(vendor)` | `BaseTTS` | Set the TTS provider |
| `with_stt(vendor)` | `BaseSTT` | Set the STT provider |
| `with_mllm(vendor)` | `BaseMLLM` | Set the MLLM provider (for multimodal flow) |
| `with_avatar(vendor)` | `BaseAvatar` | Set the avatar provider |

### Configuration Methods

| Method | Accepts | Purpose |
|---|---|---|
| `with_instructions(text)` | `str` | Override the system prompt |
| `with_greeting(text)` | `str` | Override the greeting message |
| `with_name(name)` | `str` | Override the agent name |
| `with_turn_detection(config)` | `TurnDetectionConfig` | Override turn detection settings |

## Chaining Example

```python
from agora_agent.agentkit import Agent
from agora_agent.agentkit.vendors import OpenAI, ElevenLabsTTS, DeepgramSTT

agent = (
Agent(name='my-agent', instructions='You are a helpful assistant.')
.with_llm(OpenAI(api_key='your-openai-key', model='gpt-4o-mini'))
.with_tts(ElevenLabsTTS(key='your-elevenlabs-key', model_id='eleven_flash_v2_5', voice_id='your-voice-id'))
.with_stt(DeepgramSTT(api_key='your-deepgram-key', language='en-US'))
)
```

## Immutable Reuse

Because each `with_*` call returns a new `Agent`, you can build a base configuration and create multiple sessions from it:

```python
from agora_agent import Agora, Area
from agora_agent.agentkit import Agent
from agora_agent.agentkit.vendors import OpenAI, ElevenLabsTTS, DeepgramSTT

client = Agora(area=Area.US, app_id='your-app-id', app_certificate='your-app-certificate')

base = (
Agent(instructions='You are a helpful assistant.')
.with_llm(OpenAI(api_key='your-openai-key', model='gpt-4o-mini'))
.with_tts(ElevenLabsTTS(key='your-elevenlabs-key', model_id='eleven_flash_v2_5', voice_id='your-voice-id'))
.with_stt(DeepgramSTT(api_key='your-deepgram-key', language='en-US'))
)

# Same agent config, different channels
session_a = base.create_session(client, channel='room-a', agent_uid='1', remote_uids=['100'])
session_b = base.create_session(client, channel='room-b', agent_uid='1', remote_uids=['200'])
```

## `create_session()`

Creates a new `AgentSession` bound to a client and channel.

```python
session = agent.create_session(
client,
channel='my-channel',
agent_uid='1',
remote_uids=['100'],
name='optional-session-name',
token='optional-pre-built-token',
idle_timeout=300,
enable_string_uid=True,
)
```

| Parameter | Type | Required | Description |
|---|---|---|---|
| `client` | `Agora` or `AsyncAgora` | Yes | The authenticated client |
| `channel` | `str` | Yes | Agora channel name |
| `agent_uid` | `str` | Yes | UID for the agent in the channel |
| `remote_uids` | `List[str]` | Yes | UIDs of remote participants to listen to |
| `name` | `str` | No | Session name (defaults to agent name or auto-generated) |
| `token` | `str` | No | Pre-built RTC token (if not provided, generated from client credentials) |
| `idle_timeout` | `int` | No | Idle timeout in seconds |
| `enable_string_uid` | `bool` | No | Enable string UIDs |

## Avatar Sample Rate Constraint

When using `with_avatar()`, the SDK validates that the TTS sample rate matches the avatar's requirement. If there is a mismatch, a `ValueError` is raised at build time:

```
ValueError: Avatar requires TTS sample rate of 24000 Hz, but TTS is configured with 16000 Hz. Please update your TTS sample_rate to 24000.
```

See [Avatar Integration](../guides/avatars.md) for details.

## Properties

| Property | Type | Description |
|---|---|---|
| `agent.name` | `Optional[str]` | Agent name |
| `agent.instructions` | `Optional[str]` | System prompt |
| `agent.greeting` | `Optional[str]` | Greeting message |
| `agent.llm` | `Optional[Dict]` | LLM configuration dict |
| `agent.tts` | `Optional[Dict]` | TTS configuration dict |
| `agent.stt` | `Optional[Dict]` | STT configuration dict |
| `agent.mllm` | `Optional[Dict]` | MLLM configuration dict |
| `agent.turn_detection` | `Optional[TurnDetectionConfig]` | Turn detection settings |
| `agent.config` | `Dict[str, Any]` | Full configuration dict |
103 changes: 103 additions & 0 deletions docs/concepts/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
---
sidebar_position: 1
title: Architecture
description: How the Python SDK layers are structured and when to use each.
---

# Architecture

## Two-Layer Design

The Python SDK has two layers:

```
+--------------------------------------------------+
| Developer API |
| Agent · AgentSession · Vendors · Token | <- agora_agent.agentkit (hand-written)
+--------------------------------------------------+
| Agora / AsyncAgora + Pool | <- agora_agent.pool_client (hand-written)
+--------------------------------------------------+
| Fern-generated Client Core |
| AgentsClient · TelephonyClient · TypeSystem | <- auto-generated
+--------------------------------------------------+
```

### Agentkit Layer (`agora_agent.agentkit`)

This is the primary developer-facing API. It provides:

- **`Agent`** — a fluent builder for configuring AI agents with LLM, TTS, STT, MLLM, and avatar vendors
- **`AgentSession` / `AsyncAgentSession`** — lifecycle management for running agents (start, stop, say, interrupt)
- **Vendor classes** — typed configuration for 28+ vendor integrations across 5 categories
- **`generate_rtc_token()`** — helper for building RTC tokens

### Pool Client Layer (`agora_agent.pool_client`)

`Agora` and `AsyncAgora` extend the Fern-generated base client with regional routing:

- Automatic DNS-based domain selection
- Region prefix cycling on failures
- Support for US, EU, AP, and CN areas

### Fern-Generated Layer

The auto-generated core provides typed HTTP methods for every Agora API endpoint. You rarely need this directly, but it is accessible via `session.raw` for advanced use cases or new endpoints that the agentkit layer does not yet cover.

## Sync vs. Async

The SDK provides two parallel client hierarchies:

| Sync | Async | HTTP Backend |
|---|---|---|
| `Agora` | `AsyncAgora` | `httpx.Client` / `httpx.AsyncClient` |
| `AgentSession` | `AsyncAgentSession` | Blocking calls / Coroutines |

### When to Use Each

**Use `Agora` (sync)** when:
- You are writing scripts, CLI tools, or batch jobs
- Your web framework is synchronous (Flask, Django without async views)
- You want the simplest possible code

**Use `AsyncAgora` (async)** when:
- Your application uses `asyncio` (FastAPI, Starlette, aiohttp)
- You need to manage multiple concurrent agent sessions
- You want non-blocking I/O

### Key Difference

Every method on `AgentSession` that makes an HTTP call (`start()`, `stop()`, `say()`, `interrupt()`, `update()`, `get_history()`, `get_info()`) has an `async` equivalent on `AsyncAgentSession` that must be called with `await`:

```python
# Sync
agent_id = session.start()
session.say('Hello!')
session.stop()

# Async
agent_id = await session.start()
await session.say('Hello!')
await session.stop()
```

The `Agent` builder class is the same for both — it does not make HTTP calls, so it has no async variant.

## Import Paths

```python
# Top-level client and types
from agora_agent import Agora, AsyncAgora, Area, Pool

# Agentkit layer
from agora_agent.agentkit import Agent, AgentSession
from agora_agent.agentkit.agent_session import AsyncAgentSession

# Vendor classes
from agora_agent.agentkit.vendors import OpenAI, ElevenLabsTTS, DeepgramSTT

# Token helpers
from agora_agent.agentkit.token import generate_rtc_token

# Also available from top-level
from agora_agent import Agent, AgentSession, AsyncAgentSession, generate_rtc_token
```
Loading