Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 12 additions & 10 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
name: ci
on: [push]
on:
push:
workflow_dispatch:
jobs:
compile:
runs-on: ubuntu-latest
Expand Down Expand Up @@ -31,14 +33,15 @@ jobs:
curl -sSL https://install.python-poetry.org | python - -y --version 1.5.1
- name: Install dependencies
run: poetry install

- name: Test
run: poetry run pytest -rP .

publish:
needs: [compile, test]
if: github.event_name == 'push' && contains(github.ref, 'refs/tags/')
if: (github.event_name == 'push' && contains(github.ref, 'refs/tags/')) || github.event_name == 'workflow_dispatch'
runs-on: ubuntu-latest
permissions:
id-token: write
steps:
- name: Checkout repo
uses: actions/checkout@v4
Expand All @@ -51,10 +54,9 @@ jobs:
curl -sSL https://install.python-poetry.org | python - -y --version 1.5.1
- name: Install dependencies
run: poetry install
- name: Publish to pypi
run: |
poetry config repositories.remote https://upload.pypi.org/legacy/
poetry --no-interaction -v publish --build --repository remote --username "$PYPI_USERNAME" --password "$PYPI_PASSWORD"
env:
PYPI_USERNAME: ${{ secrets.PYPI_USERNAME }}
PYPI_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
- name: Build package
run: poetry build
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
password: ${{ secrets.PYPI_API_TOKEN }}
76 changes: 72 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,21 @@
# Agora Agent Server SDK for Python

[![fern shield](https://img.shields.io/badge/%F0%9F%8C%BF-Built%20with%20Fern-brightgreen)](https://buildwithfern.com?utm_source=github&utm_medium=github&utm_campaign=readme&utm_source=https%3A%2F%2Fgithub.com%2FAgoraIO-Conversational-AI%2Fagent-server-sdk-python)
[![pypi](https://img.shields.io/pypi/v/agora-agent-server-sdk)](https://pypi.python.org/pypi/agora-agent-server-sdk)
[![pypi](https://img.shields.io/pypi/v/agent-server-sdk-python)](https://pypi.python.org/pypi/agent-server-sdk-python)

The Agora Conversational AI SDK provides convenient access to the Agora Conversational AI APIs,
enabling you to build voice-powered AI agents with support for both cascading flows (ASR -> LLM -> TTS)
The Agora Conversational AI SDK provides convenient access to the Agora Conversational AI APIs,
enabling you to build voice-powered AI agents with support for both cascading flows (ASR -> LLM -> TTS)
and multimodal flows (MLLM) for real-time audio processing.


## Table of Contents

- [Installation](#installation)
- [Quick Start](#quick-start)
- [Documentation](#documentation)
- [Reference](#reference)
- [Mllm Flow Multimodal](#mllm-flow-multimodal)
- [Mllm Flow Multimodal](#mllm-flow-multimodal)
- [Usage](#usage)
- [Async Client](#async-client)
- [Exception Handling](#exception-handling)
Expand All @@ -28,7 +30,7 @@ and multimodal flows (MLLM) for real-time audio processing.
## Installation

```sh
pip install agora-agent-server-sdk
pip install agent-server-sdk-python
```

## Quick Start
Expand Down Expand Up @@ -152,6 +154,71 @@ A full reference for this library is available [here](https://github.com/AgoraIO

For real-time audio processing using OpenAI's Realtime API or Google Gemini Live, use the MLLM (Multimodal Large Language Model) flow instead of the cascading ASR -> LLM -> TTS flow. See the [MLLM Overview](https://docs.agora.io/en/conversational-ai/models/mllm/overview) for more details.

```python
from agora_agent import Agora, Area
from agora_agent.agentkit import (
AdvancedFeatures,
TurnDetectionConfig,
TurnDetectionTypeValues,
)
from agora_agent.agents import (
StartAgentsRequestProperties,
StartAgentsRequestPropertiesMllm,
StartAgentsRequestPropertiesMllmVendor,
StartAgentsRequestPropertiesTts,
StartAgentsRequestPropertiesTtsVendor,
StartAgentsRequestPropertiesLlm,
)

client = Agora(
area=Area.US,
app_id="YOUR_APP_ID",
app_certificate="YOUR_APP_CERTIFICATE",
)

client.agents.start(
client.app_id,
name="mllm_agent",
properties=StartAgentsRequestProperties(
channel="channel_name",
token="your_token",
agent_rtc_uid="1001",
remote_rtc_uids=["1002"],
idle_timeout=120,
advanced_features=AdvancedFeatures(enable_mllm=True),
mllm=StartAgentsRequestPropertiesMllm(
url="wss://api.openai.com/v1/realtime",
api_key="<your_openai_api_key>",
vendor=StartAgentsRequestPropertiesMllmVendor.OPENAI,
params={
"model": "gpt-4o-realtime-preview",
"voice": "alloy",
},
input_modalities=["audio"],
output_modalities=["text", "audio"],
greeting_message="Hello! I'm ready to chat in real-time.",
),
turn_detection=TurnDetectionConfig(
type=TurnDetectionTypeValues.SERVER_VAD, # deprecated; use config.end_of_speech instead
threshold=0.5,
silence_duration_ms=500,
),
# TTS and LLM are still required but not used when MLLM is enabled
tts=StartAgentsRequestPropertiesTts(
vendor=StartAgentsRequestPropertiesTtsVendor.MICROSOFT,
params={},
),
llm=StartAgentsRequestPropertiesLlm(
url="https://api.openai.com/v1/chat/completions",
),
),
)
```

## MLLM Flow (Multimodal)

For real-time audio processing using OpenAI's Realtime API or Google Gemini Live, use the MLLM (Multimodal Large Language Model) flow instead of the cascading ASR -> LLM -> TTS flow. See the [MLLM Overview](https://docs.agora.io/en/conversational-ai/models/mllm/overview) for more details.

```python
from agora-agent-server-sdk import Agora
from agora-agent-server-sdk.agents import (
Expand Down Expand Up @@ -212,6 +279,7 @@ client.agents.start(
)
```


## Usage

Instantiate and use the client with the following:
Expand Down
Loading
Loading