Inference gateway integration (#22)

bcherry · web-flow · commit 423d4a2edce0 · 2025-10-01T12:50:02.000-06:00
diff --git a/.env.example b/.env.example
@@ -1,7 +1,3 @@
 LIVEKIT_URL=
 LIVEKIT_API_KEY=
 LIVEKIT_API_SECRET=
-
-OPENAI_API_KEY=
-DEEPGRAM_API_KEY=
-CARTESIA_API_KEY=
diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -28,5 +28,7 @@ jobs:
         
     - name: Run tests
       env:
-        OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+        LIVEKIT_URL: ${{ secrets.LIVEKIT_URL }}
+        LIVEKIT_API_KEY: ${{ secrets.LIVEKIT_API_KEY }}
+        LIVEKIT_API_SECRET: ${{ secrets.LIVEKIT_API_SECRET }}
       run: uv run pytest -v
diff --git a/.gitignore b/.gitignore
@@ -9,4 +9,4 @@ KMS
 .vscode
 *.egg-info
 .pytest_cache
-.ruff_cache
+.ruff_cache
diff --git a/README.md b/README.md
@@ -4,17 +4,18 @@
 
 # LiveKit Agents Starter - Python
 
-A complete starter project for building voice AI apps with [LiveKit Agents for Python](https://github.com/livekit/agents).
+A complete starter project for building voice AI apps with [LiveKit Agents for Python](https://github.com/livekit/agents) and [LiveKit Cloud](https://cloud.livekit.io/).
 
 The starter project includes:
 
-- A simple voice AI assistant based on the [Voice AI quickstart](https://docs.livekit.io/agents/start/voice-ai/)
-- Voice AI pipeline based on [OpenAI](https://docs.livekit.io/agents/integrations/llm/openai/), [Cartesia](https://docs.livekit.io/agents/integrations/tts/cartesia/), and [Deepgram](https://docs.livekit.io/agents/integrations/llm/deepgram/)
-  - Easily integrate your preferred [LLM](https://docs.livekit.io/agents/integrations/llm/), [STT](https://docs.livekit.io/agents/integrations/stt/), and [TTS](https://docs.livekit.io/agents/integrations/tts/) instead, or swap to a realtime model like the [OpenAI Realtime API](https://docs.livekit.io/agents/integrations/realtime/openai)
+- A simple voice AI assistant, ready for extension and customization
+- A voice AI pipeline with [models](https://docs.livekit.io/agents/models) from OpenAI, Cartesia, and AssemblyAI served through LiveKit Cloud
+  - Easily integrate your preferred [LLM](https://docs.livekit.io/agents/models/llm/), [STT](https://docs.livekit.io/agents/models/stt/), and [TTS](https://docs.livekit.io/agents/models/tts/) instead, or swap to a realtime model like the [OpenAI Realtime API](https://docs.livekit.io/agents/models/realtime/openai)
 - Eval suite based on the LiveKit Agents [testing & evaluation framework](https://docs.livekit.io/agents/build/testing/)
 - [LiveKit Turn Detector](https://docs.livekit.io/agents/build/turns/turn-detector/) for contextually-aware speaker detection, with multilingual support
-- [LiveKit Cloud enhanced noise cancellation](https://docs.livekit.io/home/cloud/noise-cancellation/)
+- [Background voice cancellation](https://docs.livekit.io/home/cloud/noise-cancellation/)
 - Integrated [metrics and logging](https://docs.livekit.io/agents/build/metrics/)
+- A Dockerfile ready for [production deployment](https://docs.livekit.io/agents/ops/deployment/)
 
 This starter app is compatible with any [custom web/mobile frontend](https://docs.livekit.io/agents/start/frontend/) or [SIP-based telephony](https://docs.livekit.io/agents/start/telephony/).
 
@@ -27,19 +28,17 @@ cd agent-starter-python
 uv sync
 ```
 
-Set up the environment by copying `.env.example` to `.env.local` and filling in the required values:
+Sign up for [LiveKit Cloud](https://cloud.livekit.io/) then set up the environment by copying `.env.example` to `.env.local` and filling in the required keys:
 
-- `LIVEKIT_URL`: Use [LiveKit Cloud](https://cloud.livekit.io/) or [run your own](https://docs.livekit.io/home/self-hosting/)
+- `LIVEKIT_URL`
 - `LIVEKIT_API_KEY`
 - `LIVEKIT_API_SECRET`
-- `OPENAI_API_KEY`: [Get a key](https://platform.openai.com/api-keys) or use your [preferred LLM provider](https://docs.livekit.io/agents/integrations/llm/)
-- `DEEPGRAM_API_KEY`: [Get a key](https://console.deepgram.com/) or use your [preferred STT provider](https://docs.livekit.io/agents/integrations/stt/)
-- `CARTESIA_API_KEY`: [Get a key](https://play.cartesia.ai/keys) or use your [preferred TTS provider](https://docs.livekit.io/agents/integrations/tts/)
 
 You can load the LiveKit environment automatically using the [LiveKit CLI](https://docs.livekit.io/home/cli/cli-setup):
 
 ```bash
-lk app env -w .env.local
+lk cloud auth
+lk app env -w -d .env.local
 ```
 
 ## Run the agent
@@ -100,12 +99,16 @@ Once you've started your own project based on this repo, you should:
 
 2. **Remove the git tracking test**: Delete the "Check files not tracked in git" step from `.github/workflows/tests.yml` since you'll now want this file to be tracked. These are just there for development purposes in the template repo itself.
 
-3. **Add your own repository secrets**: You must [add secrets](https://docs.github.com/en/actions/how-tos/writing-workflows/choosing-what-your-workflow-does/using-secrets-in-github-actions) for `OPENAI_API_KEY` or your other LLM provider so that the tests can run in CI.
+3. **Add your own repository secrets**: You must [add secrets](https://docs.github.com/en/actions/how-tos/writing-workflows/choosing-what-your-workflow-does/using-secrets-in-github-actions) for `LIVEKIT_URL`, `LIVEKIT_API_KEY`, and `LIVEKIT_API_SECRET` so that the tests can run in CI.
 
 ## Deploying to production
 
 This project is production-ready and includes a working `Dockerfile`. To deploy it to LiveKit Cloud or another environment, see the [deploying to production](https://docs.livekit.io/agents/ops/deployment/) guide.
 
+## Self-hosted LiveKit
+
+You can also self-host LiveKit instead of using LiveKit Cloud. See the [self-hosting](https://docs.livekit.io/home/self-hosting/) guide for more information. If you choose to self-host, you'll need to also use [model plugins](https://docs.livekit.io/agents/models/#plugins) instead of LiveKit Inference and will need to remove the [LiveKit Cloud noise cancellation](https://docs.livekit.io/home/cloud/noise-cancellation/) plugin.
+
 ## License
 
 This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
diff --git a/pyproject.toml b/pyproject.toml
@@ -9,7 +9,7 @@ description = "Simple voice AI assistant built with LiveKit Agents for Python"
 requires-python = ">=3.9"
 
 dependencies = [
-    "livekit-agents[openai,turn-detector,silero,cartesia,deepgram]~=1.2",
+    "livekit-agents[silero,turn-detector]~=1.2",
     "livekit-plugins-noise-cancellation~=0.2",
     "python-dotenv",
 ]
diff --git a/src/agent.py b/src/agent.py
@@ -2,21 +2,17 @@
 
 from dotenv import load_dotenv
 from livekit.agents import (
-    NOT_GIVEN,
     Agent,
-    AgentFalseInterruptionEvent,
     AgentSession,
     JobContext,
     JobProcess,
     MetricsCollectedEvent,
     RoomInputOptions,
-    RunContext,
     WorkerOptions,
     cli,
     metrics,
 )
-from livekit.agents.llm import function_tool
-from livekit.plugins import cartesia, deepgram, noise_cancellation, openai, silero
+from livekit.plugins import noise_cancellation, silero
 from livekit.plugins.turn_detector.multilingual import MultilingualModel
 
 logger = logging.getLogger("agent")
@@ -27,27 +23,28 @@
 class Assistant(Agent):
     def __init__(self) -> None:
         super().__init__(
-            instructions="""You are a helpful voice AI assistant.
+            instructions="""You are a helpful voice AI assistant. The user is interacting with you via voice, even if you perceive the conversation as text.
             You eagerly assist users with their questions by providing information from your extensive knowledge.
             Your responses are concise, to the point, and without any complex formatting or punctuation including emojis, asterisks, or other symbols.
             You are curious, friendly, and have a sense of humor.""",
         )
 
-    # all functions annotated with @function_tool will be passed to the LLM when this
-    # agent is active
-    @function_tool
-    async def lookup_weather(self, context: RunContext, location: str):
-        """Use this tool to look up current weather information in the given location.
-
-        If the location is not supported by the weather service, the tool will indicate this. You must tell the user the location's weather is unavailable.
-
-        Args:
-            location: The location to look up weather information for (e.g. city name)
-        """
-
-        logger.info(f"Looking up weather for {location}")
-
-        return "sunny with a temperature of 70 degrees."
+    # To add tools, use the @function_tool decorator.
+    # Here's an example that adds a simple weather tool.
+    # You also have to add `from livekit.agents.llm import function_tool, RunContext` to the top of this file
+    # @function_tool
+    # async def lookup_weather(self, context: RunContext, location: str):
+    #     """Use this tool to look up current weather information in the given location.
+    #
+    #     If the location is not supported by the weather service, the tool will indicate this. You must tell the user the location's weather is unavailable.
+    #
+    #     Args:
+    #         location: The location to look up weather information for (e.g. city name)
+    #     """
+    #
+    #     logger.info(f"Looking up weather for {location}")
+    #
+    #     return "sunny with a temperature of 70 degrees."
 
 
 def prewarm(proc: JobProcess):
@@ -61,17 +58,17 @@ async def entrypoint(ctx: JobContext):
         "room": ctx.room.name,
     }
 
-    # Set up a voice AI pipeline using OpenAI, Cartesia, Deepgram, and the LiveKit turn detector
+    # Set up a voice AI pipeline using OpenAI, Cartesia, AssemblyAI, and the LiveKit turn detector
     session = AgentSession(
-        # A Large Language Model (LLM) is your agent's brain, processing user input and generating a response
-        # See all providers at https://docs.livekit.io/agents/integrations/llm/
-        llm=openai.LLM(model="gpt-4o-mini"),
         # Speech-to-text (STT) is your agent's ears, turning the user's speech into text that the LLM can understand
-        # See all providers at https://docs.livekit.io/agents/integrations/stt/
-        stt=deepgram.STT(model="nova-3", language="multi"),
+        # See all available models at https://docs.livekit.io/agents/models/stt/
+        stt="assemblyai/universal-streaming:en",
+        # A Large Language Model (LLM) is your agent's brain, processing user input and generating a response
+        # See all available models at https://docs.livekit.io/agents/models/llm/
+        llm="openai/gpt-4.1-mini",
         # Text-to-speech (TTS) is your agent's voice, turning the LLM's text into speech that the user can hear
-        # See all providers at https://docs.livekit.io/agents/integrations/tts/
-        tts=cartesia.TTS(voice="6f84f4b8-58a2-430c-8c79-688dad597532"),
+        # See all available models as well as voice selections at https://docs.livekit.io/agents/models/tts/
+        tts="cartesia/sonic-2:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
         # VAD and turn detection are used to determine when the user is speaking and when the agent should respond
         # See more at https://docs.livekit.io/agents/build/turns
         turn_detection=MultilingualModel(),
@@ -81,19 +78,16 @@ async def entrypoint(ctx: JobContext):
         preemptive_generation=True,
     )
 
-    # To use a realtime model instead of a voice pipeline, use the following session setup instead:
+    # To use a realtime model instead of a voice pipeline, use the following session setup instead.
+    # (Note: This is for the OpenAI Realtime API. For other providers, see https://docs.livekit.io/agents/models/realtime/))
+    # 1. Install livekit-agents[openai]
+    # 2. Set OPENAI_API_KEY in .env.local
+    # 3. Add `from livekit.plugins import openai` to the top of this file
+    # 4. Use the following session setup instead of the version above
     # session = AgentSession(
-    #     # See all providers at https://docs.livekit.io/agents/integrations/realtime/
     #     llm=openai.realtime.RealtimeModel(voice="marin")
     # )
 
-    # sometimes background noise could interrupt the agent session, these are considered false positive interruptions
-    # when it's detected, you may resume the agent's speech
-    @session.on("agent_false_interruption")
-    def _on_agent_false_interruption(ev: AgentFalseInterruptionEvent):
-        logger.info("false positive interruption, resuming")
-        session.generate_reply(instructions=ev.extra_instructions or NOT_GIVEN)
-
     # Metrics collection, to measure pipeline performance
     # For more information, see https://docs.livekit.io/agents/build/metrics/
     usage_collector = metrics.UsageCollector()
@@ -110,9 +104,9 @@ async def log_usage():
     ctx.add_shutdown_callback(log_usage)
 
     # # Add a virtual avatar to the session, if desired
-    # # For other providers, see https://docs.livekit.io/agents/integrations/avatar/
+    # # For other providers, see https://docs.livekit.io/agents/models/avatar/
     # avatar = hedra.AvatarSession(
-    #   avatar_id="...",  # See https://docs.livekit.io/agents/integrations/avatar/hedra
+    #   avatar_id="...",  # See https://docs.livekit.io/agents/models/avatar/plugins/hedra
     # )
     # # Start the avatar and wait for it to join
     # await avatar.start(session, room=ctx.room)
@@ -122,9 +116,7 @@ async def log_usage():
         agent=Assistant(),
         room=ctx.room,
         room_input_options=RoomInputOptions(
-            # LiveKit Cloud enhanced noise cancellation
-            # - If self-hosting, omit this parameter
-            # - For telephony applications, use `BVCTelephony` for best results
+            # For telephony applications, use `BVCTelephony` for best results
             noise_cancellation=noise_cancellation.BVC(),
         ),
     )
diff --git a/tests/test_agent.py b/tests/test_agent.py
@@ -1,12 +1,11 @@
 import pytest
-from livekit.agents import AgentSession, llm, mock_tools
-from livekit.plugins import openai
+from livekit.agents import AgentSession, inference, llm
 
 from agent import Assistant
 
 
 def _llm() -> llm.LLM:
-    return openai.LLM(model="gpt-4o-mini")
+    return inference.LLM(model="openai/gpt-4.1-mini")
 
 
 @pytest.mark.asyncio
@@ -41,118 +40,6 @@ async def test_offers_assistance() -> None:
         result.expect.no_more_events()
 
 
-@pytest.mark.asyncio
-async def test_weather_tool() -> None:
-    """Unit test for the weather tool combined with an evaluation of the agent's ability to incorporate its results."""
-    async with (
-        _llm() as llm,
-        AgentSession(llm=llm) as session,
-    ):
-        await session.start(Assistant())
-
-        # Run an agent turn following the user's request for weather information
-        result = await session.run(user_input="What's the weather in Tokyo?")
-
-        # Test that the agent calls the weather tool with the correct arguments
-        result.expect.next_event().is_function_call(
-            name="lookup_weather", arguments={"location": "Tokyo"}
-        )
-
-        # Test that the tool invocation works and returns the correct output
-        # To mock the tool output instead, see https://docs.livekit.io/agents/build/testing/#mock-tools
-        result.expect.next_event().is_function_call_output(
-            output="sunny with a temperature of 70 degrees."
-        )
-
-        # Evaluate the agent's response for accurate weather information
-        await (
-            result.expect.next_event()
-            .is_message(role="assistant")
-            .judge(
-                llm,
-                intent="""
-                Informs the user that the weather is sunny with a temperature of 70 degrees.
-
-                Optional context that may or may not be included (but the response must not contradict these facts)
-                - The location for the weather report is Tokyo
-                """,
-            )
-        )
-
-        # Ensures there are no function calls or other unexpected events
-        result.expect.no_more_events()
-
-
-@pytest.mark.asyncio
-async def test_weather_unavailable() -> None:
-    """Evaluation of the agent's ability to handle tool errors."""
-    async with (
-        _llm() as llm,
-        AgentSession(llm=llm) as sess,
-    ):
-        await sess.start(Assistant())
-
-        # Simulate a tool error
-        with mock_tools(
-            Assistant,
-            {"lookup_weather": lambda: RuntimeError("Weather service is unavailable")},
-        ):
-            result = await sess.run(user_input="What's the weather in Tokyo?")
-            result.expect.skip_next_event_if(type="message", role="assistant")
-            result.expect.next_event().is_function_call(
-                name="lookup_weather", arguments={"location": "Tokyo"}
-            )
-            result.expect.next_event().is_function_call_output()
-            await result.expect.next_event(type="message").judge(
-                llm,
-                intent="""
-                Acknowledges that the weather request could not be fulfilled and communicates this to the user.
-
-                The response should convey that there was a problem getting the weather information, but can be expressed in various ways such as:
-                - Mentioning an error, service issue, or that it couldn't be retrieved
-                - Suggesting alternatives or asking what else they can help with
-                - Being apologetic or explaining the situation
-
-                The response does not need to use specific technical terms like "weather service error" or "temporary".
-                """,
-            )
-
-            # leaving this commented, some LLMs may occasionally try to retry.
-            # result.expect.no_more_events()
-
-
-@pytest.mark.asyncio
-async def test_unsupported_location() -> None:
-    """Evaluation of the agent's ability to handle a weather response with an unsupported location."""
-    async with (
-        _llm() as llm,
-        AgentSession(llm=llm) as sess,
-    ):
-        await sess.start(Assistant())
-
-        with mock_tools(Assistant, {"lookup_weather": lambda: "UNSUPPORTED_LOCATION"}):
-            result = await sess.run(user_input="What's the weather in Tokyo?")
-
-            # Evaluate the agent's response for an unsupported location
-            await result.expect.next_event(type="message").judge(
-                llm,
-                intent="""
-                Communicates that the weather request for the specific location could not be fulfilled.
-
-                The response should indicate that weather information is not available for the requested location, but can be expressed in various ways such as:
-                - Saying they can't get weather for that location
-                - Explaining the location isn't supported or available
-                - Suggesting alternatives or asking what else they can help with
-                - Being apologetic about the limitation
-
-                The response does not need to explicitly state "unsupported" or discourage retrying.
-                """,
-            )
-
-        # Ensures there are no function calls or other unexpected events
-        result.expect.no_more_events()
-
-
 @pytest.mark.asyncio
 async def test_grounding() -> None:
     """Evaluation of the agent's ability to refuse to answer when it doesn't know something."""

Original file line number	Diff line number	Diff line change
`@@ -9,7 +9,7 @@ description = "Simple voice AI assistant built with LiveKit Agents for Python"`
`9`	`9`	`requires-python = ">=3.9"`
`10`	`10`
`11`	`11`	`dependencies = [`
`12`		`- "livekit-agents[openai,turn-detector,silero,cartesia,deepgram]~=1.2",`
	`12`	`+ "livekit-agents[silero,turn-detector]~=1.2",`
`13`	`13`	`"livekit-plugins-noise-cancellation~=0.2",`
`14`	`14`	`"python-dotenv",`
`15`	`15`	`]`