diff --git a/docs/llms-full.txt b/docs/llms-full.txt index d6996ee332..dddad2545d 100644 --- a/docs/llms-full.txt +++ b/docs/llms-full.txt @@ -38,8 +38,7 @@ The Agents SDK delivers a focused set of Python primitives—agents, tools, guar - [Realtime guide](https://openai.github.io/openai-agents-python/realtime/guide/): Deep dive into realtime session lifecycle, structured input, approvals, interruptions, and low-level transport control. ## Models and Provider Integrations -- [Model catalog](https://openai.github.io/openai-agents-python/models/): Lists supported OpenAI and partner models with guidance on selecting capabilities for different workloads. -- [LiteLLM integration](https://openai.github.io/openai-agents-python/models/litellm/): Configure LiteLLM as a provider, map model aliases, and route requests across heterogeneous backends. +- [Model catalog](https://openai.github.io/openai-agents-python/models/): Covers OpenAI model selection, non-OpenAI provider patterns, websocket transport, and the SDK's best-effort LiteLLM guidance in one place. ## API Reference – Agents SDK Core - [API index](https://openai.github.io/openai-agents-python/ref/index/): Directory of all documented modules, classes, and functions in the SDK. diff --git a/docs/llms.txt b/docs/llms.txt index cbd6312a3f..a96401c0c0 100644 --- a/docs/llms.txt +++ b/docs/llms.txt @@ -52,8 +52,7 @@ The SDK focuses on a concise set of primitives so you can orchestrate multi-agen - [Extensions](https://openai.github.io/openai-agents-python/ref/extensions/handoff_filters/): Extend the SDK with custom handoff filters, prompts, LiteLLM integration, and SQLAlchemy session memory. ## Models and Providers -- [Model catalog](https://openai.github.io/openai-agents-python/models/): Overview of supported model families and configuration guidance. -- [LiteLLM integration](https://openai.github.io/openai-agents-python/models/litellm/): Configure LiteLLM as a provider to fan out across multiple model backends. +- [Model catalog](https://openai.github.io/openai-agents-python/models/): Overview of OpenAI models, non-OpenAI provider patterns, websocket transport, and the SDK's best-effort LiteLLM guidance. ## Optional - [Release notes](https://openai.github.io/openai-agents-python/release/): Track SDK changes, migration notes, and deprecations. diff --git a/docs/models/index.md b/docs/models/index.md index c510f0dd85..3ec1b573c8 100644 --- a/docs/models/index.md +++ b/docs/models/index.md @@ -7,18 +7,21 @@ The Agents SDK comes with out-of-the-box support for OpenAI models in two flavor ## Choosing a model setup -Use this page in the following order depending on your setup: +Start with the simplest path that fits your setup: -| Goal | Start here | -| --- | --- | -| Use OpenAI-hosted models with SDK defaults | [OpenAI models](#openai-models) | -| Use OpenAI Responses API over websocket transport | [Responses WebSocket transport](#responses-websocket-transport) | -| Use non-OpenAI providers | [Non-OpenAI models](#non-openai-models) | -| Mix models/providers in one workflow | [Advanced model selection and mixing](#advanced-model-selection-and-mixing) and [Mixing models across providers](#mixing-models-across-providers) | -| Debug provider compatibility issues | [Troubleshooting non-OpenAI providers](#troubleshooting-non-openai-providers) | +| If you are trying to... | Recommended path | Read more | +| --- | --- | --- | +| Use OpenAI models only | Use the default OpenAI provider with the Responses model path | [OpenAI models](#openai-models) | +| Use OpenAI Responses API over websocket transport | Keep the Responses model path and enable websocket transport | [Responses WebSocket transport](#responses-websocket-transport) | +| Use one non-OpenAI provider | Start with the built-in provider integration points | [Non-OpenAI models](#non-openai-models) | +| Mix models or providers across agents | Select providers per run or per agent and review feature differences | [Mixing models in one workflow](#mixing-models-in-one-workflow) and [Mixing models across providers](#mixing-models-across-providers) | +| Tune advanced OpenAI Responses request settings | Use `ModelSettings` on the OpenAI Responses path | [Advanced OpenAI Responses settings](#advanced-openai-responses-settings) | +| Use LiteLLM for non-OpenAI Chat Completions providers | Treat LiteLLM as a beta fallback | [LiteLLM](#litellm) | ## OpenAI models +For most OpenAI-only apps, the recommended path is to use string model names with the default OpenAI provider and stay on the Responses model path. + When you don't specify a model when initializing an `Agent`, the default model will be used. The default is currently [`gpt-4.1`](https://developers.openai.com/api/docs/models/gpt-4.1) for compatibility and low latency. If you have access, we recommend setting your agents to [`gpt-5.4`](https://developers.openai.com/api/docs/models/gpt-5.4) for higher quality while keeping explicit `model_settings`. If you want to switch to other models like [`gpt-5.4`](https://developers.openai.com/api/docs/models/gpt-5.4), there are two ways to configure your agents. @@ -97,6 +100,8 @@ These features are rejected on Chat Completions models and on non-Responses back By default, OpenAI Responses API requests use HTTP transport. You can opt in to websocket transport when using OpenAI-backed models. +#### Basic setup + ```python from agents import set_default_openai_responses_transport @@ -107,6 +112,8 @@ This affects OpenAI Responses models resolved by the default OpenAI provider (in Transport selection happens when the SDK resolves a model name into a model instance. If you pass a concrete [`Model`][agents.models.interface.Model] object, its transport is already fixed: [`OpenAIResponsesWSModel`][agents.models.openai_responses.OpenAIResponsesWSModel] uses websocket, [`OpenAIResponsesModel`][agents.models.openai_responses.OpenAIResponsesModel] uses HTTP, and [`OpenAIChatCompletionsModel`][agents.models.openai_chatcompletions.OpenAIChatCompletionsModel] stays on Chat Completions. If you pass `RunConfig(model_provider=...)`, that provider controls transport selection instead of the global default. +#### Provider or run-level setup + You can also configure websocket transport per provider or per run: ```python @@ -126,6 +133,8 @@ result = await Runner.run( ) ``` +#### Advanced routing with `MultiProvider` + If you need prefix-based model routing (for example mixing `openai/...` and `litellm/...` model names in one run), use [`MultiProvider`][agents.MultiProvider] and set `openai_use_responses_websocket=True` there instead. `MultiProvider` keeps two historical defaults: @@ -163,7 +172,7 @@ Use `openai_prefix_mode="model_id"` when a backend expects the literal `openai/. If you use a custom OpenAI-compatible endpoint or proxy, websocket transport also requires a compatible websocket `/responses` endpoint. In those setups you may need to set `websocket_base_url` explicitly. -Notes: +#### Notes - This is the Responses API over websocket transport, not the [Realtime API](../realtime/guide.md). It does not apply to Chat Completions or non-OpenAI providers unless they support the Responses websocket `/responses` endpoint. - Install the `websockets` package if it is not already available in your environment. @@ -171,34 +180,30 @@ Notes: ## Non-OpenAI models -You can use most other non-OpenAI models via the [LiteLLM integration](./litellm.md). First, install the litellm dependency group: - -```bash -pip install "openai-agents[litellm]" -``` - -Then, use any of the [supported models](https://docs.litellm.ai/docs/providers) with the `litellm/` prefix: +If you need a non-OpenAI provider, start with the SDK's built-in provider integration points. In many setups, this is enough without adding LiteLLM. Examples for each pattern live in [examples/model_providers](https://github.com/openai/openai-agents-python/tree/main/examples/model_providers/). -```python -claude_agent = Agent(model="litellm/anthropic/claude-3-5-sonnet-20240620", ...) -gemini_agent = Agent(model="litellm/gemini/gemini-2.5-flash-preview-04-17", ...) -``` +### Ways to integrate non-OpenAI providers -### Other ways to use non-OpenAI models +| Approach | Use it when | Scope | +| --- | --- | --- | +| [`set_default_openai_client`][agents.set_default_openai_client] | One OpenAI-compatible endpoint should be the default for most or all agents | Global default | +| [`ModelProvider`][agents.models.interface.ModelProvider] | One custom provider should apply to a single run | Per run | +| [`Agent.model`][agents.agent.Agent.model] | Different agents need different providers or concrete model objects | Per agent | +| LiteLLM (beta) | You need LiteLLM-specific provider coverage or routing | See [LiteLLM](#litellm) | -You can integrate other LLM providers in 3 more ways (examples [here](https://github.com/openai/openai-agents-python/tree/main/examples/model_providers/)): +You can integrate other LLM providers with these built-in paths: 1. [`set_default_openai_client`][agents.set_default_openai_client] is useful in cases where you want to globally use an instance of `AsyncOpenAI` as the LLM client. This is for cases where the LLM provider has an OpenAI compatible API endpoint, and you can set the `base_url` and `api_key`. See a configurable example in [examples/model_providers/custom_example_global.py](https://github.com/openai/openai-agents-python/tree/main/examples/model_providers/custom_example_global.py). 2. [`ModelProvider`][agents.models.interface.ModelProvider] is at the `Runner.run` level. This lets you say "use a custom model provider for all agents in this run". See a configurable example in [examples/model_providers/custom_example_provider.py](https://github.com/openai/openai-agents-python/tree/main/examples/model_providers/custom_example_provider.py). -3. [`Agent.model`][agents.agent.Agent.model] lets you specify the model on a specific Agent instance. This enables you to mix and match different providers for different agents. See a configurable example in [examples/model_providers/custom_example_agent.py](https://github.com/openai/openai-agents-python/tree/main/examples/model_providers/custom_example_agent.py). An easy way to use most available models is via the [LiteLLM integration](./litellm.md). +3. [`Agent.model`][agents.agent.Agent.model] lets you specify the model on a specific Agent instance. This enables you to mix and match different providers for different agents. See a configurable example in [examples/model_providers/custom_example_agent.py](https://github.com/openai/openai-agents-python/tree/main/examples/model_providers/custom_example_agent.py). In cases where you do not have an API key from `platform.openai.com`, we recommend disabling tracing via `set_tracing_disabled()`, or setting up a [different tracing processor](../tracing.md). !!! note - In these examples, we use the Chat Completions API/model, because most LLM providers don't yet support the Responses API. If your LLM provider does support it, we recommend using Responses. + In these examples, we use the Chat Completions API/model, because many LLM providers still do not support the Responses API. If your LLM provider does support it, we recommend using Responses. -## Advanced model selection and mixing +## Mixing models in one workflow Within a single workflow, you may want to use different models for each agent. For example, you could use a smaller, faster model for triage, while using a larger, more capable model for complex tasks. When configuring an [`Agent`][agents.Agent], you can select a specific model by either: @@ -206,7 +211,7 @@ Within a single workflow, you may want to use different models for each agent. F 2. Passing any model name + a [`ModelProvider`][agents.models.interface.ModelProvider] that can map that name to a Model instance. 3. Directly providing a [`Model`][agents.models.interface.Model] implementation. -!!!note +!!! note While our SDK supports both the [`OpenAIResponsesModel`][agents.models.openai_responses.OpenAIResponsesModel] and the [`OpenAIChatCompletionsModel`][agents.models.openai_chatcompletions.OpenAIChatCompletionsModel] shapes, we recommend using a single model shape for each workflow because the two shapes support a different set of features and tools. If your workflow requires mixing and matching model shapes, make sure that all the features you're using are available on both. @@ -257,19 +262,21 @@ english_agent = Agent( ) ``` -#### Common advanced `ModelSettings` options +## Advanced OpenAI Responses settings + +When you are on the OpenAI Responses path and need more control, start with `ModelSettings`. + +### Common advanced `ModelSettings` options When you are using the OpenAI Responses API, several request fields already have direct `ModelSettings` fields, so you do not need `extra_args` for them. -| Field | Use it for | -| --- | --- | -| `parallel_tool_calls` | Allow or forbid multiple tool calls in the same turn. | -| `truncation` | Set `"auto"` to let the Responses API drop the oldest conversation items instead of failing when context would overflow. | -| `store` | Control whether the generated response is stored server-side for later retrieval. This matters for follow-up workflows that rely on response IDs, and for session compaction flows that may need to fall back to local input when `store=False`. | -| `prompt_cache_retention` | Keep cached prompt prefixes around longer, for example with `"24h"`. | -| `response_include` | Request richer response payloads such as `web_search_call.action.sources`, `file_search_call.results`, or `reasoning.encrypted_content`. | -| `top_logprobs` | Request top-token logprobs for output text. The SDK also adds `message.output_text.logprobs` automatically. | -| `retry` | Opt in to runner-managed retry settings for model calls. See [Runner-managed retries](#runner-managed-retries). | +- `parallel_tool_calls`: Allow or forbid multiple tool calls in the same turn. +- `truncation`: Set `"auto"` to let the Responses API drop the oldest conversation items instead of failing when context would overflow. +- `store`: Control whether the generated response is stored server-side for later retrieval. This matters for follow-up workflows that rely on response IDs, and for session compaction flows that may need to fall back to local input when `store=False`. +- `prompt_cache_retention`: Keep cached prompt prefixes around longer, for example with `"24h"`. +- `response_include`: Request richer response payloads such as `web_search_call.action.sources`, `file_search_call.results`, or `reasoning.encrypted_content`. +- `top_logprobs`: Request top-token logprobs for output text. The SDK also adds `message.output_text.logprobs` automatically. +- `retry`: Opt in to runner-managed retry settings for model calls. See [Runner-managed retries](#runner-managed-retries). ```python from agents import Agent, ModelSettings @@ -290,7 +297,27 @@ research_agent = Agent( When you set `store=False`, the Responses API does not keep that response available for later server-side retrieval. This is useful for stateless or zero-data-retention style flows, but it also means features that would otherwise reuse response IDs need to rely on locally managed state instead. For example, [`OpenAIResponsesCompactionSession`][agents.memory.openai_responses_compaction_session.OpenAIResponsesCompactionSession] switches its default `"auto"` compaction path to input-based compaction when the last response was not stored. See the [Sessions guide](../sessions/index.md#openai-responses-compaction-sessions). -#### Runner-managed retries +### Passing `extra_args` + +Use `extra_args` when you need provider-specific or newer request fields that the SDK does not expose directly at the top level yet. + +Also, when you use OpenAI's Responses API, [there are a few other optional parameters](https://platform.openai.com/docs/api-reference/responses/create) (e.g., `user`, `service_tier`, and so on). If they are not available at the top level, you can use `extra_args` to pass them as well. + +```python +from agents import Agent, ModelSettings + +english_agent = Agent( + name="English agent", + instructions="You only speak English", + model="gpt-4.1", + model_settings=ModelSettings( + temperature=0.1, + extra_args={"service_tier": "flex", "user": "user_12345"}, + ), +) +``` + +## Runner-managed retries Retries are runtime-only and opt in. The SDK does not retry general model requests unless you set `ModelSettings(retry=...)` and your retry policy chooses to retry. @@ -322,11 +349,15 @@ agent = Agent( `ModelRetrySettings` has three fields: +