docs: v0.8.0 changes (#2405)

seratch · web-flow · commit 9f0b4bb16fb4 · 2026-02-05T11:51:41.000+09:00
diff --git a/docs/examples.md b/docs/examples.md
@@ -77,13 +77,14 @@ Check out a variety of sample implementations of the SDK in the examples section
     Simple deep research clone that demonstrates complex multi-agent research workflows.
 
 -   **[tools](https://github.com/openai/openai-agents-python/tree/main/examples/tools):**
-    Learn how to implement OAI hosted tools such as:
+    Learn how to implement OAI hosted tools and experimental Codex tooling such as:
 
     -   Web search and web search with filters
     -   File search
     -   Code interpreter
     -   Computer use
     -   Image generation
+    -   Experimental Codex tool workflows (`examples/tools/codex.py`)
 
 -   **[voice](https://github.com/openai/openai-agents-python/tree/main/examples/voice):**
     See examples of voice agents, using our TTS and STT models, including streamed voice examples.
diff --git a/docs/human_in_the_loop.md b/docs/human_in_the_loop.md
@@ -0,0 +1,130 @@
+# Human-in-the-loop
+
+Use the human-in-the-loop (HITL) flow to pause agent execution until a person approves or rejects sensitive tool calls. Tools declare when they need approval, run results surface pending approvals as interruptions, and `RunState` lets you serialize and resume runs after decisions are made.
+
+## Marking tools that need approval
+
+Set `needs_approval` to `True` to always require approval or provide an async function that decides per call. The callable receives the run context, parsed tool parameters, and the tool call ID.
+
+```python
+from agents import Agent, Runner, function_tool
+
+
+@function_tool(needs_approval=True)
+async def cancel_order(order_id: int) -> str:
+    return f"Cancelled order {order_id}"
+
+
+async def requires_review(_ctx, params, _call_id) -> bool:
+    return "refund" in params.get("subject", "").lower()
+
+
+@function_tool(needs_approval=requires_review)
+async def send_email(subject: str, body: str) -> str:
+    return f"Sent '{subject}'"
+
+
+agent = Agent(
+    name="Support agent",
+    instructions="Handle tickets and ask for approval when needed.",
+    tools=[cancel_order, send_email],
+)
+```
+
+`needs_approval` is available on [`function_tool`][agents.tool.function_tool], [`Agent.as_tool`][agents.agent.Agent.as_tool], [`ShellTool`][agents.tool.ShellTool], and [`ApplyPatchTool`][agents.tool.ApplyPatchTool]. Local MCP servers also support approvals through `require_approval` on [`MCPServerStdio`][agents.mcp.server.MCPServerStdio], [`MCPServerSse`][agents.mcp.server.MCPServerSse], and [`MCPServerStreamableHttp`][agents.mcp.server.MCPServerStreamableHttp]. Hosted MCP servers support approvals via [`HostedMCPTool`][agents.tool.HostedMCPTool] with `tool_config={"require_approval": "always"}` and an optional `on_approval_request` callback. Shell and apply_patch tools accept an `on_approval` callback if you want to auto-approve or auto-reject without surfacing an interruption.
+
+## How the approval flow works
+
+1. When the model emits a tool call, the runner evaluates `needs_approval`.
+2. If an approval decision for that tool call is already stored in the [`RunContextWrapper`][agents.run_context.RunContextWrapper] (for example, from `always_approve=True`), the runner proceeds without prompting. Per-call approvals are scoped to the specific call ID; use `always_approve=True` to allow future calls automatically.
+3. Otherwise, execution pauses and `RunResult.interruptions` (or `RunResultStreaming.interruptions`) contains `ToolApprovalItem` entries with details such as `agent.name`, `name`, and `arguments`.
+4. Convert the result to a `RunState` with `result.to_state()`, call `state.approve(...)` or `state.reject(...)` (optionally passing `always_approve` or `always_reject`), and then resume with `Runner.run(agent, state)` or `Runner.run_streamed(agent, state)`.
+5. The resumed run continues where it left off and will re-enter this flow if new approvals are needed.
+
+## Example: pause, approve, resume
+
+The snippet below mirrors the JavaScript HITL guide: it pauses when a tool needs approval, persists state to disk, reloads it, and resumes after collecting a decision.
+
+```python
+import asyncio
+import json
+from pathlib import Path
+
+from agents import Agent, Runner, RunState, function_tool
+
+
+async def needs_oakland_approval(_ctx, params, _call_id) -> bool:
+    return "Oakland" in params.get("city", "")
+
+
+@function_tool(needs_approval=needs_oakland_approval)
+async def get_temperature(city: str) -> str:
+    return f"The temperature in {city} is 20° Celsius"
+
+
+agent = Agent(
+    name="Weather assistant",
+    instructions="Answer weather questions with the provided tools.",
+    tools=[get_temperature],
+)
+
+STATE_PATH = Path(".cache/hitl_state.json")
+
+
+def prompt_approval(tool_name: str, arguments: str | None) -> bool:
+    answer = input(f"Approve {tool_name} with {arguments}? [y/N]: ").strip().lower()
+    return answer in {"y", "yes"}
+
+
+async def main() -> None:
+    result = await Runner.run(agent, "What is the temperature in Oakland?")
+
+    while result.interruptions:
+        # Persist the paused state.
+        state = result.to_state()
+        STATE_PATH.parent.mkdir(parents=True, exist_ok=True)
+        STATE_PATH.write_text(state.to_string())
+
+        # Load the state later (could be a different process).
+        stored = json.loads(STATE_PATH.read_text())
+        state = await RunState.from_json(agent, stored)
+
+        for interruption in result.interruptions:
+            approved = await asyncio.get_running_loop().run_in_executor(
+                None, prompt_approval, interruption.name or "unknown_tool", interruption.arguments
+            )
+            if approved:
+                state.approve(interruption, always_approve=False)
+            else:
+                state.reject(interruption)
+
+        result = await Runner.run(agent, state)
+
+    print(result.final_output)
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+In this example, `prompt_approval` is synchronous because it uses `input()` and is executed with `run_in_executor(...)`. If your approval source is already asynchronous (for example, an HTTP request or async database query), you can use an `async def` function and `await` it directly instead.
+
+To stream output while waiting for approvals, call `Runner.run_streamed`, consume `result.stream_events()` until it completes, and then follow the same `result.to_state()` and resume steps shown above.
+
+## Other patterns in this repository
+
+- **Streaming approvals**: `examples/agent_patterns/human_in_the_loop_stream.py` shows how to drain `stream_events()` and then approve pending tool calls before resuming with `Runner.run_streamed(agent, state)`.
+- **Agent as tool approvals**: `Agent.as_tool(..., needs_approval=...)` applies the same interruption flow when delegated agent tasks need review.
+- **Shell and apply_patch tools**: `ShellTool` and `ApplyPatchTool` also support `needs_approval`. Use `state.approve(interruption, always_approve=True)` or `state.reject(..., always_reject=True)` to cache the decision for future calls. For automatic decisions, provide `on_approval` (see `examples/tools/shell.py`); for manual decisions, handle interruptions (see `examples/tools/shell_human_in_the_loop.py`).
+- **Local MCP servers**: Use `require_approval` on `MCPServerStdio` / `MCPServerSse` / `MCPServerStreamableHttp` to gate MCP tool calls (see `examples/mcp/get_all_mcp_tools_example/main.py` and `examples/mcp/tool_filter_example/main.py`).
+- **Hosted MCP servers**: Set `require_approval` to `"always"` on `HostedMCPTool` to force HITL, optionally providing `on_approval_request` to auto-approve or reject (see `examples/hosted_mcp/human_in_the_loop.py` and `examples/hosted_mcp/on_approval.py`). Use `"never"` for trusted servers (`examples/hosted_mcp/simple.py`).
+- **Sessions and memory**: Pass a session to `Runner.run` so approvals and conversation history survive multiple turns. SQLite and OpenAI Conversations session variants are in `examples/memory/memory_session_hitl_example.py` and `examples/memory/openai_session_hitl_example.py`.
+- **Realtime agents**: The realtime demo exposes WebSocket messages that approve or reject tool calls via `approve_tool_call` / `reject_tool_call` on the `RealtimeSession` (see `examples/realtime/app/server.py` for the server-side handlers).
+
+## Long-running approvals
+
+`RunState` is designed to be durable. Use `state.to_json()` or `state.to_string()` to store pending work in a database or queue and recreate it later with `RunState.from_json(...)` or `RunState.from_string(...)`. Pass `context_override` if you do not want to persist sensitive context data in the serialized payload.
+
+## Versioning pending tasks
+
+If approvals may sit for a while, store a version marker for your agent definitions or SDK alongside the serialized state. You can then route deserialization to the matching code path to avoid incompatibilities when models, prompts, or tool definitions change.
diff --git a/docs/mcp.md b/docs/mcp.md
@@ -24,6 +24,33 @@ matrix below summarises the options that the Python SDK supports.
 
 The sections below walk through each option, how to configure it, and when to prefer one transport over another.
 
+## Agent-level MCP configuration
+
+In addition to choosing a transport, you can tune how MCP tools are prepared by setting `Agent.mcp_config`.
+
+```python
+from agents import Agent
+
+agent = Agent(
+    name="Assistant",
+    mcp_servers=[server],
+    mcp_config={
+        # Try to convert MCP tool schemas to strict JSON schema.
+        "convert_schemas_to_strict": True,
+        # If None, MCP tool failures are raised as exceptions instead of
+        # returning model-visible error text.
+        "failure_error_function": None,
+    },
+)
+```
+
+Notes:
+
+- `convert_schemas_to_strict` is best-effort. If a schema cannot be converted, the original schema is used.
+- `failure_error_function` controls how MCP tool call failures are surfaced to the model.
+- When `failure_error_function` is unset, the SDK uses the default tool error formatter.
+- Server-level `failure_error_function` overrides `Agent.mcp_config["failure_error_function"]` for that server.
+
 ## 1. Hosted MCP server tools
 
 Hosted tools push the entire tool round-trip into OpenAI's infrastructure. Instead of your code listing and calling tools, the
@@ -178,6 +205,61 @@ The constructor accepts additional options:
 - `use_structured_content` toggles whether `tool_result.structured_content` is preferred over textual output.
 - `max_retry_attempts` and `retry_backoff_seconds_base` add automatic retries for `list_tools()` and `call_tool()`.
 - `tool_filter` lets you expose only a subset of tools (see [Tool filtering](#tool-filtering)).
+- `require_approval` enables human-in-the-loop approval policies on local MCP tools.
+- `failure_error_function` customizes model-visible MCP tool failure messages; set it to `None` to raise errors instead.
+- `tool_meta_resolver` injects per-call MCP `_meta` payloads before `call_tool()`.
+
+### Approval policies for local MCP servers
+
+`MCPServerStdio`, `MCPServerSse`, and `MCPServerStreamableHttp` all accept `require_approval`.
+
+Supported forms:
+
+- `"always"` or `"never"` for all tools.
+- `True` / `False` (equivalent to always/never).
+- A per-tool map, for example `{"delete_file": "always", "read_file": "never"}`.
+- A grouped object:
+  `{"always": {"tool_names": [...]}, "never": {"tool_names": [...]}}`.
+
+```python
+async with MCPServerStreamableHttp(
+    name="Filesystem MCP",
+    params={"url": "http://localhost:8000/mcp"},
+    require_approval={"always": {"tool_names": ["delete_file"]}},
+) as server:
+    ...
+```
+
+For a full pause/resume flow, see [Human-in-the-loop](human_in_the_loop.md) and `examples/mcp/get_all_mcp_tools_example/main.py`.
+
+### Per-call metadata with `tool_meta_resolver`
+
+Use `tool_meta_resolver` when your MCP server expects request metadata in `_meta` (for example, tenant IDs or trace context). The example below assumes you pass a `dict` as `context` to `Runner.run(...)`.
+
+```python
+from agents.mcp import MCPServerStreamableHttp, MCPToolMetaContext
+
+
+def resolve_meta(context: MCPToolMetaContext) -> dict[str, str] | None:
+    run_context_data = context.run_context.context or {}
+    tenant_id = run_context_data.get("tenant_id")
+    if tenant_id is None:
+        return None
+    return {"tenant_id": str(tenant_id), "source": "agents-sdk"}
+
+
+server = MCPServerStreamableHttp(
+    name="Metadata-aware MCP",
+    params={"url": "http://localhost:8000/mcp"},
+    tool_meta_resolver=resolve_meta,
+)
+```
+
+If your run context is a Pydantic model, dataclass, or custom class, read the tenant ID with attribute access instead.
+
+### MCP tool outputs: text and images
+
+When an MCP tool returns image content, the SDK maps it to image tool output entries automatically. Mixed text/image responses are forwarded as a list of output items, so agents can consume MCP image results the same way they consume image output from regular function tools.
 
 ## 3. HTTP with SSE MCP servers
 
diff --git a/docs/ref/agent_tool_input.md b/docs/ref/agent_tool_input.md
@@ -0,0 +1,3 @@
+# `Agent Tool Input`
+
+::: agents.agent_tool_input
diff --git a/docs/ref/agent_tool_state.md b/docs/ref/agent_tool_state.md
@@ -0,0 +1,3 @@
+# `Agent Tool State`
+
+::: agents.agent_tool_state
diff --git a/docs/ref/memory/session_settings.md b/docs/ref/memory/session_settings.md
@@ -0,0 +1,3 @@
+# `Session Settings`
+
+::: agents.memory.session_settings
diff --git a/docs/ref/run_config.md b/docs/ref/run_config.md
@@ -0,0 +1,3 @@
+# `Run Config`
+
+::: agents.run_config
diff --git a/docs/ref/run_error_handlers.md b/docs/ref/run_error_handlers.md
@@ -0,0 +1,3 @@
+# `Run Error Handlers`
+
+::: agents.run_error_handlers
diff --git a/docs/ref/run_internal/agent_runner_helpers.md b/docs/ref/run_internal/agent_runner_helpers.md
@@ -0,0 +1,3 @@
+# `Agent Runner Helpers`
+
+::: agents.run_internal.agent_runner_helpers
diff --git a/docs/ref/run_internal/approvals.md b/docs/ref/run_internal/approvals.md
@@ -0,0 +1,3 @@
+# `Approvals`
+
+::: agents.run_internal.approvals
diff --git a/docs/ref/run_internal/error_handlers.md b/docs/ref/run_internal/error_handlers.md
@@ -0,0 +1,3 @@
+# `Error Handlers`
+
+::: agents.run_internal.error_handlers
diff --git a/docs/ref/run_internal/guardrails.md b/docs/ref/run_internal/guardrails.md
@@ -0,0 +1,3 @@
+# `Guardrails`
+
+::: agents.run_internal.guardrails
diff --git a/docs/ref/run_internal/items.md b/docs/ref/run_internal/items.md
@@ -0,0 +1,3 @@
+# `Items`
+
+::: agents.run_internal.items
diff --git a/docs/ref/run_internal/oai_conversation.md b/docs/ref/run_internal/oai_conversation.md
@@ -0,0 +1,3 @@
+# `Oai Conversation`
+
+::: agents.run_internal.oai_conversation
diff --git a/docs/ref/run_internal/run_loop.md b/docs/ref/run_internal/run_loop.md
@@ -0,0 +1,3 @@
+# `Run Loop`
+
+::: agents.run_internal.run_loop
diff --git a/docs/ref/run_internal/run_steps.md b/docs/ref/run_internal/run_steps.md
@@ -0,0 +1,3 @@
+# `Run Steps`
+
+::: agents.run_internal.run_steps
diff --git a/docs/ref/run_internal/session_persistence.md b/docs/ref/run_internal/session_persistence.md
@@ -0,0 +1,3 @@
+# `Session Persistence`
+
+::: agents.run_internal.session_persistence
diff --git a/docs/ref/run_internal/streaming.md b/docs/ref/run_internal/streaming.md
@@ -0,0 +1,3 @@
+# `Streaming`
+
+::: agents.run_internal.streaming
diff --git a/docs/ref/run_internal/tool_actions.md b/docs/ref/run_internal/tool_actions.md
@@ -0,0 +1,3 @@
+# `Tool Actions`
+
+::: agents.run_internal.tool_actions
diff --git a/docs/ref/run_internal/tool_execution.md b/docs/ref/run_internal/tool_execution.md
@@ -0,0 +1,3 @@
+# `Tool Execution`
+
+::: agents.run_internal.tool_execution
diff --git a/docs/ref/run_internal/tool_planning.md b/docs/ref/run_internal/tool_planning.md
@@ -0,0 +1,3 @@
+# `Tool Planning`
+
+::: agents.run_internal.tool_planning
diff --git a/docs/ref/run_internal/tool_use_tracker.md b/docs/ref/run_internal/tool_use_tracker.md
@@ -0,0 +1,3 @@
+# `Tool Use Tracker`
+
+::: agents.run_internal.tool_use_tracker
diff --git a/docs/ref/run_internal/turn_preparation.md b/docs/ref/run_internal/turn_preparation.md
@@ -0,0 +1,3 @@
+# `Turn Preparation`
+
+::: agents.run_internal.turn_preparation
diff --git a/docs/ref/run_internal/turn_resolution.md b/docs/ref/run_internal/turn_resolution.md
@@ -0,0 +1,3 @@
+# `Turn Resolution`
+
+::: agents.run_internal.turn_resolution
diff --git a/docs/ref/run_state.md b/docs/ref/run_state.md
@@ -0,0 +1,3 @@
+# `Run State`
+
+::: agents.run_state
diff --git a/docs/ref/tracing/context.md b/docs/ref/tracing/context.md
@@ -0,0 +1,3 @@
+# `Context`
+
+::: agents.tracing.context
diff --git a/docs/ref/tracing/model_tracing.md b/docs/ref/tracing/model_tracing.md
@@ -0,0 +1,3 @@
+# `Model Tracing`
+
+::: agents.tracing.model_tracing
diff --git a/docs/release.md b/docs/release.md
@@ -19,6 +19,20 @@ We will increment `Z` for non-breaking changes:
 
 ## Breaking change changelog
 
+### 0.8.0
+
+In this version, two runtime behavior changes may require migration work:
+
+- Function tools wrapping **synchronous** Python callables now execute on worker threads via `asyncio.to_thread(...)` instead of running on the event loop thread. If your tool logic depends on thread-local state or thread-affine resources, migrate to an async tool implementation or make thread affinity explicit in your tool code.
+- Local MCP tool failure handling is now configurable, and the default behavior can return model-visible error output instead of failing the whole run. If you rely on fail-fast semantics, set `mcp_config={"failure_error_function": None}`. Server-level `failure_error_function` values override the agent-level setting, so set `failure_error_function=None` on each local MCP server that has an explicit handler.
+
+### 0.7.0
+
+In this version, there were a few behavior changes that can affect existing applications:
+
+- Nested handoff history is now **opt-in** (disabled by default). If you depended on the v0.6.x default nested behavior, explicitly set `RunConfig(nest_handoff_history=True)`.
+- The default `reasoning.effort` for `gpt-5.1` / `gpt-5.2` changed to `"none"` (from the previous default `"low"` configured by SDK defaults). If your prompts or quality/cost profile relied on `"low"`, set it explicitly in `model_settings`.
+
 ### 0.6.0
 
 In this version, the default handoff history is now packaged into a single assistant message instead of exposing the raw user/assistant turns, giving downstream agents a concise, predictable recap
diff --git a/docs/results.md b/docs/results.md
@@ -52,3 +52,30 @@ The [`raw_responses`][agents.result.RunResultBase.raw_responses] property contai
 ### Original input
 
 The [`input`][agents.result.RunResultBase.input] property contains the original input you provided to the `run` method. In most cases you won't need this, but it's available in case you do.
+
+### Interruptions and resuming runs
+
+If a run pauses for tool approval, pending approvals are exposed in [`interruptions`][agents.result.RunResultBase.interruptions]. Convert the result into a [`RunState`][agents.run_state.RunState] with `to_state()`, approve or reject the interruption(s), and resume with `Runner.run(...)` or `Runner.run_streamed(...)`.
+
+```python
+from agents import Agent, Runner
+
+agent = Agent(name="Assistant", instructions="Use tools when needed.")
+result = await Runner.run(agent, "Delete temp files that are no longer needed.")
+
+if result.interruptions:
+    state = result.to_state()
+    for interruption in result.interruptions:
+        state.approve(interruption)
+    result = await Runner.run(agent, state)
+```
+
+Both [`RunResult`][agents.result.RunResult] and [`RunResultStreaming`][agents.result.RunResultStreaming] support `to_state()`.
+
+### Convenience helpers
+
+`RunResultBase` includes a few helper methods/properties that are useful in production flows:
+
+- [`final_output_as(...)`][agents.result.RunResultBase.final_output_as] casts final output to a specific type (optionally with runtime type checking).
+- [`last_response_id`][agents.result.RunResultBase.last_response_id] returns the latest model response ID, useful for response chaining.
+- [`release_agents(...)`][agents.result.RunResultBase.release_agents] drops strong references to agents when you want to reduce memory pressure after inspecting results.
diff --git a/docs/running_agents.md b/docs/running_agents.md
diff --git a/docs/tools.md b/docs/tools.md
diff --git a/mkdocs.yml b/mkdocs.yml

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	+# `Agent Tool Input`
	`2`	`+`
	`3`	`+::: agents.agent_tool_input`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	+# `Agent Tool State`
	`2`	`+`
	`3`	`+::: agents.agent_tool_state`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	+# `Session Settings`
	`2`	`+`
	`3`	`+::: agents.memory.session_settings`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	+# `Run Config`
	`2`	`+`
	`3`	`+::: agents.run_config`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	+# `Run Error Handlers`
	`2`	`+`
	`3`	`+::: agents.run_error_handlers`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	+# `Agent Runner Helpers`
	`2`	`+`
	`3`	`+::: agents.run_internal.agent_runner_helpers`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	+# `Approvals`
	`2`	`+`
	`3`	`+::: agents.run_internal.approvals`