Skip to content

Python: [Bug]: background=True causes infinite tool-call loop, tool-result submissions inherit background mode #5394

@Laende

Description

@Laende

Correction (updated after deeper investigation):
My initial diagnosis blamed background=True leaking into tool-result submissions. That was wrong. The actual culprit is continuation_token leaking into mutable_options across tool-loop iterations, causing the SDK to repeatedly GET the same completed response instead of POSTing tool results.

The HTTP log is the smoking gun: 1 POST + ~40 GETs all to the same response_id, no tool-result POSTs ever happen. See the updated Description below. A runtime monkeypatch that fixes it is verified working and included at the bottom.

background=True + function tools: SDK loops on responses.retrieve() of the same response, never POSTs tool results, returns empty text

Description

SDK version: agent-framework-core==1.1.0, agent-framework-openai==1.1.0
Python version:: 3.14

When an agent runs with background=True and has local function tools, the SDK gets stuck retrieving the same completed background response on every tool-loop iteration. Tool results are never POSTed back. After max_iterations (default 40) the loop exits and response.text is empty.

What actually happens

Verified from httpx INFO logs: the traffic is 1 POST followed by roughly 40 GETs, and every GET hits the same response_id.

  1. Caller does agent.run(messages, options={background: True}). This POSTs and gets back a continuation_token.
  2. Caller polls with agent.run(session=session, options={continuation_token: X}) in a loop.
  3. Eventually RawOpenAIChatClient._inner_get_response() sees continuation_token in options and calls client.responses.retrieve(response_id). The retrieve returns a completed response with function_call output items.
  4. FunctionInvocationLayer executes the tools locally and enters the next tool-loop iteration.
  5. mutable_options (the shared dict inside FunctionInvocationLayer.get_response()) still contains continuation_token, so the next super_get_response() call does another responses.retrieve() to the same response_id.
  6. Same completed response, same function_call items. Tools run again.
  7. Repeat up to 40 times. No tool result is ever POSTed. The model never gets the tool outputs. Final response text is empty.

Root cause

FunctionInvocationLayer.get_response() copies all options into a mutable_options dict (around line 59 of _tools.py) and reuses that same dict for every super_get_response() call in the tool loop. When the caller's polling code passes continuation_token in options, it leaks into every iteration. _inner_get_response() sees it and keeps doing GET retrieves instead of POSTing tool results.

background=True has the same problem. If only continuation_token were fixed, tool-result POSTs would still start new background jobs instead of running synchronously.

For comparison, conversation_id is handled correctly: it gets updated on mutable_options by _update_continuation_state after each response. continuation_token needs the inverse treatment, it should be removed from mutable_options once the background job returns a completed response.

Code Sample

Minimal reproduction. One tool, background mode, structured output.

import asyncio
from agent_framework.openai import OpenAIChatClient, OpenAIChatOptions
from agent_framework import AgentSession, Message, tool
from pydantic import BaseModel


@tool(name="search", description="Search documents for information")
def search(query: str) -> str:
    return f"Document contains info about: {query}"


class Output(BaseModel):
    answer: str


client = OpenAIChatClient(
    api_key="YOUR_KEY",
    model="gpt-4.1",
    azure_endpoint="https://YOUR_ENDPOINT",
    api_version="preview",
)

agent = client.as_agent(
    instructions="Use the search tool to find info, then answer.",
    name="repro_agent",
    tools=[search],
    default_options=OpenAIChatOptions(response_format=Output),
)


async def reproduce():
    session = AgentSession()
    messages = [Message(role="user", contents=["Search for 'project name'"])]

    response = await agent.run(
        messages, session=session, options=OpenAIChatOptions(background=True)
    )

    poll_count = 0
    while response.continuation_token is not None:
        poll_count += 1
        await asyncio.sleep(2.0)
        response = await agent.run(
            session=session,
            options=OpenAIChatOptions(continuation_token=response.continuation_token),
        )

    print(f"Polls: {poll_count}")
    print(f"response.text: '{response.text}'")  # empty


asyncio.run(reproduce())

Expected: response.text contains valid JSON matching Output. Tool called roughly once.

Actual: response.text is empty. Tool called 40+ times with identical arguments.

Error Messages / Stack Traces

No exception is raised. The bug is silent. The SDK exhausts all 40 iterations and returns an empty response.

HTTP traffic (before fix)

From httpx INFO logs. Note the identical response_id on every GET.

POST .../openai/v1/responses?api-version=preview "HTTP/1.1 200 OK"
  Poll 1...
GET  .../openai/v1/responses/resp_058e072487625bc8...?api-version=preview "HTTP/1.1 200 OK"
  Poll 2...
GET  .../openai/v1/responses/resp_058e072487625bc8...?api-version=preview "HTTP/1.1 200 OK"
  Poll 3...
GET  .../openai/v1/responses/resp_058e072487625bc8...?api-version=preview "HTTP/1.1 200 OK"
  (~40 more GETs to the SAME resp_058e072487625bc8... id)
GET  .../openai/v1/responses/resp_058e072487625bc8...?api-version=preview "HTTP/1.1 200 OK"

Exactly one POST. Zero tool-result POSTs. Every other request is a GET retrieve of the same response_id. This is what proves the SDK is stuck retrieving the same completed response instead of looping through fresh tool-result submissions.

SDK log when the loop exhausts

Maximum iterations reached (40). Requesting final response without tools.

Observed output

--- Result (polls=3) ---
Text length: 0
Text (raw):

Tool calls made: 120
  {'tool': 'search_documents', 'query': 'prosjektnavn adresse byggherre...', 'hits': 0}
  {'tool': 'search_documents', 'query': 'tilbudsinnbydelse prosjekt byggherre adresse', 'hits': 0}
  {'tool': 'search_documents', 'query': 'konkurransegrunnlag prosjektbeskrivelse...', 'hits': 0}
  (same 3 queries repeated 40 times = 120 calls)

*** EMPTY RESPONSE TEXT ***

The same tool calls repeat identically on every iteration because the SDK retrieves the same completed background response every time. Tool results are never POSTed, so the model never produces a final answer.

Additional Context

Affected code paths

Location What it does
agent_framework._tools.FunctionInvocationLayer.get_response() around line 59 mutable_options = dict(options). Copies continuation_token and background into the dict shared across all loop iterations.
agent_framework._tools.FunctionInvocationLayer.get_response() around line 189 super_get_response(options=mutable_options). Reuses the same dict on every iteration, including after a retrieve has already returned a completed response.
agent_framework_openai._chat_client.RawOpenAIChatClient._inner_get_response() Reads continuation_token from options. If set, does client.responses.retrieve(response_id) instead of POSTing. Never strips it after a completed retrieve.

Workaround

Do not combine background mode with tools:

use_background = tools is None

Runtime monkeypatch (verified)

Patch RawOpenAIChatClient._inner_get_response so that once a responses.retrieve() returns a non-in-progress response, it removes continuation_token and background from the shared options dict. The FunctionInvocationLayer passes the same dict reference across iterations, so mutating it in place is enough to make the next iteration POST tool results properly.

from agent_framework_openai._chat_client import RawOpenAIChatClient

if not hasattr(RawOpenAIChatClient, "_true_original_inner_get_response"):
    RawOpenAIChatClient._true_original_inner_get_response = RawOpenAIChatClient._inner_get_response


def _patched_inner_get_response(self, *, messages, options, stream=False, **kwargs):
    continuation_token = options.get("continuation_token")
    if not stream and continuation_token is not None:
        async def _run():
            validated_options = await self._validate_options(options)
            try:
                raw_response = await self.client.responses.retrieve(continuation_token["response_id"])
            except Exception as ex:
                self._handle_request_error(ex)
            chat_response = self._parse_response_from_openai(raw_response, options=validated_options)
            if chat_response.continuation_token is None and isinstance(options, dict):
                options.pop("continuation_token", None)
                options.pop("background", None)
            return chat_response
        return _run()
    return RawOpenAIChatClient._true_original_inner_get_response(
        self, messages=messages, options=options, stream=stream, **kwargs
    )


RawOpenAIChatClient._inner_get_response = _patched_inner_get_response

HTTP traffic with the monkeypatch applied

POST .../openai/v1/responses?api-version=preview "HTTP/1.1 200 OK"
  Poll 1...
GET  .../openai/v1/responses/resp_01a797ec4c3e877b...?api-version=preview "HTTP/1.1 200 OK"
  Poll 2...
GET  .../openai/v1/responses/resp_01a797ec4c3e877b...?api-version=preview "HTTP/1.1 200 OK"
  Poll 3...
GET  .../openai/v1/responses/resp_01a797ec4c3e877b...?api-version=preview "HTTP/1.1 200 OK"
  [MONKEYPATCH] stripped continuation_token+background from mutable_options
POST .../openai/v1/responses?api-version=preview "HTTP/1.1 200 OK"
POST .../openai/v1/responses?api-version=preview "HTTP/1.1 200 OK"
  (roughly 16 tool-result POSTs)
POST .../openai/v1/responses?api-version=preview "HTTP/1.1 200 OK"

Result: valid JSON response matching the schema, tools called with varied queries (normal agent behavior) rather than the same 3 queries repeated 40 times.

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions