Skip to content

Responses streaming treats failed and incomplete terminal events as success #3106

@Aphroq

Description

@Aphroq

Please read this first

  • Have you read the docs? Agents SDK docs
  • Have you searched for related issues? Others may have faced similar issues.

Describe the bug

Responses streaming treats response.failed and response.incomplete terminal events as usable final responses in a few paths. That can allow a run to finish successfully even though the Responses API reported a failed or incomplete response.

Relevant code:

  • src/agents/models/openai_responses.py: stream_response() stores the response payload from response.failed and response.incomplete as final_response.
  • src/agents/run_internal/run_loop.py: streamed runner finalization converts response.failed and response.incomplete payloads into ModelResponse.
  • src/agents/models/openai_responses.py: websocket get_response() also accepts response.failed and response.incomplete terminal payloads.

Current tests also encode this behavior for both model-level websocket handling and streamed runner handling.

Debug information

  • Agents SDK version: main at 3854c124cb8e3e51fb660f5714405ee39ee86c5e
  • Python version: Python 3.12

Repro steps

Minimal reproducer outline:

from collections.abc import AsyncIterator

import pytest
from openai.types.responses import Response, ResponseOutputMessage, ResponseOutputText

from agents import Agent, Runner
from agents.items import TResponseStreamEvent
from agents.models.interface import Model, ModelTracing
from agents.model_settings import ModelSettings
from agents.usage import Usage


class FailedStreamingModel(Model):
    async def get_response(self, *args, **kwargs):
        raise NotImplementedError

    async def stream_response(
        self,
        system_instructions,
        input,
        model_settings: ModelSettings,
        tools,
        output_schema,
        handoffs,
        tracing: ModelTracing,
        *,
        previous_response_id=None,
        conversation_id=None,
        prompt=None,
    ) -> AsyncIterator[TResponseStreamEvent]:
        response = Response(
            id="resp_failed",
            created_at=0,
            model="fake",
            object="response",
            output=[
                ResponseOutputMessage(
                    id="msg_1",
                    content=[
                        ResponseOutputText(
                            text="partial output from failed response",
                            type="output_text",
                            annotations=[],
                        )
                    ],
                    role="assistant",
                    type="message",
                    status="completed",
                )
            ],
            tool_choice="none",
            tools=[],
            parallel_tool_calls=False,
        )
        response.status = "failed"

        event = type(
            "ResponseFailedEvent",
            (),
            {
                "type": "response.failed",
                "response": response,
                "sequence_number": 0,
            },
        )()
        yield event


@pytest.mark.asyncio
async def test_failed_stream_should_not_succeed():
    agent = Agent(name="test", model=FailedStreamingModel())

    result = Runner.run_streamed(agent, "hello")
    async for _ in result.stream_events():
        pass

    print(result.final_output)

Current behavior on main: the streamed run can complete and expose partial output from the failed response.

partial output from failed response

Expected behavior

response.failed and response.incomplete terminal events should fail the model/run path instead of becoming successful final responses. The raised error should include the terminal event type and any available response status/error/incomplete details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions