Please read this first
Describe the bug
Responses streaming treats response.failed and response.incomplete terminal events as usable final responses in a few paths. That can allow a run to finish successfully even though the Responses API reported a failed or incomplete response.
Relevant code:
src/agents/models/openai_responses.py: stream_response() stores the response payload from response.failed and response.incomplete as final_response.
src/agents/run_internal/run_loop.py: streamed runner finalization converts response.failed and response.incomplete payloads into ModelResponse.
src/agents/models/openai_responses.py: websocket get_response() also accepts response.failed and response.incomplete terminal payloads.
Current tests also encode this behavior for both model-level websocket handling and streamed runner handling.
Debug information
- Agents SDK version:
main at 3854c124cb8e3e51fb660f5714405ee39ee86c5e
- Python version: Python 3.12
Repro steps
Minimal reproducer outline:
from collections.abc import AsyncIterator
import pytest
from openai.types.responses import Response, ResponseOutputMessage, ResponseOutputText
from agents import Agent, Runner
from agents.items import TResponseStreamEvent
from agents.models.interface import Model, ModelTracing
from agents.model_settings import ModelSettings
from agents.usage import Usage
class FailedStreamingModel(Model):
async def get_response(self, *args, **kwargs):
raise NotImplementedError
async def stream_response(
self,
system_instructions,
input,
model_settings: ModelSettings,
tools,
output_schema,
handoffs,
tracing: ModelTracing,
*,
previous_response_id=None,
conversation_id=None,
prompt=None,
) -> AsyncIterator[TResponseStreamEvent]:
response = Response(
id="resp_failed",
created_at=0,
model="fake",
object="response",
output=[
ResponseOutputMessage(
id="msg_1",
content=[
ResponseOutputText(
text="partial output from failed response",
type="output_text",
annotations=[],
)
],
role="assistant",
type="message",
status="completed",
)
],
tool_choice="none",
tools=[],
parallel_tool_calls=False,
)
response.status = "failed"
event = type(
"ResponseFailedEvent",
(),
{
"type": "response.failed",
"response": response,
"sequence_number": 0,
},
)()
yield event
@pytest.mark.asyncio
async def test_failed_stream_should_not_succeed():
agent = Agent(name="test", model=FailedStreamingModel())
result = Runner.run_streamed(agent, "hello")
async for _ in result.stream_events():
pass
print(result.final_output)
Current behavior on main: the streamed run can complete and expose partial output from the failed response.
partial output from failed response
Expected behavior
response.failed and response.incomplete terminal events should fail the model/run path instead of becoming successful final responses. The raised error should include the terminal event type and any available response status/error/incomplete details.
Please read this first
Describe the bug
Responses streaming treats
response.failedandresponse.incompleteterminal events as usable final responses in a few paths. That can allow a run to finish successfully even though the Responses API reported a failed or incomplete response.Relevant code:
src/agents/models/openai_responses.py:stream_response()stores the response payload fromresponse.failedandresponse.incompleteasfinal_response.src/agents/run_internal/run_loop.py: streamed runner finalization convertsresponse.failedandresponse.incompletepayloads intoModelResponse.src/agents/models/openai_responses.py: websocketget_response()also acceptsresponse.failedandresponse.incompleteterminal payloads.Current tests also encode this behavior for both model-level websocket handling and streamed runner handling.
Debug information
mainat3854c124cb8e3e51fb660f5714405ee39ee86c5eRepro steps
Minimal reproducer outline:
Current behavior on
main: the streamed run can complete and expose partial output from the failed response.Expected behavior
response.failedandresponse.incompleteterminal events should fail the model/run path instead of becoming successful final responses. The raised error should include the terminal event type and any available response status/error/incomplete details.