fix: Chat Completions streaming output indexes by Aphroq · Pull Request #3108 · openai/openai-agents-python

Aphroq · 2026-05-04T06:07:15Z

Summary

ensure Chat Completions fallback function calls receive distinct stream output_index values
compute assistant message output_index explicitly instead of passing a bool expression into event constructors
add regression coverage for multiple fallback tool calls and reasoning-before-text stream events

Test plan

uv run pytest tests/models/test_openai_chatcompletions_stream.py -k 'fallback_tool_calls_use_distinct_output_indexes'
uv run pytest tests/models/test_reasoning_content.py -k 'stream_response_yields_events_for_reasoning_content'
bash .agents/skills/code-change-verification/scripts/run.sh

Issue number

Closes #3104
Closes #3109

Checks

I've added new tests (if relevant)
I've added/updated the relevant documentation
I've run make lint and make format
I've made sure tests pass

seratch · 2026-05-04T08:56:14Z

@codex review

chatgpt-codex-connector · 2026-05-04T08:58:48Z

Codex Review: Didn't find any major issues. Another round soon, please!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

seratch

Thanks for the PR. I think this is directionally valuable: the fallback-only duplicate output_index bug is real, and the explicit integer helper for assistant message indexes also fixes the True vs 1 issue after reasoning output.

Before merging, could you cover and fix one mixed case as well? If one tool call is first seen without enough data to enter the real-time streaming path, and a later tool call does enter the real-time streaming path, the stream event indexes can still disagree with the final response.output order.

A minimal shape is:

tool call index 0 arrives without id, so it remains fallback-only
tool call index 1 arrives with id and name, so it starts streaming immediately
the completed response output order is [fallback_first, streamed_second]

With the current PR, the streamed call can emit response.output_item.added at output_index=0, while the fallback call is emitted later at output_index=1. That makes the event indexes map to [streamed_second, fallback_first], which does not match the completed response output order.

Could you add a regression test for mixed fallback + streamed tool calls and adjust the index assignment so all tool calls use stable indexes that match the final output order?

Aphroq · 2026-05-04T10:00:50Z

Thanks for the detailed catch. Agreed — I’ll add regression coverage for this path and update the implementation accordingly.

Aphroq · 2026-05-04T12:11:46Z

Updated based on the feedback.

I kept the fix focused on the Chat Completions streaming adapter index semantics: each function call now gets a stable output_index when it is first tracked, and both the streamed and fallback paths reuse that same index. This keeps stream event output_index values aligned with the final response.completed.response.output positions.

Added/expanded regression coverage for:

fallback-only multiple tool calls using distinct indexes
mixed fallback + streamed tool calls matching final output order
tool call indexes offset correctly after assistant text
tool call indexes offset correctly after reasoning output

Full local verification passed.

seratch · 2026-05-04T13:19:11Z

Thanks for the update. The mixed fallback + streamed tool call case from the previous review is fixed now, and the added coverage looks good for fallback-only, mixed tool calls, text-before-tools, and reasoning-before-tools.

I found one remaining ordering regression before merging: when a tool call is first seen before any assistant text, and assistant text is streamed later, the new "assign the function call output_index when first tracked" approach can reserve 0 for the function call. The final completed response still orders the assistant message before function calls, so the message also uses output_index=0 and the function call should be shifted to 1.

Minimal shape:

chunk 1 has tool_calls=[index=0] without id, so it stays fallback-only
chunk 2 has content="answer"
response.completed.response.output is [message, function_call]

On the PR head, the function call response.output_item.added is emitted with output_index=0, colliding with the assistant message. The same probe passes on the PR base because the fallback output index is computed after text exists.

Could you add a regression test for tool-call-before-text and adjust the stable index assignment so late-created assistant messages can still occupy their final output position?

Aphroq · 2026-05-04T22:06:07Z

Updated the PR to address the remaining tool-call-before-text regression.

The fallback function-call path no longer keeps the absolute output_index assigned when the tool call is first observed. Fallback events now compute their index from the final output layout at completion time, so a late-created assistant message can still occupy output index 0 and the fallback function call shifts to 1.

Added regression coverage :

a fallback-only tool call is observed first
assistant text is streamed later
the completed response output is [message, function_call]

Aphroq · 2026-05-04T22:08:01Z

I also checked adjacent ordering permutations. The remaining review case is covered in this PR; broader real-time-streamed-tool-call-before-late-text behavior appears to involve a separate tradeoff between preserving immediate argument streaming and delaying events until final output ordering is known. I did not expand this PR into that behavior change.

seratch · 2026-05-05T00:10:15Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 11b0cedaba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…ck-output-index

Aphroq · 2026-05-05T01:26:57Z

Updated the PR to address the streamed-tool-before-late-text ordering issue from the latest review.

The Chat Completions streaming adapter now defers function-call added/delta events when a streamed tool call is observed before the final assistant-message position is known. Once text/refusal appears, or once the stream ends, the buffered function-call events are flushed using the final output layout. This keeps response.output_item.added, response.function_call_arguments.delta, and response.output_item.done indexes aligned with response.completed.response.output.

Added regression coverage for:

streamed tool call before late assistant text
reasoning, then streamed tool call, then late assistant text

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: abb916e82c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-05T15:21:58Z

+        if not (force or cls._can_emit_function_call_events(state)):
+            return


Restore real-time streaming for pure tool-call turns

This guard prevents flushing function-call added/delta events until text/refusal exists, so the common Chat Completions turn that emits only tool_calls now buffers every argument delta until the stream ends via the final forced flush. That regresses the documented real-time function-call streaming behavior and delays consumers from seeing long tool arguments as they arrive.

Useful? React with 👍 / 👎.

github-actions Bot added the feature:chat-completions label May 4, 2026

Aphroq changed the title ~~Fix Chat Completions fallback output indexes~~ Fix Chat Completions streaming output indexes May 4, 2026

Aphroq mentioned this pull request May 4, 2026

fix: emit integer chat stream output indexes #3112

Closed

4 tasks

seratch requested changes May 4, 2026

View reviewed changes

seratch added this to the 0.15.x milestone May 4, 2026

seratch changed the title ~~Fix Chat Completions streaming output indexes~~ fix: Chat Completions streaming output indexes May 4, 2026

github-actions Bot added the bug Something isn't working label May 4, 2026

Aphroq added 3 commits May 4, 2026 11:50

Fix chat completions fallback output indexes

aaca35e

fix: emit integer chat stream output indexes

c303f92

fix: stabilize chat completions tool call indexes

49142b5

Aphroq force-pushed the fix/chatcmpl-fallback-output-index branch from ba02386 to 49142b5 Compare May 4, 2026 12:02

fix: handle late text after fallback tool calls

bb4f9ca

Merge branch 'main' into fix/chatcmpl-fallback-output-index

11b0ced

seratch mentioned this pull request May 5, 2026

fix: emit integer output_index in Chat Completions streaming events #3113

Closed

chatgpt-codex-connector Bot reviewed May 5, 2026

View reviewed changes

Comment thread src/agents/models/chatcmpl_stream_handler.py Outdated

Aphroq added 2 commits May 5, 2026 00:52

Merge remote-tracking branch 'upstream/main' into fix/chatcmpl-fallba…

792096f

…ck-output-index

fix: defer streamed tool indexes until order is known

abb916e

chatgpt-codex-connector Bot reviewed May 5, 2026

View reviewed changes

seratch mentioned this pull request May 5, 2026

fix: use integer output_index instead of boolean in streaming handler #3119

Closed

Aphroq marked this pull request as draft May 6, 2026 00:08

		if not (force or cls._can_emit_function_call_events(state)):
		return

Conversation

Aphroq commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Issue number

Checks

Uh oh!

seratch commented May 4, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 4, 2026

Uh oh!

seratch left a comment

Choose a reason for hiding this comment

Uh oh!

Aphroq commented May 4, 2026

Uh oh!

Aphroq commented May 4, 2026

Uh oh!

seratch commented May 4, 2026

Uh oh!

Aphroq commented May 4, 2026

Uh oh!

Aphroq commented May 4, 2026

Uh oh!

seratch commented May 5, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Aphroq commented May 5, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Aphroq commented May 4, 2026 •

edited

Loading