fix: Chat Completions streaming output indexes#3108
fix: Chat Completions streaming output indexes#3108Aphroq wants to merge 7 commits intoopenai:mainfrom
Conversation
|
@codex review |
|
Codex Review: Didn't find any major issues. Another round soon, please! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
seratch
left a comment
There was a problem hiding this comment.
Thanks for the PR. I think this is directionally valuable: the fallback-only duplicate output_index bug is real, and the explicit integer helper for assistant message indexes also fixes the True vs 1 issue after reasoning output.
Before merging, could you cover and fix one mixed case as well? If one tool call is first seen without enough data to enter the real-time streaming path, and a later tool call does enter the real-time streaming path, the stream event indexes can still disagree with the final response.output order.
A minimal shape is:
- tool call index 0 arrives without
id, so it remains fallback-only - tool call index 1 arrives with
idandname, so it starts streaming immediately - the completed response output order is
[fallback_first, streamed_second]
With the current PR, the streamed call can emit response.output_item.added at output_index=0, while the fallback call is emitted later at output_index=1. That makes the event indexes map to [streamed_second, fallback_first], which does not match the completed response output order.
Could you add a regression test for mixed fallback + streamed tool calls and adjust the index assignment so all tool calls use stable indexes that match the final output order?
|
Thanks for the detailed catch. Agreed — I’ll add regression coverage for this path and update the implementation accordingly. |
ba02386 to
49142b5
Compare
|
Updated based on the feedback. I kept the fix focused on the Chat Completions streaming adapter index semantics: each function call now gets a stable Added/expanded regression coverage for:
Full local verification passed. |
|
Thanks for the update. The mixed fallback + streamed tool call case from the previous review is fixed now, and the added coverage looks good for fallback-only, mixed tool calls, text-before-tools, and reasoning-before-tools. I found one remaining ordering regression before merging: when a tool call is first seen before any assistant text, and assistant text is streamed later, the new "assign the function call output_index when first tracked" approach can reserve Minimal shape:
On the PR head, the function call Could you add a regression test for tool-call-before-text and adjust the stable index assignment so late-created assistant messages can still occupy their final output position? |
|
Updated the PR to address the remaining tool-call-before-text regression. The fallback function-call path no longer keeps the absolute Added regression coverage :
|
|
I also checked adjacent ordering permutations. The remaining review case is covered in this PR; broader real-time-streamed-tool-call-before-late-text behavior appears to involve a separate tradeoff between preserving immediate argument streaming and delaying events until final output ordering is known. I did not expand this PR into that behavior change. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 11b0cedaba
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Updated the PR to address the streamed-tool-before-late-text ordering issue from the latest review. The Chat Completions streaming adapter now defers function-call added/delta events when a streamed tool call is observed before the final assistant-message position is known. Once text/refusal appears, or once the stream ends, the buffered function-call events are flushed using the final output layout. This keeps Added regression coverage for:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: abb916e82c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if not (force or cls._can_emit_function_call_events(state)): | ||
| return |
There was a problem hiding this comment.
Restore real-time streaming for pure tool-call turns
This guard prevents flushing function-call added/delta events until text/refusal exists, so the common Chat Completions turn that emits only tool_calls now buffers every argument delta until the stream ends via the final forced flush. That regresses the documented real-time function-call streaming behavior and delays consumers from seeing long tool arguments as they arrive.
Useful? React with 👍 / 👎.
Summary
output_indexvaluesoutput_indexexplicitly instead of passing a bool expression into event constructorsTest plan
uv run pytest tests/models/test_openai_chatcompletions_stream.py -k 'fallback_tool_calls_use_distinct_output_indexes'uv run pytest tests/models/test_reasoning_content.py -k 'stream_response_yields_events_for_reasoning_content'bash .agents/skills/code-change-verification/scripts/run.shIssue number
Closes #3104
Closes #3109
Checks
make lintandmake format