Skip to content

fix(openai oauth): drop item_reference + stored-item id refs on OAuth path#2523

Open
yetone wants to merge 1 commit into
Wei-Shaw:mainfrom
yetone:fix-oauth-item-reference-404
Open

fix(openai oauth): drop item_reference + stored-item id refs on OAuth path#2523
yetone wants to merge 1 commit into
Wei-Shaw:mainfrom
yetone:fix-oauth-item-reference-404

Conversation

@yetone
Copy link
Copy Markdown

@yetone yetone commented May 16, 2026

Problem

Every continuation request (i.e. any /v1/responses request with function_call_output items, or item_reference items, in input) routed through the OAuth path is currently guaranteed to fail with HTTP 502, masking an upstream HTTP 404:

OpenAI upstream error 404 (account=N platform=openai type=oauth):
{
  "error": {
    "message": "Item with id 'fc_<...>' not found. Items are not persisted when `store` is set to false. Try again with `store` set to true, or remove this item from your input.",
    "type": "invalid_request_error",
    "param": "input"
  }
}

In production this manifests as a 100% failure rate for any client that uses tool-calling continuation through OAuth — no successful turn after the first one is ever possible.

Root cause

The OAuth path forces store=false on the upstream request — both in applyCodexOAuthTransform (`openai_codex_transform.go` around line 138) and again as a belt-and-braces in `normalizeOpenAIPassthroughOAuthBody` (`openai_gateway_service.go` around line 5828). With store=false, ChatGPT's internal Responses backend never persists any item, so any `item_reference` (and any other item carrying an `id` from a prior turn) is guaranteed to 404.

How we got into this state:

  • 70eaa45 ('fix(网关): 修复工具续链校验与存储策略') tried to fix tool continuation by (a) forcing `store=true` on continuation turns and (b) preserving `item_reference` items + per-item `id` fields in the filtered input. The two halves are load-bearing for each other: preserve-references only makes sense if store=true means upstream actually has those items.
  • 3663951 ('fix(网关): OAuth 请求强制 store=false') reverted half (a) the next day after upstream returned 'Store must be set to false', restoring the unconditional `store=false` block. It did not revert half (b) — `filterCodexInputWithOptions` was left with `PreserveReferences: needsToolContinuation`.

So every continuation request now goes upstream with `store=false` plus preserved `item_reference` items / `id` fields → guaranteed 404 → 502 to the client.

Fix

Pass `PreserveReferences: false` unconditionally on the OAuth path. The function_call + matching function_call_output items inlined in the same request body already carry the full continuation context — that's the standard Responses-API continuation shape and upstream accepts it without any stored-item references.

Drop the now-vestigial `needsToolContinuation` local (it was only read by the line we're changing).

Update the three tests that had been written to lock in 70eaa45's intermediate behavior (`TestApplyCodexOAuthTransform_ToolContinuationPreservesInput`, `...PreservesNativeMessageAndReasoningIDs`, `...NormalizesToolReferenceIDsOnly`) so they assert the corrected behavior:

  • `item_reference` items dropped on the OAuth path
  • stored-item `id` fields stripped from surviving items
  • `function_call_output` items kept; `call_id` normalized `call_` → `fc_` (unchanged)

Verification (production)

Deployed this patch on top of `cumora-prod` in our fork and rolled it to GKE. Sampled `/v1/responses` status codes on the OAuth path before and after a probe wake of 4 agent pods:

Status Before fix After fix (2 min sample)
200 OK 0 18
502 (this bug — 'Items are not persisted when store is set to false') high 0
502 (unrelated 'No tool call found …' — separate client-side issue) masked 6

The original failure mode is fully eliminated. The remaining 502s carry a different upstream message and trace to a separate client-side bug unrelated to this PR.

Test plan

  • `go build ./...` succeeds
  • `go vet` clean
  • Three updated tests pass and now express the correct contract (item_reference dropped, stored-item id stripped, function_call_output preserved with call_id normalized)
  • Verified end-to-end against a live ChatGPT OAuth account through `normalizeOpenAIPassthroughOAuthBody` → `/v1/responses` upstream

… path

ChatGPT's internal /v1/responses backend rejects any item_reference (or
any other item carrying an `id` from a prior turn) with HTTP 404:

  "Item with id '<fc_*|rs_*|...>' not found. Items are not persisted
   when `store` is set to false. Try again with `store` set to true,
   or remove this item from your input."

We force `store=false` on OAuth (line ~138, and again in
normalizeOpenAIPassthroughOAuthBody — line ~5828), so by construction
upstream never has any persisted item to reference. Every continuation
request was therefore guaranteed to 404.

How we got here: 70eaa45 ("修复工具续链校验与存储策略") tried to fix
continuation by forcing store=true on continuation turns AND preserving
item_reference items in the input. 3663951 reverted the store=true half
the next day after upstream returned "Store must be set to false", but
left the preserve-references half in place. The two halves were
load-bearing for each other; with only one of them, every continuation
request 404s.

Fix: pass PreserveReferences=false unconditionally on the OAuth path.
The function_call + matching function_call_output items inlined in the
same request body already carry the full continuation context — that's
the standard Responses-API continuation shape and upstream accepts it
fine without any stored-item references.

Drop the now-vestigial `needsToolContinuation` variable (it was only
read by the line we're changing). Update the three tests that locked
in the broken intermediate behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant