Skip to content

Fix xAI Responses input normalization for replayed histories#3734

Open
GGBond6 wants to merge 2 commits into
router-for-me:devfrom
GGBond6:fix/xai-responses-input-normalization
Open

Fix xAI Responses input normalization for replayed histories#3734
GGBond6 wants to merge 2 commits into
router-for-me:devfrom
GGBond6:fix/xai-responses-input-normalization

Conversation

@GGBond6

@GGBond6 GGBond6 commented Jun 5, 2026

Copy link
Copy Markdown

Summary

  • Normalize replayed custom tool call input items to xAI-compatible function call items
  • Drop replayed web search trace items that xAI does not accept as ModelInput
  • Remove replayed reasoning encrypted_content before forwarding requests to xAI

Test

  • go test -count=1 ./internal/runtime/executor

@github-actions github-actions Bot changed the base branch from main to dev June 5, 2026 18:01
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

This pull request targeted main.

The base branch has been automatically changed to dev.

@GGBond6 GGBond6 changed the base branch from dev to main June 5, 2026 18:02
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

This pull request targeted main.

The base branch has been automatically changed to dev.

@github-actions github-actions Bot changed the base branch from main to dev June 5, 2026 18:02

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces input normalization and filtering for the xAI executor, including converting custom tool calls to standard function calls, dropping web search calls, and removing encrypted content from reasoning items. The reviewer noted that dropping 'web_search_call' items without also dropping their corresponding 'web_search_call_output' items will lead to validation errors, and suggested updating both the implementation and the tests to handle this case.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +858 to +864
for _, item := range input.Array() {
if item.Get("type").String() == "web_search_call" {
changed = true
continue
}
items = append(items, json.RawMessage(item.Raw))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When dropping web_search_call items from the replayed history, any corresponding web_search_call_output items must also be dropped. Leaving orphaned tool outputs in the history will cause xAI (and other OpenAI-compatible APIs) to reject the request with a validation error (e.g., unmatched tool response).

	for _, item := range input.Array() {
		itemType := item.Get(

Comment on lines +675 to +696
func TestDropXAIInputWebSearchCalls(t *testing.T) {
body := []byte(`{"input":[{"type":"message","role":"user","content":"hi"},{"type":"web_search_call","status":"completed","action":{"type":"search","query":"test","queries":["test"]}},{"type":"function_call","call_id":"call_1","name":"lookup","arguments":"{}"},{"type":"function_call_output","call_id":"call_1","output":"ok"}]}`)
out := dropXAIInputWebSearchCalls(body)

if got := len(gjson.GetBytes(out, "input").Array()); got != 3 {
t.Fatalf("input length = %d, want 3: %s", got, string(out))
}
if got := gjson.GetBytes(out, "input.0.type").String(); got != "message" {
t.Fatalf("input.0.type = %q, want message: %s", got, string(out))
}
if got := gjson.GetBytes(out, "input.1.type").String(); got != "function_call" {
t.Fatalf("input.1.type = %q, want function_call: %s", got, string(out))
}
if got := gjson.GetBytes(out, "input.2.type").String(); got != "function_call_output" {
t.Fatalf("input.2.type = %q, want function_call_output: %s", got, string(out))
}
for _, item := range gjson.GetBytes(out, "input").Array() {
if item.Get("type").String() == "web_search_call" {
t.Fatalf("web_search_call should be dropped before sending to xAI: %s", string(out))
}
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Update the test to include a web_search_call_output item and verify that both the call and its output are correctly dropped.

func TestDropXAIInputWebSearchCalls(t *testing.T) {
	body := []byte(`{"input":[{"type":"message","role":"user","content":"hi"},{"type":"web_search_call","status":"completed","action":{"type":"search","query":"test","queries":["test"]}},{"type":"web_search_call_output","output":"results"},{"type":"function_call","call_id":"call_1","name":"lookup","arguments":"{}"},{"type":"function_call_output","call_id":"call_1","output":"ok"}]}`)
	out := dropXAIInputWebSearchCalls(body)

	if got := len(gjson.GetBytes(out, "input").Array()); got != 3 {
		t.Fatalf("input length = %d, want 3: %s", got, string(out))
	}
	if got := gjson.GetBytes(out, "input.0.type").String(); got != "message" {
		t.Fatalf("input.0.type = %q, want message: %s", got, string(out))
	}
	if got := gjson.GetBytes(out, "input.1.type").String(); got != "function_call" {
		t.Fatalf("input.1.type = %q, want function_call: %s", got, string(out))
	}
	if got := gjson.GetBytes(out, "input.2.type").String(); got != "function_call_output" {
		t.Fatalf("input.2.type = %q, want function_call_output: %s", got, string(out))
	}
	for _, item := range gjson.GetBytes(out, "input").Array() {
		itemType := item.Get("type").String()
		if itemType == "web_search_call" || itemType == "web_search_call_output" {
			t.Fatalf("%s should be dropped before sending to xAI: %s", itemType, string(out))
		}
	}
}

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63a3bf3b69

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

body = dropXAIInputWebSearchCalls(body)
body = normalizeXAIInputReasoningItems(body)
body = normalizeCodexInstructions(body)
body = appendXAICodexProgressInstructions(body, opts)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate Codex progress instructions to Codex requests

This call runs for every xAI Responses request prepared by Execute, ExecuteStream, and CountTokens, including ordinary OpenAI-compatible Grok chats where opts.SourceFormat is not Codex-specific. In that scenario a user’s normal request gets extra Codex coding progress/update requirements injected into instructions, which can change visible model behavior and token accounting; this should be gated on the Codex workflow/request metadata rather than applied unconditionally.

Useful? React with 👍 / 👎.

@GGBond6 GGBond6 force-pushed the fix/xai-responses-input-normalization branch from 63a3bf3 to e17fe95 Compare June 5, 2026 19:51

Copy link
Copy Markdown

I verified the replayed-history xAI cases in this PR against the v7.1.50 release binary and they still reproduce.

Environment:

  • Release binary: v7.1.50 (4f55ecca, linux amd64)
  • Isolated local process on a non-default localhost port, using copied xAI OAuth auth entries
  • Model route: grok-default/grok-build-0.1
  • Endpoint: POST /v1/responses, stream: false

Results on v7.1.50:

custom_tool_call input item -> HTTP 422
{"error":"Failed to deserialize the JSON body into the target type: data did not match any variant of untagged enum ModelInput"}

web_search_call input item -> HTTP 422
{"error":"Failed to deserialize the JSON body into the target type: data did not match any variant of untagged enum ModelInput"}

custom_tool_call_output input item -> HTTP 200
empty tools + tool_choice auto -> HTTP 200
simple one-shot input -> HTTP 200

I also checked the same custom_tool_call and web_search_call sanitized payloads on v7.1.33; they fail there too, so this is not unique to v7.1.50, but it is still present in the latest release. The patch here appears to address the exact gap: v7.1.50 only runs normalizeXAITools, normalizeXAIToolChoiceForTools, and normalizeXAIInputReasoningItems, while this PR adds normalizeXAIInputCustomToolCalls and dropXAIInputWebSearchCalls before forwarding to xAI.

No token or auth values are included here.

Copy link
Copy Markdown

I built and tested this together with PR #3726 on top of the v7.1.50 release commit.

Build under test:

Focused tests passed:

  • go test -count=1 ./sdk/cliproxy/auth ./internal/runtime/executor

Live xAI/Grok Build verification against the combined build:

/v1/models contains:
  grok-default/grok-build-0.1
  grok-green/grok-build-0.1
  grok-yellow/grok-build-0.1
  go-ds-flash

simple /v1/responses:
  grok-default/grok-build-0.1 -> 200, expected marker returned
  grok-green/grok-build-0.1 -> 200, expected marker returned
  grok-yellow/grok-build-0.1 -> 200, expected marker returned

previously failing replayed-history cases:
  custom_tool_call input item -> 200, expected marker returned
  web_search_call input item -> 200, expected marker returned

control cases:
  custom_tool_call_output input item -> 200, expected marker returned
  empty tools + tool_choice auto -> 200, expected marker returned
  grok-default streaming /v1/responses -> 200
  non-Grok go-ds-flash /v1/chat/completions -> 200, expected marker returned

This confirms #3734 fixes the deterministic xAI 422 failures I saw on both v7.1.33 and v7.1.50 for replayed custom_tool_call and web_search_call history items.

No token or auth values are included here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants