Skip to content

fix: preserve images in Claude tool_result when converting to OpenAI chat completions#4873

Open
playeriv65 wants to merge 3 commits into
QuantumNous:mainfrom
playeriv65:feat/tool-result-image-fix
Open

fix: preserve images in Claude tool_result when converting to OpenAI chat completions#4873
playeriv65 wants to merge 3 commits into
QuantumNous:mainfrom
playeriv65:feat/tool-result-image-fix

Conversation

@playeriv65
Copy link
Copy Markdown

@playeriv65 playeriv65 commented May 15, 2026

变更描述

修复 Claude tool_result 中的图片在转换为 OpenAI Chat Completions 格式时丢失的问题。

根因:service/convert.go 的 ClaudeToOpenAIRequest 处理 tool_result 时,把非字符串内容直接 Marshal 成 JSON 字符串塞进 tool message 的 content 字段。上游收到的 tool 消息里 content 是一段 JSON 文本,模型拿不到图片输入,只能靠文本猜测,产生幻觉。

修复:按 OpenAI Chat Completions 规范拆分:

  • tool_result 的文本内容 -> role:tool,字符串 content
  • tool_result 的图片内容 -> 拆成 role:tool(文本说明)+ role:user(image_url content part)

OpenAI 规范里 role:tool 只支持 text content part,图片必须放在 user message 中。

变更类型

  • Bug 修复

提交前检查项

  • 确认:已亲自整理描述,未直接粘贴 AI 输出
  • 已搜索 Issues 和 PRs,确认非重复提交
  • Bug fix 说明:已通过 curl 实验和端到端复现验证
  • 变更范围仅限 service/convert.go 的 tool_result 分支
  • 无无关代码改动
  • 本地测试 8 个用例全部通过,curl 最小复现和 Claude Code 端到端均验证通过
  • 代码中无凭据泄露

运行证明

修复前 dump(outbound JSON):

  • role=tool, content=[{"type":"image","source":{"type":"base64",...}}]
  • 模型回答:\u0022像素风格樱桃/水果图标\u0022

修复后 dump:

  • role=tool, content=\u0022[Tool result contained image content...]\u0022
  • role=user, type=image_url, len=24234
  • 模型回答:\u0022Windows Recycle Bin icon\u0022

测试命令:go test ./service -run TestClaudeToOpenAIRequest_ToolResultImages -v

Summary by CodeRabbit

  • New Features

    • Improved conversion of tool-result content: non-text media are parsed so text remains in tool messages while images are emitted as separate follow-up user messages.
    • Supports both direct image URLs and base64 image data (converted to data: URIs), preserves text alongside images, and skips empty image data.
    • Unknown block types are serialized into tool message content.
  • Tests

    • Added comprehensive tests covering mixed text/image tool results, base64 and URL images, and edge cases.

Review Change Stack

…chat completions

Previously, when converting Claude tool_result content blocks to OpenAI
format, non-string content was marshaled into a JSON string. This caused
image blocks returned by tools to be lost as vision input, since the
image data was treated as plain text.

This change converts tool_result text blocks to role:tool string content,
and image blocks to role:user content[].image_url, matching OpenAI
Chat Completions schema. Images must be in user messages per the spec,
not in tool messages.
…chat completions

Previously, when converting Claude tool_result content blocks to OpenAI
format, non-string content was marshaled into a JSON string. This caused
image blocks returned by tools to be lost as vision input, since the
image data was treated as plain text.

This change converts tool_result text blocks to role:tool string content,
and image blocks to role:user content[].image_url, matching OpenAI
Chat Completions schema. Images must be in user messages per the spec,
not in tool messages.

Tests added (8 cases):
- pure string tool_result (unchanged)
- text block tool_result -> tool text
- image block tool_result -> tool text notice + user image_url
- mixed text + image -> both preserved
- multiple images in single tool_result
- image with URL source instead of base64 data
- empty data image is skipped
- unknown block type serialized as text
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

Walkthrough

ClaudeToOpenAIRequest now handles Claude tool_result messages containing images by parsing media blocks, extracting text and image content, creating a tool message with text content and an optional user message with image media when images are present.

Changes

Claude tool_result to OpenAI conversion with image handling

Layer / File(s) Summary
Tool message initialization and media parsing
service/convert.go
Tool message is initialized by resolving the tool name via SearchToolNameByToolCallId, then non-string tool results are parsed to extract textual content (including unknown block types serialized to JSON) and image blocks (converted to OpenAI media with URL or base64 data: URI).
Image content extraction and user message creation
service/convert.go
When image content is detected in the parsed media, an additional OpenAI "user" message is appended containing the image media content, prefixed with a notice indicating the tool returned images.
Test coverage for tool result image conversion
service/convert_test.go
TestClaudeToOpenAIRequest_ToolResultImages validates conversion across variants: text-only, text+image, image-only, multiple images, base64 vs. URL images, empty base64 skipping, and unknown block serialization.

Sequence Diagram

sequenceDiagram
  participant Claude as Claude tool_result
  participant Parser as Media Parser
  participant TextExtract as Text Extractor
  participant ImageExtract as Image Extractor
  participant ToolMsg as OpenAI tool message
  participant UserMsg as OpenAI user message
  
  Claude->>Parser: tool_result with media blocks
  Parser->>TextExtract: parse media content for text blocks
  TextExtract->>ToolMsg: provide extracted text (+ image notice if images present)
  Parser->>ImageExtract: parse image blocks (url/base64)
  ImageExtract->>ImageExtract: convert base64 to data: URI (default image/png if missing)
  ImageExtract->>UserMsg: emit image_url media blocks with notice referencing tool call id
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • QuantumNous/new-api#1531: Both PRs modify service/convert.go's ClaudeToOpenAIRequest handling of Claude tool_result messages, extending tool name resolution and media conversion logic.

Suggested reviewers

  • seefs001

Poem

🐰 A rabbit hops through images bright,
Converting Claude's tools to OpenAI's light,
With media parsed and images spread,
User messages sent where images tread,
Tests confirm each byte and thread.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: preserving images in Claude tool_result when converting to OpenAI format, which is the core fix described in the PR objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
service/convert_test.go (1)

250-267: ⚡ Quick win

Consider using a named constant for ChannelType.

The magic number 1 at line 262 reduces code clarity. If a named constant exists (e.g., ChannelTypeOpenAI), use it to make the test setup more self-documenting.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@service/convert_test.go` around lines 250 - 267, Replace the magic number
used for ChannelMeta.ChannelType in the test setup with the appropriate named
constant (e.g., ChannelTypeOpenAI) to improve clarity: locate the test where
RelayInfo and ChannelMeta are constructed (the variable info with
&common.RelayInfo{ ChannelMeta: &common.ChannelMeta{ ChannelType: 1 }} ) and
change ChannelType to the existing constant defined in your codebase (or add one
if missing) so ClaudeToOpenAIRequest receives a self-documenting channel type.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@service/convert_test.go`:
- Around line 295-311: The test currently only captures the last image URL
because the loop in the assertion overwrites foundURL; update the test to either
(A) validate all images by changing the test case field tc.expectImageURL ->
tc.expectImageURLs []string, collect every image URL from
imgUserMsg.ParseContent() into a slice (checking block.Type ==
dto.ContentTypeImageURL and extracting from *dto.MessageImageUrl or
map[string]any), then assert the collected slice contains exactly (or at least)
the expected URLs for the "multiple images in tool_result" case, or (B) if only
the last image is intended, add a clear comment and change the test
name/expectation to reflect that only the final image is preserved; adjust
assertions accordingly around imgUserMsg.ParseContent() and tc fields.

---

Nitpick comments:
In `@service/convert_test.go`:
- Around line 250-267: Replace the magic number used for ChannelMeta.ChannelType
in the test setup with the appropriate named constant (e.g., ChannelTypeOpenAI)
to improve clarity: locate the test where RelayInfo and ChannelMeta are
constructed (the variable info with &common.RelayInfo{ ChannelMeta:
&common.ChannelMeta{ ChannelType: 1 }} ) and change ChannelType to the existing
constant defined in your codebase (or add one if missing) so
ClaudeToOpenAIRequest receives a self-documenting channel type.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7b89c640-f36d-4be3-8755-dfbe25c92939

📥 Commits

Reviewing files that changed from the base of the PR and between 8789891 and a23b352.

📒 Files selected for processing (2)
  • service/convert.go
  • service/convert_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • service/convert.go

Comment thread service/convert_test.go
Comment on lines +295 to +311
if tc.expectUserImageMsg {
require.NotNil(t, imgUserMsg, "user message with image_url should exist")
content := imgUserMsg.ParseContent()
var foundURL string
for _, block := range content {
if block.Type == dto.ContentTypeImageURL && block.ImageUrl != nil {
if url, ok := block.ImageUrl.(*dto.MessageImageUrl); ok {
foundURL = url.Url
} else if urlMap, ok := block.ImageUrl.(map[string]any); ok {
foundURL, _ = urlMap["url"].(string)
}
}
}
require.Equal(t, tc.expectImageURL, foundURL)
} else {
require.Nil(t, imgUserMsg, "user message with image_url should not exist for text-only")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

The multiple-images test case only validates the last image URL.

The assertion loop (lines 299-307) overwrites foundURL on each iteration, so the test case "multiple images in tool_result" (line 126) only verifies that the last image (image2) is present. If the conversion should preserve all images in the user message, this test would not detect if earlier images were dropped.

Consider iterating through all expected images and asserting that each is present in the user message content, or add a comment clarifying that only the last image is expected by design.

🔍 Suggested enhancement to validate all images

If all images should be preserved, replace the single expectImageURL field with expectImageURLs []string in the test case struct and update the assertion logic:

 if tc.expectUserImageMsg {
     require.NotNil(t, imgUserMsg, "user message with image_url should exist")
     content := imgUserMsg.ParseContent()
-    var foundURL string
+    var foundURLs []string
     for _, block := range content {
         if block.Type == dto.ContentTypeImageURL && block.ImageUrl != nil {
             if url, ok := block.ImageUrl.(*dto.MessageImageUrl); ok {
-                foundURL = url.Url
+                foundURLs = append(foundURLs, url.Url)
             } else if urlMap, ok := block.ImageUrl.(map[string]any); ok {
-                foundURL, _ = urlMap["url"].(string)
+                if url, ok := urlMap["url"].(string); ok {
+                    foundURLs = append(foundURLs, url)
+                }
             }
         }
     }
-    require.Equal(t, tc.expectImageURL, foundURL)
+    require.ElementsMatch(t, tc.expectImageURLs, foundURLs)
 } else {

Then update the "multiple images" test case (line 161):

-expectImageURL:     "data:image/jpeg;base64,image2",
+expectImageURLs:    []string{"data:image/png;base64,image1", "data:image/jpeg;base64,image2"},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@service/convert_test.go` around lines 295 - 311, The test currently only
captures the last image URL because the loop in the assertion overwrites
foundURL; update the test to either (A) validate all images by changing the test
case field tc.expectImageURL -> tc.expectImageURLs []string, collect every image
URL from imgUserMsg.ParseContent() into a slice (checking block.Type ==
dto.ContentTypeImageURL and extracting from *dto.MessageImageUrl or
map[string]any), then assert the collected slice contains exactly (or at least)
the expected URLs for the "multiple images in tool_result" case, or (B) if only
the last image is intended, add a clear comment and change the test
name/expectation to reflect that only the final image is preserved; adjust
assertions accordingly around imgUserMsg.ParseContent() and tc fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant