Skip to content

fix: certain APIs return SSE-style string responses by parsing them as JSON and reconstructing a ChatCompletion object#7280

Open
mastertion wants to merge 1 commit intoAstrBotDevs:masterfrom
mastertion:patch-1
Open

fix: certain APIs return SSE-style string responses by parsing them as JSON and reconstructing a ChatCompletion object#7280
mastertion wants to merge 1 commit intoAstrBotDevs:masterfrom
mastertion:patch-1

Conversation

@mastertion
Copy link
Copy Markdown

@mastertion mastertion commented Apr 1, 2026

兼容某些 API 强制返回 SSE 格式的出参,将字符串响应转换为 ChatCompletion 对象

兼容某些 API 强制返回 SSE 格式的出参,将字符串响应转换为 ChatCompletion 对象

image

Summary by Sourcery

Bug Fixes:

  • Fix failures when certain APIs return SSE-style string responses by parsing them as JSON and reconstructing a ChatCompletion object.

兼容某些 API 强制返回 SSE 格式的 Bug
@auto-assign auto-assign bot requested review from LIghtJUNction and anka-afk April 1, 2026 14:58
@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Apr 1, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • Consider preserving the original exception when auto-fix fails by using raise ... from e rather than creating a fresh Exception, so that debugging retains the full stack trace and error context.
  • Using ChatCompletion.construct(**completion_dict) bypasses Pydantic validation; if possible, prefer the standard initializer (e.g., ChatCompletion.model_validate(...) or equivalent) to catch malformed responses early.
  • The SSE compatibility logic assumes a single data: payload; if upstream APIs send multi-line or batched SSE messages, you may need to split and join the relevant data: lines before JSON parsing to avoid partial or invalid JSON.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider preserving the original exception when auto-fix fails by using `raise ... from e` rather than creating a fresh `Exception`, so that debugging retains the full stack trace and error context.
- Using `ChatCompletion.construct(**completion_dict)` bypasses Pydantic validation; if possible, prefer the standard initializer (e.g., `ChatCompletion.model_validate(...)` or equivalent) to catch malformed responses early.
- The SSE compatibility logic assumes a single `data:` payload; if upstream APIs send multi-line or batched SSE messages, you may need to split and join the relevant `data:` lines before JSON parsing to avoid partial or invalid JSON.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/openai_source.py" line_range="467-476" />
<code_context>
-
+        
+        # --- 新增:兼容某些 API 强制返回 SSE 格式的 Bug ---
+        if isinstance(completion, str):
+            logger.warning(f"检测到 API 返回了字符串而非对象,尝试自动修复: {completion[:100]}...")
+            try:
+                # 如果是 data:{...} 格式,去掉 "data:" 并解析 JSON
+                json_str = completion.strip()
+                if json_str.startswith("data:"):
+                    json_str = json_str[5:].strip()
+                
+                # 尝试解析 JSON
+                completion_dict = json.loads(json_str)
+                
+                # 重新构造 ChatCompletion 对象
</code_context>
<issue_to_address>
**issue (bug_risk):** SSE-like `data:` responses can contain multiple lines and trailing markers which this logic currently ignores.

Some gateways send multi-line SSE chunks (e.g. `data:{...}\n\ndata:[DONE]`) or extra newlines. Since this code only strips a single leading `data:`, `json.loads` will fail when there are multiple `data:` lines or a `[DONE]` sentinel. Consider splitting on newlines, discarding `[DONE]`/empty lines, and parsing only the last valid `data:` JSON line.
</issue_to_address>

### Comment 2
<location path="astrbot/core/provider/sources/openai_source.py" line_range="475-481" />
<code_context>
+                    json_str = json_str[5:].strip()
+                
+                # 尝试解析 JSON
+                completion_dict = json.loads(json_str)
+                
+                # 重新构造 ChatCompletion 对象
+                completion = ChatCompletion.construct(**completion_dict)
+                logger.info("成功将字符串响应转换为 ChatCompletion 对象。")
+                
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Using `ChatCompletion.construct` bypasses validation and may admit malformed data.

Because `construct` skips validation and type coercion, malformed or partially invalid JSON can become a `ChatCompletion` instance that violates its invariants and fails later in harder-to-debug ways. Prefer a validated constructor (e.g. `ChatCompletion(**completion_dict)` or a proper `from_*` helper) so bad responses fail fast with validation errors.

```suggestion
                # 尝试解析 JSON
                completion_dict = json.loads(json_str)

                # 重新构造 ChatCompletion 对象(使用带验证的构造函数,而非 construct)
                completion = ChatCompletion(**completion_dict)
                logger.info("成功将字符串响应转换为 ChatCompletion 对象。")

```
</issue_to_address>

### Comment 3
<location path="astrbot/core/provider/sources/openai_source.py" line_range="482-485" />
<code_context>
+                completion = ChatCompletion.construct(**completion_dict)
+                logger.info("成功将字符串响应转换为 ChatCompletion 对象。")
+                
+            except Exception as e:
+                logger.error(f"自动修复失败: {e}")
+                # 如果修复失败,继续抛出原始错误
+                raise Exception(f"API 返回格式错误且无法修复:{type(completion)}: {completion}。")
+        # ---------------------------------------------------
+        
</code_context>
<issue_to_address>
**issue (bug_risk):** Re-raising a new generic `Exception` here loses the original traceback and error type.

The comment promises to rethrow the original error, but the code creates a new `Exception`, discarding the original stack trace and specific error from `json.loads` / `ChatCompletion`. To keep debugging context, either use a bare `raise` to rethrow `e`, or raise a more specific/custom exception with `raise ... from e` so the root cause is preserved via exception chaining.
</issue_to_address>

### Comment 4
<location path="astrbot/core/provider/sources/openai_source.py" line_range="468" />
<code_context>
+        
+        # --- 新增:兼容某些 API 强制返回 SSE 格式的 Bug ---
+        if isinstance(completion, str):
+            logger.warning(f"检测到 API 返回了字符串而非对象,尝试自动修复: {completion[:100]}...")
+            try:
+                # 如果是 data:{...} 格式,去掉 "data:" 并解析 JSON
</code_context>
<issue_to_address>
**🚨 suggestion (security):** Logging the raw response snippet may expose sensitive content and could be toned down or guarded.

This substring can still include user prompts or other private data, which may be sensitive depending on where logs are stored or shipped. Please consider masking/redacting the payload, logging only metadata (e.g., length or content type), or gating this detailed snippet behind a debug-only flag.

Suggested implementation:

```python
        if isinstance(completion, str):
            # 避免在日志中暴露原始响应内容,仅记录元信息
            logger.warning("检测到 API 返回了字符串而非对象,尝试自动修复。为保护隐私,已省略响应内容。")
            # 在 debug 级别可输出更详细的调试信息(例如长度),不包含具体文本
            try:
                response_length = len(completion)
            except Exception:
                response_length = None
            if logger.isEnabledFor(logging.DEBUG):
                logger.debug("原始字符串响应元信息: length=%s", response_length)
            try:

```

If not already present at the top of `astrbot/core/provider/sources/openai_source.py`, add `import logging` and ensure `logger` is configured appropriately for your project’s logging setup.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +467 to +476
if isinstance(completion, str):
logger.warning(f"检测到 API 返回了字符串而非对象,尝试自动修复: {completion[:100]}...")
try:
# 如果是 data:{...} 格式,去掉 "data:" 并解析 JSON
json_str = completion.strip()
if json_str.startswith("data:"):
json_str = json_str[5:].strip()

# 尝试解析 JSON
completion_dict = json.loads(json_str)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): SSE-like data: responses can contain multiple lines and trailing markers which this logic currently ignores.

Some gateways send multi-line SSE chunks (e.g. data:{...}\n\ndata:[DONE]) or extra newlines. Since this code only strips a single leading data:, json.loads will fail when there are multiple data: lines or a [DONE] sentinel. Consider splitting on newlines, discarding [DONE]/empty lines, and parsing only the last valid data: JSON line.

Comment on lines +475 to +481
# 尝试解析 JSON
completion_dict = json.loads(json_str)

# 重新构造 ChatCompletion 对象
completion = ChatCompletion.construct(**completion_dict)
logger.info("成功将字符串响应转换为 ChatCompletion 对象。")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Using ChatCompletion.construct bypasses validation and may admit malformed data.

Because construct skips validation and type coercion, malformed or partially invalid JSON can become a ChatCompletion instance that violates its invariants and fails later in harder-to-debug ways. Prefer a validated constructor (e.g. ChatCompletion(**completion_dict) or a proper from_* helper) so bad responses fail fast with validation errors.

Suggested change
# 尝试解析 JSON
completion_dict = json.loads(json_str)
# 重新构造 ChatCompletion 对象
completion = ChatCompletion.construct(**completion_dict)
logger.info("成功将字符串响应转换为 ChatCompletion 对象。")
# 尝试解析 JSON
completion_dict = json.loads(json_str)
# 重新构造 ChatCompletion 对象(使用带验证的构造函数,而非 construct)
completion = ChatCompletion(**completion_dict)
logger.info("成功将字符串响应转换为 ChatCompletion 对象。")

Comment on lines +482 to +485
except Exception as e:
logger.error(f"自动修复失败: {e}")
# 如果修复失败,继续抛出原始错误
raise Exception(f"API 返回格式错误且无法修复:{type(completion)}: {completion}。")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Re-raising a new generic Exception here loses the original traceback and error type.

The comment promises to rethrow the original error, but the code creates a new Exception, discarding the original stack trace and specific error from json.loads / ChatCompletion. To keep debugging context, either use a bare raise to rethrow e, or raise a more specific/custom exception with raise ... from e so the root cause is preserved via exception chaining.


# --- 新增:兼容某些 API 强制返回 SSE 格式的 Bug ---
if isinstance(completion, str):
logger.warning(f"检测到 API 返回了字符串而非对象,尝试自动修复: {completion[:100]}...")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 suggestion (security): Logging the raw response snippet may expose sensitive content and could be toned down or guarded.

This substring can still include user prompts or other private data, which may be sensitive depending on where logs are stored or shipped. Please consider masking/redacting the payload, logging only metadata (e.g., length or content type), or gating this detailed snippet behind a debug-only flag.

Suggested implementation:

        if isinstance(completion, str):
            # 避免在日志中暴露原始响应内容,仅记录元信息
            logger.warning("检测到 API 返回了字符串而非对象,尝试自动修复。为保护隐私,已省略响应内容。")
            # 在 debug 级别可输出更详细的调试信息(例如长度),不包含具体文本
            try:
                response_length = len(completion)
            except Exception:
                response_length = None
            if logger.isEnabledFor(logging.DEBUG):
                logger.debug("原始字符串响应元信息: length=%s", response_length)
            try:

If not already present at the top of astrbot/core/provider/sources/openai_source.py, add import logging and ensure logger is configured appropriately for your project’s logging setup.

@dosubot dosubot bot added the area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. label Apr 1, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to handle cases where the OpenAI API returns SSE-formatted strings instead of objects. The reviewer identified that using ChatCompletion.construct is insufficient because it does not recursively parse nested dictionaries, which would lead to AttributeErrors later. Additionally, the reviewer recommended enhancing the parsing logic to correctly handle multi-line SSE responses and exclude the [DONE] marker to prevent JSON decoding failures.

Comment on lines +467 to +485
if isinstance(completion, str):
logger.warning(f"检测到 API 返回了字符串而非对象,尝试自动修复: {completion[:100]}...")
try:
# 如果是 data:{...} 格式,去掉 "data:" 并解析 JSON
json_str = completion.strip()
if json_str.startswith("data:"):
json_str = json_str[5:].strip()

# 尝试解析 JSON
completion_dict = json.loads(json_str)

# 重新构造 ChatCompletion 对象
completion = ChatCompletion.construct(**completion_dict)
logger.info("成功将字符串响应转换为 ChatCompletion 对象。")

except Exception as e:
logger.error(f"自动修复失败: {e}")
# 如果修复失败,继续抛出原始错误
raise Exception(f"API 返回格式错误且无法修复:{type(completion)}: {completion}。")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

这里存在两个主要问题:

  1. 非递归构造问题ChatCompletion.construct(或 Pydantic v2 中的 model_construct)不是递归的。这意味着 completion_dict 中的嵌套字典(如 choices 列表中的项)不会被自动转换为 Pydantic 模型对象,而是保留为 dict。这会导致后续代码(如第 731 行)在访问 choice.message.content 时抛出 AttributeError。建议使用 ChatCompletion.model_validate()
  2. 多行 SSE 处理问题:如果 API 返回的是包含多行的 SSE 响应(例如末尾带有 data: [DONE]),当前的 json.loads 会因为包含非 JSON 字符而解析失败。建议通过 splitlines() 遍历并提取第一个有效的 JSON 数据块。

此外,建议优化异常处理逻辑以提高鲁棒性。

        if isinstance(completion, str):
            logger.warning(f"检测到 API 返回了字符串而非对象,尝试自动修复: {completion[:100]}...")
            try:
                # 兼容多行 SSE 格式,提取第一个包含有效 JSON 的 data 行
                json_str = None
                for line in completion.splitlines():
                    line = line.strip()
                    if line.startswith("data:"):
                        content = line[5:].strip()
                        if content and content != "[DONE]":
                            json_str = content
                            break
                
                if not json_str:
                    json_str = completion.strip()
                
                completion_dict = json.loads(json_str)
                
                # 使用 model_validate 以确保嵌套对象(如 choices, message)被正确解析为 Pydantic 模型
                # construct 方法不是递归的,会导致后续访问属性时抛出 AttributeError
                completion = ChatCompletion.model_validate(completion_dict)
                logger.info("成功将字符串响应转换为 ChatCompletion 对象。")
                
            except Exception as e:
                logger.error(f"自动修复失败: {e}")
                raise Exception(f"API 返回格式错误且无法修复:{type(completion)}: {completion}。")

Comment on lines +465 to +486

# --- 新增:兼容某些 API 强制返回 SSE 格式的 Bug ---
if isinstance(completion, str):
logger.warning(f"检测到 API 返回了字符串而非对象,尝试自动修复: {completion[:100]}...")
try:
# 如果是 data:{...} 格式,去掉 "data:" 并解析 JSON
json_str = completion.strip()
if json_str.startswith("data:"):
json_str = json_str[5:].strip()

# 尝试解析 JSON
completion_dict = json.loads(json_str)

# 重新构造 ChatCompletion 对象
completion = ChatCompletion.construct(**completion_dict)
logger.info("成功将字符串响应转换为 ChatCompletion 对象。")

except Exception as e:
logger.error(f"自动修复失败: {e}")
# 如果修复失败,继续抛出原始错误
raise Exception(f"API 返回格式错误且无法修复:{type(completion)}: {completion}。")
# ---------------------------------------------------
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# --- 新增:兼容某些 API 强制返回 SSE 格式的 Bug ---
if isinstance(completion, str):
logger.warning(f"检测到 API 返回了字符串而非对象,尝试自动修复: {completion[:100]}...")
try:
# 如果是 data:{...} 格式,去掉 "data:" 并解析 JSON
json_str = completion.strip()
if json_str.startswith("data:"):
json_str = json_str[5:].strip()
# 尝试解析 JSON
completion_dict = json.loads(json_str)
# 重新构造 ChatCompletion 对象
completion = ChatCompletion.construct(**completion_dict)
logger.info("成功将字符串响应转换为 ChatCompletion 对象。")
except Exception as e:
logger.error(f"自动修复失败: {e}")
# 如果修复失败,继续抛出原始错误
raise Exception(f"API 返回格式错误且无法修复:{type(completion)}: {completion}。")
# ---------------------------------------------------
if isinstance(completion, str):
try:
# see #7280
json_str = completion.strip().removeprefix("data:").strip()
completion_dict = json.loads(json_str)
completion = ChatCompletion.construct(**completion_dict)
except Exception as e:
raise Exception(
f"The API returned a string response that cannot be parsed as a ChatCompletion. Response: {completion[:200]}... Error: {e}"
)

@Soulter Soulter changed the title 兼容某些 API 强制返回 SSE 格式的 Bug fix: certain APIs return SSE-style string responses by parsing them as JSON and reconstructing a ChatCompletion object Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants