Skip to content

fix(openai-compatible): preserve non-stream reasoning content#3099

Open
Epochex wants to merge 1 commit into
langgenius:mainfrom
Epochex:fix/openai-compatible-nonstream-reasoning
Open

fix(openai-compatible): preserve non-stream reasoning content#3099
Epochex wants to merge 1 commit into
langgenius:mainfrom
Epochex:fix/openai-compatible-nonstream-reasoning

Conversation

@Epochex
Copy link
Copy Markdown
Contributor

@Epochex Epochex commented May 13, 2026

Summary

Related to #2945.

The OpenAI-compatible streaming path already normalizes both delta.reasoning and delta.reasoning_content into Dify's <think>...</think> format. The non-streaming chat response path still only used message.content, so vLLM/SGLang-style responses that put the reasoning trace in message.reasoning dropped that trace before Dify could render or filter it.

This PR applies the same normalization to non-streaming chat responses:

  • wraps message.reasoning and message.reasoning_content before the final answer
  • leaves content that already starts with <think> unchanged to avoid double wrapping
  • keeps existing tool call extraction and usage handling unchanged
  • keeps existing thinking-disabled filtering behavior working after the response is wrapped

This is separate from the earlier streaming fix in #2741 and the extensions/openai_compatible endpoint fix in #2676; it covers the model plugin's non-streaming choices[].message handler.

Change Type

  • Documentation / non-plugin change
  • Non-LLM plugin (tools, extensions, datasource, etc.)
  • LLM plugin

Screenshots / Videos

N/A. This is a response-normalization fix covered by unit and package tests.

Before After
Non-streaming responses with message.reasoning only surfaced message.content. Reasoning is preserved as <think>...</think> before the final answer, matching the streaming path.

LLM Plugin Checklist

Areas affected by this change (check all that apply)
  • Message flow (system messages, user -> assistant turn-taking)
  • Tool interaction flow (multi-round usage, Agent App and Agent Node)
  • Multimodal input (images, PDFs, audio, video, etc.)
  • Multimodal output (images, audio, video, etc.)
  • Structured output (JSON, XML, etc.)
  • Token consumption metrics
  • Other LLM functionality (reasoning, grounding, prompt caching, etc.)
  • New models / model parameter fixes

Version

  • Bumped top-level version in manifest.yaml (not the one under meta)
  • dify_plugin>=0.5.0 is declared in pyproject.toml and locked in uv.lock

Note: this PR bumps openai_api_compatible to 0.0.51 because open PRs #3091 and #3092 already use the pending 0.0.50 bump for the same plugin.

Testing

  • uv run --project models/openai_api_compatible --frozen --with pytest pytest models/openai_api_compatible/tests
  • uv run --project models/openai_api_compatible --frozen python -m py_compile models/openai_api_compatible/models/llm/llm.py models/openai_api_compatible/tests/test_handle_response.py
  • uv run --with requests --with dify_plugin python .scripts/toolkit/uploader/upload-package.py -d models/openai_api_compatible -t dummy-token --plugin-daemon-path .scripts/dify-plugin-windows-amd64.exe -u https://marketplace.dify.ai -f --test
  • .\.scripts\dify-plugin-windows-amd64.exe plugin package models/openai_api_compatible --output_path test-openai-api-compatible-package.difypkg
  • Unpacked the generated package and ran uv run --frozen --with pytest pytest
  • Local deployment - Dify version:
  • SaaS (cloud.dify.ai)

Local result: 28 passed, 2 warnings from existing dependencies.

@dosubot dosubot Bot added the size:S This PR changes 10-29 lines, ignoring generated files. label May 13, 2026
@Epochex Epochex temporarily deployed to models/openai_api_compatible May 13, 2026 19:32 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for handling reasoning traces in non-streaming responses for OpenAI-compatible models, such as those from vLLM or SGLang. A new method, _wrap_non_stream_reasoning_content, was implemented to extract reasoning data from the reasoning or reasoning_content fields and wrap it in tags if not already present in the main content. Additionally, the version was bumped to 0.0.51, and several unit tests were added to ensure correct handling of reasoning content alongside tool calls and existing thinking blocks. I have no feedback to provide.

@Epochex Epochex force-pushed the fix/openai-compatible-nonstream-reasoning branch from 2c15474 to 2d06469 Compare May 15, 2026 12:11
@Epochex Epochex temporarily deployed to models/openai_api_compatible May 15, 2026 12:12 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant