feat: Support deployment-level thinking defaults for chat templates#11047
feat: Support deployment-level thinking defaults for chat templates#11047indrajit96 wants to merge 3 commits into
Conversation
Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
| chat_template_kwargs = apply_default_thinking_mode_to_template_kwargs( | ||
| chat_template_kwargs, | ||
| default_thinking_mode, | ||
| request_has_root_thinking=(isinstance(request, dict) and "thinking" in request), | ||
| ) | ||
| chat_template_kwargs["reasoning_effort"] = request_for_sampling.reasoning_effort |
There was a problem hiding this comment.
🚩 Behavioral difference between SGLang and vLLM/Rust in how reasoning_effort interacts with default_thinking_mode
In the SGLang path, _normalize_openai_thinking_template_kwargs (components/src/dynamo/frontend/sglang_prepost.py:366-401) processes reasoning_effort: "none" into thinking: False / enable_thinking: False via setdefault_reasoning(False) BEFORE apply_default_thinking_mode_to_template_kwargs runs. This means reasoning_effort: "none" effectively blocks any deployment default from being applied.
In the vLLM path (components/src/dynamo/frontend/prepost.py:139-145) and the Rust preprocessor (lib/llm/src/preprocessor.rs:402-442), reasoning_effort is NOT checked before the default is applied — it's simply added as a separate template kwarg afterward. So a request with reasoning_effort: "none" + default_thinking_mode: "enabled" would end up with both enable_thinking: true (from the default) and reasoning_effort: "none" in the template kwargs. Whether this is a problem depends on the chat template's precedence rules for these fields.
This is likely intentional — SGLang normalizes OpenAI thinking controls into template flags, while vLLM/Rust let the template interpret reasoning_effort directly — but it's worth confirming the desired behavior for mixed-signal scenarios.
Was this helpful? React with 👍 or 👎 to provide feedback.
WalkthroughAdds a deployment-level ChangesDefault Thinking Mode Feature
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/bindings/python/rust/backend.rs (1)
297-345: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winAppend the new
backend.WorkerConfigparameter instead of inserting it mid-signature.This changes the positional constructor contract for
backend.WorkerConfig(...): every argument afterreasoning_parsernow shifts one slot. Please movedefault_thinking_modeto the end of the PyO3 signature/parameter list and updatelib/bindings/python/src/dynamo/_core.pyito match.Suggested fix
#[pyo3(signature = ( namespace, component = "backend".to_string(), endpoint = "generate".to_string(), model_name = String::new(), served_model_name = None, model_input = ModelInput::Tokens, endpoint_types = "chat,completions".to_string(), custom_jinja_template = None, tool_call_parser = None, reasoning_parser = None, - default_thinking_mode = None, exclude_tools_when_tool_choice_none = true, enable_local_indexer = true, enable_kv_routing = true, metrics_labels = Vec::new(), runtime = None, disaggregation_mode = DisaggregationMode::Aggregated, health_check_payload = None, structural_tag_mode = "off".to_string(), structural_tag_scope = "auto".to_string(), structural_tag_schema = "auto".to_string(), route_to_encoder = false, + default_thinking_mode = None, ))] @@ custom_jinja_template: Option<String>, tool_call_parser: Option<String>, reasoning_parser: Option<String>, - default_thinking_mode: Option<String>, exclude_tools_when_tool_choice_none: bool, enable_local_indexer: bool, enable_kv_routing: bool, metrics_labels: Vec<(String, String)>, runtime: Option<RuntimeConfig>, disaggregation_mode: DisaggregationMode, health_check_payload: Option<PyObject>, structural_tag_mode: String, structural_tag_scope: String, structural_tag_schema: String, route_to_encoder: bool, + default_thinking_mode: Option<String>, ) -> PyResult<Self> {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/bindings/python/rust/backend.rs` around lines 297 - 345, The backend.WorkerConfig constructor signature changed the positional argument order by inserting a new parameter in the middle, which breaks existing callers. Update the PyO3 constructor in backend.rs so default_thinking_mode is appended at the end of the signature and parameter list after route_to_encoder, and mirror the same ordering in the backend.WorkerConfig type stub in lib/bindings/python/src/dynamo/_core.pyi to keep the Python API contract stable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@components/src/dynamo/common/backend/worker.py`:
- Line 120: Move the new WorkerConfig field default_thinking_mode to the end of
the class so the dataclass stays append-only and existing positional callers
keep binding correctly. Update the WorkerConfig definition by appending
default_thinking_mode after the current optional fields, rather than inserting
it before exclude_tools_when_tool_choice_none or any existing parameters.
---
Outside diff comments:
In `@lib/bindings/python/rust/backend.rs`:
- Around line 297-345: The backend.WorkerConfig constructor signature changed
the positional argument order by inserting a new parameter in the middle, which
breaks existing callers. Update the PyO3 constructor in backend.rs so
default_thinking_mode is appended at the end of the signature and parameter list
after route_to_encoder, and mirror the same ordering in the backend.WorkerConfig
type stub in lib/bindings/python/src/dynamo/_core.pyi to keep the Python API
contract stable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b0055b16-3d34-4118-ad8d-6ab9157ae694
📒 Files selected for processing (17)
components/src/dynamo/common/backend/tests/test_backend_bindings.pycomponents/src/dynamo/common/backend/worker.pycomponents/src/dynamo/common/configuration/groups/runtime_args.pycomponents/src/dynamo/frontend/prepost.pycomponents/src/dynamo/frontend/sglang_prepost.pycomponents/src/dynamo/frontend/sglang_processor.pycomponents/src/dynamo/frontend/tests/test_sglang_processor_unit.pycomponents/src/dynamo/frontend/tests/test_vllm_processor_unit.pycomponents/src/dynamo/frontend/thinking.pycomponents/src/dynamo/frontend/vllm_processor.pycomponents/src/dynamo/sglang/register.pycomponents/src/dynamo/trtllm/workers/llm_worker.pycomponents/src/dynamo/vllm/main.pylib/backend-common/src/worker.rslib/bindings/python/rust/backend.rslib/bindings/python/src/dynamo/_core.pyilib/llm/src/preprocessor.rs
| custom_jinja_template: Optional[str] = None | ||
| tool_call_parser: Optional[str] = None | ||
| reasoning_parser: Optional[str] = None | ||
| default_thinking_mode: Optional[str] = None |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Keep WorkerConfig append-only.
Line 120 inserts default_thinking_mode before existing optional fields, but this class already documents that new fields must be appended to preserve positional callers. Any positional WorkerConfig(...) call that used exclude_tools_when_tool_choice_none or anything after it will now silently bind the wrong values.
Suggested fix
- default_thinking_mode: Optional[str] = None
exclude_tools_when_tool_choice_none: bool = True
enable_local_indexer: bool = True
@@
# Appended at the END of the dataclass to keep positional callers
# working -- inserting mid-class would silently shift downstream args.
route_to_encoder: bool = False
+ default_thinking_mode: Optional[str] = None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| default_thinking_mode: Optional[str] = None | |
| exclude_tools_when_tool_choice_none: bool = True | |
| enable_local_indexer: bool = True | |
| # Appended at the END of the dataclass to keep positional callers | |
| # working -- inserting mid-class would silently shift downstream args. | |
| route_to_encoder: bool = False | |
| default_thinking_mode: Optional[str] = None |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@components/src/dynamo/common/backend/worker.py` at line 120, Move the new
WorkerConfig field default_thinking_mode to the end of the class so the
dataclass stays append-only and existing positional callers keep binding
correctly. Update the WorkerConfig definition by appending default_thinking_mode
after the current optional fields, rather than inserting it before
exclude_tools_when_tool_choice_none or any existing parameters.
…namo#11047) Adapt and apply ai-dynamo/dynamo PR ai-dynamo#11047 ("Support deployment-level thinking defaults for chat templates") onto feat/glm51-v12-20260610. Adds --dyn-default-thinking-mode enabled|disabled / DYN_DEFAULT_THINKING_MODE, published through model runtime metadata (runtime_data.default_thinking_mode) and applied in the SGLang/vLLM Python frontends and the Rust OpenAI preprocessor when a request carries no thinking control of its own. Request-level thinking/chat_template_args/chat_template_kwargs always win. Adaptations for this branch's older base (~855 commits behind the PR base): - preprocessor.rs: this branch's NvCreateChatCompletionRequest has no first-class `thinking` field, so request-level precedence checks `unsupported_fields` (where a root-level `thinking` lands) instead. - worker.rs: this branch's EngineConfig has no `runtime_data`, so the default-thinking runtime_data map is built fresh rather than cloned. - sglang_prepost.py: this branch lacks the newer-main `_normalize_openai_thinking_template_kwargs` helper; the deployment default is injected at the top of preprocess_chat_request (before reasoning gating and rendering) so both observe the same thinking state. - vllm/main.py: add missing `import json` for the set_engine_specific call. - The PR's frontend Python unit tests were not ported (written against newer-main fixtures); the Rust preprocessor unit tests are included. Original-author: Indrajit Bhosale <iamindrajitb@gmail.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Overview:
Adds a deployment-level default thinking toggle for chat-template rendering.
This lets operators set whether a deployment should default to thinking enabled or disabled, while preserving request-level override priority.
Details:
Validation:
Summary by CodeRabbit
New Features
Tests