Refactor chat template render context serialization#158
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors chat-template preprocessing by removing the global TemplateWorkarounds mutation path and instead applying handler-specific, typed TemplateRenderContext serialization at the render-context boundary. This keeps typed multimodal message parts intact until final Jinja context construction, while making tool-call JSON shape transformations explicit per handler.
Changes:
- Replaced
TemplateWorkaroundswithTemplateRenderContext+TemplateToolCallSerializationto control tool-call normalization/schema/content-embedding at render time. - Updated template handlers to build Jinja
messagesviatemplateMessages(...), and added per-format tool-call serialization policies. - Reworked/added unit tests to lock the tool-call serialization policy matrix and preserve multimodal/system-merge behaviors.
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test/unit/core/template/template_workarounds_test.dart | Removed tests tied to deleted TemplateWorkarounds. |
| test/unit/core/template/template_render_context_test.dart | Added coverage for render-context serialization, system-merge behavior, and moved tool-call encoding. |
| test/unit/core/template/chat_template_engine_test.dart | Added regression tests for content-only routing and the per-format serialization policy matrix; kept GLM-OCR marker coverage. |
| lib/src/core/template/template_workarounds.dart | Removed global workaround/mutation implementation. |
| lib/src/core/template/template_render_context.dart | Introduced render-context serialization utilities and TemplateToolCallSerialization policy. |
| lib/src/core/template/handlers/xiaomi_mimo_handler.dart | Switched handler context to templateMessages(...). |
| lib/src/core/template/handlers/seed_oss_handler.dart | Switched to templateMessages(...) and set normalizeOnly tool-call policy. |
| lib/src/core/template/handlers/qwen3_coder_xml_handler.dart | Switched to templateMessages(...) and set normalizeOnly tool-call policy. |
| lib/src/core/template/handlers/nemotron_v2_handler.dart | Switched handler context to templateMessages(...). |
| lib/src/core/template/handlers/mistral_handler.dart | Switched to templateMessages(...) and set normalizeOnly tool-call policy. |
| lib/src/core/template/handlers/minimax_m2_handler.dart | Switched to templateMessages(...) and set normalizeOnly tool-call policy. |
| lib/src/core/template/handlers/magistral_handler.dart | Switched handler context to templateMessages(...). |
| lib/src/core/template/handlers/llama3_handler.dart | Switched to templateMessages(...) and set normalizeOnly tool-call policy. |
| lib/src/core/template/handlers/kimi_k2_handler.dart | Switched handler context to templateMessages(...). |
| lib/src/core/template/handlers/hermes_handler.dart | Switched handler context to templateMessages(...). |
| lib/src/core/template/handlers/granite_handler.dart | Switched to templateMessages(...) and set genericSchemaInContent policy. |
| lib/src/core/template/handlers/glm45_handler.dart | Switched to templateMessages(...) and set normalizeOnly tool-call policy. |
| lib/src/core/template/handlers/generic_handler.dart | Parameterized handler format/policy to support content-only & PEG routing without inheriting generic tool-call serialization. |
| lib/src/core/template/handlers/gemma_handler.dart | Switched handler context to templateMessages(...). |
| lib/src/core/template/handlers/functionary_v32_handler.dart | Switched handler context to templateMessages(...). |
| lib/src/core/template/handlers/functionary_v31_llama31_handler.dart | Switched handler context to templateMessages(...). |
| lib/src/core/template/handlers/firefunction_v2_handler.dart | Switched handler context to templateMessages(...). |
| lib/src/core/template/handlers/exaone_moe_handler.dart | Switched handler context to templateMessages(...). |
| lib/src/core/template/handlers/deepseek_r1_handler.dart | Switched handler context to templateMessages(...). |
| lib/src/core/template/handlers/command_r7b_handler.dart | Switched to templateMessages(...) and set normalizeOnly tool-call policy. |
| lib/src/core/template/handlers/apriel15_handler.dart | Switched handler context to templateMessages(...). |
| lib/src/core/template/handlers/apertus_handler.dart | Switched handler context to templateMessages(...). |
| lib/src/core/template/chat_template_handler.dart | Added handler-level tool-call serialization policy and centralized templateMessages(...) context building. |
| lib/src/core/template/chat_template_engine.dart | Replaced system-message workaround call-site and adjusted handler routing to ensure PEG/content-only formats don’t inherit generic tool-call serialization. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #158 +/- ##
==========================================
- Coverage 78.67% 78.67% -0.01%
==========================================
Files 76 76
Lines 9915 9910 -5
==========================================
- Hits 7801 7797 -4
+ Misses 2114 2113 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
a1e5182 to
c0fe105
Compare
… E2E Hardens llama_cpp_template_detection_integration_test against updated/new llama.cpp templates, adding Bielik, GLM-4.7, GigaChat, SmolLM3, LFM2, Qwen3.5, Reka, StepFun, DeepSeek-V3.2, Gemma-4, Granite-4.0, etc. Also hardens run_llama_cpp_chat_tests.sh against target renames, dynamic library path lookup (DYLD_LIBRARY_PATH/LD_LIBRARY_PATH), and full test-chat server/mtmd build requirements, with script unit tests.
Summary
TemplateWorkaroundspreprocessing path with typedTemplateRenderContextserializationTemplateToolCallSerializationToolChoice.nonedisables tool-use template routing, tool definitions, tool grammar/triggers, and parallel tool-call behavior for format-specific handlersRegression coverage
chat_template.jinjaprompt shape, including exactly one<__media__>placeholder, preservedText Extraction:prompt text, assistant generation marker, and no rawfile://,image_url, or base64 image-source leakageToolChoice.noneon GLM/format-specific handlers does not select the metadata tool-use template, expose tool definitions to Jinja, or emit grammar/preserved tokens/triggerssystem text, image/media, user text)LlamaInferenceExceptionReview notes
ToolChoice.nonecould still leave format-specific handler tool prompting/grammar enabled when tools were supplied. This is now fixed and covered by a GLM regression test.LlamaInferenceException.details; the original stack is preserved withError.throwWithStackTracecb04d877b56a438cf112a1d0f4f25128fdf86525and generated no new comments.Verification
dart format --output=none --set-exit-if-changed lib/src/core/template/chat_template_engine.dart test/unit/core/template/chat_template_engine_test.dart→ 0 changedgit diff --check→ passeddart analyze lib/src/core/template/chat_template_engine.dart test/unit/core/template/chat_template_engine_test.dart→ no issues founddart test -p vm test/unit/core/template/chat_template_engine_test.dart --exclude-tags local-only→+36: All tests passed!dart test -p vm,chrome test/unit/core/template/chat_template_engine_test.dart test/unit/core/template/template_render_context_test.dart test/integration/core/template --exclude-tags local-only→+238: All tests passed!dart test -p vm test/unit/core/template test/unit/tooling/run_llama_cpp_chat_tests_script_test.dart --exclude-tags local-only→+289: All tests passed!dart test test/unit/core/template/chat_template_engine_test.dart test/unit/core/template/template_render_context_test.dart→+90: All tests passed!across VM and Chrome/Dart2Jsdart test -p vm test/integration/core/template --exclude-tags local-only→+84 ~43: All tests passed!dart --packages=.dart_tool/package_config.json /tmp/glm_ocr_issue156_smoke.dartwith local GLM-OCR GGUF + mmproj + marker image →GLM_OCR_E2E_PASS(supportsVision=true, output includedGLM OCR TEST/Issue 156: image marker)dart run tool/testing/run_local_e2e.dart --scenario root-native-tool-e2e→+1: All tests passed!with real Qwen3.5-0.8B GGUF prompt/template pathdart run tool/testing/run_local_e2e.dart --scenario root-template-e2e→ partial local-only E2E: FunctionGemma Jinja-template real-model path passed, but the upstream llama.cpp chat-test subtests failed before this Dart refactor path because the local llama.cpp CMake checkout did not provide the requestedtest-chat-parsertarget and thenlibggml-cpu.dylibwas not on the runtime loader pathnpm run buildfromwebsite/→ Docusaurus build succeededcb04d877b56a438cf112a1d0f4f25128fdf86525:Coverage note
Note: The local-only upstream template E2E caveat is an environment/upstream llama.cpp setup issue, not evidence of a Dart refactor regression. Generated
.dart_tool/llama_cpp*artifacts were removed after local validation.Additional local E2E confidence pass (2026-05-21)
Ran after the latest head
cb04d877b56a438cf112a1d0f4f25128fdf86525to validate real-model behavior beyond fixture/template coverage:dart test -p vm,chrome test/unit/core/template/chat_template_engine_test.dart test/unit/core/template/template_render_context_test.dart test/integration/core/template --exclude-tags local-only→+238: All tests passed!dart run tool/testing/run_local_e2e.dart --scenario root-native-tool-e2e→ real cachedQwen3.5-0.8B-Q4_K_M.ggufnative chat-template/tool prompt path,+1: All tests passed!dart run tool/testing/run_local_e2e.dart --scenario root-template-e2e→ upstream llama.cpp chat-template E2E selection + full chat suite,+3: All tests passed!dart run tool/testing/run_local_e2e.dart --scenario qwen35-multimodal-macos-reprowith cachedQwen3.5-0.8B-Q4_K_M.gguf,Qwen3.5-0.8B-mmproj-F16.gguf, CPU backend, and image input →supportsVision: true, non-empty generation,+1: All tests passed!/opt/UnitySrc/personal/llama/models/glm-ocr/GLM-OCR.i1-Q4_K_M.gguf, matchingGLM-OCR.mmproj-Q8_0.gguf, and a deterministic text image containingGLM OCR TEST/Issue 156 image marker→supportsVision: true, output recovered the marker text,GLM_OCR_E2E_PASS.