Refactor chat template render context serialization by leehack · Pull Request #158 · leehack/llamadart

leehack · 2026-05-20T14:34:59Z

Summary

replace the global TemplateWorkarounds preprocessing path with typed TemplateRenderContext serialization
move tool-call serialization policy onto handlers via TemplateToolCallSerialization
keep typed multimodal content intact until final template-context conversion
lock the old workaround format matrix with regression coverage, including PEG/content-only formats using no serialization policy
harden no-system-role template merging so system text stays before leading media parts
fail loudly when template-specific tool-call serialization cannot build the required render context
ensure ToolChoice.none disables tool-use template routing, tool definitions, tool grammar/triggers, and parallel tool-call behavior for format-specific handlers
add small render-context cleanup/optimization: value equality for serialization policy, one-pass tool-call detection, reused default pretty JSON encoder, and removal of no-op try/rethrow wrappers

Regression coverage

GLM-OCR image marker preservation
actual upstream GLM-OCR chat_template.jinja prompt shape, including exactly one <__media__> placeholder, preserved Text Extraction: prompt text, assistant generation marker, and no raw file://, image_url, or base64 image-source leakage
GLM-OCR image prompt preservation while GLM tool-call normalization is active
ToolChoice.none on GLM/format-specific handlers does not select the metadata tool-use template, expose tool definitions to Jinja, or emit grammar/preserved tokens/triggers
system-message merge preserving typed multimodal parts and ordering (system text, image/media, user text)
generic tool-instruction injection preserving multimodal user parts and ordering
tool-call argument normalization without mutating source messages
invalid template-specific tool-call serialization raising LlamaInferenceException
generic/granite schema-in-content rendering
content-only and PEG formats not inheriting generic tool-call serialization
text-oriented tool-call content moves stringifying list content, while multimodal paths explicitly preserve structured image/audio parts
audio content part preservation when tool-call serialization moves tool calls into multimodal content

Review notes

Independent local review found no blockers after the first hardening pass.
A second independent review of the additional hardening found no blockers or obvious regressions.
The latest review found one actionable gap: ToolChoice.none could still leave format-specific handler tool prompting/grammar enabled when tools were supplied. This is now fixed and covered by a GLM regression test.
Previous Copilot comments were addressed and resolved:
- preserved list-shaped assistant content when moving tool calls into content
- added non-string-key tool-call argument normalization coverage
- avoided embedding full stack traces in LlamaInferenceException.details; the original stack is preserved with Error.throwWithStackTrace
Copilot was re-requested on latest pushed head cb04d877b56a438cf112a1d0f4f25128fdf86525 and generated no new comments.
There are no unresolved review/Copilot threads.
Known prompt-shape caveat: generic tool-instruction injection now uses the shared leading-system merge path, so the separator changed from a manual blank line to the centralized one-newline merge behavior. This is intentional to preserve typed multimodal content.

Verification

dart format --output=none --set-exit-if-changed lib/src/core/template/chat_template_engine.dart test/unit/core/template/chat_template_engine_test.dart → 0 changed
git diff --check → passed
static added-line scan for hardcoded secrets, dangerous execution, and conflict markers → 0 hits
dart analyze lib/src/core/template/chat_template_engine.dart test/unit/core/template/chat_template_engine_test.dart → no issues found
dart test -p vm test/unit/core/template/chat_template_engine_test.dart --exclude-tags local-only → +36: All tests passed!
dart test -p vm,chrome test/unit/core/template/chat_template_engine_test.dart test/unit/core/template/template_render_context_test.dart test/integration/core/template --exclude-tags local-only → +238: All tests passed!
dart test -p vm test/unit/core/template test/unit/tooling/run_llama_cpp_chat_tests_script_test.dart --exclude-tags local-only → +289: All tests passed!
Earlier local validation on this PR also included:
- dart test test/unit/core/template/chat_template_engine_test.dart test/unit/core/template/template_render_context_test.dart → +90: All tests passed! across VM and Chrome/Dart2Js
- dart test -p vm test/integration/core/template --exclude-tags local-only → +84 ~43: All tests passed!
- dart --packages=.dart_tool/package_config.json /tmp/glm_ocr_issue156_smoke.dart with local GLM-OCR GGUF + mmproj + marker image → GLM_OCR_E2E_PASS (supportsVision=true, output included GLM OCR TEST / Issue 156: image marker)
- dart run tool/testing/run_local_e2e.dart --scenario root-native-tool-e2e → +1: All tests passed! with real Qwen3.5-0.8B GGUF prompt/template path
- dart run tool/testing/run_local_e2e.dart --scenario root-template-e2e → partial local-only E2E: FunctionGemma Jinja-template real-model path passed, but the upstream llama.cpp chat-test subtests failed before this Dart refactor path because the local llama.cpp CMake checkout did not provide the requested test-chat-parser target and then libggml-cpu.dylib was not on the runtime loader path
- npm run build from website/ → Docusaurus build succeeded
CI passed on latest head cb04d877b56a438cf112a1d0f4f25128fdf86525:
- Analyze & Lint
- Docs Build Check
- Test Linux & Web (with Coverage)
- Test Native (macos-latest)
- Test Native (windows-latest)
- Native Prompt Reuse Parity

Coverage note

Codecov reports project coverage passing.
Patch coverage is advisory and has previously reported below 90% on this PR; required CI/check runs are green.

Note: The local-only upstream template E2E caveat is an environment/upstream llama.cpp setup issue, not evidence of a Dart refactor regression. Generated .dart_tool/llama_cpp* artifacts were removed after local validation.

Additional local E2E confidence pass (2026-05-21)

Ran after the latest head cb04d877b56a438cf112a1d0f4f25128fdf86525 to validate real-model behavior beyond fixture/template coverage:

dart test -p vm,chrome test/unit/core/template/chat_template_engine_test.dart test/unit/core/template/template_render_context_test.dart test/integration/core/template --exclude-tags local-only → +238: All tests passed!
dart run tool/testing/run_local_e2e.dart --scenario root-native-tool-e2e → real cached Qwen3.5-0.8B-Q4_K_M.gguf native chat-template/tool prompt path, +1: All tests passed!
dart run tool/testing/run_local_e2e.dart --scenario root-template-e2e → upstream llama.cpp chat-template E2E selection + full chat suite, +3: All tests passed!
dart run tool/testing/run_local_e2e.dart --scenario qwen35-multimodal-macos-repro with cached Qwen3.5-0.8B-Q4_K_M.gguf, Qwen3.5-0.8B-mmproj-F16.gguf, CPU backend, and image input → supportsVision: true, non-empty generation, +1: All tests passed!
Direct GLM-OCR native public-API smoke with /opt/UnitySrc/personal/llama/models/glm-ocr/GLM-OCR.i1-Q4_K_M.gguf, matching GLM-OCR.mmproj-Q8_0.gguf, and a deterministic text image containing GLM OCR TEST / Issue 156 image marker → supportsVision: true, output recovered the marker text, GLM_OCR_E2E_PASS.

Copilot

Pull request overview

This PR refactors chat-template preprocessing by removing the global TemplateWorkarounds mutation path and instead applying handler-specific, typed TemplateRenderContext serialization at the render-context boundary. This keeps typed multimodal message parts intact until final Jinja context construction, while making tool-call JSON shape transformations explicit per handler.

Changes:

Replaced TemplateWorkarounds with TemplateRenderContext + TemplateToolCallSerialization to control tool-call normalization/schema/content-embedding at render time.
Updated template handlers to build Jinja messages via templateMessages(...), and added per-format tool-call serialization policies.
Reworked/added unit tests to lock the tool-call serialization policy matrix and preserve multimodal/system-merge behaviors.

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
test/unit/core/template/template_workarounds_test.dart	Removed tests tied to deleted `TemplateWorkarounds`.
test/unit/core/template/template_render_context_test.dart	Added coverage for render-context serialization, system-merge behavior, and moved tool-call encoding.
test/unit/core/template/chat_template_engine_test.dart	Added regression tests for content-only routing and the per-format serialization policy matrix; kept GLM-OCR marker coverage.
lib/src/core/template/template_workarounds.dart	Removed global workaround/mutation implementation.
lib/src/core/template/template_render_context.dart	Introduced render-context serialization utilities and `TemplateToolCallSerialization` policy.
lib/src/core/template/handlers/xiaomi_mimo_handler.dart	Switched handler context to `templateMessages(...)`.
lib/src/core/template/handlers/seed_oss_handler.dart	Switched to `templateMessages(...)` and set `normalizeOnly` tool-call policy.
lib/src/core/template/handlers/qwen3_coder_xml_handler.dart	Switched to `templateMessages(...)` and set `normalizeOnly` tool-call policy.
lib/src/core/template/handlers/nemotron_v2_handler.dart	Switched handler context to `templateMessages(...)`.
lib/src/core/template/handlers/mistral_handler.dart	Switched to `templateMessages(...)` and set `normalizeOnly` tool-call policy.
lib/src/core/template/handlers/minimax_m2_handler.dart	Switched to `templateMessages(...)` and set `normalizeOnly` tool-call policy.
lib/src/core/template/handlers/magistral_handler.dart	Switched handler context to `templateMessages(...)`.
lib/src/core/template/handlers/llama3_handler.dart	Switched to `templateMessages(...)` and set `normalizeOnly` tool-call policy.
lib/src/core/template/handlers/kimi_k2_handler.dart	Switched handler context to `templateMessages(...)`.
lib/src/core/template/handlers/hermes_handler.dart	Switched handler context to `templateMessages(...)`.
lib/src/core/template/handlers/granite_handler.dart	Switched to `templateMessages(...)` and set `genericSchemaInContent` policy.
lib/src/core/template/handlers/glm45_handler.dart	Switched to `templateMessages(...)` and set `normalizeOnly` tool-call policy.
lib/src/core/template/handlers/generic_handler.dart	Parameterized handler format/policy to support content-only & PEG routing without inheriting generic tool-call serialization.
lib/src/core/template/handlers/gemma_handler.dart	Switched handler context to `templateMessages(...)`.
lib/src/core/template/handlers/functionary_v32_handler.dart	Switched handler context to `templateMessages(...)`.
lib/src/core/template/handlers/functionary_v31_llama31_handler.dart	Switched handler context to `templateMessages(...)`.
lib/src/core/template/handlers/firefunction_v2_handler.dart	Switched handler context to `templateMessages(...)`.
lib/src/core/template/handlers/exaone_moe_handler.dart	Switched handler context to `templateMessages(...)`.
lib/src/core/template/handlers/deepseek_r1_handler.dart	Switched handler context to `templateMessages(...)`.
lib/src/core/template/handlers/command_r7b_handler.dart	Switched to `templateMessages(...)` and set `normalizeOnly` tool-call policy.
lib/src/core/template/handlers/apriel15_handler.dart	Switched handler context to `templateMessages(...)`.
lib/src/core/template/handlers/apertus_handler.dart	Switched handler context to `templateMessages(...)`.
lib/src/core/template/chat_template_handler.dart	Added handler-level tool-call serialization policy and centralized `templateMessages(...)` context building.
lib/src/core/template/chat_template_engine.dart	Replaced system-message workaround call-site and adjusted handler routing to ensure PEG/content-only formats don’t inherit generic tool-call serialization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

codecov-commenter · 2026-05-20T14:43:10Z

Codecov Report

❌ Patch coverage is 88.15166% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.67%. Comparing base (ce760d6) to head (cb04d87).

Files with missing lines	Patch %	Lines
lib/src/core/template/template_render_context.dart	82.40%	22 Missing ⚠️
...b/src/core/template/handlers/apriel15_handler.dart	0.00%	1 Missing ⚠️
...src/core/template/handlers/exaone_moe_handler.dart	0.00%	1 Missing ⚠️
...late/handlers/functionary_v31_llama31_handler.dart	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #158      +/-   ##
==========================================
- Coverage   78.67%   78.67%   -0.01%     
==========================================
  Files          76       76              
  Lines        9915     9910       -5     
==========================================
- Hits         7801     7797       -4     
+ Misses       2114     2113       -1

Flag	Coverage Δ
unittests	`78.67% <88.15%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated no new comments.

Copilot

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 1 comment.

Copilot

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated no new comments.

… E2E Hardens llama_cpp_template_detection_integration_test against updated/new llama.cpp templates, adding Bielik, GLM-4.7, GigaChat, SmolLM3, LFM2, Qwen3.5, Reka, StepFun, DeepSeek-V3.2, Gemma-4, Granite-4.0, etc. Also hardens run_llama_cpp_chat_tests.sh against target renames, dynamic library path lookup (DYLD_LIBRARY_PATH/LD_LIBRARY_PATH), and full test-chat server/mtmd build requirements, with script unit tests.

Copilot

Pull request overview

Copilot reviewed 34 out of 34 changed files in this pull request and generated no new comments.

Copilot

Pull request overview

Copilot reviewed 34 out of 34 changed files in this pull request and generated no new comments.

Copilot

Pull request overview

Copilot reviewed 34 out of 34 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings May 20, 2026 14:35

Copilot started reviewing on behalf of leehack May 20, 2026 14:35 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

Comment thread lib/src/core/template/template_render_context.dart

Comment thread test/unit/core/template/template_render_context_test.dart

Refactor template render context serialization

c0fe105

leehack force-pushed the refactor/template-render-context branch from a1e5182 to c0fe105 Compare May 20, 2026 14:47

leehack requested a review from Copilot May 20, 2026 14:48

Copilot started reviewing on behalf of leehack May 20, 2026 14:48 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

fix: harden template render context multimodal handling

17b9ce9

leehack requested a review from Copilot May 21, 2026 13:51

Copilot started reviewing on behalf of leehack May 21, 2026 13:52 View session