Skip to content

Refactor chat template render context serialization#158

Merged
leehack merged 7 commits into
mainfrom
refactor/template-render-context
May 22, 2026
Merged

Refactor chat template render context serialization#158
leehack merged 7 commits into
mainfrom
refactor/template-render-context

Conversation

@leehack
Copy link
Copy Markdown
Owner

@leehack leehack commented May 20, 2026

Summary

  • replace the global TemplateWorkarounds preprocessing path with typed TemplateRenderContext serialization
  • move tool-call serialization policy onto handlers via TemplateToolCallSerialization
  • keep typed multimodal content intact until final template-context conversion
  • lock the old workaround format matrix with regression coverage, including PEG/content-only formats using no serialization policy
  • harden no-system-role template merging so system text stays before leading media parts
  • fail loudly when template-specific tool-call serialization cannot build the required render context
  • ensure ToolChoice.none disables tool-use template routing, tool definitions, tool grammar/triggers, and parallel tool-call behavior for format-specific handlers
  • add small render-context cleanup/optimization: value equality for serialization policy, one-pass tool-call detection, reused default pretty JSON encoder, and removal of no-op try/rethrow wrappers

Regression coverage

  • GLM-OCR image marker preservation
  • actual upstream GLM-OCR chat_template.jinja prompt shape, including exactly one <__media__> placeholder, preserved Text Extraction: prompt text, assistant generation marker, and no raw file://, image_url, or base64 image-source leakage
  • GLM-OCR image prompt preservation while GLM tool-call normalization is active
  • ToolChoice.none on GLM/format-specific handlers does not select the metadata tool-use template, expose tool definitions to Jinja, or emit grammar/preserved tokens/triggers
  • system-message merge preserving typed multimodal parts and ordering (system text, image/media, user text)
  • generic tool-instruction injection preserving multimodal user parts and ordering
  • tool-call argument normalization without mutating source messages
  • invalid template-specific tool-call serialization raising LlamaInferenceException
  • generic/granite schema-in-content rendering
  • content-only and PEG formats not inheriting generic tool-call serialization
  • text-oriented tool-call content moves stringifying list content, while multimodal paths explicitly preserve structured image/audio parts
  • audio content part preservation when tool-call serialization moves tool calls into multimodal content

Review notes

  • Independent local review found no blockers after the first hardening pass.
  • A second independent review of the additional hardening found no blockers or obvious regressions.
  • The latest review found one actionable gap: ToolChoice.none could still leave format-specific handler tool prompting/grammar enabled when tools were supplied. This is now fixed and covered by a GLM regression test.
  • Previous Copilot comments were addressed and resolved:
    • preserved list-shaped assistant content when moving tool calls into content
    • added non-string-key tool-call argument normalization coverage
    • avoided embedding full stack traces in LlamaInferenceException.details; the original stack is preserved with Error.throwWithStackTrace
  • Copilot was re-requested on latest pushed head cb04d877b56a438cf112a1d0f4f25128fdf86525 and generated no new comments.
  • There are no unresolved review/Copilot threads.
  • Known prompt-shape caveat: generic tool-instruction injection now uses the shared leading-system merge path, so the separator changed from a manual blank line to the centralized one-newline merge behavior. This is intentional to preserve typed multimodal content.

Verification

  • dart format --output=none --set-exit-if-changed lib/src/core/template/chat_template_engine.dart test/unit/core/template/chat_template_engine_test.dart → 0 changed
  • git diff --check → passed
  • static added-line scan for hardcoded secrets, dangerous execution, and conflict markers → 0 hits
  • dart analyze lib/src/core/template/chat_template_engine.dart test/unit/core/template/chat_template_engine_test.dart → no issues found
  • dart test -p vm test/unit/core/template/chat_template_engine_test.dart --exclude-tags local-only+36: All tests passed!
  • dart test -p vm,chrome test/unit/core/template/chat_template_engine_test.dart test/unit/core/template/template_render_context_test.dart test/integration/core/template --exclude-tags local-only+238: All tests passed!
  • dart test -p vm test/unit/core/template test/unit/tooling/run_llama_cpp_chat_tests_script_test.dart --exclude-tags local-only+289: All tests passed!
  • Earlier local validation on this PR also included:
    • dart test test/unit/core/template/chat_template_engine_test.dart test/unit/core/template/template_render_context_test.dart+90: All tests passed! across VM and Chrome/Dart2Js
    • dart test -p vm test/integration/core/template --exclude-tags local-only+84 ~43: All tests passed!
    • dart --packages=.dart_tool/package_config.json /tmp/glm_ocr_issue156_smoke.dart with local GLM-OCR GGUF + mmproj + marker image → GLM_OCR_E2E_PASS (supportsVision=true, output included GLM OCR TEST / Issue 156: image marker)
    • dart run tool/testing/run_local_e2e.dart --scenario root-native-tool-e2e+1: All tests passed! with real Qwen3.5-0.8B GGUF prompt/template path
    • dart run tool/testing/run_local_e2e.dart --scenario root-template-e2e → partial local-only E2E: FunctionGemma Jinja-template real-model path passed, but the upstream llama.cpp chat-test subtests failed before this Dart refactor path because the local llama.cpp CMake checkout did not provide the requested test-chat-parser target and then libggml-cpu.dylib was not on the runtime loader path
    • npm run build from website/ → Docusaurus build succeeded
  • CI passed on latest head cb04d877b56a438cf112a1d0f4f25128fdf86525:
    • Analyze & Lint
    • Docs Build Check
    • Test Linux & Web (with Coverage)
    • Test Native (macos-latest)
    • Test Native (windows-latest)
    • Native Prompt Reuse Parity

Coverage note

  • Codecov reports project coverage passing.
  • Patch coverage is advisory and has previously reported below 90% on this PR; required CI/check runs are green.

Note: The local-only upstream template E2E caveat is an environment/upstream llama.cpp setup issue, not evidence of a Dart refactor regression. Generated .dart_tool/llama_cpp* artifacts were removed after local validation.

Additional local E2E confidence pass (2026-05-21)

Ran after the latest head cb04d877b56a438cf112a1d0f4f25128fdf86525 to validate real-model behavior beyond fixture/template coverage:

  • dart test -p vm,chrome test/unit/core/template/chat_template_engine_test.dart test/unit/core/template/template_render_context_test.dart test/integration/core/template --exclude-tags local-only+238: All tests passed!
  • dart run tool/testing/run_local_e2e.dart --scenario root-native-tool-e2e → real cached Qwen3.5-0.8B-Q4_K_M.gguf native chat-template/tool prompt path, +1: All tests passed!
  • dart run tool/testing/run_local_e2e.dart --scenario root-template-e2e → upstream llama.cpp chat-template E2E selection + full chat suite, +3: All tests passed!
  • dart run tool/testing/run_local_e2e.dart --scenario qwen35-multimodal-macos-repro with cached Qwen3.5-0.8B-Q4_K_M.gguf, Qwen3.5-0.8B-mmproj-F16.gguf, CPU backend, and image input → supportsVision: true, non-empty generation, +1: All tests passed!
  • Direct GLM-OCR native public-API smoke with /opt/UnitySrc/personal/llama/models/glm-ocr/GLM-OCR.i1-Q4_K_M.gguf, matching GLM-OCR.mmproj-Q8_0.gguf, and a deterministic text image containing GLM OCR TEST / Issue 156 image markersupportsVision: true, output recovered the marker text, GLM_OCR_E2E_PASS.

Copilot AI review requested due to automatic review settings May 20, 2026 14:35
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors chat-template preprocessing by removing the global TemplateWorkarounds mutation path and instead applying handler-specific, typed TemplateRenderContext serialization at the render-context boundary. This keeps typed multimodal message parts intact until final Jinja context construction, while making tool-call JSON shape transformations explicit per handler.

Changes:

  • Replaced TemplateWorkarounds with TemplateRenderContext + TemplateToolCallSerialization to control tool-call normalization/schema/content-embedding at render time.
  • Updated template handlers to build Jinja messages via templateMessages(...), and added per-format tool-call serialization policies.
  • Reworked/added unit tests to lock the tool-call serialization policy matrix and preserve multimodal/system-merge behaviors.

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/unit/core/template/template_workarounds_test.dart Removed tests tied to deleted TemplateWorkarounds.
test/unit/core/template/template_render_context_test.dart Added coverage for render-context serialization, system-merge behavior, and moved tool-call encoding.
test/unit/core/template/chat_template_engine_test.dart Added regression tests for content-only routing and the per-format serialization policy matrix; kept GLM-OCR marker coverage.
lib/src/core/template/template_workarounds.dart Removed global workaround/mutation implementation.
lib/src/core/template/template_render_context.dart Introduced render-context serialization utilities and TemplateToolCallSerialization policy.
lib/src/core/template/handlers/xiaomi_mimo_handler.dart Switched handler context to templateMessages(...).
lib/src/core/template/handlers/seed_oss_handler.dart Switched to templateMessages(...) and set normalizeOnly tool-call policy.
lib/src/core/template/handlers/qwen3_coder_xml_handler.dart Switched to templateMessages(...) and set normalizeOnly tool-call policy.
lib/src/core/template/handlers/nemotron_v2_handler.dart Switched handler context to templateMessages(...).
lib/src/core/template/handlers/mistral_handler.dart Switched to templateMessages(...) and set normalizeOnly tool-call policy.
lib/src/core/template/handlers/minimax_m2_handler.dart Switched to templateMessages(...) and set normalizeOnly tool-call policy.
lib/src/core/template/handlers/magistral_handler.dart Switched handler context to templateMessages(...).
lib/src/core/template/handlers/llama3_handler.dart Switched to templateMessages(...) and set normalizeOnly tool-call policy.
lib/src/core/template/handlers/kimi_k2_handler.dart Switched handler context to templateMessages(...).
lib/src/core/template/handlers/hermes_handler.dart Switched handler context to templateMessages(...).
lib/src/core/template/handlers/granite_handler.dart Switched to templateMessages(...) and set genericSchemaInContent policy.
lib/src/core/template/handlers/glm45_handler.dart Switched to templateMessages(...) and set normalizeOnly tool-call policy.
lib/src/core/template/handlers/generic_handler.dart Parameterized handler format/policy to support content-only & PEG routing without inheriting generic tool-call serialization.
lib/src/core/template/handlers/gemma_handler.dart Switched handler context to templateMessages(...).
lib/src/core/template/handlers/functionary_v32_handler.dart Switched handler context to templateMessages(...).
lib/src/core/template/handlers/functionary_v31_llama31_handler.dart Switched handler context to templateMessages(...).
lib/src/core/template/handlers/firefunction_v2_handler.dart Switched handler context to templateMessages(...).
lib/src/core/template/handlers/exaone_moe_handler.dart Switched handler context to templateMessages(...).
lib/src/core/template/handlers/deepseek_r1_handler.dart Switched handler context to templateMessages(...).
lib/src/core/template/handlers/command_r7b_handler.dart Switched to templateMessages(...) and set normalizeOnly tool-call policy.
lib/src/core/template/handlers/apriel15_handler.dart Switched handler context to templateMessages(...).
lib/src/core/template/handlers/apertus_handler.dart Switched handler context to templateMessages(...).
lib/src/core/template/chat_template_handler.dart Added handler-level tool-call serialization policy and centralized templateMessages(...) context building.
lib/src/core/template/chat_template_engine.dart Replaced system-message workaround call-site and adjusted handler routing to ensure PEG/content-only formats don’t inherit generic tool-call serialization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/src/core/template/template_render_context.dart
Comment thread test/unit/core/template/template_render_context_test.dart
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 20, 2026

Codecov Report

❌ Patch coverage is 88.15166% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.67%. Comparing base (ce760d6) to head (cb04d87).

Files with missing lines Patch % Lines
lib/src/core/template/template_render_context.dart 82.40% 22 Missing ⚠️
...b/src/core/template/handlers/apriel15_handler.dart 0.00% 1 Missing ⚠️
...src/core/template/handlers/exaone_moe_handler.dart 0.00% 1 Missing ⚠️
...late/handlers/functionary_v31_llama31_handler.dart 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #158      +/-   ##
==========================================
- Coverage   78.67%   78.67%   -0.01%     
==========================================
  Files          76       76              
  Lines        9915     9910       -5     
==========================================
- Hits         7801     7797       -4     
+ Misses       2114     2113       -1     
Flag Coverage Δ
unittests 78.67% <88.15%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@leehack leehack force-pushed the refactor/template-render-context branch from a1e5182 to c0fe105 Compare May 20, 2026 14:47
@leehack leehack requested a review from Copilot May 20, 2026 14:48
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 1 comment.

Comment thread lib/src/core/template/chat_template_handler.dart Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated no new comments.

leehack added 3 commits May 21, 2026 09:20
… E2E

Hardens llama_cpp_template_detection_integration_test against updated/new llama.cpp templates, adding Bielik, GLM-4.7, GigaChat, SmolLM3, LFM2, Qwen3.5, Reka, StepFun, DeepSeek-V3.2, Gemma-4, Granite-4.0, etc.

Also hardens run_llama_cpp_chat_tests.sh against target renames, dynamic library path lookup (DYLD_LIBRARY_PATH/LD_LIBRARY_PATH), and full test-chat server/mtmd build requirements, with script unit tests.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 34 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 34 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 34 changed files in this pull request and generated no new comments.

@leehack leehack merged commit 8625cbf into main May 22, 2026
10 checks passed
@leehack leehack deleted the refactor/template-render-context branch May 22, 2026 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants