Add consistency report and project glossary by jserv · Pull Request #82 · sysprog21/zhtw-mcp

jserv · 2026-05-06T05:55:56Z

A real-world deployment study [1] reported mainland-Chinese terms slipping past the linter in published zh-TW articles, blockquote citation contexts producing ~50 false positives across a 72-article corpus, and ASCII quotes auto-converted to 「」 inside YAML frontmatter breaking downstream parsers.

User-facing additions:

'--consistency' reports mixed regional usage of one concept (both 線程 and 執行緒 in the same document). Groups by the rule's "english" anchor; skips TM-suppressed terms.
'--exempt-blockquotes' (CLI + '[markdown]' config) excludes pulldown-cmark 'Tag::BlockQuote' ranges from scanning. Off by default: adopted blockquote prose is real content.
YAML frontmatter preserves ASCII '"' / ''' scalar delimiters. Body prose still converts to 「」.
'[glossary]' section in '.zhtw-mcp.toml': banned / preferred / proper_nouns lists. Banned terms inject synthetic Errors that TM cannot downgrade; proper_nouns suppress matching issues; both honor exclusion zones.
Per-rule 'editorial_confidence' (low / medium / high) flows through issue inflation into MCP explain output. Low forces auto_fix_safe = false and needs_review = true. 優化, 算法, 場景 tagged low because both regional forms are valid zh-TW.

Calque-audit refinements:

消息 gains positional_clues; 好消息 / 壞消息 / 消息來源 no longer fire.
Symmetric 元資料 rule mirrors 元數據 — both use to: [] plus english: "metadata", surfacing the English original as the preferred form. 詮釋資料 and 後設資料 (NAER terminology bank) remain unflagged as acceptable zh-TW alternatives.
Real-world regression fixture pins the 14 documented blind-spot terms.

[1] https://ai-muninn.com/zh-TW/blog/zhtw-mcp-calque-blindspot-sweep

Summary by cubic

Adds a document-wide terminology consistency report and a project glossary to enforce preferred terms. Reduces false positives in blockquote citations and preserves ASCII quotes in YAML frontmatter.

New Features
- --consistency: groups by a rule’s english anchor to catch mixed regional terms in one doc; ignores TM‑suppressed issues.
- Project glossary in .zhtw-mcp.toml ([glossary]): banned (always error), preferred (guides suggestions), proper_nouns (suppress); all honor exclusion zones.
- Blockquote exemption: --exempt-blockquotes and [markdown].exempt_blockquotes = true exclude Markdown blockquotes from scanning (off by default).
- Per‑rule editorial_confidence (low/medium/high): included in diagnostics; low forces auto_fix_safe = false and needs_review = true.
Bug Fixes
- YAML frontmatter now preserves ASCII " and ' scalar delimiters; body prose still converts to 「」.
- Calque rules: added positional clues for 消息 (no more false positives like 好消息/壞消息/消息來源); added a symmetric 元資料 rule mirroring 元數據 with english: "metadata" while keeping 詮釋資料/後設資料 acceptable.

^{Written for commit 269c8dd. Summary will update on new commits.}

cubic-dev-ai

7 issues found across 29 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/rules/glossary.rs">

<violation number="1" location="src/rules/glossary.rs:86">
P1: Skipping synthetic banned issues when a regular issue already exists can let TM downgrade the only report, breaking the intended `banned > TM` precedence.</violation>
</file>

<file name="src/engine/consistency.rs">

<violation number="1" location="src/engine/consistency.rs:148">
P1: Do not bypass group matching when only one term group exists; it can select unrelated glossary terms and create false consistency diagnostics.</violation>
</file>

<file name="src/mcp/tools.rs">

<violation number="1" location="src/mcp/tools.rs:853">
P1: Fix-mode ordering lets TM downgrade glossary-banned synthetic errors, breaking the intended `banned > TM` precedence.</violation>

<violation number="2" location="src/mcp/tools.rs:1083">
P2: `tools/list` schema was not updated for the new `exempt_blockquotes`/`glossary`/`consistency` arguments, causing API contract drift.</violation>
</file>

<file name="src/main.rs">

<violation number="1" location="src/main.rs:869">
P2: The new `--exempt-blockquotes` mode is not represented in scan-cache keys, so cached results can be incorrect when toggling the flag.</violation>

<violation number="2" location="src/main.rs:1194">
P1: Cache-hit paths can drop source text needed by the new glossary/consistency features, causing missed findings.</violation>
</file>

<file name="tests/realworld_calques.rs">

<violation number="1" location="tests/realworld_calques.rs:62">
P2: Match on containment here so the collocation regression is caught even when the scanner reports the full phrase instead of the bare term.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-05-06T06:02:21Z

    if params.relaxed {
        cfg = cfg.with_relaxed();
    }
+    if params.exempt_blockquotes {


P2: The new --exempt-blockquotes mode is not represented in scan-cache keys, so cached results can be incorrect when toggling the flag.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/main.rs, line 869: <comment>The new `--exempt-blockquotes` mode is not represented in scan-cache keys, so cached results can be incorrect when toggling the flag.</comment> <file context> @@ -825,6 +866,9 @@ fn run_lint_batch(params: &LintBatchParams<'_>) -> Result<()> { if params.relaxed { cfg = cfg.with_relaxed(); } + if params.exempt_blockquotes { + cfg = cfg.with_exempt_blockquotes(true); + } </file context>

A real-world deployment study [1] reported mainland-Chinese terms slipping past the linter in published zh-TW articles, blockquote citation contexts producing ~50 false positives across a 72-article corpus, and ASCII quotes auto-converted to 「」 inside YAML frontmatter breaking downstream parsers. User-facing additions: - '--consistency' reports mixed regional usage of one concept (both 線程 and 執行緒 in the same document). Groups by the rule's "english" anchor; skips TM-suppressed terms. - '--exempt-blockquotes' (CLI + '[markdown]' config) excludes pulldown-cmark 'Tag::BlockQuote' ranges from scanning. Off by default: adopted blockquote prose is real content. - YAML frontmatter preserves ASCII '"' / ''' scalar delimiters. Body prose still converts to 「」. - '[glossary]' section in '.zhtw-mcp.toml': banned / preferred / proper_nouns lists. Banned terms inject synthetic Errors that TM cannot downgrade; proper_nouns suppress matching issues; both honor exclusion zones. - Per-rule 'editorial_confidence' (low / medium / high) flows through issue inflation into MCP explain output. Low forces auto_fix_safe = false and needs_review = true. 優化, 算法, 場景 tagged low because both regional forms are valid zh-TW. Calque-audit refinements: - 消息 gains positional_clues; 好消息 / 壞消息 / 消息來源 no longer fire. - Symmetric 元資料 rule mirrors 元數據 — both use to: [] plus english: "metadata", surfacing the English original as the preferred form. 詮釋資料 and 後設資料 (NAER terminology bank) remain unflagged as acceptable zh-TW alternatives. - Real-world regression fixture pins the 14 documented blind-spot terms. [1] https://ai-muninn.com/zh-TW/blog/zhtw-mcp-calque-blindspot-sweep

cubic-dev-ai Bot reviewed May 6, 2026

View reviewed changes

jserv force-pushed the refine branch from cfb7884 to 269c8dd Compare May 6, 2026 09:07

jserv merged commit cbddee7 into main May 6, 2026
4 checks passed

jserv deleted the refine branch May 6, 2026 09:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add consistency report and project glossary#82

Add consistency report and project glossary#82
jserv merged 1 commit into
mainfrom
refine

jserv commented May 6, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot May 6, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jserv commented May 6, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jserv commented May 6, 2026 •

edited by cubic-dev-ai Bot

Loading