Decidable completeness + a soundness-ceiling audit for the TextMate generator (#51) by johnsoncodehk · Pull Request #55 · johnsoncodehk/monogram

johnsoncodehk · 2026-06-20T05:36:27Z

Closes #51. Adds a mechanized, gated account of what src/gen-tm.ts (the TextMate generator) guarantees — and, precisely, what it does not. Two halves, both wired into npm run check.

Completeness (presence) — decidable

The generator's input is a closed, finite combinator algebra, so the obligation set is enumerable and presence completeness — every required construct emitted, reachable, visually scoped — is decidable per grammar. test/tm-completeness.ts decides it with no corpus (token census + structural keyword-literal discharge + repository reachability), and a 20-construct shape-robustness gate asserts each shape-detector still fires under every equivalent factoring (opt-tail / alt-split / via-ref / trailing-comma). test/tm-mutation.ts injects gaps to confirm the checker bites. COMPLETENESS.md is the proof spine + the fixed-denominator ledger.

Soundness — no ceiling, audited

Whether each present construct paints the right scope on all inputs is a separate axis (scope-gap / gap-ledger). One structural question about it is settled here by exhaustive audit: gen-tm has no expressiveness ceiling on well-formed input — every context-dependent scope channel is lex-local, bounded, or delimiter-carried recursion; none needs the unbounded non-delimiter context a ceiling would require, because the obligation space is itself TextMate-bounded. See COMPLETENESS.md §"Soundness — no ceiling".

Honest scope

Decidable per grammar and executed on the six shipped grammars, with the algebra closure as the inductive ∀-grammar backbone — not a closed-form ∀-grammar proof. Presence completeness is decided; the soundness half is strong evidence + a structural argument, not a proof. Ordering is decidable per grammar but not reached by the structural fold (a measured blind spot, not undecidable). COMPLETENESS.md states these bounds.

Fixes (each with a regression guard)

~17 latent completeness gaps — fixed-window detector fragilities, sep-omission silent-drops, declared-scope punctuation in type-param regions, a co-blind markup census — all byte-identical on the six shipped grammars (latent: no shipped grammar triggered them).
2 soundness over-accepts the audit derived — an enum brace-leak, a module/namespace/type declaration-keyword over-accept (module.exports) — root-caused in gen-tm, changing only the ts/tsx grammars, verified by witness + gates.

npm run check 40/40; shipped grammars byte-identical except the two intended soundness fixes.

Add a formal completeness proof for src/gen-tm.ts: for every grammar expressible through the public src/api.ts combinators, every TextMate-representable highlighting obligation is emitted and reachable. This is the dual of the soundness ledger (KNOWN-GAPS.md, which finds WRONG paints) — here we prove there are no MISSING ones. The proof rests on the generator's input being a closed, finite algebra (RuleExpr / TokenPattern), which makes "every obligation" enumerable and reduces completeness to three mechanically-checked layers: - closure — toRuleExpr, the token-pattern compiler, and the shared collectLiterals backbone are total over their unions, so no supported combinator shape is silently dropped; - coverage — every non-skip token has a discharge path (the token census), and every content/keyword obligation leaf is painted (2433/2433 across the six grammars), on a fixed denominator; - reachability — every emitted repository key is reachable from the root patterns or a declared export surface (#expression / canonicalRepoNames / aliasScopes); zero dead keys. The Layer-A audit surfaced one latent silent-drop: getTypeParamElementKeywords omitted `sep`, so a keyword in a sep-list within a type-parameter element lost its keyword role inside `<…>`. Fixed by recursing into sep.element (`not`/`ref` stay omitted on purpose — a forbidden word / a constraint type's own keywords must not be hoisted); the six shipped grammars are byte-identical (the drop is latent), and the checker carries a biting regression guard. Three sites that looked like TextMate impossibilities — variable-width lookbehind for the cast/arrow value test, the balanced-paren arrow confirm, and a regex after a control-flow head — were attacked and refuted: each is expressible in vscode-oniguruma (verified), and the fixed-width forms gen-tm emits are deliberate Onigmo-portability choices. They are a soundness-precision axis, not a completeness gap. COMPLETENESS.md is the proof spine; test/tm-completeness.ts mechanises it (npm run completeness[:check]) and joins `npm run check` as a gate.

A passing checker means nothing if the checker is blind. Add test/tm-mutation.ts: it injects a catalogue of known gaps into the emitted grammar (drop a key / all of a token's includes, neuter a scope to the bare root, add a dead key, a dangling include, mis-scope a token to a wrong role, reorder two disambiguation patterns) and records which detector layer kills each — measuring the detector's power instead of asserting it. Measured: every PRESENCE / REACHABILITY gap is killed corpus-free (16/16, 12/16 by reachability / token-census / the new flat-token neuter check); WRONG-ROLE gaps are caught only by a differential witness (presence ≠ correctness); ORDERING gaps are a measured blind spot — TextMate is order-sensitive and pattern rank lives in the emitted artifact, not the grammar algebra, so no corpus-free structural check reaches it. This is the honest boundary, now empirical: the structural proof covers presence + reachability; ordering / correctness are the soundness axis, reached only by evaluation. The over-claim of an a-priori "no gap can hide" over the whole gap space is dropped — COMPLETENESS.md states the bounded, measured claim. The harness motivated one detector strengthening: tokenCensus now flags a flat token NEUTERED to the bare root scope (an entry that exists but paints inert document text), moving that gap class from differential-only to corpus-free. Wired as a meta-gate in `npm run check`.

The keyword/operator obligation was the one class still checked by the grammar-derived corpus (leaf coverage). Replace it with a structural, a-priori discharge: `literalDischarge` confirms every alphabetic literal the grammar consumes (collectLiterals over every rule + the prec/led tables) appears, as a scoped word, in some REACHABLE pattern whose scope is a keyword family — a finite scan of the emitted artifact, no corpus. 248/248 across the six grammars; non-vacuous (stripping `class` from its patterns reports it undischarged). Completeness is now a decidable structural check end to end — token discharge (census, incl. neuter) + keyword-literal discharge + repository reachability, 952/952 = 100%, no corpus. The leaf-coverage corpus pass is demoted to a redundant differential cross-check on the soundness axis; `tm-mutation`'s structural layer now also kills a dropped/neutered keyword corpus-free. COMPLETENESS.md draws the line correctly: COMPLETENESS (present + reachable + scoped) is decidable — finite G, finite gen-tm(G), an obligation taxonomy bounded by TextMate's finite construct kinds, per-token discharge by structural identity (the flat `match` IS the token's own pattern) — and ∀ G by structural induction over the finite combinator algebra. SOUNDNESS (do the present constructs paint correctly on all inputs — wrong-role, pattern ordering) is the undecidable residual (CFG vs regex-stack-machine over infinite input). The earlier "a-priori completeness is unavailable" was an over-concession: that was soundness's wall, mistaken for completeness's.

An adversarial gap-hunt (5 agents over the classes the structural checker does NOT cover) found that the getTypeParamElementKeywords `sep` drop had un-fixed siblings in the same family: - detectTypeParamConstraintKeywords.scanConstraint omitted `sep`, so a type-parameter CONSTRAINT keyword reached through a `&`/`,`-separated list was not collected — the .tsx generic-arrow⇄JSX no-comma disambiguation lost its constraint signal (mis-scoping the header as a JSX tag). - detectDeclarations.containsBlockRef omitted `sep`, so a declaration whose brace body is reached through a `sep` was not seen as having a body — its #declaration-body member-scoping region was dropped. Both recurse into `sep.element` now, mirroring the prior fix. Byte-identical on all six shipped grammars (latent: every shipped grammar writes constraints as `opt('extends', Type)` and block bodies as direct refs). The hunt found 8 latent completeness gaps total; all were VERIFIED latent — tokenizing the witnesses against the shipped grammars shows ternary, calls, trailing-comma type params, and every prec operator are correctly scoped, so the 0-gap soundness ledger is real, not corpus-blind. The remaining gaps are detector SHAPE-FRAGILITY (fixed-offset window matching in detectTernary / detectCallExpression / isAngleBracketSepRule misses equivalent factorings), an unimplemented `rawBlock` config region, and punctuation-in-region — tracked for follow-up; none triggers on a shipped grammar.

The fixed-window detector fragility the gap-hunt surfaced was a symptom; the root cause is that the shape-detectors pattern-match RAW RuleExpr shapes, so an equivalent factoring of the same construct (an `opt`-tail, a separate args rule, a trailing comma) is a different shape and slips them. Adding a `sep` arm per detector only widens the symptom patch — the next factoring still slips. Fix the structural condition: match the NORMALISED form. expandAlts already canonicalises alt-split / opt-tail / group / quantifier into the same flat adjacency; route the three fragile detectors through it, plus FIRST-set for the one ref-hidden case: - detectTernary, isAngleBracketSepRule → expandAlts (opt-tail and trailing-comma factorings now reduce to the matched adjacency; `sep` stays opaque so the sep node survives). - detectCallExpression → FIRST(next) instead of a literal `(`, so the args may be the inline `(` OR a separate `CallArgs` rule referenced after the callee. Byte-identical on all six shipped grammars (they already write the canonical factoring); the three latent gaps are now emitted for their equivalent forms (verified: #ternary-expression / #function-call / #declaration-type-params). And the detector is no longer blind to this class: a new shape-robustness gate (test/tm-completeness.ts) asserts each construct emits its region for EVERY equivalent factoring — it bites (3 failures) without the normalization. So the class is now caught structurally, not just by an adversarial hunt.

A systematic shape-fragility audit (5 agents running the real emitter over equivalent factorings of each detector's construct) found the fixed-window fragility is broad, not three isolated cases. Fix four more by routing through the normalized form, same pattern as detectTernary: - detectConditionalType — ran its 7-window on raw r.body; now over expandAlts, so an `opt`-tail / grouped / alt-split conditional `?:` is detected. - getTypeParamElementKeywords — early-out demanded an exact 3-item `'<' sep '>'` body; now scans the expandAlts branches for that adjacency, so a trailing `opt(',')` or alt-wrapped type-param list still hoists its constraint keyword. - detectDeclarations.isBlockRule — matched only a raw-seq `{ … }` body; now EVERY expandAlts branch must be `{ … }`-bounded (`.every`, not `.some`, so a `Type` that is only SOMETIMES an object literal is not mis-read as a brace declaration — `.some` regressed `type X = …` to #declaration-body). - detectJsx hasElementShape — matched `<`+ref only in a raw seq; now over the expandAlts branches, so an opt/alt/group factoring of the element qualifies. Byte-identical on all six shipped grammars (verified the `.every` form after a `.some` first attempt changed typescript/tsx output), and each fix is verified to detect its previously-dropped factoring. Seven detectors are now shape-robust; the YAML region detectors and the expression group are the remaining batch.

Reading each expression detector myself (the audit agent's output was malformed): five matched a construct on the raw r.body, so an equivalent factoring slips them. Route them through expandAlts, same pattern as detectTernary/detectCallExpression: - detectBareArrowParam — `ref '=>'`; an opt-tail arrow (`[x, opt('=>', body)]`) was dropped (verified: variable.parameter now emitted for that factoring). - detectPropertyAccess — `'.'`/`'?.'` before a token ref. - detectParenArrowParams + detectArrowParamDelims — the deliberate pair that read the same arrow param-list production; routed identically so they still cannot disagree. - detectDirectParamKeywords — keyword directly before `(`; also recurse `sep`. detectConstructorKeywords already expands (no change). Byte-identical on all six shipped grammars. Twelve detectors are now shape-robust. The YAML region detectors are deferred to a semantics-aware pass: their fixed shapes encode DELIBERATE YAML semantics (detectFold stops at a leaf token and does NOT follow a rule-ref by design — an Indent+rule-ref is a SIBLING node, not a fold), so the audit's heuristic "follow the ref" fix would break the fold-vs-sibling meaning. Not every fixed shape is fragility — some are intent.

) Worked the YAML region detectors semantics-aware (not the audit's heuristic). Key finding: the audit's "route through expandAlts" fix is WRONG for these — they match STRUCTURAL nodes (a `(Newline item)*` quantifier, config-keyed bracket pairs), and expandAlts EXPANDS the very quantifier they depend on. - detectBlockSequence — the one genuine positional rigidity: it matched the `[item, (Newline item)*]` pattern only at items[0]/items[1]. Now scanned pairwise (any adjacent k), so a leading element before the sequence does not hide it. NOT routed through expandAlts (that would expand the quantifier). yaml byte-identical; the full yaml gate group (depth-witnesses, deepest-sibling, compact-nest-sites, flow-sites, blockscalar-depth, issue12) stays green. - detectFold — its visit is ALREADY pairwise; its refsLeaf stopping at a leaf token (not following a rule-ref) is the DELIBERATE fold-vs-sibling distinction (an Indent+rule-ref is a sibling node). No change. - detectExplicitKey — the indicator at items[0] is intrinsic (an explicit-key entry is headed by `?`); the inner already unwraps a quantifier. No change. - detectFlowCollections — topLits is positional-agnostic and unwraps quantifier/group/sep, deliberately stopping at alt/ref to read THIS rule's own structure. No change. So the fixed-window fragility class is closed: 13 detectors made shape-robust where the rigidity was accidental; the rest is deliberate YAML semantics that the heuristic over-flagged. Not every fixed shape is fragility.

getTypeParamElementKeywords collected only KEYWORD literals, so a literal the grammar declares a scope for that is PUNCTUATION (e.g. a `&` scoped punctuation.separator) inside a type-parameter element lost its scope inside `<…>`. Collect any scope-declared literal now, and emit it with the right boundary (`\b` for words, none for punctuation — `\b&\b` never matches). `=` and `,` are skipped in the loop since the dedicated handlers emit them, so the six shipped grammars stay byte-identical (verified the `=` double-emit and excluded it); a scoped `&` in a type-param element is now scoped (verified). This closes the last clean completeness gap from the hunt. The remaining two are not clean completeness gaps: the symbolic-operator case is an ORDERING concern (a short overridden op can shadow a longer non-overridden one across the separate #operator-overrides / #operators patterns — the provably-hard ordering axis, and the op IS scoped, just shadowed), and `rawBlock` is a declared-but-unimplemented IndentConfig field (no shipped grammar sets it; implementing the verbatim region speculatively, with no adopter to verify against, is deferred).

An adversarial review of the "proven no gaps" conclusion found real holes; this acts on them. - Markup co-blindness (the one fixable defect): tokenCensus discharged EVERY markup token via `if (g.markup) bump('markup-region')` with zero verification, so a markup token whose declared scope generateMarkupTm does not model (e.g. a `<?…?>` processing instruction) was reported discharged while the engine painted it the bare root. Now a markup token with an explicit `scope` must have that scope actually emitted, or it is an orphan (verified: the PI counterexample is now caught; html stays 7/7). - COMPLETENESS.md corrections, all overclaims the review caught: - "per-token discharge by structural identity (match IS tokenPatternSource)" was false — the identifier match is widened to identPattern, and the census checks PRESENCE (a reachable non-root-scoped entry), not regex identity. Reworded to presence-not-identity. - "∀ G by structural induction" conflated the algebra CLOSURE (which is ∀ G) with the per-grammar DISCHARGE (executed on the shipped set). Separated. - "ordering is undecidable" was the project's own impossibility-without-proof trap: for a fixed G the pattern list is finite and the winner is a finite index read (gen-tm even sorts it deterministically). It is "not reachable by a corpus-free structural fold," a measurement limit, not undecidability. Residual, honestly stated: the shape class is MEASURED not proven — the robustness gate covers 3 of ~20 detectors, mutation testing mutates only flat tokens (not shape regions), and detectAngleBracketCast keeps a latent ref-factored fragility (type-cast fires on no shipped grammar). 81/81 · 40/40.

…posed (#51) The shape-robustness gate covered 3 constructs (ternary / call / generic-type- params). Extended it to 19 — one per gen-tm shape-detector — each asserting the construct discharges its obligation for several EQUIVALENT factorings (canonical / opt-tail / alt-split / via-ref / trailing-comma). The builders were authored by running the real emitter over each factoring; an `xfail` list records a factoring a detector is known to drop, so a NEW drop OR a stale xfail (a landed fix) both go red — never a silent false-green. Building it out caught two residual fixed-shape fragilities (both latent — neither construct fires on a shipped grammar — so both fixes are byte-identical on all six): - detectAngleBracketCast dropped a cast whose `<Type>` head is its own rule (the `<`/`>` hidden behind a ref). Now resolves a ref to a `'<' type '>'` cast-head rule by name, mirroring detectCallExpression reaching its args through a rule. - detectTypeParamConstraintKeywords extracted the constraint keyword only from a `?`-quantifier `[kw, ref]`, dropping `alt([…,kw,type],[…])` (optionality via an alt branch) and `opt(kw, sep(type))` (type behind a sep). Now reads the constraint as the optional `[kw, type]` segment by which one expandAlts branch extends a prefix-shorter sibling — uniform across opt / alt / sep, and a leading modifier (not a prefix extension) is still excluded. (A first attempt keyed on the follower being a type-flagged rule; the gate caught that it broke the canonical factoring when the constraint type is not type-flagged.) 19 constructs all green; the YAML/markup detectors are positive-controlled (their fixed shapes are deliberate, so they guard the canonical form, not factorings). 40/40 · 96/96 completeness checks.

Rounds the gate to 20 constructs — one per shape-detector. detectJsx's hasElementShape walks expandAlts branches for the `<`+ref element lead, so the attribute list inline or wrapped in opt/alt both surface #jsx-element-in- expression; verified robust. 97/97 completeness checks.

The soundness-ceiling audit (which found gen-tm has no expressiveness ceiling) surfaced two over-accepts by derivation, not corpus — both real divergences from the parser's scoping, both root-caused in gen-tm. Not byte-identical: the two enum/namespace-bearing grammars (ts, tsx) change; the fix is verified by witness + gates + scope-gap, not by byte-identity. - enum brace-leak: #enum-body was the lone emitBracketRegion call whose body lacked a self-balancing first include (its siblings #code-block / #declaration-body both carry one), so a nested `{…}` in a member initializer (`enum E { A = { x: 1 }, B }`) let the inner `}` close the enum region early and leak the following members out of enum scope. Prepend the shared #code-block balancing region (an inner brace is an ordinary expression block, not another enum body) so braces balance to any depth. `B` now scopes enummember; the shorthand-object key case stays correct. - declaration-keyword over-accept: a `storage.type.*` keyword that is also a valid identifier (`module`, `namespace`, `type`, `interface`) was painted by the flat global match unconditionally, so `module.exports` / `namespace.foo` / `type.foo` mis-read the property-access base as the declaration keyword. Demote the whole class — the same contextual-keyword machinery gen-tm already uses for `as`/`keyof`/`public` — to an accessor-guarded `*-decl` rule (`(?!\s*(?:\.|\?\.))`), so a real declaration head still wins while a `.`/`?.`-adjacent base falls through to identifier scoping. Verified `module X {}` / `type X = …` / `namespace N {}` keep their keyword scope. Both DERIVED (agnostic 9/9 — keyed on scope family + declaration/reserved sets, no hardcoded word); npm run check 40/40; scope-gap ts unchanged (the corpus is blind to both witnesses — exactly why the audit derived them).

Adds the symmetric half to the completeness spine: a "Soundness — no ceiling" section stating the audited result that gen-tm has no TextMate expressiveness ceiling on well-formed input — because its obligation space (lex-local + delimiter-carried recursion, no general rule-context→scope channel) is itself TextMate-bounded — with the honest limits (well-formed input only; strong evidence + structural argument, not a closed-form ∀-grammar proof; the two region-leak over-accepts the audit derived were fixable bugs, not ceilings).

johnsoncodehk added 14 commits June 20, 2026 13:35

johnsoncodehk changed the title ~~Prove TextMate-generator completeness (#51)~~ Decidable completeness + a soundness-ceiling audit for the TextMate generator (#51) Jun 21, 2026

johnsoncodehk merged commit 987847e into master Jun 21, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decidable completeness + a soundness-ceiling audit for the TextMate generator (#51)#55

Decidable completeness + a soundness-ceiling audit for the TextMate generator (#51)#55
johnsoncodehk merged 14 commits into
masterfrom
tm-completeness-proof

johnsoncodehk commented Jun 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

johnsoncodehk commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Completeness (presence) — decidable

Soundness — no ceiling, audited

Honest scope

Fixes (each with a regression guard)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

johnsoncodehk commented Jun 20, 2026 •

edited

Loading