Skip to content

Decidable completeness + a soundness-ceiling audit for the TextMate generator (#51)#55

Merged
johnsoncodehk merged 14 commits into
masterfrom
tm-completeness-proof
Jun 21, 2026
Merged

Decidable completeness + a soundness-ceiling audit for the TextMate generator (#51)#55
johnsoncodehk merged 14 commits into
masterfrom
tm-completeness-proof

Conversation

@johnsoncodehk

@johnsoncodehk johnsoncodehk commented Jun 20, 2026

Copy link
Copy Markdown
Owner

Closes #51. Adds a mechanized, gated account of what src/gen-tm.ts (the TextMate generator) guarantees — and, precisely, what it does not. Two halves, both wired into npm run check.

Completeness (presence) — decidable

The generator's input is a closed, finite combinator algebra, so the obligation set is enumerable and presence completeness — every required construct emitted, reachable, visually scoped — is decidable per grammar. test/tm-completeness.ts decides it with no corpus (token census + structural keyword-literal discharge + repository reachability), and a 20-construct shape-robustness gate asserts each shape-detector still fires under every equivalent factoring (opt-tail / alt-split / via-ref / trailing-comma). test/tm-mutation.ts injects gaps to confirm the checker bites. COMPLETENESS.md is the proof spine + the fixed-denominator ledger.

Soundness — no ceiling, audited

Whether each present construct paints the right scope on all inputs is a separate axis (scope-gap / gap-ledger). One structural question about it is settled here by exhaustive audit: gen-tm has no expressiveness ceiling on well-formed input — every context-dependent scope channel is lex-local, bounded, or delimiter-carried recursion; none needs the unbounded non-delimiter context a ceiling would require, because the obligation space is itself TextMate-bounded. See COMPLETENESS.md §"Soundness — no ceiling".

Honest scope

Decidable per grammar and executed on the six shipped grammars, with the algebra closure as the inductive ∀-grammar backbone — not a closed-form ∀-grammar proof. Presence completeness is decided; the soundness half is strong evidence + a structural argument, not a proof. Ordering is decidable per grammar but not reached by the structural fold (a measured blind spot, not undecidable). COMPLETENESS.md states these bounds.

Fixes (each with a regression guard)

  • ~17 latent completeness gaps — fixed-window detector fragilities, sep-omission silent-drops, declared-scope punctuation in type-param regions, a co-blind markup census — all byte-identical on the six shipped grammars (latent: no shipped grammar triggered them).
  • 2 soundness over-accepts the audit derived — an enum brace-leak, a module/namespace/type declaration-keyword over-accept (module.exports) — root-caused in gen-tm, changing only the ts/tsx grammars, verified by witness + gates.

npm run check 40/40; shipped grammars byte-identical except the two intended soundness fixes.

Add a formal completeness proof for src/gen-tm.ts: for every grammar
expressible through the public src/api.ts combinators, every
TextMate-representable highlighting obligation is emitted and reachable.
This is the dual of the soundness ledger (KNOWN-GAPS.md, which finds
WRONG paints) — here we prove there are no MISSING ones.

The proof rests on the generator's input being a closed, finite algebra
(RuleExpr / TokenPattern), which makes "every obligation" enumerable and
reduces completeness to three mechanically-checked layers:

- closure — toRuleExpr, the token-pattern compiler, and the shared
  collectLiterals backbone are total over their unions, so no supported
  combinator shape is silently dropped;
- coverage — every non-skip token has a discharge path (the token
  census), and every content/keyword obligation leaf is painted
  (2433/2433 across the six grammars), on a fixed denominator;
- reachability — every emitted repository key is reachable from the root
  patterns or a declared export surface (#expression / canonicalRepoNames
  / aliasScopes); zero dead keys.

The Layer-A audit surfaced one latent silent-drop:
getTypeParamElementKeywords omitted `sep`, so a keyword in a sep-list
within a type-parameter element lost its keyword role inside `<…>`. Fixed
by recursing into sep.element (`not`/`ref` stay omitted on purpose — a
forbidden word / a constraint type's own keywords must not be hoisted);
the six shipped grammars are byte-identical (the drop is latent), and the
checker carries a biting regression guard.

Three sites that looked like TextMate impossibilities — variable-width
lookbehind for the cast/arrow value test, the balanced-paren arrow
confirm, and a regex after a control-flow head — were attacked and
refuted: each is expressible in vscode-oniguruma (verified), and the
fixed-width forms gen-tm emits are deliberate Onigmo-portability choices.
They are a soundness-precision axis, not a completeness gap.

COMPLETENESS.md is the proof spine; test/tm-completeness.ts mechanises it
(npm run completeness[:check]) and joins `npm run check` as a gate.
A passing checker means nothing if the checker is blind. Add
test/tm-mutation.ts: it injects a catalogue of known gaps into the emitted
grammar (drop a key / all of a token's includes, neuter a scope to the bare
root, add a dead key, a dangling include, mis-scope a token to a wrong role,
reorder two disambiguation patterns) and records which detector layer kills
each — measuring the detector's power instead of asserting it.

Measured: every PRESENCE / REACHABILITY gap is killed corpus-free (16/16,
12/16 by reachability / token-census / the new flat-token neuter check);
WRONG-ROLE gaps are caught only by a differential witness (presence ≠
correctness); ORDERING gaps are a measured blind spot — TextMate is
order-sensitive and pattern rank lives in the emitted artifact, not the
grammar algebra, so no corpus-free structural check reaches it. This is the
honest boundary, now empirical: the structural proof covers presence +
reachability; ordering / correctness are the soundness axis, reached only by
evaluation. The over-claim of an a-priori "no gap can hide" over the whole
gap space is dropped — COMPLETENESS.md states the bounded, measured claim.

The harness motivated one detector strengthening: tokenCensus now flags a
flat token NEUTERED to the bare root scope (an entry that exists but paints
inert document text), moving that gap class from differential-only to
corpus-free. Wired as a meta-gate in `npm run check`.
The keyword/operator obligation was the one class still checked by the
grammar-derived corpus (leaf coverage). Replace it with a structural,
a-priori discharge: `literalDischarge` confirms every alphabetic literal the
grammar consumes (collectLiterals over every rule + the prec/led tables)
appears, as a scoped word, in some REACHABLE pattern whose scope is a keyword
family — a finite scan of the emitted artifact, no corpus. 248/248 across the
six grammars; non-vacuous (stripping `class` from its patterns reports it
undischarged).

Completeness is now a decidable structural check end to end — token discharge
(census, incl. neuter) + keyword-literal discharge + repository reachability,
952/952 = 100%, no corpus. The leaf-coverage corpus pass is demoted to a
redundant differential cross-check on the soundness axis; `tm-mutation`'s
structural layer now also kills a dropped/neutered keyword corpus-free.

COMPLETENESS.md draws the line correctly: COMPLETENESS (present + reachable +
scoped) is decidable — finite G, finite gen-tm(G), an obligation taxonomy
bounded by TextMate's finite construct kinds, per-token discharge by
structural identity (the flat `match` IS the token's own pattern) — and ∀ G by
structural induction over the finite combinator algebra. SOUNDNESS (do the
present constructs paint correctly on all inputs — wrong-role, pattern
ordering) is the undecidable residual (CFG vs regex-stack-machine over infinite
input). The earlier "a-priori completeness is unavailable" was an
over-concession: that was soundness's wall, mistaken for completeness's.
An adversarial gap-hunt (5 agents over the classes the structural checker
does NOT cover) found that the getTypeParamElementKeywords `sep` drop had
un-fixed siblings in the same family:

- detectTypeParamConstraintKeywords.scanConstraint omitted `sep`, so a
  type-parameter CONSTRAINT keyword reached through a `&`/`,`-separated list
  was not collected — the .tsx generic-arrow⇄JSX no-comma disambiguation lost
  its constraint signal (mis-scoping the header as a JSX tag).
- detectDeclarations.containsBlockRef omitted `sep`, so a declaration whose
  brace body is reached through a `sep` was not seen as having a body — its
  #declaration-body member-scoping region was dropped.

Both recurse into `sep.element` now, mirroring the prior fix. Byte-identical
on all six shipped grammars (latent: every shipped grammar writes constraints
as `opt('extends', Type)` and block bodies as direct refs).

The hunt found 8 latent completeness gaps total; all were VERIFIED latent —
tokenizing the witnesses against the shipped grammars shows ternary, calls,
trailing-comma type params, and every prec operator are correctly scoped, so
the 0-gap soundness ledger is real, not corpus-blind. The remaining gaps are
detector SHAPE-FRAGILITY (fixed-offset window matching in detectTernary /
detectCallExpression / isAngleBracketSepRule misses equivalent factorings),
an unimplemented `rawBlock` config region, and punctuation-in-region — tracked
for follow-up; none triggers on a shipped grammar.
The fixed-window detector fragility the gap-hunt surfaced was a symptom; the
root cause is that the shape-detectors pattern-match RAW RuleExpr shapes, so an
equivalent factoring of the same construct (an `opt`-tail, a separate args
rule, a trailing comma) is a different shape and slips them. Adding a `sep` arm
per detector only widens the symptom patch — the next factoring still slips.

Fix the structural condition: match the NORMALISED form. expandAlts already
canonicalises alt-split / opt-tail / group / quantifier into the same flat
adjacency; route the three fragile detectors through it, plus FIRST-set for the
one ref-hidden case:

- detectTernary, isAngleBracketSepRule → expandAlts (opt-tail and trailing-comma
  factorings now reduce to the matched adjacency; `sep` stays opaque so the sep
  node survives).
- detectCallExpression → FIRST(next) instead of a literal `(`, so the args may be
  the inline `(` OR a separate `CallArgs` rule referenced after the callee.

Byte-identical on all six shipped grammars (they already write the canonical
factoring); the three latent gaps are now emitted for their equivalent forms
(verified: #ternary-expression / #function-call / #declaration-type-params).

And the detector is no longer blind to this class: a new shape-robustness gate
(test/tm-completeness.ts) asserts each construct emits its region for EVERY
equivalent factoring — it bites (3 failures) without the normalization. So the
class is now caught structurally, not just by an adversarial hunt.
A systematic shape-fragility audit (5 agents running the real emitter over
equivalent factorings of each detector's construct) found the fixed-window
fragility is broad, not three isolated cases. Fix four more by routing through
the normalized form, same pattern as detectTernary:

- detectConditionalType — ran its 7-window on raw r.body; now over expandAlts,
  so an `opt`-tail / grouped / alt-split conditional `?:` is detected.
- getTypeParamElementKeywords — early-out demanded an exact 3-item `'<' sep '>'`
  body; now scans the expandAlts branches for that adjacency, so a trailing
  `opt(',')` or alt-wrapped type-param list still hoists its constraint keyword.
- detectDeclarations.isBlockRule — matched only a raw-seq `{ … }` body; now
  EVERY expandAlts branch must be `{ … }`-bounded (`.every`, not `.some`, so a
  `Type` that is only SOMETIMES an object literal is not mis-read as a brace
  declaration — `.some` regressed `type X = …` to #declaration-body).
- detectJsx hasElementShape — matched `<`+ref only in a raw seq; now over the
  expandAlts branches, so an opt/alt/group factoring of the element qualifies.

Byte-identical on all six shipped grammars (verified the `.every` form after a
`.some` first attempt changed typescript/tsx output), and each fix is verified
to detect its previously-dropped factoring. Seven detectors are now shape-robust;
the YAML region detectors and the expression group are the remaining batch.
Reading each expression detector myself (the audit agent's output was
malformed): five matched a construct on the raw r.body, so an equivalent
factoring slips them. Route them through expandAlts, same pattern as
detectTernary/detectCallExpression:

- detectBareArrowParam — `ref '=>'`; an opt-tail arrow (`[x, opt('=>', body)]`)
  was dropped (verified: variable.parameter now emitted for that factoring).
- detectPropertyAccess — `'.'`/`'?.'` before a token ref.
- detectParenArrowParams + detectArrowParamDelims — the deliberate pair that
  read the same arrow param-list production; routed identically so they still
  cannot disagree.
- detectDirectParamKeywords — keyword directly before `(`; also recurse `sep`.

detectConstructorKeywords already expands (no change). Byte-identical on all six
shipped grammars. Twelve detectors are now shape-robust. The YAML region
detectors are deferred to a semantics-aware pass: their fixed shapes encode
DELIBERATE YAML semantics (detectFold stops at a leaf token and does NOT follow
a rule-ref by design — an Indent+rule-ref is a SIBLING node, not a fold), so the
audit's heuristic "follow the ref" fix would break the fold-vs-sibling meaning.
Not every fixed shape is fragility — some are intent.
)

Worked the YAML region detectors semantics-aware (not the audit's heuristic).
Key finding: the audit's "route through expandAlts" fix is WRONG for these —
they match STRUCTURAL nodes (a `(Newline item)*` quantifier, config-keyed
bracket pairs), and expandAlts EXPANDS the very quantifier they depend on.

- detectBlockSequence — the one genuine positional rigidity: it matched the
  `[item, (Newline item)*]` pattern only at items[0]/items[1]. Now scanned
  pairwise (any adjacent k), so a leading element before the sequence does not
  hide it. NOT routed through expandAlts (that would expand the quantifier).
  yaml byte-identical; the full yaml gate group (depth-witnesses, deepest-sibling,
  compact-nest-sites, flow-sites, blockscalar-depth, issue12) stays green.
- detectFold — its visit is ALREADY pairwise; its refsLeaf stopping at a leaf
  token (not following a rule-ref) is the DELIBERATE fold-vs-sibling distinction
  (an Indent+rule-ref is a sibling node). No change.
- detectExplicitKey — the indicator at items[0] is intrinsic (an explicit-key
  entry is headed by `?`); the inner already unwraps a quantifier. No change.
- detectFlowCollections — topLits is positional-agnostic and unwraps
  quantifier/group/sep, deliberately stopping at alt/ref to read THIS rule's own
  structure. No change.

So the fixed-window fragility class is closed: 13 detectors made shape-robust
where the rigidity was accidental; the rest is deliberate YAML semantics that
the heuristic over-flagged. Not every fixed shape is fragility.
getTypeParamElementKeywords collected only KEYWORD literals, so a literal the
grammar declares a scope for that is PUNCTUATION (e.g. a `&` scoped
punctuation.separator) inside a type-parameter element lost its scope inside
`<…>`. Collect any scope-declared literal now, and emit it with the right
boundary (`\b` for words, none for punctuation — `\b&\b` never matches). `=`
and `,` are skipped in the loop since the dedicated handlers emit them, so the
six shipped grammars stay byte-identical (verified the `=` double-emit and
excluded it); a scoped `&` in a type-param element is now scoped (verified).

This closes the last clean completeness gap from the hunt. The remaining two are
not clean completeness gaps: the symbolic-operator case is an ORDERING concern (a
short overridden op can shadow a longer non-overridden one across the separate
#operator-overrides / #operators patterns — the provably-hard ordering axis, and
the op IS scoped, just shadowed), and `rawBlock` is a declared-but-unimplemented
IndentConfig field (no shipped grammar sets it; implementing the verbatim region
speculatively, with no adopter to verify against, is deferred).
An adversarial review of the "proven no gaps" conclusion found real holes;
this acts on them.

- Markup co-blindness (the one fixable defect): tokenCensus discharged EVERY
  markup token via `if (g.markup) bump('markup-region')` with zero
  verification, so a markup token whose declared scope generateMarkupTm does
  not model (e.g. a `<?…?>` processing instruction) was reported discharged
  while the engine painted it the bare root. Now a markup token with an
  explicit `scope` must have that scope actually emitted, or it is an orphan
  (verified: the PI counterexample is now caught; html stays 7/7).

- COMPLETENESS.md corrections, all overclaims the review caught:
  - "per-token discharge by structural identity (match IS tokenPatternSource)"
    was false — the identifier match is widened to identPattern, and the census
    checks PRESENCE (a reachable non-root-scoped entry), not regex identity.
    Reworded to presence-not-identity.
  - "∀ G by structural induction" conflated the algebra CLOSURE (which is ∀ G)
    with the per-grammar DISCHARGE (executed on the shipped set). Separated.
  - "ordering is undecidable" was the project's own impossibility-without-proof
    trap: for a fixed G the pattern list is finite and the winner is a finite
    index read (gen-tm even sorts it deterministically). It is "not reachable by
    a corpus-free structural fold," a measurement limit, not undecidability.

Residual, honestly stated: the shape class is MEASURED not proven — the
robustness gate covers 3 of ~20 detectors, mutation testing mutates only flat
tokens (not shape regions), and detectAngleBracketCast keeps a latent
ref-factored fragility (type-cast fires on no shipped grammar). 81/81 · 40/40.
…posed (#51)

The shape-robustness gate covered 3 constructs (ternary / call / generic-type-
params). Extended it to 19 — one per gen-tm shape-detector — each asserting the
construct discharges its obligation for several EQUIVALENT factorings (canonical
/ opt-tail / alt-split / via-ref / trailing-comma). The builders were authored by
running the real emitter over each factoring; an `xfail` list records a factoring
a detector is known to drop, so a NEW drop OR a stale xfail (a landed fix) both go
red — never a silent false-green.

Building it out caught two residual fixed-shape fragilities (both latent — neither
construct fires on a shipped grammar — so both fixes are byte-identical on all six):

- detectAngleBracketCast dropped a cast whose `<Type>` head is its own rule (the
  `<`/`>` hidden behind a ref). Now resolves a ref to a `'<' type '>'` cast-head
  rule by name, mirroring detectCallExpression reaching its args through a rule.
- detectTypeParamConstraintKeywords extracted the constraint keyword only from a
  `?`-quantifier `[kw, ref]`, dropping `alt([…,kw,type],[…])` (optionality via an
  alt branch) and `opt(kw, sep(type))` (type behind a sep). Now reads the
  constraint as the optional `[kw, type]` segment by which one expandAlts branch
  extends a prefix-shorter sibling — uniform across opt / alt / sep, and a leading
  modifier (not a prefix extension) is still excluded. (A first attempt keyed on
  the follower being a type-flagged rule; the gate caught that it broke the
  canonical factoring when the constraint type is not type-flagged.)

19 constructs all green; the YAML/markup detectors are positive-controlled (their
fixed shapes are deliberate, so they guard the canonical form, not factorings).
40/40 · 96/96 completeness checks.
Rounds the gate to 20 constructs — one per shape-detector. detectJsx's
hasElementShape walks expandAlts branches for the `<`+ref element lead, so the
attribute list inline or wrapped in opt/alt both surface #jsx-element-in-
expression; verified robust. 97/97 completeness checks.
The soundness-ceiling audit (which found gen-tm has no expressiveness ceiling)
surfaced two over-accepts by derivation, not corpus — both real divergences from
the parser's scoping, both root-caused in gen-tm. Not byte-identical: the two
enum/namespace-bearing grammars (ts, tsx) change; the fix is verified by witness
+ gates + scope-gap, not by byte-identity.

- enum brace-leak: #enum-body was the lone emitBracketRegion call whose body
  lacked a self-balancing first include (its siblings #code-block /
  #declaration-body both carry one), so a nested `{…}` in a member initializer
  (`enum E { A = { x: 1 }, B }`) let the inner `}` close the enum region early and
  leak the following members out of enum scope. Prepend the shared #code-block
  balancing region (an inner brace is an ordinary expression block, not another
  enum body) so braces balance to any depth. `B` now scopes enummember; the
  shorthand-object key case stays correct.

- declaration-keyword over-accept: a `storage.type.*` keyword that is also a valid
  identifier (`module`, `namespace`, `type`, `interface`) was painted by the flat
  global match unconditionally, so `module.exports` / `namespace.foo` / `type.foo`
  mis-read the property-access base as the declaration keyword. Demote the whole
  class — the same contextual-keyword machinery gen-tm already uses for
  `as`/`keyof`/`public` — to an accessor-guarded `*-decl` rule (`(?!\s*(?:\.|\?\.))`),
  so a real declaration head still wins while a `.`/`?.`-adjacent base falls through
  to identifier scoping. Verified `module X {}` / `type X = …` / `namespace N {}`
  keep their keyword scope.

Both DERIVED (agnostic 9/9 — keyed on scope family + declaration/reserved sets, no
hardcoded word); npm run check 40/40; scope-gap ts unchanged (the corpus is blind
to both witnesses — exactly why the audit derived them).
Adds the symmetric half to the completeness spine: a "Soundness — no ceiling"
section stating the audited result that gen-tm has no TextMate expressiveness
ceiling on well-formed input — because its obligation space (lex-local +
delimiter-carried recursion, no general rule-context→scope channel) is itself
TextMate-bounded — with the honest limits (well-formed input only; strong
evidence + structural argument, not a closed-form ∀-grammar proof; the two
region-leak over-accepts the audit derived were fixable bugs, not ceilings).
@johnsoncodehk johnsoncodehk changed the title Prove TextMate-generator completeness (#51) Decidable completeness + a soundness-ceiling audit for the TextMate generator (#51) Jun 21, 2026
@johnsoncodehk johnsoncodehk merged commit 987847e into master Jun 21, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prove full completeness of the TextMate generator

1 participant