Skip to content

Rebuild cache: sound type-aware invalidation via two-layer scheme#87

Merged
johnsoncodehk merged 33 commits into
masterfrom
refactor/rebuild-cache
May 2, 2026
Merged

Rebuild cache: sound type-aware invalidation via two-layer scheme#87
johnsoncodehk merged 33 commits into
masterfrom
refactor/rebuild-cache

Conversation

@johnsoncodehk
Copy link
Copy Markdown
Owner

@johnsoncodehk johnsoncodehk commented May 2, 2026

Why

3.1.0's per-file mtime cache has two problems:

  1. Unsound for type-aware rules. Edit an ambient .d.ts and the consumer file's mtime hasn't moved — the previous run's diagnostics get re-served against types that have changed under them.
  2. No actual warm benefit on type-aware rules. The cache memoises diagnostic output but the lint loop still drives the type checker per file. On Dify (5867 files, one type-aware rule) master shows warm == cold == ~12s.

This PR rebuilds the cache as two layers — one per soundness model.

How

layer scope invalidation key
1 syntactic rules per-file mtime
2 type-aware rules TS BuilderProgram affected-file diff, persisted via TS internal tsBuildInfoText round-trip

A runtime probe on RuleContext.program flips a per-rule flag the first time a rule reads it, classifying it type-aware (sticky across sessions). Cache file at os.tmpdir()/tsslint-cache/<tsslint-version>/<ts-version>/<sha256(...)>.cache.json with atomic write + deep shape validation. Wrong miss > wrong hit on every failure path.

Layer 2's cross-session state rides on three TS internal APIs (getBuildInfo, createBuilderProgramUsingIncrementalBuildInfo, BuilderProgram.emitBuildInfo). Each is feature-detected + try/catched, with a yellow warn line on stderr if any are missing — naming the TS version, the missing API, and the consequence. Normal cold-start paths stay silent.

Cache logic lives entirely in packages/cli/lib/{cache,cache-flow,incremental-state}.ts. Core only exposes skipRules + hasFixForDiagnostic + getTypeAwareRules — it has no knowledge of mtime, BuilderProgram state, or on-disk format.

Numbers

Dify web/, 5867 files, react-x/no-leaked-conditional-rendering. /usr/bin/time -p, same machine, cache wiped before each cold measurement, three warm samples for stability.

run master (3.1.0) this PR delta
cold 12.45s 12.22s ≈ equal
warm 12.22–12.59s 5.96–6.97s ~2× faster
--force ≡ cold 11.91s ≈ master cold

The win is warm: master's mtime cache only memoises output, so type-aware rules still drive the type checker every run. Layer 2 short-circuits the type-aware lint pass when the file's transitive deps haven't moved. Cold matches master — the BP setup + drain is overhead-free in practice (drain uses ignoreSourceFile to skip diagnostic compute). Warm bottoms out at the LS Program build floor; daemonisation is the only path below ~6s for a cross-process CLI.

Other changes

  • --force — opts out by clearing the loaded cache, so the same code path runs cold. Summary surfaces (use --force to ignore cache) only when the cache had effect.
  • --list-rules — debug aid, prints rule classification (type-aware vs syntactic) after the run. Both section headers always emit even at count 0 so the format is stable.
  • Compiler options — CLI sets incremental: true + a synthetic tsBuildInfoFile so BuilderProgram.emitBuildInfo produces content. Verified via spike that this doesn't change any TS diagnostic output (semantic, syntactic, suggestion all identical with vs. without).
  • Hardening (post-review): worker module state cleared between projects, isCacheData deep-validates files entries, tsBuildInfoText capped at 64MB with warn on overflow, summary→info blank-line invariant pinned by unit tests.

Test plan

  • pnpm build clean
  • 151 cross-package checks pass — 8 suites: core/{builder-program-poc,probe,skip-rules} (30) + cli/{cache,cache-flow,incremental-state,integration,render} (121)
  • Self-lint clean: tsslint --project packages/{cli,core,config,types,compat-eslint,typescript-plugin}/tsconfig.json reports 0 messages across 51 files
  • Bench on Dify (numbers above), three warm samples for stability

The per-file mtime cache shipped in 3.1.0 is unsound for type-aware
rules — see commit 4f4f923 for context. The minimal patch in 4f4f923
restored the 3.0.4 invariant (don't cache type-aware rule results) but
type-aware rules then pay full cost on every run, which is the workload
TSSLint primarily serves.

Rather than evolve the existing tuple-shaped cache, remove it entirely
and rebuild from a sound foundation. Plan in `packages/cli/CACHE.md`.

This commit deletes:

- `packages/cli/lib/cache.ts` — load/save + path key derivation
- `Reporter.withoutCache()` from `@tsslint/types` — only meaningful
  when there's a cache to opt out of
- `core.FileLintCache` type and the `cache?` parameter throughout
  `core.lint`, `core.getCodeFixes`, `core.getRules`, `core.getConfigs`
- The getter probe + `typeAwareRules` set added in 4f4f923 — needed
  only because there was a cache to gate; bring it back in step 1
  of the new design
- The `cache-typeaware.test.ts` suite — testing removed code
- The `--force` CLI flag and its summary hint
- `minimatchResult` per-pattern cache — was useful only as a third
  tuple slot; cost is microseconds per file, can be reintroduced
  later as a flat top-level key if profiling justifies it

CLI smoke: `pnpm -r exec tsc --build` clean across all packages.
Compat-eslint pipeline + lazy-estree tests pass unchanged (the
`withoutCache()` mocks in compat-pipeline.test.ts are excess
properties on object literals; TS allows, runtime no-ops since
no caller invokes them after this commit).

Net: -442 LOC.
Captures the design for the cache rewrite in `packages/cli/CACHE.md`:

- Two layers gated by an `--incremental` flag
  - Layer 1: per-file mtime cache for syntactic rules, runtime
    getter probe to classify type-aware rules (sticky), persist
    classification across sessions. Always on.
  - Layer 2: `BuilderProgram`-based affected-file tracking for
    type-aware rule diagnostics, cross-session state via TSSLint-
    managed `.tsbuildinfo`. Opt-in until perf is measured.
- New cache file shape: object form with `version`, `ruleModes`,
  `files.{mtime, rules}`. Drops the 3.0.x tuple format and the
  `minimatchResult` slot.
- Implementation order, edge cases (incremental + noEmit + tsconfig
  collision, .tsbuildinfo format stability across TS majors),
  backwards compat (none — path key includes pkg.version), and
  open questions (default-on vs opt-in for layer 2, ruleModes
  expiry policy).

This is a working doc; expected to evolve as implementation lands.
Ten items to run through before touching cache code, grouped by
soundness / cache key / implementation / operational. Each item
corresponds to a real failure mode the old design hit or could have
hit (silent stale on global type changes, half-written JSON on
SIGINT, missing TS-version in cache key, etc.).

Also captures the realistic upper bound from profile data: cache
caps at ~40% wall-time reduction on warm runs because TS Program
build is unavoidable.
First step of the cache rebuild plan. Adds the foundation but no
cache wiring yet — see packages/cli/CACHE.md for the full design.

createLinter now:
- Accepts an optional `initialTypeAwareRules` iterable. The CLI will
  load this from the cache file's `ruleModes` map on session start
  so a rule classified type-aware in any prior session stays
  classified — closes the cold-session-with-stale-cache hole 3.0.4
  had where the first invocation of a session re-probed and could
  silently mismatch a stale cache entry.
- Wraps `rulesContext.program` in a getter that flips a per-rule
  `touchedProgram` flag. After each rule call, if the flag is set,
  add the rule to the linter's `typeAwareRules` set (sticky).
- Exposes `getTypeAwareRules(): ReadonlySet<string>` so the CLI can
  snapshot the classification at session end and persist into the
  cache file.

No behavior change to cache-less operation: probe runs but its
output isn't consulted yet. Layer 1 (next commit) is what gates
cache reads/writes on the classification.

Tests in packages/core/test/probe.test.ts (5 scenarios):
- syntactic rule not classified
- rule that reads program is classified
- classification sticks past the touching file (cross-file in same
  session)
- seeded `initialTypeAwareRules` preserved across sessions
- `getTypeAwareRules()` returns the live set
CACHE.md item 7 — spike the LS / BP integration before wiring it
through the CLI. Six scenarios:

  1. `builder.getProgram() === languageService.getProgram()` — no
     double-build when wrapping LS's program in a builder
  2. Cold first run flags every file as affected
  3. Edit a.ts → both a.ts and b.ts (importer) in affected set
  4. Identical re-run → no files affected (cache hit territory)
  5. Edit unrelated c.ts → only c.ts affected, NOT a.ts/b.ts
  6. Edit globals.d.ts → all consumers in affected set
     ↑ the motivating soundness case: ambient declarations don't
     appear in any file's `imports` but BuilderProgram's reference
     graph tracks them transitively. This is the gap layer 1
     alone cannot close.

13/13 checks pass. The integration model works. Layer 2 wiring
into the CLI lint flow is the next step.

Test runs in-process (oldProgram passed directly). Cross-session
state transfer via `.tsbuildinfo` is a separate concern handled
when wiring `incremental: true` into the LS's compilerOptions.
CACHE.md steps 2 + 3 of the rebuild plan: the cache file format and
its wiring into core.lint(). CLI integration (file mtime check,
load/save in the lint loop) is the next commit.

CLI side — packages/cli/lib/cache.ts:

- Cache file at os.tmpdir()/tsslint-cache/<tsslint-ver>/<ts-ver>/<key>.cache.json.
  TS version segment closes the .tsbuildinfo-format-cross-major hole
  flagged in CACHE.md item 4.
- Object-form CacheData { version, ruleModes, files{mtime, rules} }
  replaces the 3.0.x tuple form. `version: "v2"` lets us evolve the
  schema; mismatch returns empty cache.
- Atomic write via tmp + rename (CACHE.md item 6). Half-written JSON
  on SIGINT no longer corrupts the canonical file.
- Default hash is sha256 hex (was btoa, which produced filesystem-
  ENAMETOOLONG paths on real configs).
- Loader treats any uncertainty (parse error, shape mismatch, version
  drift) as a cache miss — CACHE.md item 2's "miss is always safe"
  invariant.
- Also: scope `files` in cli's package.json to explicit paths so the
  newly-introduced test/ doesn't leak to npm. Mirrors fix in e40bfbb
  for core, 0585586 for compat-eslint.

Core side — packages/core/index.ts:

- New types `RuleCache { hasFix, diagnostics }` and `FileLintCache =
  Record<ruleId, RuleCache>`. Per-file shape; CLI manages the file-
  level mtime/version/ruleModes envelope.
- lint() takes optional fileCache. Read path: hit only when entry
  exists AND rule is not in typeAwareRules — type-aware rules ignore
  any cache entry, matching CACHE.md's layer 1 invariant.
- Write path: report() pushes a serialization-friendly twin of the
  diagnostic (file: undefined, relatedInformation files reduced to
  { fileName }). After the rule call, if it touched program, the
  whole entry is deleted; otherwise hasFix is set if any fix was
  registered.
- Restored diagnostics on cache hit get rehydrated with the live
  ts.SourceFile from the current Program.

Tests:

- packages/cli/test/cache.test.ts (14 checks): roundtrip, missing
  file → empty, corrupted JSON → empty, version mismatch → empty,
  shape mismatch → empty, ts-version segregation, config edit
  invalidates path key, no .tmp leak after successful save.
- packages/core/test/cache-layer1.test.ts (18 checks): syntactic
  rule cached, type-aware rule not cached, cache hit skips rule
  re-execution, restored diagnostic has live file ref, report-then-
  touch deletes entry, sticky cross-file, initialTypeAwareRules
  ignores stale cache entry, hasFix flag, lint without fileCache
  still works (back-compat).

Existing tests pass unchanged (probe, builder-program-poc, compat-
pipeline, lazy-estree).
Walks back the cache-aware lint() introduced in 2284d7c. The cache
file format and lifecycle are CLI concerns; core's job is to run
rules. Future cache changes (BuilderProgram-based invalidation,
schema bumps) shouldn't churn this layer.

API changes on createLinter():

- `lint(fileName, options?: { skipRules?: ReadonlySet<string> })`
  Replaces the cache-mutation overload. Caller passes rule IDs to
  skip; core just doesn't run them. Caller is responsible for merging
  cached output back into the diagnostic list.
- `hasFixForDiagnostic(fileName, diagnostic): boolean`
  New per-diagnostic fix accessor. The CLI cache layer uses this to
  snapshot the `hasFix` flag for a rule's cache entry without reaching
  into core's internal `diagnostic2Fixes` map.
- Removed types: `RuleCache`, `FileLintCache`. Both move to the CLI
  cache module in the next commit.
- Removed the cache-write side of `report()`. Reports now only update
  the in-memory lint result; serialization for persistence is handled
  outside core.

What stays in core:
- The runtime probe + sticky `typeAwareRules` set + `getTypeAwareRules()`
  exposed for the CLI to persist into `ruleModes`.
- `initialTypeAwareRules` constructor param so a session starting cold
  can preseed classification from the cache file.

Tests:
- packages/core/test/cache-layer1.test.ts removed — that suite tested
  cache mutation through lint(), which is no longer core's job. The
  same scenarios are reformulated against the CLI cache layer in the
  next commit.
- packages/core/test/skip-rules.test.ts added (9 checks): rule in
  skipRules doesn't run, only specified rule is skipped, lint without
  options runs everything (back-compat), seeded classification persists
  through skipping, hasFixForDiagnostic returns true only for diags
  with registered fixes.

Existing probe.test.ts and builder-program-poc.test.ts pass unchanged.
Counterpart to the prior commit (eaabb4c) that pulled cache concerns
out of core. Lands the layer 1 logic in packages/cli/lib/cache-flow.ts:

  lintWithCache(linter, fileName, fileCache, fileMtime, program)
    → DiagnosticWithLocation[]

Responsibilities, all owned by this module:

  - mtime check: bumping the file mtime resets every rule entry
  - skip-set computation: cached entries minus type-aware rules
  - call into core's `linter.lint(fileName, { skipRules })`
  - group fresh diagnostics by `diagnostic.code` (the rule ID core
    sets in `report()`) — that's the attribution channel
  - update cache for rules that ran:
    - type-aware rules: delete the entry (sticky cleanup, covers
      report-then-touch and stale prior-session entries)
    - syntactic rules: write fresh entry, derive `hasFix` via
      `linter.hasFixForDiagnostic` for each emitted diagnostic
  - rehydrate cached diagnostics for skipped rules — replace the
    serialized `file: undefined` with the live `ts.SourceFile` from
    the current Program; same for `relatedInformation[].file`

Layer 2 (BuilderProgram-based affected-file invalidation) plugs in
here later: it'll narrow the skip set further by removing entries
the BuilderProgram says are affected. Core stays untouched.

Tests in packages/cli/test/cache-flow.test.ts (25 checks): syntactic
cached / type-aware not, cache hit skips rule, mtime change re-runs,
report-then-touch deletes entry, sticky across files, stale entry
ignored + cleaned, hasFix derivation, multi-rule partial cache.

Setup notes:
- Added `@tsslint/types` to cli's dependencies — needed for tests to
  type-resolve `Config` / `RuleContext` (cli only used them transitively
  via core before).
- Added types/tsconfig project reference to cli's tsconfig.

Total cache test surface across the rebuild: 8 + 9 + 13 + 14 + 25 = 69
checks. 144 across all suites.
End of CACHE.md step 4: cli/index.ts and worker.ts now go through
the cache-flow module. Per-project cache file is loaded at startup,
each file's lint pass uses cached entries when available, and the
file is written back at end of project.

Project class:
- `cacheData: cache.CacheData` (defaults to empty cache)
- `init()` calls `cache.loadCache(...)` unless `--force` is passed.
  Cache key includes ts.version now (the segment was added when
  the cache module landed; this is the first caller honouring it).

Lint loop:
- Per file: stat for mtime, get-or-create
  `project.cacheData.files[fileName]`, pass both to `worker.lint`.
- After all files in a project: snapshot `worker.getTypeAwareRules()`
  into `project.cacheData.ruleModes`, then `cache.saveCache(...)`.
- Diagnostic loop now also OR's in
  `fileCache.rules[diagnostic.code]?.hasFix` because cache-hit
  diagnostics don't re-register fixes through `linter.hasCodeFixes`.

Worker (lib/worker.ts):
- `setup` takes `initialTypeAwareRules: readonly string[]` and forwards
  to `core.createLinter`.
- `lint(fileName, fix, fileCache, fileMtime)` — both cache args now
  required. Replaces direct `linter.lint` calls with
  `cacheFlow.lintWithCache`. After a fix writes the file, mtime is
  refreshed via `fs.statSync` so the post-fix re-lint sees the new
  mtime and invalidates layer-1 entries.
- New `getTypeAwareRules()` wrapper for the CLI to snapshot.
- `--fix` interaction: before calling lintWithCache in the fix
  branch, drop cached entries for rules with `hasFix === true`.
  Their `getEdits` callbacks don't survive the JSON cache round-
  trip, so we have to actually run those rules to rebuild them.
  Rules with no fixes can stay cached even under `--fix`.

`--force` flag is back, behind a HELP-only documentation entry.
With `--force`, `loadCache` is skipped — the project starts with an
empty `cacheData` and all rules run fresh. The cache is still
written at end of run.

Smoke-tested on fixtures/define-rule/:
- Cold run: lints, writes cache file with the no-console diagnostic
  + hasFix=true.
- Warm run: cache hit on no-console; same diagnostic shown; `--fix`
  hint correctly surfaces (via the cached hasFix flag, since the
  rule didn't run this session).
- `--fix` cold + warm: both apply the fix correctly. The pre-call
  drop of hasFix entries forces those rules to re-run and rebuild
  their getEdits callbacks.

All 144 existing tests still pass (probe / skip-rules / builder-
program-poc / cache module / cache-flow / compat-pipeline).
Adds the affected-file gate that BuilderProgram (or any equivalent
tracker) feeds layer 2 with. New option on lintWithCache:

  options.typeAwareUnaffected: boolean
    true  → type-aware rules behave like syntactic ones for caching
            (cache hit allowed, cache write on run)
    false / undefined → mode A: type-aware rules always re-run, their
            entries deleted (current default behavior — safe when
            mtime alone can't catch ambient-declaration edits)

Behavior change is opt-in. Existing call sites pass no option and
keep the layer-1-only semantics. The CLI side that drives this
(creating BuilderProgram, walking getSemanticDiagnosticsOfNextAffected-
File, computing per-file affected sets, plumbing through worker.lint)
is the next commit.

Skip-set computation now considers the option:
  - syntactic + cached → skip
  - type-aware + cached + typeAwareUnaffected=true → skip
  - type-aware + cached + typeAwareUnaffected!=true → run (current)

Cache-write logic mirrors:
  - rule ran AND (syntactic OR typeAwareUnaffected=true) → write entry
  - rule ran AND type-aware AND not typeAwareUnaffected → delete entry

Tests in cache-flow.test.ts cover four new scenarios (10–13):
- typeAwareUnaffected=true → first run caches the entry
- typeAwareUnaffected=true → second run cache hit, rule skipped
- mode B → mode A: entry replaced by re-run, cleanup applied
- default / explicit false → mode A semantics preserved

36 cache-flow checks total.

Drive-by: drop unused `SerializedRelatedInfo` import in cache-flow.ts
(the user's earlier serialize-helper rewrite no longer references it).
CLI surface for layer 2. Adds the `--incremental` flag (in HELP
between --force and --failures-only) and threads it through to
worker.setup as a new boolean param. Worker creates a Semantic-
DiagnosticsBuilderProgram once at setup and walks `getSemantic-
DiagnosticsOfNextAffectedFile` to collect the affected file set.

worker.lint computes `typeAwareUnaffected` per file:
  !!affectedFiles && !fix && !affectedFiles.has(fileName)

  - falsey when --incremental wasn't passed (no BP, layer 1 only)
  - false in --fix mode (fix mutates files mid-session, the setup-
    time affected snapshot becomes stale for downstream files —
    conservatively treat all as affected)
  - true when BP says the file is unaffected this session

The flag is forwarded to cache-flow's lintWithCache option,
already implemented in 576641f.

Limitation, documented inline: without cross-session BP state
(.tsbuildinfo persistence), the first session's BP has no
oldProgram and considers every file affected on cold start. So
within a one-shot CLI invocation, `--incremental` is effectively
a no-op today. The wiring lands here so the cross-session work
can plug in without touching cache-flow or core.

Smoke tests on fixtures/define-rule (cold without flag, cold with
--incremental, warm with --incremental) all produce the expected
diagnostic. Existing unit tests untouched (155 checks across
six suites).
Spawns the local tsslint binary against a throwaway fixture in
os.tmpdir() and inspects diagnostics + cache state on disk. The
in-process cache-flow tests cover the module's invariants — these
exercise everything around it: argv parsing, mtime detection from
the real filesystem, --force / --incremental gates, the cache file
load/save round-trip, and Project's wiring of fileCache through
worker.lint.

Six scenarios:
  1. Cold run produces the no-console diagnostic and writes a cache
     file containing a rule entry for the fixture.
  2. Warm run produces the same diagnostic (cache hit doesn't lose
     data).
  3. Editing the linted file moves the cached `mtime` past the edit
     on the next run (mtime invalidation works through the CLI).
  4. Editing tsslint.config.ts mints a fresh cache file under a
     different path key (config mtime+size in the hash).
  5. --force still produces the diagnostic and exits non-zero.
  6. --incremental is accepted and writes a cache (the layer-2
     plumbing in 10f0f5c doesn't break anything).

Setup gotcha worth noting: macOS' /var symlinks to /private/var,
and the CLI canonicalises paths through realpath when keying the
cache. The fixture helper realpaths the temp dir up front so the
test's cache-file lookups line up with the keys the CLI writes.

Total cache test surface: 8 + 9 + 13 + 14 + 36 + 14 = 94 across
six suites, plus 75 in compat-pipeline.
Persists per-file state between `tsslint --incremental` runs so the
next session can compute which files' type-relevant inputs (own text,
transitive deps, ambient `.d.ts`, lib) have actually moved. Without
this, the prior wiring (10f0f5c) treated every file as affected on
cold start — wiring done but value was zero.

Why not `.tsbuildinfo`? TS reads/writes it via `getProgramBuildInfo` /
`createBuilderProgramUsingProgramBuildInfo`, both internal and TS-
major-coupled. Going through public `BuilderProgram.getAllDependencies`
+ a content-hash digest trades some hit rate (we invalidate on body-
only edits where TS's shape signatures wouldn't) for a contract that
survives TS upgrades.

New module — packages/cli/lib/incremental-state.ts:
- `IncrementalState` shape: `{ version: 'v1', files: { [name]:
  { contentHash, deps: string[] } } }`. Saved as a new optional
  `incrementalState` field on the cache file.
- `buildIncrementalState(builder, hash)` — harvest from a fresh
  BuilderProgram pass over the LS program (called once at end of
  project, after the lint loop).
- `computeAffectedFiles(prev, program, hash)` — diff: a file is
  affected if its own content moved, any dep moved, it's newly added,
  or anyone listed a removed file in deps. Stale-deps risk is bounded
  because dep-graph changes always move the file's own content
  (import statement edit), so they propagate via the own-hash path.

Worker:
- `setup` now takes `prevIncrementalState` and (when `--incremental`)
  computes the affected set via the new module instead of the cold-
  start "everything affected" placeholder.
- New `buildIncrementalState()` accessor, called from CLI at end of
  project to harvest fresh state for persistence.
- `defaultHash` fallback (sha256 hex via `node:crypto`) if `ts.sys.
  createHash` is unavailable on the host.

CLI:
- Reads `project.cacheData.incrementalState` and passes to
  `worker.setup`.
- After the lint loop, calls `worker.buildIncrementalState()` and
  writes back to `project.cacheData.incrementalState` before
  `cache.saveCache`.

Cache schema (lib/cache.ts):
- Optional `incrementalState?: IncrementalState` on `CacheData`.
  Layer-1-only sessions stay clean (field absent). Field's own
  `version` lets us bump the diff format without touching the
  cache schema.

Tests in packages/cli/test/incremental-state.test.ts (18 checks):
- No prior state → all affected
- Schema version bump → all affected
- Identical state → no user file affected (lib files mirrored in prev)
- Own content changed → just that file
- Dep changed (the `globals.d.ts` killer case) → all consumers affected
- Newly added file → affected
- Removed file → consumers affected (transitive impact propagates)
- Independent file change doesn't affect unrelated files

Smoke-tested on fixtures/define-rule with `--incremental`: cold run
writes 84 files × deps into the cache file; warm run reads the state
and produces the same diagnostic.

Total cache test surface now: 8 + 9 + 13 + 14 + 36 + 18 + 14 = 112
across seven cache-related suites, plus 75 in compat-pipeline.
Two bugs found via the layer-2 soundness integration tests:

1. Cold session under --incremental wasn't writing type-aware cache
   entries because the affected-file diff (correctly) lists every file
   as affected on first encounter (no prev state). The single-flag
   `typeAwareUnaffected` conflated "trust cache" with "write cache" —
   so layer 2 had nothing to read on the warm session.

   Split the option into two flags on cacheFlow.lintWithCache:
   - `incremental: boolean` — write type-aware entries (next-session
     bait). True ⇔ CLI passed --incremental.
   - `typeAwareUnaffected: boolean` — trust this file's cached
     type-aware entries (run-skip via skipRules). Only meaningful
     when `incremental` is true.

   cold session: incremental=true, unaffected=false → run + WRITE
   warm session unaffected: incremental=true, unaffected=true → SKIP
   warm session affected: incremental=true, unaffected=false → run + WRITE
   no flag: any combination → never write type-aware (mode A)

2. `BuilderProgram.getAllDependencies` walks the explicit reference
   graph (imports + /// references). Ambient `.d.ts` files in
   script mode (e.g. user's `globals.d.ts` declaring globals) connect
   only via global scope — no file imports them — so they're absent
   from every consumer's dep list. Editing one wouldn't propagate to
   the affected-file set; the killer layer-2 case fails silently.

   Fix in incremental-state.ts: enumerate non-lib script-mode `.d.ts`
   files at state-build time and treat them as universal deps. Lib
   files are excluded since `compilerOptions.lib` flips already
   invalidate the whole cache via the path key.

   Detect script mode via `externalModuleIndicator` (cast since field
   is technically internal in TS's public types, but stable at
   runtime — used by typescript-eslint and ts-morph for the same
   purpose).

Worker and cache-flow tests updated for the new option shape. Two new
cache-flow checks (38 total) cover the cold-session-writes-fresh-entry
case.

Three integration scenarios now passing end-to-end via the marker-
file fixture (rule writes its own execution count to disk, test reads
back to verify):
- Test 7: warm `--incremental` skips type-aware rule (cache hit)
- Test 8: editing `ambient.d.ts` forces dependent re-lint
- Test 9: without `--incremental`, type-aware rule re-runs every
          session (mode A semantics preserved)

22 integration checks total (was 14). All other suites unchanged.
Two layered fixes after the layer 2 wiring crashed `JSON.stringify`
on Dify (5867 source files, 16M+ transitive dep references → 80MB+
serialized payload, past V8's max string length).

1. Path interning. Every file path stored once in a `paths` table;
   the per-file entry uses integer indices in its `deps` list. On
   the same Dify program: ~5× compression vs the verbose form, and
   numbers serialize faster than strings.

2. Track only user files. node_modules and lib files aren't tracked;
   their changes don't propagate via this state. Trade-off: a
   `pnpm install` between `tsslint --incremental` runs serves stale
   type-aware results until `--force`. Acceptable for CLI use; lib
   changes are already covered by the cache path key
   (`compilerOptions.lib` participates in tsconfig+config mtime).

Schema bump to `v2` (cache file's `incrementalState.version`). Old
"v1" state files are treated as cache-misses on load — by-design,
since the path-list/dep-list shape is incompatible.

Smoke-tested on Dify web/ (5867 files, react-x/no-leaked-conditional-
rendering rule, --incremental):
- cold: 19.6 s, cache file 17 MB (vs 80MB+ pre-fix, would crash)
- warm (no edits): 11.9 s — ~1.6× speedup from layer 2 cache hits

Tests rewritten for the interned shape — 18 checks unchanged in
spirit (test 8 renamed from "hash collision in deps" to
"independent file change doesn't false-positive" — its actual
intent).
The previous approach (path-interned content-hash + dep graph in
68140be) shipped working but with two limitations:

  - 17MB cache file on Dify (vs 3.6MB now)
  - drops node_modules dep tracking to fit; ambient declarations
    needed an explicit special case in code

Both stem from rebuilding what TS already does internally for
`tsc --incremental`. Switching to the underlying internal API
(`getBuildInfo`, `createBuilderProgramUsingIncrementalBuildInfo`)
gets all of TS's machinery for free: shape signatures, ambient
detection, module augmentation, project references.

Mechanism: BuilderProgram emits its state via `emitBuildInfo` to a
synthetic `tsBuildInfoFile` path; we capture the text via the host's
`writeFile` callback and stash it in `IncrementalState.tsBuildInfoText`.
Next session reads it back through `getBuildInfo` and feeds the
result to `createBuilderProgramUsingIncrementalBuildInfo`, producing
an `oldBuilder` we pass to the new BP.

Risk surface (documented in incremental-state.ts):
  - These two functions are not in `typescript.d.ts`. Stable across
    TS 5.x → 6.x but no public contract. The cache file path key
    already includes `ts.version`, so version drift gives a clean
    miss instead of corruption.
  - On any deserialization failure (parse error, version mismatch,
    schema bump), `reconstructOldBuilder` returns `undefined` —
    caller treats as cold start. The "miss is always safe" invariant.

Schema bump to `v3` (the path-keyed `paths`/`files` structure is
gone — now just `{ version, tsBuildInfoText }`). Old `v2` state
files are treated as misses on load.

Worker side:
  - `getScriptVersion` now falls back to file mtime (was always '0'
    for unmutated files). Without this, BP couldn't tell that a
    file's content changed across sessions. In-session bumps from
    `--fix` still win — they reflect content changes the BP needs
    to see immediately.
  - Setup: when `--incremental`, override `compilerOptions.incremental`
    + `tsBuildInfoFile` (synthetic path). User's own `tsc --incremental`
    builds use a different path; no collision.
  - One BP (cached as `currentBuilder`) covers both the affected-file
    diff at setup and the buildinfo capture at end of session.

Dify-scale numbers (5867 files, react-x/no-leaked-conditional-rendering):

  cold (--incremental):  ~58s   ← +38s drain cost vs no-incremental
  warm (--incremental):   ~6s   ← BP says 0 affected, type-aware
                                   rules cache-hit
  no-incremental:        ~10s   ← layer 1 only

Cold-time penalty is real: BP's `getSemanticDiagnosticsOfNextAffectedFile`
forces full semantic-check work that TSSLint's regular lint pass
doesn't trigger. CI workloads that always cold-start should NOT use
`--incremental`. Repeat dev workflows pay it once and amortize over
N warm runs.

Tests:
  - incremental-state.test.ts rewritten for new shape (10 checks):
    state captured, version stamp, round-trip identical-program → 0
    affected, round-trip with edited dep → consumer in affected,
    undefined / version-mismatch / corrupted-text → cold-start fallback.
  - integration.test.ts: cache-shape assertion updated (look for
    `tsBuildInfoText`, not `files`/`paths`).
  - All other layer 2 integration scenarios (test 7-9: cache hit on
    warm, ambient-edit invalidation, mode-A re-run) pass unchanged.

Out: custom path table, content hashes, ambient-as-universal-dep
workaround, 200 LOC of dep-graph plumbing.
The drain pattern from 1c4c888 paid for full semantic diagnostics on
every affected file — TSSLint's lint pass already triggers semantic
checks lazily for the symbols type-aware rules query, so doing it
twice cost +38s on Dify cold (58s vs 10s baseline).

`getSemanticDiagnosticsOfNextAffectedFile` accepts an `ignoreSource-
File` callback. Returning true from it skips the diagnostic compute
for that file but the internal `getNextAffectedFile` walk continues —
graph propagation through the reference map still happens, signatures
still get computed during emitBuildInfo, the next session can still
diff. We just don't pay for diagnostics we never use.

Threading the affected-file recorder through the callback keeps the
shape we want (affected set in `affectedFiles`) and discards the
unused diagnostics.

Type wrinkle: the public type for `ignoreSourceFile` declares
`(sourceFile: SourceFile) => boolean`, but TS internally calls it
with the iterator's `affected` value — which can also be a Program
(whole-program affected path, e.g. lib flip). Discriminate at runtime
via the `fileName` field; cast through `SourceFile | Program`.

Dify-scale numbers (5867 files, react-x/no-leaked-conditional-rendering):

  before this commit:
    --incremental cold:  58.3 s
    --incremental warm:   5.8 s
    no --incremental:    10.3 s

  after:
    --incremental cold:  10.9 s   (+0.6 s vs baseline — noise)
    --incremental warm:   5.9 s   (1.7× speedup over baseline)
    no --incremental:    10.3 s

`--incremental` now strictly dominates layer 1 for warm and matches
it for cold. CI workloads no longer need to avoid the flag.

Existing tests pass unchanged (192 checks across 8 cache suites).
Cold-time penalty was the only reason `--incremental` had to be
opt-in. After c6cfbac brought it to +0.6s vs baseline (noise),
there's no longer a reason to keep two paths.

`--force` already serves as the opt-out — it skips the cache load,
so the next invocation starts cold (no prev state for layer 2 to
diff against, type-aware rules re-run, fresh state written for the
session after that).

Changes:
- HELP: drop the `--incremental` line.
- CLI: drop the `process.argv.includes('--incremental')` check;
  always pass through to worker.
- Worker `setup`: drop the `incremental: boolean` parameter.
  `compilerOptions.incremental` + `tsBuildInfoFile` always set
  (overriding the user's tsc-side values, which use a different path).
- Worker `lint`: simplify — `affectedFiles` is always populated; drop
  the conditional that derived `incremental` from its presence.
- `cache-flow.lintWithCache`'s `incremental` option stays — the
  module is library-level and still supports mode A for tests /
  future callers.

Tests:
- integration.test.ts test 6: rephrase ("incrementalState always
  persisted" instead of "--incremental accepted").
- Tests 7-8: drop the redundant `--incremental` arg.
- Test 9: repurposed from "without --incremental, type-aware rule
  re-runs" (no longer reachable from CLI) to "with --force,
  type-aware rule re-runs every time" — same invariant, exposed via
  the supported opt-out.

Mode A semantics still tested at the cache-flow unit level (38
checks unchanged in cache-flow.test.ts).

192 checks across 8 cache-related suites all pass.
Lost in the cache-removal refactor (47f5410) and never put back when
the cache returned. Now that `--force` is the documented opt-out
from layer 2 (d303aba), the hint is genuinely useful — without it,
users hitting stale cache results have no surfaced way to clear it.

Tracking: a file counts as cache-hit if its prev `mtime` matches
the current stat AND there's at least one rule entry recorded for
it. Approximation, not soundness — layer 2's BP may still re-run
type-aware rules if their deps moved, but the user-visible signal
just answers "did the cache have something for this file."

Output now reads:
  cold: "1 message (use --fix to apply fixes)"
  warm: "1 message (use --force to ignore cache, --fix to apply fixes)"
  with --force: "1 message (use --fix to apply fixes)"
Debug aid for understanding which configured rules went down the
type-aware path vs ran as plain syntactic. Reads from the cache
data we already accumulate (`ruleModes` for type-aware, per-file
`rules` keys for the rest), groups + sorts for stable output.

Sample on Dify (5867 files, react-x rule):
  type-aware (1)
    react-x/no-leaked-conditional-rendering

Sample on fixtures/define-rule (no-console, no program access):
  syntactic (1)
    no-console/default

Multi-project runs union the type-aware set across projects so a
rule classified type-aware in one project doesn't show up as
syntactic from another.
Previous behaviour skipped a section header when its count was 0,
so a project with only type-aware rules formatted differently from
one with both kinds. Always print both headers; visually consistent
across runs.

Output now reads, regardless of which side is empty:
  type-aware (1)
    react-x/no-leaked-conditional-rendering
  syntactic (0)

Drops the special-case "(no rules ran)" line — if both counts are
zero the user sees `type-aware (0)` / `syntactic (0)`, which is
already self-explanatory.
`renderer.summary` didn't update lastWasContent, so the blank-line
separator before --list-rules output only appeared when diagnostics
happened to print earlier in the run. On a clean project (e.g. Dify,
where the only rule is type-aware and there are no findings) the
separator vanished, making the section visually blur into the
summary line.

Mark summary as content so any follow-up info() — currently just
the --list-rules block — gets the same separator that follows a
diagnostic block. Non-TTY output is unchanged (info still skips the
blank line outside a TTY).
Layer 2 leans on three TS internal APIs:
  - ts.getBuildInfo (load path)
  - ts.createBuilderProgramUsingIncrementalBuildInfo (load path)
  - BuilderProgram.emitBuildInfo (save path)

Load was already wrapped in try/catch — if a future TS renames or
removes the loader pair, reconstructOldBuilder returns undefined
and the run starts cold. Save was unguarded: missing emitBuildInfo
would throw a TypeError after lint already finished, surfacing as
an unhandled rejection that takes down the CLI and discards
diagnostics for the run.

captureIncrementalState now feature-detects emitBuildInfo, wraps
the call in try/catch, and returns undefined on either path.
The caller (worker.buildIncrementalState → cli persistence) already
treats undefined as "layer-1-only this session", so the next run
just rebuilds layer 2 from a cold start. Wrong miss > wrong hit.

Tests cover both directions — Proxy-stub TS without the loader
APIs; fake BuilderProgram without emitBuildInfo and one whose
emitBuildInfo throws.
Silent fallback meant a TS upgrade that drops getBuildInfo /
createBuilderProgramUsingIncrementalBuildInfo / emitBuildInfo
would degrade tsslint to layer-1-only without telling anyone — the
run would just get slower across sessions, with no obvious cause.

Now all four failure paths print a yellow `warn` line on stderr
naming the TS version, the missing or failing API, and the
consequence (cache disabled this session / not persisted for next):

  warn  TypeScript 6.0.3 is missing internal APIs (getBuildInfo /
        createBuilderProgramUsingIncrementalBuildInfo). Type-aware
        cache disabled — every type-aware rule will re-run from
        cold start.

  warn  TypeScript 6.0.3 BuilderProgram is missing emitBuildInfo.
        Type-aware cache cannot be persisted — next run will start
        cold.

  warn  Could not persist incremental state on TypeScript 6.0.3:
        <error>. Type-aware cache cannot be persisted — next run
        will start cold.

  warn  Could not load previous incremental state on TypeScript
        6.0.3: <error>. Type-aware cache will rebuild from cold
        start.

Normal cold-start paths (no prev cache; cache schema bumped) stay
silent — only actual incompatibility / failure warns. stderr keeps
it out of the renderer's stdout flow but visible to anyone who
hasn't redirected stderr.

`captureIncrementalState` takes ts.version as an explicit param
now so the helper stays free of `import 'typescript'` (it works
with a typeof-only import). Tests verify each warn fires with the
right substring and that silent paths produce no stderr.
tsslint's own dogfood lint flagged these as
@typescript-eslint/no-unnecessary-type-assertion:

  - probe.test, skip-rules.test: `((rctx: RuleContext) => { ... }) as any`
    on rule values typed against `Config`. The arrow function already
    matches `Rule = (ctx: RuleContext) => void`; the `as any` was
    leftover scaffolding from earlier iterations.
  - incremental-state.test: `new Proxy(ts, ...) as typeof ts`. Proxy's
    constructor preserves the target type; the cast is a no-op. Same
    file: `stale as any` on a `{ version, tsBuildInfoText }` object that
    already structurally satisfies `IncrementalState`.

Applied via `tsslint --fix`. All 133 cross-package checks still pass.
Multi-project runs reuse the same in-process worker. setup() reset
linterHost back to originalHost but left snapshots / versions /
affectedFiles / currentBuilder populated from the previous project.
Two consequences:

  - memory leak: cross-project file paths accumulate in snapshots
    and versions for the lifetime of the process. Short-lived CLI
    today, but a future daemon mode would grow unbounded.
  - latent correctness bug: if two projects share an absolute file
    path (uncommon but possible with monorepo workspace links),
    affectedFiles from project A could mis-classify the same path
    in project B as cache-hit-eligible.

Clear all four at the top of setup(). projectVersion/typeRootsVersion
keep bumping for LS-cache invalidation as before.
isCacheData was a shallow check — top-level keys only. A corrupt
files entry (mtime as string, rules as null, missing keys) would
pass the gate and surface later as a TypeError mid-lint, with no
actionable signal to the user.

Walk Object.values(files) and verify each entry has a numeric
mtime + non-null rules object. Same for incrementalState if
present (must be {version: string, tsBuildInfoText: string}).
Cost is one full traversal at load time — negligible against the
cost of a confusing crash.

Tests cover both inner-shape mismatches (mtime wrong type, rules
null) and incrementalState corruption (wrong type, missing field).
The TS internal buildinfo format is compact (~3.6MB on Dify's 5867
files), but pathologically large monorepos could in theory grow
past V8's max-string limit, or just make the surrounding
JSON.stringify of cacheData feel sticky on every save.

Hard cap at 64MB (~10–15× headroom over Dify scale, comfortably
below V8's max). If captureIncrementalState gets a payload over
that, warn + return undefined — caller persists no incrementalState
this session, next run starts cold for layer 2 (layer 1 mtime
cache still works). Wrong miss > wrong hit.
The summary→info blank-line invariant (recently fixed in 8995ed4)
was only verified by integration tests eyeballing subprocess
stdout. Cover it directly so the formatting rules can't drift
silently.

Patches process.stdout.write + process.stdout.isTTY around each
case to exercise both the TTY-mode separator path and the
non-TTY clean-log path. Re-requires the render module per case
so the closure-captured isTTY reflects the current value.

Verifies all four state transitions (info→info, diagnostic→info,
summary→info, summary→info→info), TTY vs non-TTY for each, and
edge cases (empty summary is a no-op, diagnostic indents only in
TTY).
3.1.0 (e05416f) shipped Reporter.withoutCache() as the per-diagnostic
opt-out for findings whose correctness depends on inputs the cache
doesn't track. Dropping it in 47f5410 silently broke any rule that
relied on it — the call site would hit a TypeError at runtime, or
fail to type-check if written in TS.

Restore the API with semantics that match the original:

  - Diagnostic is still returned for the current run.
  - cache-flow filters it out before serialising the rule entry to
    disk, so the next warm hit on this file won't replay it (rule has
    to re-run to surface it again).

Marker is a Symbol.for('@tsslint/no-cache') stamped on the diagnostic
in core. Symbol-keyed → invisible to JSON.stringify and to
{...spread}, so it doesn't leak into the on-disk cache. Test 17
covers that explicitly.

README's "Caching" section is rewritten to describe the new two-layer
model and explain when withoutCache() still makes sense (external
inputs / fs-driven side reads); for cross-file types, the recommended
path is to read ctx.program once so layer 2 handles invalidation.

Tests:
  - cache-flow.test #15: marked diagnostic returned but not persisted
  - cache-flow.test #16: warm replay drops the marked one
  - cache-flow.test #17: marker doesn't leak through serialisation
Document the runtime probe's behavior on rules that file-shape-filter
before reading ctx.program. Two scenarios that came up during review
but weren't directly covered:

  - probe.test #5 (regression): early-return file processed FIRST
    in a session leaves the rule unclassified for that invocation,
    but the next file that does touch program flips classification
    mid-session, and it sticks even on subsequent early-return calls.

  - cache-flow.test #17 (regression): in the same scenario, the
    early-return file's cache entry replays cleanly on a later warm
    hit — even after the rule is globally classified type-aware.
    Sound because the early-return path's output is a deterministic
    function of file text alone (mtime catches changes), and the
    type-aware path simply never executes for that file.

Both pin the user-facing invariant: "a rule that does file-type
filtering before accessing program won't be silently mis-classified
in a way that produces stale cached diagnostics."
@johnsoncodehk johnsoncodehk merged commit f6026c3 into master May 2, 2026
1 check passed
@johnsoncodehk johnsoncodehk deleted the refactor/rebuild-cache branch May 27, 2026 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant