Skip to content

Commit a5b1907

Browse files
committed
feat(strict-typing-luau): add --!strict conversion skill and eval harness
Add the strict-typing-luau project skill, which converts Nevermore Luau files to --!strict following the repo's typing conventions (export-type pattern, dot-syntax methods, ServiceBag services, sanctioned :: any escapes with a bare-any default, and precise Rx/Promise return typing). Ships with a deterministic eval harness under evals/: a no-LLM gold smoke test with cases mined from main history, worker-eval primitives, judge-based tests for description triggering, tooling-scope, and parallelism, an intra-package dependency planner with per-node model routing, and a whole-package conversion driver. Validated by converting the full settings package (9 files) to strict and clean.
1 parent 08fb7f9 commit a5b1907

18 files changed

Lines changed: 1324 additions & 0 deletions

File tree

Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
---
2+
name: strict-typing-luau
3+
description: Convert a Nevermore Luau file from --!nonstrict (or untyped/--!nocheck) to --!strict, adding the project's explicit type annotations and fixing every type error the checker reports. Use this whenever the user asks to "strictly type", "add types to", "make strict", "type-annotate", "convert to --!strict", or clean up the typing on a .lua/.luau file in this repo — including when they just select a file with a `--!nonstrict` header and say "type this" or point you at a legacy module. Also use it when a strict file is throwing luau-lsp type errors and the user wants them resolved following Nevermore conventions.
4+
---
5+
6+
# Strictly typing Luau files
7+
8+
Convert a file to `--!strict` and make it pass the type checker cleanly. This is mostly
9+
**mechanical pattern application** — move fast and correctly. The checker is your fast
10+
feedback loop; lean on it instead of guessing.
11+
12+
## Triage first — match effort to the file
13+
14+
- **Plain util / data module** (`local X = {}` of functions, or a `return {...}` table; no
15+
`setmetatable`): flip the header, give every function typed params/returns, run single-file
16+
analyze. Usually no `export type` block needed.
17+
- **Class** (`setmetatable({}, ...)` + constructor + methods): the real work — needs the
18+
`export type` block. Follow the patterns below and `references/conventions.md`.
19+
- **Already `--!strict` but erroring**: skip conversion; just resolve the reported errors.
20+
21+
## Why it's not find-and-replace
22+
23+
Luau can't infer fields through `setmetatable`, so strict mode turns that blind spot into
24+
errors. The fix is an explicit `export type` block enumerating every `self` field, plus
25+
dot-syntax methods that name `self`. Flipping the header without these just produces a wall
26+
of errors — supplying the types the checker can't infer *is* the job.
27+
28+
## The verification loop (fast inner loop)
29+
30+
`luau-lsp analyze` is the same engine as `lint:luau`, pointed at one file (~2.5s). One-time
31+
setup if `sourcemap.json` / `globalTypes.d.lua` are missing at repo root: `npm run prelint:luau`.
32+
33+
```bash
34+
luau-lsp analyze --sourcemap=sourcemap.json --base-luaurc=.luaurc \
35+
--defs=globalTypes.d.lua --flag:LuauSolverV2=false --ignore='**/node_modules/**' \
36+
src/<package>/src/<Realm>/<File>.lua
37+
```
38+
39+
Clean = only the `[INFO] Loading...` line. `LuauSolverV2=false` is required (repo pins the old
40+
solver). Iterate until clean, then run `npm run lint:luau` **once** as the final gate — single-file
41+
analyze can't see files that depend on *yours*, and tightening a type ripples to subclasses and
42+
callers. Triage new downstream errors: pre-existing → leave & flag; your type is genuinely too
43+
tight for the real contract → loosen *your* type (often `T?` not `T`); a small obvious follow-on
44+
→ fix it.
45+
46+
## Core patterns
47+
48+
**Class with a parent (the common case):**
49+
50+
```lua
51+
--!strict
52+
local require = require(script.Parent.loader).load(script)
53+
54+
local BaseObject = require("BaseObject")
55+
56+
local MyClass = setmetatable({}, BaseObject)
57+
MyClass.ClassName = "MyClass"
58+
MyClass.__index = MyClass
59+
60+
export type MyClass = typeof(setmetatable(
61+
{} :: {
62+
_serviceBag: ServiceBag.ServiceBag, -- EVERY self field, with its type
63+
_enabled: ValueObject.ValueObject<boolean>,
64+
},
65+
{} :: typeof({ __index = MyClass })
66+
)) & BaseObject.BaseObject -- intersection pulls in inherited _maid, _obj, etc.
67+
68+
function MyClass.new(serviceBag: ServiceBag.ServiceBag): MyClass
69+
local self: MyClass = setmetatable(BaseObject.new() :: any, MyClass)
70+
self._serviceBag = assert(serviceBag, "No serviceBag")
71+
return self
72+
end
73+
```
74+
75+
**Methods use dot syntax with explicit `self`** (colon syntax loses the `self` type in strict
76+
mode). Callers still write `obj:Method()`; only the definition changes:
77+
78+
```lua
79+
function MyClass.GetEnabled(self: MyClass): boolean
80+
return self._enabled.Value
81+
end
82+
```
83+
84+
## The export type rule — always `typeof(setmetatable(...))`, never hand-list methods
85+
86+
There is **one** way to write a class's export type, and it holds for generic, inherited, and
87+
dynamic-`__index` classes alike:
88+
89+
```lua
90+
export type MyClass = typeof(setmetatable(
91+
{} :: { ...only the instance FIELDS... },
92+
{} :: typeof({ __index = MyClass })
93+
)) [& Parent.Parent]
94+
```
95+
96+
Methods come from `typeof({ __index = MyClass })` — never hand-write `Method: (self, ...) -> ...`
97+
records (they drift and cost enormous churn on big classes). Inheritance: list only the child's
98+
*own new* fields; inherited ones arrive via `& Parent.Parent`. Generics: keep `<T>` load-bearing
99+
by putting it in a field (e.g. a virtual `Value: T`); never collapse `<T>` to `any`. The one tax:
100+
a metatable'd type needs `self :: any` for dynamic self-access (`rawget(self :: any, k)`).
101+
102+
## When `:: any` is acceptable
103+
104+
Confined to boundaries the checker genuinely can't model:
105+
106+
- `setmetatable(ParentClass.new(...) :: any, MyClass)` — the metatable transform
107+
- `Binder.new("Tag", MyClass :: any) :: Binder.Binder<MyClass>` — binder registration
108+
- `local t: any = require("t")``t` (and the rare library like it) isn't strict-friendly
109+
- **Rx / reactive chains** (`:Pipe({...})`, `switchMap`/`map` closures, `RxSignal`, `Signal.new()`)
110+
— the #1 time-sink. Don't try to thread types through them; cast to `any` on the *first* analyze
111+
error and move on, keeping the public return type precise. See `references/rx.md`.
112+
113+
**Cast the chain body, never the public return type.** Type every public method's return precisely
114+
(`Promise.Promise<T>`, `Observable.Observable<T>`); only the internal *expression* producing it may
115+
be `any`. Casting the return type itself is the single biggest source of avoidable looseness — a
116+
`GetX(self): any` leaks `any` to every caller. See the Rx/Promise examples in `references/conventions.md`.
117+
118+
Reaching for `:: any` inside a method body means the `export type` block is wrong — fix it there.
119+
A call-site `(x :: any):Method()` to dodge a *signature mismatch* is a smell: usually the upstream
120+
signature is too strict (a param that should be optional) — fix it upstream if in scope, else flag
121+
it. Don't bury a real type bug under a cast.
122+
123+
## Hard cases — try the precise type first, loosen the narrowest spot only
124+
125+
Concrete metatable types (`Signal.Signal<T>`, `ValueObject.ValueObject<T>`, `Maid.Maid`) import
126+
and check cleanly almost always — write them out. Only these justify deviating:
127+
128+
1. **A type you cannot import** (sibling is still nonstrict, exports no type) → write a **precise
129+
structural interface** of the exact surface you use (real signatures, real field types) — this
130+
is typing by another means, not a loosening. Better, if cheap: add the `export type` upstream.
131+
2. **A heavy cyclic generic** the old solver can't hold (a deep class wrapped across many
132+
members) → type each occurrence as `any` with JUST the intended type in a trailing comment:
133+
`_indexObservers: any, -- ObservableSubscriptionTable<T?>`. Find the one culprit type and sweep
134+
**all** its occurrences in one pass (fields, params, **returns**), don't bisect.
135+
3. **Require cycle** between a class and its definition/factory → hoist shared shapes into a
136+
types-only `<Package>Types.lua` (requires nothing at runtime, so importing it can't recreate the
137+
cycle). Keep the public API precise; only the internal back-reference may give.
138+
139+
Time-box it: if a file won't go clean within ~2 iterations on the *same* cyclic/"too complex"
140+
error, take the escape for that member and move on. The public API must stay precise; internal
141+
plumbing can absorb imprecision.
142+
143+
## Comment discipline
144+
145+
**Default: write loose `any`s BARE — no explanatory comment.** Write `_cmdrService: any,` — NOT
146+
`_cmdrService: any, -- CmdrServiceClient (nonstrict, no exported type)`. This holds for *every* `any`
147+
whose source module simply exports no type yet (the common case). The comment reads as "blessed,
148+
intentional" and discourages the next person from tightening it — and we *want* these `any`s gone. A
149+
bare `any` is honest debt; a commented one looks finished. Don't explain yourself.
150+
151+
Only two exceptions earn a comment:
152+
- **A forced cyclic-complexity `any`** (Hard case #2) records its intended type so it's recoverable:
153+
`_indexObservers: any, -- ObservableSubscriptionTable<T?>`. The comment is *information*, not an apology.
154+
- **A non-obvious deliberate structure** (e.g. a structural type dodging a require cycle) gets a line
155+
so no one "fixes" it back into the cycle.
156+
157+
If you're tempted to comment an `any` for any other reason — don't.
158+
159+
## Report your hand-offs
160+
161+
End with a short **"Hand-offs"** list: every spot where you took an escape instead of a precise
162+
type — file, member, the `any`/structural fallback, and the intended type. Skip routine sanctioned
163+
boundaries (metatable casts, `t: any`, binder registration). This is the deliverable that lets the
164+
user tighten later; a run that loosened things silently is not done.
165+
166+
## Pitfalls
167+
168+
- **`--!strict` must be the file's FIRST line** — before the `--[=[ ]=]` docstring. Luau only honors
169+
a mode directive at the very top; placed after the docstring it's silently ignored and the file is
170+
never strict-checked (analyze looks "clean" but isn't).
171+
- Don't leave `--!nonstrict`/`--!nocheck` "for safety" — the point is `--!strict`.
172+
- Don't run repo-wide `lint:luau` on every edit — single-file analyze is the inner loop, `lint:luau`
173+
is the final gate.
174+
- Don't invent fields — the `export type` block must list exactly what `self` gets.
175+
- Keep legacy/deprecation docstrings; typing doesn't change deprecation status.
176+
177+
## Tooling
178+
179+
- `references/conventions.md` — full canonical examples + a common-error → fix table.
180+
`references/rx.md` — the Rx/reactive escape in detail.
181+
- **Converting a whole package?** Don't eyeball the order. Run the planner for a
182+
dependency-ordered, model-routed conversion plan — which files, in what order, and which form
183+
cyclic clusters to convert together:
184+
185+
```bash
186+
# from the repo root (the script cd's to the repo top itself, so CWD doesn't matter):
187+
.claude/skills/strict-typing-luau/evals/lib/run.sh plan src/<package> # e.g. src/settings
188+
```
189+
190+
It defaults to **real mode** (targets the files that aren't `--!strict` yet); convert leaves-first
191+
in the order it prints. (`--eval-gold` switches it to the harness's gold-bearing view.)
192+
- **Parallelize at scale.** Files in the same dependency layer of the plan are independent — they
193+
don't require each other, so they can convert concurrently. When a package has **more than ~3
194+
files to convert**, fan out **one sub-agent per file within a layer** (launch them in a single
195+
message so they run in parallel), wait for the layer, then move to the next. Each sub-agent
196+
converts ONE file against its already-converted deps. This makes wall-clock ≈ the slowest file per
197+
layer instead of the sum, and keeps each context small. At ≤3 files, just convert sequentially —
198+
the fan-out overhead isn't worth it.
199+
- That same **`run.sh`** with no arguments lists every command. `plan` is the one generally useful
200+
for real conversions; the rest (`gold`, `convert`, `place`/`score`/`restore`, `triggers`) *evaluate*
201+
the skill — see `evals/README.md` for those workflows.
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# strict-typing-luau — mechanical eval harness
2+
3+
A fast, deterministic, **no-LLM** way to validate the skill. Cases are mined from **main
4+
history**: real nonstrict→strict conversions the maintainers shipped. Each case stores the
5+
pre-conversion blob (`input`) and anchors `gold` to **main HEAD**, so single-file analyze always
6+
matches the main-built sourcemap (no staleness). See `manifest.json`.
7+
8+
## The checks
9+
10+
### 1. Gold smoke test (~20s, no tokens) — validates the harness itself
11+
12+
```bas
13+
bash evals/lib/run.sh gold
14+
```
15+
16+
Scores the maintainer's gold (main) for every case. Every positive must come back
17+
`strict=true, analyze=0`; negatives must be `strict=false`. If this is green, the pairs, the
18+
scorer, and the plumbing are sound. Run it after editing the manifest or scorer.
19+
20+
### 2. Worker eval — measures an actual skill run
21+
22+
Per case, drive the worker (an agent running the skill) with these primitives:
23+
24+
```bash
25+
bash evals/lib/run.sh place <case_id> # lay the nonstrict INPUT at the file's real path
26+
# ... run the skill on that path to convert it in place ...
27+
bash evals/lib/run.sh score <case_id> # -> {strict, analyze_errors, any, any_gold}
28+
bash evals/lib/run.sh restore <case_id> # undo: restore the package to main
29+
```
30+
31+
A conversion **passes** when `strict=true`, `analyze_errors=0`, and `any_nonrx <= any_gold_nonrx`
32+
(no looser than the maintainer's output, *excluding* Rx-chain casts — those are deliberate policy,
33+
see `references/rx.md`). `score.sh` reports both the total `any` and the budgeted `any_nonrx`. Run
34+
`npm run lint:luau` once at the end as the downstream gate.
35+
36+
**Whole-package runs.** Cases sharing an `input` commit belong to one package conversion — e.g.
37+
the six `settings-*` cases (all `input: b1bc1512aa`, `src/settings`, the `PlayerSettings` package
38+
the maintainers strict-typed in PR #550). `place` any one of them and the whole package drops to
39+
its untyped state; the agent then converts every file (leaves first — see their manifest order),
40+
and you `score` each case. If the agent **added** a file during conversion (a shared `Types.lua`),
41+
`run.sh sync` before scoring so single-file analyze can resolve it, then `restore` + `sync` again
42+
to return the sourcemap to main. (Single-file conversions that add nothing don't need `sync`.)
43+
44+
### 3. Triggering test (~5s, one judge call) — validates the `description`
45+
46+
```bash
47+
bash evals/lib/triggers.sh
48+
```
49+
50+
Does the skill's frontmatter `description` fire on the right prompts and stay quiet on near-misses?
51+
Cases live in `triggers.json` (positives + hard negatives like TypeScript-strict, tsconfig-strict,
52+
formatting). A judge model (`claude -p`) sees only the *live* description from SKILL.md and each
53+
prompt, returns fire/skip per prompt, and we score against `expect`. The judge reads the current
54+
description, so this catches a description edit that quietly breaks (or over-broadens) triggering.
55+
Add a case = one line in `triggers.json`.
56+
57+
### 4. Tooling-scope test (~6s, one judge call) — guards against over-orchestration
58+
59+
```bash
60+
bash evals/lib/run.sh tooling
61+
```
62+
63+
Now that SKILL.md surfaces the planner and harness, a single-file worker could wrongly run `plan`
64+
or the eval harness instead of just converting its file. A judge reads the *live* SKILL.md and
65+
classifies each task in `tooling.json` as `planner` (whole-package only) or `direct` (single file /
66+
scoped node / error-fix), scored against `expect`. Catches a SKILL.md edit that over-broadens the
67+
planner's apparent scope. Add a case = one line in `tooling.json`.
68+
69+
### 5. Parallelism test (~9s, one judge call) — execution strategy at scale
70+
71+
```bash
72+
bash evals/lib/run.sh parallelism
73+
```
74+
75+
A judge reads the live SKILL.md and classifies each `parallelism.json` task as `parallel` (fan out
76+
one sub-agent per file within a dependency layer, for a package > ~3 files) or `sequential` (single
77+
file / handful / error-fix). Guards the real-use regression where the agent converted a whole package
78+
serially because the skill never told it to parallelize.
79+
80+
## Design notes
81+
82+
- **Files are placed at package granularity** (`src/<pkg>`): a package is the unit of
83+
type-consistency, so a cyclic file scored against nonstrict siblings won't error falsely.
84+
- **`restore` unstages and cleans** gold-only new files (e.g. a new `*Types.lua`) so cases can't
85+
leak into each other.
86+
- **Budget enforcement belongs here, not in the skill prose.** Cap the worker's tokens/iterations
87+
in the runner; if it can't finish a case in budget, that's a recorded failure, not a reason to
88+
add more instructions to SKILL.md.
89+
- **Negatives** come from the revert commit `9027756cd7` (a naive flip that broke downstream and
90+
was reverted). They exist to prove the downstream `lint:luau` gate fires.
91+
- **Model routing (per node, not per file).** `plan` tags each unit with a suggested worker model:
92+
**opus** for class / cyclic-cluster / Rx-importing nodes (precision + comprehension matter),
93+
**sonnet** for mechanical leaf/util nodes (speed is free, little to cast loosely). Backed by
94+
probes on a hard Rx class: Sonnet is fastest (3.1 min vs Opus 5.5) but casts ~3× more `any`
95+
(34 vs 12) — fine where looseness is cheap, not where it isn't. **Do not** split one hard file
96+
across models as a Sonnet-draft→Opus-polish pipeline: measured 2× the cost of Opus-alone with no
97+
precision gain (the polish pass re-comprehends from scratch; post-hoc `any`-tightening is harder
98+
than typing precisely up front). Route whole *nodes*, never halves of a file.
99+
- **`any`-budget gate** (for a Sonnet-first driver): after scoring a node, if `any_nonrx >
100+
any_gold_nonrx + slack`, re-run that node on Opus. Uses the budgeted count, so heavy Rx casting
101+
(sanctioned) never trips the gate — only genuine non-Rx looseness does.
102+
103+
## Adding a case
104+
105+
Pick a file currently `--!strict` on main, find the commit that introduced it, and use its parent
106+
as `input` (both must be reachable from main):
107+
108+
```bash
109+
f=src/<pkg>/src/<Realm>/<File>.lua
110+
c=$(git log -S'--!strict' --reverse --format='%H' -- "$f" | head -1)
111+
git merge-base --is-ancestor "$c" main && echo "on main: input=$(git rev-parse --short "$c^")"
112+
```
113+
114+
Add `{ id, archetype, path, input: "<C^ sha>", gold: "main", polarity: "positive" }` to
115+
`manifest.json`, then re-run the gold smoke test.

0 commit comments

Comments
 (0)