Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
201 changes: 201 additions & 0 deletions .claude/skills/strict-typing-luau/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
---
name: strict-typing-luau
description: Convert a Nevermore Luau file from --!nonstrict (or untyped/--!nocheck) to --!strict, adding the project's explicit type annotations and fixing every type error the checker reports. Use this whenever the user asks to "strictly type", "add types to", "make strict", "type-annotate", "convert to --!strict", or clean up the typing on a .lua/.luau file in this repo — including when they just select a file with a `--!nonstrict` header and say "type this" or point you at a legacy module. Also use it when a strict file is throwing luau-lsp type errors and the user wants them resolved following Nevermore conventions.
---

# Strictly typing Luau files

Convert a file to `--!strict` and make it pass the type checker cleanly. This is mostly
**mechanical pattern application** — move fast and correctly. The checker is your fast
feedback loop; lean on it instead of guessing.

## Triage first — match effort to the file

- **Plain util / data module** (`local X = {}` of functions, or a `return {...}` table; no
`setmetatable`): flip the header, give every function typed params/returns, run single-file
analyze. Usually no `export type` block needed.
- **Class** (`setmetatable({}, ...)` + constructor + methods): the real work — needs the
`export type` block. Follow the patterns below and `references/conventions.md`.
- **Already `--!strict` but erroring**: skip conversion; just resolve the reported errors.

## Why it's not find-and-replace

Luau can't infer fields through `setmetatable`, so strict mode turns that blind spot into
errors. The fix is an explicit `export type` block enumerating every `self` field, plus
dot-syntax methods that name `self`. Flipping the header without these just produces a wall
of errors — supplying the types the checker can't infer *is* the job.

## The verification loop (fast inner loop)

`luau-lsp analyze` is the same engine as `lint:luau`, pointed at one file (~2.5s). One-time
setup if `sourcemap.json` / `globalTypes.d.lua` are missing at repo root: `npm run prelint:luau`.

```bash
luau-lsp analyze --sourcemap=sourcemap.json --base-luaurc=.luaurc \
--defs=globalTypes.d.lua --flag:LuauSolverV2=false --ignore='**/node_modules/**' \
src/<package>/src/<Realm>/<File>.lua
```

Clean = only the `[INFO] Loading...` line. `LuauSolverV2=false` is required (repo pins the old
solver). Iterate until clean, then run `npm run lint:luau` **once** as the final gate — single-file
analyze can't see files that depend on *yours*, and tightening a type ripples to subclasses and
callers. Triage new downstream errors: pre-existing → leave & flag; your type is genuinely too
tight for the real contract → loosen *your* type (often `T?` not `T`); a small obvious follow-on
→ fix it.

## Core patterns

**Class with a parent (the common case):**

```lua
--!strict
local require = require(script.Parent.loader).load(script)

local BaseObject = require("BaseObject")

local MyClass = setmetatable({}, BaseObject)
MyClass.ClassName = "MyClass"
MyClass.__index = MyClass

export type MyClass = typeof(setmetatable(
{} :: {
_serviceBag: ServiceBag.ServiceBag, -- EVERY self field, with its type
_enabled: ValueObject.ValueObject<boolean>,
},
{} :: typeof({ __index = MyClass })
)) & BaseObject.BaseObject -- intersection pulls in inherited _maid, _obj, etc.

function MyClass.new(serviceBag: ServiceBag.ServiceBag): MyClass
local self: MyClass = setmetatable(BaseObject.new() :: any, MyClass)
self._serviceBag = assert(serviceBag, "No serviceBag")
return self
end
```

**Methods use dot syntax with explicit `self`** (colon syntax loses the `self` type in strict
mode). Callers still write `obj:Method()`; only the definition changes:

```lua
function MyClass.GetEnabled(self: MyClass): boolean
return self._enabled.Value
end
```

## The export type rule — always `typeof(setmetatable(...))`, never hand-list methods

There is **one** way to write a class's export type, and it holds for generic, inherited, and
dynamic-`__index` classes alike:

```lua
export type MyClass = typeof(setmetatable(
{} :: { ...only the instance FIELDS... },
{} :: typeof({ __index = MyClass })
)) [& Parent.Parent]
```

Methods come from `typeof({ __index = MyClass })` — never hand-write `Method: (self, ...) -> ...`
records (they drift and cost enormous churn on big classes). Inheritance: list only the child's
*own new* fields; inherited ones arrive via `& Parent.Parent`. Generics: keep `<T>` load-bearing
by putting it in a field (e.g. a virtual `Value: T`); never collapse `<T>` to `any`. The one tax:
a metatable'd type needs `self :: any` for dynamic self-access (`rawget(self :: any, k)`).

## When `:: any` is acceptable

Confined to boundaries the checker genuinely can't model:

- `setmetatable(ParentClass.new(...) :: any, MyClass)` — the metatable transform
- `Binder.new("Tag", MyClass :: any) :: Binder.Binder<MyClass>` — binder registration
- `local t: any = require("t")` — `t` (and the rare library like it) isn't strict-friendly
- **Rx / reactive chains** (`:Pipe({...})`, `switchMap`/`map` closures, `RxSignal`, `Signal.new()`)
— the #1 time-sink. Don't try to thread types through them; cast to `any` on the *first* analyze
error and move on, keeping the public return type precise. See `references/rx.md`.

**Cast the chain body, never the public return type.** Type every public method's return precisely
(`Promise.Promise<T>`, `Observable.Observable<T>`); only the internal *expression* producing it may
be `any`. Casting the return type itself is the single biggest source of avoidable looseness — a
`GetX(self): any` leaks `any` to every caller. See the Rx/Promise examples in `references/conventions.md`.

Reaching for `:: any` inside a method body means the `export type` block is wrong — fix it there.
A call-site `(x :: any):Method()` to dodge a *signature mismatch* is a smell: usually the upstream
signature is too strict (a param that should be optional) — fix it upstream if in scope, else flag
it. Don't bury a real type bug under a cast.

## Hard cases — try the precise type first, loosen the narrowest spot only

Concrete metatable types (`Signal.Signal<T>`, `ValueObject.ValueObject<T>`, `Maid.Maid`) import
and check cleanly almost always — write them out. Only these justify deviating:

1. **A type you cannot import** (sibling is still nonstrict, exports no type) → write a **precise
structural interface** of the exact surface you use (real signatures, real field types) — this
is typing by another means, not a loosening. Better, if cheap: add the `export type` upstream.
2. **A heavy cyclic generic** the old solver can't hold (a deep class wrapped across many
members) → type each occurrence as `any` with JUST the intended type in a trailing comment:
`_indexObservers: any, -- ObservableSubscriptionTable<T?>`. Find the one culprit type and sweep
**all** its occurrences in one pass (fields, params, **returns**), don't bisect.
3. **Require cycle** between a class and its definition/factory → hoist shared shapes into a
types-only `<Package>Types.lua` (requires nothing at runtime, so importing it can't recreate the
cycle). Keep the public API precise; only the internal back-reference may give.

Time-box it: if a file won't go clean within ~2 iterations on the *same* cyclic/"too complex"
error, take the escape for that member and move on. The public API must stay precise; internal
plumbing can absorb imprecision.

## Comment discipline

**Default: write loose `any`s BARE — no explanatory comment.** Write `_cmdrService: any,` — NOT
`_cmdrService: any, -- CmdrServiceClient (nonstrict, no exported type)`. This holds for *every* `any`
whose source module simply exports no type yet (the common case). The comment reads as "blessed,
intentional" and discourages the next person from tightening it — and we *want* these `any`s gone. A
bare `any` is honest debt; a commented one looks finished. Don't explain yourself.

Only two exceptions earn a comment:
- **A forced cyclic-complexity `any`** (Hard case #2) records its intended type so it's recoverable:
`_indexObservers: any, -- ObservableSubscriptionTable<T?>`. The comment is *information*, not an apology.
- **A non-obvious deliberate structure** (e.g. a structural type dodging a require cycle) gets a line
so no one "fixes" it back into the cycle.

If you're tempted to comment an `any` for any other reason — don't.

## Report your hand-offs

End with a short **"Hand-offs"** list: every spot where you took an escape instead of a precise
type — file, member, the `any`/structural fallback, and the intended type. Skip routine sanctioned
boundaries (metatable casts, `t: any`, binder registration). This is the deliverable that lets the
user tighten later; a run that loosened things silently is not done.

## Pitfalls

- **`--!strict` must be the file's FIRST line** — before the `--[=[ ]=]` docstring. Luau only honors
a mode directive at the very top; placed after the docstring it's silently ignored and the file is
never strict-checked (analyze looks "clean" but isn't).
- Don't leave `--!nonstrict`/`--!nocheck` "for safety" — the point is `--!strict`.
- Don't run repo-wide `lint:luau` on every edit — single-file analyze is the inner loop, `lint:luau`
is the final gate.
- Don't invent fields — the `export type` block must list exactly what `self` gets.
- Keep legacy/deprecation docstrings; typing doesn't change deprecation status.

## Tooling

- `references/conventions.md` — full canonical examples + a common-error → fix table.
`references/rx.md` — the Rx/reactive escape in detail.
- **Converting a whole package?** Don't eyeball the order. Run the planner for a
dependency-ordered, model-routed conversion plan — which files, in what order, and which form
cyclic clusters to convert together:

```bash
# from the repo root (the script cd's to the repo top itself, so CWD doesn't matter):
.claude/skills/strict-typing-luau/evals/lib/run.sh plan src/<package> # e.g. src/settings
```

It defaults to **real mode** (targets the files that aren't `--!strict` yet); convert leaves-first
in the order it prints. (`--eval-gold` switches it to the harness's gold-bearing view.)
- **Parallelize at scale.** Files in the same dependency layer of the plan are independent — they
don't require each other, so they can convert concurrently. When a package has **more than ~3
files to convert**, fan out **one sub-agent per file within a layer** (launch them in a single
message so they run in parallel), wait for the layer, then move to the next. Each sub-agent
converts ONE file against its already-converted deps. This makes wall-clock ≈ the slowest file per
layer instead of the sum, and keeps each context small. At ≤3 files, just convert sequentially —
the fan-out overhead isn't worth it.
- That same **`run.sh`** with no arguments lists every command. `plan` is the one generally useful
for real conversions; the rest (`gold`, `convert`, `place`/`score`/`restore`, `triggers`) *evaluate*
the skill — see `evals/README.md` for those workflows.
115 changes: 115 additions & 0 deletions .claude/skills/strict-typing-luau/evals/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# strict-typing-luau — mechanical eval harness

A fast, deterministic, **no-LLM** way to validate the skill. Cases are mined from **main
history**: real nonstrict→strict conversions the maintainers shipped. Each case stores the
pre-conversion blob (`input`) and anchors `gold` to **main HEAD**, so single-file analyze always
matches the main-built sourcemap (no staleness). See `manifest.json`.

## The checks

### 1. Gold smoke test (~20s, no tokens) — validates the harness itself

```bas
bash evals/lib/run.sh gold
```

Scores the maintainer's gold (main) for every case. Every positive must come back
`strict=true, analyze=0`; negatives must be `strict=false`. If this is green, the pairs, the
scorer, and the plumbing are sound. Run it after editing the manifest or scorer.

### 2. Worker eval — measures an actual skill run

Per case, drive the worker (an agent running the skill) with these primitives:

```bash
bash evals/lib/run.sh place <case_id> # lay the nonstrict INPUT at the file's real path
# ... run the skill on that path to convert it in place ...
bash evals/lib/run.sh score <case_id> # -> {strict, analyze_errors, any, any_gold}
bash evals/lib/run.sh restore <case_id> # undo: restore the package to main
```

A conversion **passes** when `strict=true`, `analyze_errors=0`, and `any_nonrx <= any_gold_nonrx`
(no looser than the maintainer's output, *excluding* Rx-chain casts — those are deliberate policy,
see `references/rx.md`). `score.sh` reports both the total `any` and the budgeted `any_nonrx`. Run
`npm run lint:luau` once at the end as the downstream gate.

**Whole-package runs.** Cases sharing an `input` commit belong to one package conversion — e.g.
the six `settings-*` cases (all `input: b1bc1512aa`, `src/settings`, the `PlayerSettings` package
the maintainers strict-typed in PR #550). `place` any one of them and the whole package drops to
its untyped state; the agent then converts every file (leaves first — see their manifest order),
and you `score` each case. If the agent **added** a file during conversion (a shared `Types.lua`),
`run.sh sync` before scoring so single-file analyze can resolve it, then `restore` + `sync` again
to return the sourcemap to main. (Single-file conversions that add nothing don't need `sync`.)

### 3. Triggering test (~5s, one judge call) — validates the `description`

```bash
bash evals/lib/triggers.sh
```

Does the skill's frontmatter `description` fire on the right prompts and stay quiet on near-misses?
Cases live in `triggers.json` (positives + hard negatives like TypeScript-strict, tsconfig-strict,
formatting). A judge model (`claude -p`) sees only the *live* description from SKILL.md and each
prompt, returns fire/skip per prompt, and we score against `expect`. The judge reads the current
description, so this catches a description edit that quietly breaks (or over-broadens) triggering.
Add a case = one line in `triggers.json`.

### 4. Tooling-scope test (~6s, one judge call) — guards against over-orchestration

```bash
bash evals/lib/run.sh tooling
```

Now that SKILL.md surfaces the planner and harness, a single-file worker could wrongly run `plan`
or the eval harness instead of just converting its file. A judge reads the *live* SKILL.md and
classifies each task in `tooling.json` as `planner` (whole-package only) or `direct` (single file /
scoped node / error-fix), scored against `expect`. Catches a SKILL.md edit that over-broadens the
planner's apparent scope. Add a case = one line in `tooling.json`.

### 5. Parallelism test (~9s, one judge call) — execution strategy at scale

```bash
bash evals/lib/run.sh parallelism
```

A judge reads the live SKILL.md and classifies each `parallelism.json` task as `parallel` (fan out
one sub-agent per file within a dependency layer, for a package > ~3 files) or `sequential` (single
file / handful / error-fix). Guards the real-use regression where the agent converted a whole package
serially because the skill never told it to parallelize.

## Design notes

- **Files are placed at package granularity** (`src/<pkg>`): a package is the unit of
type-consistency, so a cyclic file scored against nonstrict siblings won't error falsely.
- **`restore` unstages and cleans** gold-only new files (e.g. a new `*Types.lua`) so cases can't
leak into each other.
- **Budget enforcement belongs here, not in the skill prose.** Cap the worker's tokens/iterations
in the runner; if it can't finish a case in budget, that's a recorded failure, not a reason to
add more instructions to SKILL.md.
- **Negatives** come from the revert commit `9027756cd7` (a naive flip that broke downstream and
was reverted). They exist to prove the downstream `lint:luau` gate fires.
- **Model routing (per node, not per file).** `plan` tags each unit with a suggested worker model:
**opus** for class / cyclic-cluster / Rx-importing nodes (precision + comprehension matter),
**sonnet** for mechanical leaf/util nodes (speed is free, little to cast loosely). Backed by
probes on a hard Rx class: Sonnet is fastest (3.1 min vs Opus 5.5) but casts ~3× more `any`
(34 vs 12) — fine where looseness is cheap, not where it isn't. **Do not** split one hard file
across models as a Sonnet-draft→Opus-polish pipeline: measured 2× the cost of Opus-alone with no
precision gain (the polish pass re-comprehends from scratch; post-hoc `any`-tightening is harder
than typing precisely up front). Route whole *nodes*, never halves of a file.
- **`any`-budget gate** (for a Sonnet-first driver): after scoring a node, if `any_nonrx >
any_gold_nonrx + slack`, re-run that node on Opus. Uses the budgeted count, so heavy Rx casting
(sanctioned) never trips the gate — only genuine non-Rx looseness does.

## Adding a case

Pick a file currently `--!strict` on main, find the commit that introduced it, and use its parent
as `input` (both must be reachable from main):

```bash
f=src/<pkg>/src/<Realm>/<File>.lua
c=$(git log -S'--!strict' --reverse --format='%H' -- "$f" | head -1)
git merge-base --is-ancestor "$c" main && echo "on main: input=$(git rev-parse --short "$c^")"
```

Add `{ id, archetype, path, input: "<C^ sha>", gold: "main", polarity: "positive" }` to
`manifest.json`, then re-run the gold smoke test.
Loading
Loading