diff --git a/.claude/skills/strict-typing-luau/SKILL.md b/.claude/skills/strict-typing-luau/SKILL.md new file mode 100644 index 0000000000..02ad12578f --- /dev/null +++ b/.claude/skills/strict-typing-luau/SKILL.md @@ -0,0 +1,201 @@ +--- +name: strict-typing-luau +description: Convert a Nevermore Luau file from --!nonstrict (or untyped/--!nocheck) to --!strict, adding the project's explicit type annotations and fixing every type error the checker reports. Use this whenever the user asks to "strictly type", "add types to", "make strict", "type-annotate", "convert to --!strict", or clean up the typing on a .lua/.luau file in this repo — including when they just select a file with a `--!nonstrict` header and say "type this" or point you at a legacy module. Also use it when a strict file is throwing luau-lsp type errors and the user wants them resolved following Nevermore conventions. +--- + +# Strictly typing Luau files + +Convert a file to `--!strict` and make it pass the type checker cleanly. This is mostly +**mechanical pattern application** — move fast and correctly. The checker is your fast +feedback loop; lean on it instead of guessing. + +## Triage first — match effort to the file + +- **Plain util / data module** (`local X = {}` of functions, or a `return {...}` table; no + `setmetatable`): flip the header, give every function typed params/returns, run single-file + analyze. Usually no `export type` block needed. +- **Class** (`setmetatable({}, ...)` + constructor + methods): the real work — needs the + `export type` block. Follow the patterns below and `references/conventions.md`. +- **Already `--!strict` but erroring**: skip conversion; just resolve the reported errors. + +## Why it's not find-and-replace + +Luau can't infer fields through `setmetatable`, so strict mode turns that blind spot into +errors. The fix is an explicit `export type` block enumerating every `self` field, plus +dot-syntax methods that name `self`. Flipping the header without these just produces a wall +of errors — supplying the types the checker can't infer *is* the job. + +## The verification loop (fast inner loop) + +`luau-lsp analyze` is the same engine as `lint:luau`, pointed at one file (~2.5s). One-time +setup if `sourcemap.json` / `globalTypes.d.lua` are missing at repo root: `npm run prelint:luau`. + +```bash +luau-lsp analyze --sourcemap=sourcemap.json --base-luaurc=.luaurc \ + --defs=globalTypes.d.lua --flag:LuauSolverV2=false --ignore='**/node_modules/**' \ + src//src//.lua +``` + +Clean = only the `[INFO] Loading...` line. `LuauSolverV2=false` is required (repo pins the old +solver). Iterate until clean, then run `npm run lint:luau` **once** as the final gate — single-file +analyze can't see files that depend on *yours*, and tightening a type ripples to subclasses and +callers. Triage new downstream errors: pre-existing → leave & flag; your type is genuinely too +tight for the real contract → loosen *your* type (often `T?` not `T`); a small obvious follow-on +→ fix it. + +## Core patterns + +**Class with a parent (the common case):** + +```lua +--!strict +local require = require(script.Parent.loader).load(script) + +local BaseObject = require("BaseObject") + +local MyClass = setmetatable({}, BaseObject) +MyClass.ClassName = "MyClass" +MyClass.__index = MyClass + +export type MyClass = typeof(setmetatable( + {} :: { + _serviceBag: ServiceBag.ServiceBag, -- EVERY self field, with its type + _enabled: ValueObject.ValueObject, + }, + {} :: typeof({ __index = MyClass }) +)) & BaseObject.BaseObject -- intersection pulls in inherited _maid, _obj, etc. + +function MyClass.new(serviceBag: ServiceBag.ServiceBag): MyClass + local self: MyClass = setmetatable(BaseObject.new() :: any, MyClass) + self._serviceBag = assert(serviceBag, "No serviceBag") + return self +end +``` + +**Methods use dot syntax with explicit `self`** (colon syntax loses the `self` type in strict +mode). Callers still write `obj:Method()`; only the definition changes: + +```lua +function MyClass.GetEnabled(self: MyClass): boolean + return self._enabled.Value +end +``` + +## The export type rule — always `typeof(setmetatable(...))`, never hand-list methods + +There is **one** way to write a class's export type, and it holds for generic, inherited, and +dynamic-`__index` classes alike: + +```lua +export type MyClass = typeof(setmetatable( + {} :: { ...only the instance FIELDS... }, + {} :: typeof({ __index = MyClass }) +)) [& Parent.Parent] +``` + +Methods come from `typeof({ __index = MyClass })` — never hand-write `Method: (self, ...) -> ...` +records (they drift and cost enormous churn on big classes). Inheritance: list only the child's +*own new* fields; inherited ones arrive via `& Parent.Parent`. Generics: keep `` load-bearing +by putting it in a field (e.g. a virtual `Value: T`); never collapse `` to `any`. The one tax: +a metatable'd type needs `self :: any` for dynamic self-access (`rawget(self :: any, k)`). + +## When `:: any` is acceptable + +Confined to boundaries the checker genuinely can't model: + +- `setmetatable(ParentClass.new(...) :: any, MyClass)` — the metatable transform +- `Binder.new("Tag", MyClass :: any) :: Binder.Binder` — binder registration +- `local t: any = require("t")` — `t` (and the rare library like it) isn't strict-friendly +- **Rx / reactive chains** (`:Pipe({...})`, `switchMap`/`map` closures, `RxSignal`, `Signal.new()`) + — the #1 time-sink. Don't try to thread types through them; cast to `any` on the *first* analyze + error and move on, keeping the public return type precise. See `references/rx.md`. + +**Cast the chain body, never the public return type.** Type every public method's return precisely +(`Promise.Promise`, `Observable.Observable`); only the internal *expression* producing it may +be `any`. Casting the return type itself is the single biggest source of avoidable looseness — a +`GetX(self): any` leaks `any` to every caller. See the Rx/Promise examples in `references/conventions.md`. + +Reaching for `:: any` inside a method body means the `export type` block is wrong — fix it there. +A call-site `(x :: any):Method()` to dodge a *signature mismatch* is a smell: usually the upstream +signature is too strict (a param that should be optional) — fix it upstream if in scope, else flag +it. Don't bury a real type bug under a cast. + +## Hard cases — try the precise type first, loosen the narrowest spot only + +Concrete metatable types (`Signal.Signal`, `ValueObject.ValueObject`, `Maid.Maid`) import +and check cleanly almost always — write them out. Only these justify deviating: + +1. **A type you cannot import** (sibling is still nonstrict, exports no type) → write a **precise + structural interface** of the exact surface you use (real signatures, real field types) — this + is typing by another means, not a loosening. Better, if cheap: add the `export type` upstream. +2. **A heavy cyclic generic** the old solver can't hold (a deep class wrapped across many + members) → type each occurrence as `any` with JUST the intended type in a trailing comment: + `_indexObservers: any, -- ObservableSubscriptionTable`. Find the one culprit type and sweep + **all** its occurrences in one pass (fields, params, **returns**), don't bisect. +3. **Require cycle** between a class and its definition/factory → hoist shared shapes into a + types-only `Types.lua` (requires nothing at runtime, so importing it can't recreate the + cycle). Keep the public API precise; only the internal back-reference may give. + +Time-box it: if a file won't go clean within ~2 iterations on the *same* cyclic/"too complex" +error, take the escape for that member and move on. The public API must stay precise; internal +plumbing can absorb imprecision. + +## Comment discipline + +**Default: write loose `any`s BARE — no explanatory comment.** Write `_cmdrService: any,` — NOT +`_cmdrService: any, -- CmdrServiceClient (nonstrict, no exported type)`. This holds for *every* `any` +whose source module simply exports no type yet (the common case). The comment reads as "blessed, +intentional" and discourages the next person from tightening it — and we *want* these `any`s gone. A +bare `any` is honest debt; a commented one looks finished. Don't explain yourself. + +Only two exceptions earn a comment: +- **A forced cyclic-complexity `any`** (Hard case #2) records its intended type so it's recoverable: + `_indexObservers: any, -- ObservableSubscriptionTable`. The comment is *information*, not an apology. +- **A non-obvious deliberate structure** (e.g. a structural type dodging a require cycle) gets a line + so no one "fixes" it back into the cycle. + +If you're tempted to comment an `any` for any other reason — don't. + +## Report your hand-offs + +End with a short **"Hand-offs"** list: every spot where you took an escape instead of a precise +type — file, member, the `any`/structural fallback, and the intended type. Skip routine sanctioned +boundaries (metatable casts, `t: any`, binder registration). This is the deliverable that lets the +user tighten later; a run that loosened things silently is not done. + +## Pitfalls + +- **`--!strict` must be the file's FIRST line** — before the `--[=[ ]=]` docstring. Luau only honors + a mode directive at the very top; placed after the docstring it's silently ignored and the file is + never strict-checked (analyze looks "clean" but isn't). +- Don't leave `--!nonstrict`/`--!nocheck` "for safety" — the point is `--!strict`. +- Don't run repo-wide `lint:luau` on every edit — single-file analyze is the inner loop, `lint:luau` + is the final gate. +- Don't invent fields — the `export type` block must list exactly what `self` gets. +- Keep legacy/deprecation docstrings; typing doesn't change deprecation status. + +## Tooling + +- `references/conventions.md` — full canonical examples + a common-error → fix table. + `references/rx.md` — the Rx/reactive escape in detail. +- **Converting a whole package?** Don't eyeball the order. Run the planner for a + dependency-ordered, model-routed conversion plan — which files, in what order, and which form + cyclic clusters to convert together: + + ```bash + # from the repo root (the script cd's to the repo top itself, so CWD doesn't matter): + .claude/skills/strict-typing-luau/evals/lib/run.sh plan src/ # e.g. src/settings + ``` + + It defaults to **real mode** (targets the files that aren't `--!strict` yet); convert leaves-first + in the order it prints. (`--eval-gold` switches it to the harness's gold-bearing view.) +- **Parallelize at scale.** Files in the same dependency layer of the plan are independent — they + don't require each other, so they can convert concurrently. When a package has **more than ~3 + files to convert**, fan out **one sub-agent per file within a layer** (launch them in a single + message so they run in parallel), wait for the layer, then move to the next. Each sub-agent + converts ONE file against its already-converted deps. This makes wall-clock ≈ the slowest file per + layer instead of the sum, and keeps each context small. At ≤3 files, just convert sequentially — + the fan-out overhead isn't worth it. +- That same **`run.sh`** with no arguments lists every command. `plan` is the one generally useful + for real conversions; the rest (`gold`, `convert`, `place`/`score`/`restore`, `triggers`) *evaluate* + the skill — see `evals/README.md` for those workflows. diff --git a/.claude/skills/strict-typing-luau/evals/README.md b/.claude/skills/strict-typing-luau/evals/README.md new file mode 100644 index 0000000000..5422dd021e --- /dev/null +++ b/.claude/skills/strict-typing-luau/evals/README.md @@ -0,0 +1,115 @@ +# strict-typing-luau — mechanical eval harness + +A fast, deterministic, **no-LLM** way to validate the skill. Cases are mined from **main +history**: real nonstrict→strict conversions the maintainers shipped. Each case stores the +pre-conversion blob (`input`) and anchors `gold` to **main HEAD**, so single-file analyze always +matches the main-built sourcemap (no staleness). See `manifest.json`. + +## The checks + +### 1. Gold smoke test (~20s, no tokens) — validates the harness itself + +```bas +bash evals/lib/run.sh gold +``` + +Scores the maintainer's gold (main) for every case. Every positive must come back +`strict=true, analyze=0`; negatives must be `strict=false`. If this is green, the pairs, the +scorer, and the plumbing are sound. Run it after editing the manifest or scorer. + +### 2. Worker eval — measures an actual skill run + +Per case, drive the worker (an agent running the skill) with these primitives: + +```bash +bash evals/lib/run.sh place # lay the nonstrict INPUT at the file's real path +# ... run the skill on that path to convert it in place ... +bash evals/lib/run.sh score # -> {strict, analyze_errors, any, any_gold} +bash evals/lib/run.sh restore # undo: restore the package to main +``` + +A conversion **passes** when `strict=true`, `analyze_errors=0`, and `any_nonrx <= any_gold_nonrx` +(no looser than the maintainer's output, *excluding* Rx-chain casts — those are deliberate policy, +see `references/rx.md`). `score.sh` reports both the total `any` and the budgeted `any_nonrx`. Run +`npm run lint:luau` once at the end as the downstream gate. + +**Whole-package runs.** Cases sharing an `input` commit belong to one package conversion — e.g. +the six `settings-*` cases (all `input: b1bc1512aa`, `src/settings`, the `PlayerSettings` package +the maintainers strict-typed in PR #550). `place` any one of them and the whole package drops to +its untyped state; the agent then converts every file (leaves first — see their manifest order), +and you `score` each case. If the agent **added** a file during conversion (a shared `Types.lua`), +`run.sh sync` before scoring so single-file analyze can resolve it, then `restore` + `sync` again +to return the sourcemap to main. (Single-file conversions that add nothing don't need `sync`.) + +### 3. Triggering test (~5s, one judge call) — validates the `description` + +```bash +bash evals/lib/triggers.sh +``` + +Does the skill's frontmatter `description` fire on the right prompts and stay quiet on near-misses? +Cases live in `triggers.json` (positives + hard negatives like TypeScript-strict, tsconfig-strict, +formatting). A judge model (`claude -p`) sees only the *live* description from SKILL.md and each +prompt, returns fire/skip per prompt, and we score against `expect`. The judge reads the current +description, so this catches a description edit that quietly breaks (or over-broadens) triggering. +Add a case = one line in `triggers.json`. + +### 4. Tooling-scope test (~6s, one judge call) — guards against over-orchestration + +```bash +bash evals/lib/run.sh tooling +``` + +Now that SKILL.md surfaces the planner and harness, a single-file worker could wrongly run `plan` +or the eval harness instead of just converting its file. A judge reads the *live* SKILL.md and +classifies each task in `tooling.json` as `planner` (whole-package only) or `direct` (single file / +scoped node / error-fix), scored against `expect`. Catches a SKILL.md edit that over-broadens the +planner's apparent scope. Add a case = one line in `tooling.json`. + +### 5. Parallelism test (~9s, one judge call) — execution strategy at scale + +```bash +bash evals/lib/run.sh parallelism +``` + +A judge reads the live SKILL.md and classifies each `parallelism.json` task as `parallel` (fan out +one sub-agent per file within a dependency layer, for a package > ~3 files) or `sequential` (single +file / handful / error-fix). Guards the real-use regression where the agent converted a whole package +serially because the skill never told it to parallelize. + +## Design notes + +- **Files are placed at package granularity** (`src/`): a package is the unit of + type-consistency, so a cyclic file scored against nonstrict siblings won't error falsely. +- **`restore` unstages and cleans** gold-only new files (e.g. a new `*Types.lua`) so cases can't + leak into each other. +- **Budget enforcement belongs here, not in the skill prose.** Cap the worker's tokens/iterations + in the runner; if it can't finish a case in budget, that's a recorded failure, not a reason to + add more instructions to SKILL.md. +- **Negatives** come from the revert commit `9027756cd7` (a naive flip that broke downstream and + was reverted). They exist to prove the downstream `lint:luau` gate fires. +- **Model routing (per node, not per file).** `plan` tags each unit with a suggested worker model: + **opus** for class / cyclic-cluster / Rx-importing nodes (precision + comprehension matter), + **sonnet** for mechanical leaf/util nodes (speed is free, little to cast loosely). Backed by + probes on a hard Rx class: Sonnet is fastest (3.1 min vs Opus 5.5) but casts ~3× more `any` + (34 vs 12) — fine where looseness is cheap, not where it isn't. **Do not** split one hard file + across models as a Sonnet-draft→Opus-polish pipeline: measured 2× the cost of Opus-alone with no + precision gain (the polish pass re-comprehends from scratch; post-hoc `any`-tightening is harder + than typing precisely up front). Route whole *nodes*, never halves of a file. +- **`any`-budget gate** (for a Sonnet-first driver): after scoring a node, if `any_nonrx > + any_gold_nonrx + slack`, re-run that node on Opus. Uses the budgeted count, so heavy Rx casting + (sanctioned) never trips the gate — only genuine non-Rx looseness does. + +## Adding a case + +Pick a file currently `--!strict` on main, find the commit that introduced it, and use its parent +as `input` (both must be reachable from main): + +```bash +f=src//src//.lua +c=$(git log -S'--!strict' --reverse --format='%H' -- "$f" | head -1) +git merge-base --is-ancestor "$c" main && echo "on main: input=$(git rev-parse --short "$c^")" +``` + +Add `{ id, archetype, path, input: "", gold: "main", polarity: "positive" }` to +`manifest.json`, then re-run the gold smoke test. diff --git a/.claude/skills/strict-typing-luau/evals/lib/convert.sh b/.claude/skills/strict-typing-luau/evals/lib/convert.sh new file mode 100755 index 0000000000..5c04645ae0 --- /dev/null +++ b/.claude/skills/strict-typing-luau/evals/lib/convert.sh @@ -0,0 +1,108 @@ +#!/usr/bin/env bash +# Whole-package conversion driver — the orchestration capstone. Walks the `plan` order, spawns a +# fresh model-routed worker per node (claude -p), scores each, applies the any-budget gate +# (escalate a loose Sonnet node to Opus), then one sourcemap sync + lint:luau at the end. +# +# convert.sh [--run] [--limit N] +# package dir, e.g. src/settings +# git ref holding the pre-conversion (untyped) package, e.g. b1bc1512aa +# --run ACTUALLY spawn workers (spends tokens/time). Omitted = DRY RUN. +# --limit N only the first N convertible units (for a bounded slice) +# +# Converts ONLY files that are --!strict on main (real strict gold to score against); files left +# nonstrict on main stay at main, so each target converts against the same conditions the gold did. +# Fresh worker per node = fresh context, so each node imports its already-converted deps' real types. +set -euo pipefail +ROOT="$(git rev-parse --show-toplevel)"; cd "$ROOT" +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +SKILL="$HERE/../../SKILL.md" +PKG="${1:?usage: convert.sh [--run] [--limit N]}" +INPUT="${2:?need input ref}" +RUN=0; LIMIT=0 +args=("$@") +for ((i = 2; i < ${#args[@]}; i++)); do + [ "${args[$i]}" = "--run" ] && RUN=1 + [ "${args[$i]}" = "--limit" ] && LIMIT="${args[$((i + 1))]:-0}" +done +SLACK=2 # any_nonrx may exceed gold_nonrx by this much before the gate escalates to Opus + +# convertible units (>=1 target), honoring --limit: step \t model \t kind \t comma-joined TARGET rels +units_tsv() { + node -e ' + const {execSync}=require("child_process"); + const p=JSON.parse(execSync(`node '"$HERE"'/plan.js '"$PKG"' json --eval-gold`).toString()); + let u=p.units.filter(x=>x.targets.length); + const lim='"$LIMIT"'; if(lim>0) u=u.slice(0,lim); + for(const x of u) console.log([x.step,x.model,x.kind,x.targets.join(",")].join("\t")); + ' +} +all_targets() { units_tsv | cut -f4 | tr ',' '\n' | sed '/^$/d'; } + +worker_prompt() { # model "rel1,rel2" + local files; files="$(echo "$1" | tr ',' '\n' | sed "s#^#$PKG/src/#")" + cat </dev/null 2>&1 || true; } + +# ---- DRY RUN (default) ---- +if [ "$RUN" -ne 1 ]; then + echo "DRY RUN — convert $PKG from $INPUT${LIMIT:+ (limit ${LIMIT})} (no workers; pass --run to execute)" + echo "Would: place each target's untyped $INPUT version (non-targets stay at main), then per node:" + echo + printf '%-4s %-7s %-8s %s\n' STEP MODEL KIND TARGETS + while IFS=$'\t' read -r step model kind targets; do + printf '%-4s %-7s %-8s %s\n' "$step" "$model" "$kind" "$targets" + done < <(units_tsv) + echo + echo "Per node: fresh \`claude -p --model \` -> score.sh -> if sonnet & any_nonrx >" + echo "any_gold_nonrx+$SLACK (or not clean), re-run node on opus. Then sync + npm run lint:luau." + exit 0 +fi + +# ---- REAL RUN ---- +echo ">> placing $(all_targets | wc -l | tr -d ' ') target file(s) at $INPUT (non-targets stay at main)" +git checkout main -- "$PKG"; git reset -q HEAD -- "$PKG"; git clean -fdq -- "$PKG" +while read -r t; do + full="$PKG/src/$t" + git show "$INPUT:$full" > "$full" 2>/dev/null || { echo " !! $t missing at $INPUT — aborting"; git checkout main -- "$PKG"; exit 1; } +done < <(all_targets) +npm run build:sourcemap >/dev/null 2>&1 + +printf '%-4s %-7s %-7s %-7s %-11s %s\n' STEP MODEL STRICT ERRORS ANYnonrx FILE +while IFS=$'\t' read -r step model kind targets; do + run_worker "$model" "$targets" + npm run build:sourcemap >/dev/null 2>&1 # sync before scoring — a worker may have shifted the + # tree; a stale sourcemap yields phantom analyze errors + for rel in ${targets//,/ }; do + f="$PKG/src/$rel" + row="$(bash "$HERE/score.sh" "$f" main)" + get() { node -e "process.stdout.write(String(JSON.parse(process.argv[1]).$1))" "$row"; } + strict="$(get strict)"; errs="$(get analyze_errors)"; anr="$(get any_nonrx)"; gnr="$(get any_gold_nonrx)" + gate="" + if [ "$model" = sonnet ] && { [ "$strict" != true ] || [ "$errs" -ne 0 ] || [ "$anr" -gt $((gnr + SLACK)) ]; }; then + gate="-> escalated opus" + run_worker opus "$targets" + row="$(bash "$HERE/score.sh" "$f" main)"; strict="$(get strict)"; errs="$(get analyze_errors)"; anr="$(get any_nonrx)" + fi + printf '%-4s %-7s %-7s %-7s %-11s %s %s\n' "$step" "$model" "$strict" "$errs" "$anr/$gnr" "$rel" "$gate" + done +done < <(units_tsv) + +echo; echo ">> final gate: npm run lint:luau (new errors in $PKG only)" +npm run build:sourcemap >/dev/null 2>&1 +npm run lint:luau 2>&1 | grep -E "$PKG" | grep -iE 'error' | head -20 || echo " none" +echo ">> conversion left in tree. Restore: git checkout main -- $PKG && git reset -q HEAD -- $PKG && git clean -fdq -- $PKG && npm run build:sourcemap" diff --git a/.claude/skills/strict-typing-luau/evals/lib/parallelism.js b/.claude/skills/strict-typing-luau/evals/lib/parallelism.js new file mode 100755 index 0000000000..094fe87834 --- /dev/null +++ b/.claude/skills/strict-typing-luau/evals/lib/parallelism.js @@ -0,0 +1,58 @@ +#!/usr/bin/env node +// Execution-strategy test helper. Mirrors tooling.js: given the LIVE SKILL.md, does an agent +// following it fan out parallel sub-agents (package > ~3 files) or convert sequentially? +// parallelism.js prompt -> print the batched judge prompt (full SKILL.md + cases) +// parallelism.js score -> parse judge reply, score vs expected, print table, exit 1 on fail +const fs = require("fs"); +const path = require("path"); +const HERE = __dirname; +const SKILL = path.join(HERE, "..", "..", "SKILL.md"); +const cases = JSON.parse(fs.readFileSync(path.join(HERE, "..", "parallelism.json"), "utf8")).cases; + +if (process.argv[2] === "prompt") { + const numbered = cases.map((c, i) => `${i + 1}. ${c.prompt}`).join("\n"); + process.stdout.write( +`You simulate an agent that has loaded the strict-typing-luau skill (full text below) and is handed a +task. For each task, decide HOW the skill directs the agent to EXECUTE it: + "parallel" = fan out multiple sub-agents concurrently — one per independent file within a + dependency layer — to convert several files at once. + "sequential" = convert the file(s) one at a time, no sub-agent fan-out. + +Rule the skill intends: fan out (parallel) only when a PACKAGE has more than ~3 files to convert. +A single file, a couple of named files, ~3 files, or error-fixing in one file = "sequential". + +SKILL.md: +----- +${fs.readFileSync(SKILL, "utf8")} +----- + +Tasks: +${numbered} + +Output ONLY a compact JSON array, one object per task, no prose, no code fences: +[{"n":1,"action":"sequential"},{"n":2,"action":"parallel"}, ...] +`); + process.exit(0); +} + +if (process.argv[2] === "score") { + const raw = fs.readFileSync(process.argv[3], "utf8"); + const arr = JSON.parse(raw.slice(raw.indexOf("["), raw.lastIndexOf("]") + 1)); + const got = new Map(arr.map((o) => [o.n, String(o.action || "").toLowerCase()])); + let fails = 0; + const pad = (s, n) => String(s).padEnd(n); + console.log(pad("#", 3) + pad("TAG", 18) + pad("EXPECT", 12) + pad("GOT", 12) + "VERDICT PROMPT"); + cases.forEach((c, i) => { + const n = i + 1, g = got.get(n), ok = g === c.expect; + if (!ok) fails++; + console.log(pad(n, 3) + pad(c.tag, 18) + pad(c.expect, 12) + pad(g || "?", 12) + + (ok ? "ok " : "MISS ") + c.prompt.slice(0, 48)); + }); + console.log(""); + if (fails === 0) console.log(`PARALLELISM TEST: all ${cases.length} cases correct`); + else { console.log(`PARALLELISM TEST: ${fails}/${cases.length} MISCLASSIFIED`); process.exit(1); } + process.exit(0); +} + +console.error("usage: parallelism.js prompt | score "); +process.exit(2); diff --git a/.claude/skills/strict-typing-luau/evals/lib/parallelism.sh b/.claude/skills/strict-typing-luau/evals/lib/parallelism.sh new file mode 100755 index 0000000000..71732cf509 --- /dev/null +++ b/.claude/skills/strict-typing-luau/evals/lib/parallelism.sh @@ -0,0 +1,14 @@ +#!/usr/bin/env bash +# Execution-strategy test: does the skill tell the agent to fan out parallel sub-agents for a +# package (> ~3 files) vs convert sequentially for a single file / handful? Guards the real-use +# regression where the agent converted a whole package serially. One batched judge call via +# headless `claude -p`, scored against parallelism.json. ~30s. +set -euo pipefail +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +JS="$HERE/parallelism.js" +TMP="$(mktemp)"; trap 'rm -f "$TMP"' EXIT + +PROMPT="$(node "$JS" prompt)" +echo "Judging $(node -e "console.log(require('$HERE/../parallelism.json').cases.length)") tasks via claude -p ..." +claude -p "$PROMPT" > "$TMP" 2>/dev/null || { echo "judge call failed"; cat "$TMP"; exit 1; } +node "$JS" score "$TMP" diff --git a/.claude/skills/strict-typing-luau/evals/lib/plan.js b/.claude/skills/strict-typing-luau/evals/lib/plan.js new file mode 100644 index 0000000000..1501c04dd5 --- /dev/null +++ b/.claude/skills/strict-typing-luau/evals/lib/plan.js @@ -0,0 +1,126 @@ +#!/usr/bin/env node +// Compute the INTRA-PACKAGE conversion plan for a Nevermore package: the order in which to +// strict-type its files so every file is converted only after the siblings it requires. +// +// plan.js [json] +// package dir, e.g. src/settings +// json emit a machine-readable plan (for a workflow driver) instead of the human table +// +// Method: edges = require("X") where X resolves to a sibling file in THIS package. Tarjan's SCC +// collapses cycles into clusters; the condensation is a DAG; Tarjan emits it in dependency-first +// order, which IS conversion order (a file's deps come before it). Each unit is either one file +// (mechanical single-file conversion) or a cyclic CLUSTER (use the cyclic-types playbook). +// +// SCOPE — deliberately intra-package only. It does NOT: +// * order across packages (a require into another package is treated as already-typed/external) +// * account for the downstream blast radius of high-fan-out foundational files (Maid, Promise) +// Those are a separate monorepo-level dependency problem. The plan ASSUMES cross-package deps are +// already strict (true for Maid/Promise/BaseObject/ServiceBag on main); when one isn't, that's the +// "type upstream first" case — handle it with the skill's structural-interface escape meanwhile. +const fs = require("fs"); +const cp = require("child_process"); +const path = require("path"); + +const pkg = process.argv[2]; +const rest = process.argv.slice(3); +const asJson = rest.includes("json"); +const evalGold = rest.includes("--eval-gold"); // eval harness only — see isTarget below +if (!pkg) { console.error("usage: plan.js [json] [--eval-gold]"); process.exit(2); } + +const listRef = evalGold ? "main" : "HEAD"; +const files = cp.execSync(`git ls-tree -r --name-only ${listRef} -- ${pkg}`).toString().trim().split("\n") + .filter((f) => f.endsWith(".lua") && !f.endsWith(".spec.lua") && !f.includes("jest.config") && !f.includes("/test/")); +if (!files.length) { console.error(`no source .lua files under ${pkg}`); process.exit(1); } + +const nameToFile = {}; +for (const f of files) nameToFile[path.basename(f, ".lua")] = f; + +const adj = {}; // file -> [sibling files it requires] +const traits = {}; // file -> { isClass, usesRx } from a cheap source scan +for (const f of files) { + const src = fs.readFileSync(f, "utf8"); + const reqs = [...src.matchAll(/require\("([^"]+)"\)/g)].map((m) => m[1]); + adj[f] = [...new Set(reqs.filter((r) => nameToFile[r] && nameToFile[r] !== f).map((r) => nameToFile[r]))]; + traits[f] = { + isClass: /setmetatable\(\{\}/.test(src), + usesRx: reqs.some((r) => /^(Rx|RxSignal|Observable|Brio)$/.test(r)), + }; +} + +// Target selection is MODE-AWARE (this is the key real-vs-eval distinction): +// default (real conversion): target = files NOT yet --!strict in the working tree — i.e. the +// files that actually need converting. (A package you point at is nonstrict precisely because +// it hasn't been converted.) +// --eval-gold (eval harness only): target = files that are --!strict on MAIN — the gold-bearing +// set, so the eval respects exactly what the maintainer chose to convert and leaves the rest. +// Using strict-on-main as the target predicate in real mode would skip every file you asked to +// convert, so it must never be the unconditional default. +const isStrictHeader = (text) => text.split("\n", 1)[0].trim() === "--!strict"; +function isTarget(f) { + if (evalGold) { + try { return isStrictHeader(cp.execSync(`git show main:${f}`).toString()); } + catch (e) { return false; } + } + return !isStrictHeader(fs.readFileSync(f, "utf8")); +} +const skipReason = evalGold ? "nonstrict on main" : "already strict"; + +// Suggested worker model per unit (see README "model routing"): the precision/comprehension-heavy +// nodes go to Opus, the mechanical ones to Sonnet. Empirically Sonnet is fastest but casts to `any` +// far more, so reserve it for nodes where looseness is cheap (leaf util, no metatable, no Rx). +function modelFor(comp) { + const tgts = comp.filter(isTarget); + const hard = tgts.some((f) => traits[f].isClass || traits[f].usesRx) || comp.length > 1; + return hard ? "opus" : "sonnet"; +} + +// Tarjan SCC — emits components in dependency-first (reverse-topological) order = conversion order. +let idx = 0; +const I = {}, low = {}, onStack = {}, stack = [], sccs = []; +function strongconnect(v) { + I[v] = low[v] = idx++; stack.push(v); onStack[v] = true; + for (const w of adj[v]) { + if (I[w] === undefined) { strongconnect(w); low[v] = Math.min(low[v], low[w]); } + else if (onStack[w]) low[v] = Math.min(low[v], I[w]); + } + if (low[v] === I[v]) { + const comp = []; let w; + do { w = stack.pop(); onStack[w] = false; comp.push(w); } while (w !== v); + sccs.push(comp); + } +} +for (const f of files) if (I[f] === undefined) strongconnect(f); + +const rel = (f) => f.replace(`${pkg}/src/`, ""); +const compIndex = {}; sccs.forEach((c, i) => c.forEach((f) => (compIndex[f] = i))); +const units = sccs.map((comp, i) => ({ + step: i + 1, + kind: comp.length === 1 ? "single" : "cluster", + model: modelFor(comp), + files: comp.map(rel), + // `targets` = the subset actually converted (mode-aware, see isTarget). The driver converts only + // these and leaves the rest as-is. A unit with no targets is skipped entirely. + targets: comp.filter(isTarget).map(rel), + // external sibling deps already satisfied by earlier steps (for display only) + needs: [...new Set(comp.flatMap((f) => adj[f]).filter((d) => compIndex[d] !== i).map(rel))], +})); + +if (asJson) { process.stdout.write(JSON.stringify({ pkg, units }, null, 2) + "\n"); process.exit(0); } + +const mark = (r, u) => u.targets.includes(r) ? r : `${r} (${skipReason} — skipped)`; +const convertUnits = units.filter((u) => u.targets.length); +console.log(`${pkg}: ${files.length} source files -> ${convertUnits.length} convertible units ` + + `(${units.length - convertUnits.length} units skipped: ${skipReason}). Dependency order:\n`); +for (const u of units) { + if (!u.targets.length) { console.log(`${String(u.step).padStart(2)}. [skip ] ${u.files.join(", ")} (${skipReason})`); continue; } + const tag = `${u.kind === "single" ? "single " : "cluster"} · ${u.model.padEnd(6)}`; + if (u.kind === "single") console.log(`${String(u.step).padStart(2)}. [${tag}] ${mark(u.files[0], u)}`); + else { + console.log(`${String(u.step).padStart(2)}. [${tag}] CLUSTER — convert ${u.targets.length}/${u.files.length} (cyclic-types playbook)`); + u.files.forEach((f) => console.log(` ${mark(f, u)}`)); + } +} +const onOpus = convertUnits.filter((u) => u.model === "opus").length; +console.log(`\nConvert ${convertUnits.length} units: ${onOpus} opus (class/cyclic/Rx), ${convertUnits.length - onOpus} sonnet (mechanical).`); +console.log(`Intra-package only; cross-package deps assumed already strict; ${skipReason} files left as-is.` + + (evalGold ? "" : " [real mode: targets = not-yet-strict files; pass --eval-gold for the harness view]")); diff --git a/.claude/skills/strict-typing-luau/evals/lib/run.sh b/.claude/skills/strict-typing-luau/evals/lib/run.sh new file mode 100755 index 0000000000..1049e23546 --- /dev/null +++ b/.claude/skills/strict-typing-luau/evals/lib/run.sh @@ -0,0 +1,84 @@ +#!/usr/bin/env bash +# Mechanical eval harness for the strict-typing-luau skill. No LLM in --gold mode. +# +# run.sh gold Smoke test: lay down the maintainer's GOLD for each case and score it. +# Every positive case MUST come back strict + 0 analyze errors. Proves the +# pairs, the scorer, and the plumbing are all sound. ~minutes, no tokens. +# +# run.sh place Worker loop primitive: put the pre-conversion (nonstrict) INPUT at +# the file's real path so an agent can convert it in place. +# run.sh score Score whatever is at the file's path now, vs gold. +# run.sh restore Restore the file's package back to main (undo place/convert). +# run.sh plan Print the INTRA-PACKAGE conversion order (dependency-first, cyclic +# clusters collapsed). The orchestration artifact a driver walks. Add +# `json` for machine-readable output. Intra-package only — see plan.js. +# run.sh convert [--run] +# Whole-package conversion driver: walk the plan, fresh model-routed worker +# per node, any-budget gate, final lint:luau. DRY-RUN unless --run (which +# spawns claude -p workers and spends tokens). See convert.sh. +# run.sh sync Rebuild sourcemap.json to match the current tree. Call it AFTER a +# whole-package conversion in which the agent ADDED files (e.g. a new +# shared Types.lua) — otherwise single-file analyze can't resolve them +# (the stale-sourcemap trap). After restoring such a run, `sync` once +# more to return the sourcemap to main. Single-file runs don't need it. +# run.sh triggers Judge whether the skill's DESCRIPTION fires correctly (one claude -p call). +# run.sh tooling Judge whether the skill scopes the right ACTION — planner for a whole +# package, direct conversion for a single file/scoped node (guards against +# single-file workers over-orchestrating). One claude -p call. +# run.sh parallelism Judge the execution STRATEGY — fan out parallel sub-agents for a package +# (> ~3 files) vs sequential for a single file/handful. One claude -p call. +# +# Files are laid down at PACKAGE granularity (src/) because a package is the unit of +# type-consistency: a cyclic file scored against nonstrict siblings would error falsely. +set -euo pipefail +ROOT="$(git rev-parse --show-toplevel)"; cd "$ROOT" +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +MANIFEST="$HERE/../manifest.json" + +field() { node -e "const m=require('$MANIFEST');const c=m.cases.find(c=>c.id==='$1');if(!c){process.exit(3)}process.stdout.write(c['$2']||'')"; } +ids() { node -e "const m=require('$MANIFEST');console.log(m.cases.map(c=>c.id).join(' '))"; } +pkg_of(){ echo "$1" | cut -d/ -f1-3 | sed -E 's#(src/[^/]+)/.*#\1#'; } + +place_gold(){ local p; p="$(field "$1" path)"; git checkout "$(field "$1" gold)" -- "$(pkg_of "$p")"; } +place_in() { local p; p="$(field "$1" path)"; git checkout "$(field "$1" input)" -- "$(pkg_of "$p")"; } +restore() { local p pk; p="$(field "$1" path)"; pk="$(pkg_of "$p")"; + # back to main, unstage, and remove gold-only NEW files (e.g. a new *Types.lua) + git checkout main -- "$pk" 2>/dev/null || true + git reset -q HEAD -- "$pk"; git clean -fdq -- "$pk"; } +score() { bash "$HERE/score.sh" "$(field "$1" path)" "$(field "$1" gold)"; } + +case "${1:-}" in + place) place_in "$2" ;; + score) score "$2" ;; + restore) restore "$2" ;; + sync) npm run build:sourcemap >/dev/null 2>&1 && echo "sourcemap rebuilt to current tree" ;; + plan) shift; node "$HERE/plan.js" "$@" ;; + convert) shift; bash "$HERE/convert.sh" "$@" ;; + triggers) bash "$HERE/triggers.sh" ;; + tooling) bash "$HERE/tooling.sh" ;; + parallelism) bash "$HERE/parallelism.sh" ;; + gold) + printf '%-22s %-8s %-6s %-8s %-5s %s\n' CASE POLARITY STRICT ANALYZE ANY VERDICT + fails=0 + for id in $(ids); do + pol="$(field "$id" polarity)" + place_gold "$id" + row="$(score "$id")" + restore "$id" + strict="$(node -e "console.log(JSON.parse(process.argv[1]).strict)" "$row")" + errs="$(node -e "console.log(JSON.parse(process.argv[1]).analyze_errors)" "$row")" + any="$(node -e "console.log(JSON.parse(process.argv[1]).any)" "$row")" + # gold expectation: positive => strict & 0 errors; negative => nonstrict (reverted) + verdict=PASS + if [ "$pol" = positive ]; then + { [ "$strict" = true ] && [ "$errs" -eq 0 ]; } || { verdict=FAIL; fails=$((fails+1)); } + else + [ "$strict" = false ] || { verdict=FAIL; fails=$((fails+1)); } + fi + printf '%-22s %-8s %-6s %-8s %-5s %s\n' "$id" "$pol" "$strict" "$errs" "$any" "$verdict" + done + echo + [ "$fails" -eq 0 ] && echo "GOLD SMOKE TEST: all cases passed" || { echo "GOLD SMOKE TEST: $fails FAILED"; exit 1; } + ;; + *) sed -n '2,20p' "${BASH_SOURCE[0]}"; exit 1 ;; +esac diff --git a/.claude/skills/strict-typing-luau/evals/lib/score.sh b/.claude/skills/strict-typing-luau/evals/lib/score.sh new file mode 100755 index 0000000000..a21e98ed33 --- /dev/null +++ b/.claude/skills/strict-typing-luau/evals/lib/score.sh @@ -0,0 +1,40 @@ +#!/usr/bin/env bash +# Score ONE converted file already sitting at its real path, against its gold blob. +# Mechanical only — no LLM. Emits a one-line JSON object on stdout. +# +# Usage: score.sh +# real repo path, e.g. src/foo/src/Shared/Foo.lua (file already written there) +# git ref of the maintainer-approved version, e.g. b9790d3612 +set -euo pipefail +ROOT="$(git rev-parse --show-toplevel)"; cd "$ROOT" +FILE="$1"; GOLD_REF="$2" + +# --- header: is it --!strict? +HEADER="$(head -1 "$FILE" | tr -d '\r')" +STRICT=false; [ "$HEADER" = "--!strict" ] && STRICT=true + +# --- single-file analyze (same engine/flags as lint:luau, pinned old solver) +ANALYZE_OUT="$(luau-lsp analyze \ + --sourcemap=sourcemap.json --base-luaurc=.luaurc --defs=globalTypes.d.lua \ + --flag:LuauSolverV2=false --ignore='**/node_modules/**' \ + "$FILE" 2>&1 || true)" +# Count only canonical findings located IN THE TARGET FILE — luau-lsp prints "(line,col): ...". +# Counting every "error" line over-counts pre-existing dependency noise (node_modules, nonstrict +# siblings) and stale-sourcemap "Unknown require" churn that isn't the conversion's fault; the final +# lint:luau gate catches genuine cross-file breakage separately. awk is pipefail-safe. +ERRORS="$(printf '%s\n' "$ANALYZE_OUT" | awk -v f="$FILE(" 'substr($0,1,length(f))==f{c++} END{print c+0}')" + +# --- looseness: `any` casts/annotations vs the gold (lower-or-equal is good). +# `any` = total anys. `any_nonrx` = anys EXCLUDING sanctioned Rx-chain lines (casting Rx +# machinery is deliberate policy — see references/rx.md — so the budget gate uses the non-Rx count). +RX='Pipe|switchMap|combineLatest|Rx[.]|RxSignal|Observable|Brio|Signal[.]new' +count_any() { awk '{n+=gsub(/:: *any|: *any/,"")} END{print n+0}'; } +count_any_nonrx() { awk -v rx="$RX" '$0 ~ rx {next} {n+=gsub(/:: *any|: *any/,"")} END{print n+0}'; } +ANY_FILE="$(count_any < "$FILE")" +ANY_FILE_NONRX="$(count_any_nonrx < "$FILE")" +GOLD_SRC="$(git show "$GOLD_REF:$FILE" 2>/dev/null || true)" +ANY_GOLD="$(printf '%s' "$GOLD_SRC" | count_any)" +ANY_GOLD_NONRX="$(printf '%s' "$GOLD_SRC" | count_any_nonrx)" + +printf '{"file":"%s","strict":%s,"analyze_errors":%s,"any":%s,"any_gold":%s,"any_nonrx":%s,"any_gold_nonrx":%s}\n' \ + "$FILE" "$STRICT" "$ERRORS" "$ANY_FILE" "$ANY_GOLD" "$ANY_FILE_NONRX" "$ANY_GOLD_NONRX" diff --git a/.claude/skills/strict-typing-luau/evals/lib/tooling.js b/.claude/skills/strict-typing-luau/evals/lib/tooling.js new file mode 100755 index 0000000000..c01d7c82a7 --- /dev/null +++ b/.claude/skills/strict-typing-luau/evals/lib/tooling.js @@ -0,0 +1,63 @@ +#!/usr/bin/env node +// Action-scoping test helper. Mirrors triggers.js but tests internal behavior: given the LIVE +// SKILL.md, does an agent following it reach for the package planner (whole-package only) or +// convert directly (single file / scoped node), and never run the eval harness as part of a job? +// tooling.js prompt -> print the batched judge prompt (full SKILL.md + cases) +// tooling.js score -> parse judge reply, score vs expected, print table, exit 1 on fail +const fs = require("fs"); +const path = require("path"); +const HERE = __dirname; +const SKILL = path.join(HERE, "..", "..", "SKILL.md"); +const MANIFEST = path.join(HERE, "..", "tooling.json"); +const cases = JSON.parse(fs.readFileSync(MANIFEST, "utf8")).cases; + +if (process.argv[2] === "prompt") { + const numbered = cases.map((c, i) => `${i + 1}. ${c.prompt}`).join("\n"); + process.stdout.write( +`You simulate an agent that has loaded the strict-typing-luau skill (full text below) and is handed a +task. For each task, decide what the skill directs the agent to do: + "planner" = consult/run the package planner (\`run.sh plan \`) and do a whole-package, + dependency-ordered conversion. + "direct" = just convert the file(s) directly — single file, a scoped node ("convert only X, deps + ready"), or fixing errors in one file. Do NOT run the planner, and never run eval-harness + commands (gold/convert/place/score/triggers) as part of doing a conversion. + +Rule the skill intends: the planner is ONLY for converting a WHOLE PACKAGE. One file, a few named +files, a scoped node, or error-fixing in a single file = "direct". When unsure, prefer "direct" — +over-orchestrating a single file is the failure we are guarding against. + +SKILL.md: +----- +${fs.readFileSync(SKILL, "utf8")} +----- + +Tasks: +${numbered} + +Output ONLY a compact JSON array, one object per task, no prose, no code fences: +[{"n":1,"action":"direct"},{"n":2,"action":"planner"}, ...] +`); + process.exit(0); +} + +if (process.argv[2] === "score") { + const raw = fs.readFileSync(process.argv[3], "utf8"); + const arr = JSON.parse(raw.slice(raw.indexOf("["), raw.lastIndexOf("]") + 1)); + const got = new Map(arr.map((o) => [o.n, String(o.action || "").toLowerCase()])); + let fails = 0; + const pad = (s, n) => String(s).padEnd(n); + console.log(pad("#", 3) + pad("TAG", 18) + pad("EXPECT", 9) + pad("GOT", 9) + "VERDICT PROMPT"); + cases.forEach((c, i) => { + const n = i + 1, g = got.get(n), ok = g === c.expect; + if (!ok) fails++; + console.log(pad(n, 3) + pad(c.tag, 18) + pad(c.expect, 9) + pad(g || "?", 9) + + (ok ? "ok " : "MISS ") + c.prompt.slice(0, 50)); + }); + console.log(""); + if (fails === 0) console.log(`TOOLING-SCOPE TEST: all ${cases.length} cases correct`); + else { console.log(`TOOLING-SCOPE TEST: ${fails}/${cases.length} MISCLASSIFIED`); process.exit(1); } + process.exit(0); +} + +console.error("usage: tooling.js prompt | score "); +process.exit(2); diff --git a/.claude/skills/strict-typing-luau/evals/lib/tooling.sh b/.claude/skills/strict-typing-luau/evals/lib/tooling.sh new file mode 100755 index 0000000000..2d43c29415 --- /dev/null +++ b/.claude/skills/strict-typing-luau/evals/lib/tooling.sh @@ -0,0 +1,14 @@ +#!/usr/bin/env bash +# Action-scoping test: does the skill guide agents to the right action — planner only for a whole +# package, direct conversion for a single file / scoped node — and never run the eval harness as +# part of a job? Guards against single-file workers over-orchestrating now that SKILL.md surfaces +# the tooling. One batched judge call via headless `claude -p`, scored against tooling.json. ~30s. +set -euo pipefail +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +JS="$HERE/tooling.js" +TMP="$(mktemp)"; trap 'rm -f "$TMP"' EXIT + +PROMPT="$(node "$JS" prompt)" +echo "Judging $(node -e "console.log(require('$HERE/../tooling.json').cases.length)") tasks via claude -p ..." +claude -p "$PROMPT" > "$TMP" 2>/dev/null || { echo "judge call failed"; cat "$TMP"; exit 1; } +node "$JS" score "$TMP" diff --git a/.claude/skills/strict-typing-luau/evals/lib/triggers.js b/.claude/skills/strict-typing-luau/evals/lib/triggers.js new file mode 100755 index 0000000000..f0b28dbfe9 --- /dev/null +++ b/.claude/skills/strict-typing-luau/evals/lib/triggers.js @@ -0,0 +1,70 @@ +#!/usr/bin/env node +// Triggering-test helper. Two modes: +// triggers.js prompt -> print the batched judge prompt (live SKILL.md description + cases) +// triggers.js score -> parse the judge's JSON reply, score vs expected, print table, exit 1 on fail +const fs = require("fs"); +const path = require("path"); +const HERE = __dirname; +const SKILL = path.join(HERE, "..", "..", "SKILL.md"); +const MANIFEST = path.join(HERE, "..", "triggers.json"); + +function description() { + const t = fs.readFileSync(SKILL, "utf8"); + const m = t.match(/^description:\s*(.*)$/m); + if (!m) throw new Error("no description in SKILL.md frontmatter"); + return m[1].trim(); +} +const cases = JSON.parse(fs.readFileSync(MANIFEST, "utf8")).cases; + +if (process.argv[2] === "prompt") { + const numbered = cases.map((c, i) => `${i + 1}. ${c.prompt}`).join("\n"); + process.stdout.write( +`You are the skill-router for an agentic coding tool working in the NevermoreEngine repo \ +(a Luau/Roblox monorepo; TypeScript tooling lives under tools/). You may select AT MOST ONE \ +skill per user message, and ONLY when the skill's description clearly matches the request. The \ +tool also has general coding ability plus separate tooling for formatting (stylua), running \ +tests, editing TypeScript, and writing docs — requests better served by those must NOT select \ +a skill. Judge ONLY from the description below; do not invent capabilities. + +Skill under test: + name: strict-typing-luau + description: ${description()} + +For each user message, decide whether THIS skill should be invoked. Assume any "this file" / \ +"selected file" is a Luau (.lua) source file in this repo unless the message says otherwise. + +Output ONLY a compact JSON array, one object per message, no prose, no code fences: +[{"n":1,"fire":true},{"n":2,"fire":false}, ...] + +Messages: +${numbered} +`); + process.exit(0); +} + +if (process.argv[2] === "score") { + const raw = fs.readFileSync(process.argv[3], "utf8"); + const arr = JSON.parse(raw.slice(raw.indexOf("["), raw.lastIndexOf("]") + 1)); + const got = new Map(arr.map((o) => [o.n, !!o.fire])); + let fails = 0; + const pad = (s, n) => String(s).padEnd(n); + console.log(pad("#", 3) + pad("TAG", 20) + pad("EXPECT", 8) + pad("GOT", 6) + "VERDICT PROMPT"); + cases.forEach((c, i) => { + const n = i + 1; + const g = got.get(n); + const ok = g === c.expect; + if (!ok) fails++; + console.log( + pad(n, 3) + pad(c.tag, 20) + pad(c.expect ? "fire" : "skip", 8) + + pad(g === undefined ? "?" : g ? "fire" : "skip", 6) + + (ok ? "ok " : "MISS ") + c.prompt.slice(0, 52) + ); + }); + console.log(""); + if (fails === 0) console.log(`TRIGGER TEST: all ${cases.length} cases correct`); + else { console.log(`TRIGGER TEST: ${fails}/${cases.length} MISCLASSIFIED`); process.exit(1); } + process.exit(0); +} + +console.error("usage: triggers.js prompt | score "); +process.exit(2); diff --git a/.claude/skills/strict-typing-luau/evals/lib/triggers.sh b/.claude/skills/strict-typing-luau/evals/lib/triggers.sh new file mode 100755 index 0000000000..523e6b7138 --- /dev/null +++ b/.claude/skills/strict-typing-luau/evals/lib/triggers.sh @@ -0,0 +1,13 @@ +#!/usr/bin/env bash +# Triggering test: does the skill's `description` fire on the right prompts and stay quiet on +# near-misses? One batched judge call via headless `claude -p`, scored against triggers.json. +# Mechanical compare, single LLM call. ~30s. +set -euo pipefail +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +JS="$HERE/triggers.js" +TMP="$(mktemp)"; trap 'rm -f "$TMP"' EXIT + +PROMPT="$(node "$JS" prompt)" +echo "Judging $(node -e "console.log(require('$HERE/../triggers.json').cases.length)") prompts via claude -p ..." +claude -p "$PROMPT" > "$TMP" 2>/dev/null || { echo "judge call failed"; cat "$TMP"; exit 1; } +node "$JS" score "$TMP" diff --git a/.claude/skills/strict-typing-luau/evals/manifest.json b/.claude/skills/strict-typing-luau/evals/manifest.json new file mode 100644 index 0000000000..bc764b803a --- /dev/null +++ b/.claude/skills/strict-typing-luau/evals/manifest.json @@ -0,0 +1,105 @@ +{ + "_doc": "Eval cases mined from MAIN history only (stable long-term). Each case = a real conversion the maintainers shipped to main. `input` is the pre-conversion (nonstrict) blob the worker starts from; `gold` is always \"main\" — the file as it lives on main HEAD now, which by construction matches the main-built sourcemap (so single-file analyze can never go stale). The harness lays `input` at the file's real path, runs the worker to convert it, then scores mechanically: header is --!strict, single-file analyze is clean, and `any` count vs the gold. Negatives come from the revert commit: a naive flip that broke downstream and was reverted, so on main the correct end-state is nonstrict.", + "skill_name": "strict-typing-luau", + "cases": [ + { + "id": "trivial-util", + "archetype": "plain data/constants module — header flip, no class block", + "path": "src/deathreport/src/Shared/DeathReportServiceConstants.lua", + "input": "7f6f73dc51", + "gold": "main", + "polarity": "positive" + }, + { + "id": "util-typed-fns", + "archetype": "util module — type the function signatures (pure typing conversion)", + "path": "src/deathreport/src/Shared/Stats/PlayerKillTrackerUtils.lua", + "input": "7f6f73dc51", + "gold": "main", + "polarity": "positive" + }, + { + "id": "class-pane", + "archetype": "inherited UI class — export type block + & Parent (Snackbar < TransitionModel)", + "path": "src/snackbar/src/Client/Gui/Snackbar.lua", + "input": "b1bc1512aa", + "gold": "main", + "polarity": "positive" + }, + { + "id": "class-stateful", + "archetype": "stateful class with methods — dot-syntax self + export type", + "path": "src/viewport/src/Client/Viewport.lua", + "input": "c3e5c8977f", + "gold": "main", + "polarity": "positive" + }, + { + "id": "t-quickpath", + "archetype": "module importing t — `local t: any = require(\"t\")` boundary", + "path": "src/soundplayer/src/Client/Loops/Layered/LayeredLoopedSoundPlayer.lua", + "input": "ed5f2db1f7", + "gold": "main", + "polarity": "positive" + }, + + { + "id": "settings-constants", + "archetype": "[settings pkg] trivial constants module", + "path": "src/settings/src/Shared/Player/PlayerSettingsConstants.lua", + "input": "b1bc1512aa", + "gold": "main", + "polarity": "positive" + }, + { + "id": "settings-utils", + "archetype": "[settings pkg] util module with typed signatures", + "path": "src/settings/src/Shared/Player/PlayerSettingsUtils.lua", + "input": "b1bc1512aa", + "gold": "main", + "polarity": "positive" + }, + { + "id": "settings-property", + "archetype": "[settings pkg] core class — export type block", + "path": "src/settings/src/Shared/Setting/SettingProperty.lua", + "input": "b1bc1512aa", + "gold": "main", + "polarity": "positive" + }, + { + "id": "settings-base", + "archetype": "[settings pkg] base class others inherit from", + "path": "src/settings/src/Shared/Player/PlayerSettingsBase.lua", + "input": "b1bc1512aa", + "gold": "main", + "polarity": "positive" + }, + { + "id": "settings-playersettings", + "archetype": "[settings pkg] headline class — inherits PlayerSettingsBase (& Parent)", + "path": "src/settings/src/Server/Player/PlayerSettings.lua", + "input": "b1bc1512aa", + "gold": "main", + "polarity": "positive" + }, + { + "id": "settings-client", + "archetype": "[settings pkg] client-realm class", + "path": "src/settings/src/Client/Player/PlayerSettingsClient.lua", + "input": "b1bc1512aa", + "gold": "main", + "polarity": "positive" + }, + + { + "id": "neg-breaks-downstream", + "archetype": "naive flip broke downstream; maintainer reverted — correct end-state is nonstrict", + "path": "src/tie/test/modules/Server/OpenableInterface.lua", + "input": "0edcac1e5c", + "gold": "main", + "polarity": "negative", + "_note": "input is the naive --!strict flip (pre-revert); gold (main) is the reverted --!nonstrict. A worker that blindly flips this must be caught by the downstream lint:luau gate." + } + ] +} diff --git a/.claude/skills/strict-typing-luau/evals/parallelism.json b/.claude/skills/strict-typing-luau/evals/parallelism.json new file mode 100644 index 0000000000..3de0c32f49 --- /dev/null +++ b/.claude/skills/strict-typing-luau/evals/parallelism.json @@ -0,0 +1,16 @@ +{ + "_doc": "Execution-strategy tests for the skill's parallelism guidance. SKILL.md tells the agent to fan out one sub-agent per file within a dependency layer when a PACKAGE has more than ~3 files to convert, and to stay sequential below that. A judge reads the live SKILL.md and classifies each task as 'parallel' (fan out sub-agents per layer) or 'sequential', scored against expect. Guards against a real-use regression we saw: the agent converted a whole package serially because the skill never told it to parallelize. Run via lib/parallelism.sh — one batched claude -p call.", + "skill_name": "strict-typing-luau", + "cases": [ + { "expect": "sequential", "tag": "one-file", "prompt": "Strictly type src/animationtrackutils/src/Shared/AnimationTrackUtils.lua" }, + { "expect": "sequential", "tag": "type-this", "prompt": "type this" }, + { "expect": "sequential", "tag": "fix-errors-one", "prompt": "this --!strict file is throwing luau-lsp type errors, fix the typing" }, + { "expect": "sequential", "tag": "two-named", "prompt": "Strictly type src/foo/src/Shared/A.lua and src/foo/src/Shared/B.lua" }, + { "expect": "sequential", "tag": "three-files", "prompt": "Strictly type these three files in src/foo: A.lua, B.lua, C.lua" }, + + { "expect": "parallel", "tag": "whole-settings", "prompt": "Convert the entire src/settings package (about 9 files) to --!strict." }, + { "expect": "parallel", "tag": "whole-gameconfig", "prompt": "Strictly type all of src/gameconfig, handling the dependencies properly." }, + { "expect": "parallel", "tag": "large-package", "prompt": "Convert this whole 14-file package to --!strict." }, + { "expect": "parallel", "tag": "package-order", "prompt": "I need to strict-type all of src/rogue-properties — convert the whole package." } + ] +} diff --git a/.claude/skills/strict-typing-luau/evals/tooling.json b/.claude/skills/strict-typing-luau/evals/tooling.json new file mode 100644 index 0000000000..66afc380e8 --- /dev/null +++ b/.claude/skills/strict-typing-luau/evals/tooling.json @@ -0,0 +1,17 @@ +{ + "_doc": "Action-scoping tests for the skill's Tooling guidance. Now that SKILL.md surfaces the package planner (run.sh plan) and the eval harness, a worker could over-apply it: a SINGLE-file or scoped-node task must convert directly and must NOT run the planner or any eval-harness command (gold/convert/place/score). Only a genuine WHOLE-PACKAGE task should reach for the planner. A judge reads the live SKILL.md and decides, per task, whether the right action is 'planner' or 'direct'; we score against expect. Run via lib/tooling.sh — one batched claude -p call.", + "skill_name": "strict-typing-luau", + "cases": [ + { "expect": "direct", "tag": "one-util", "prompt": "Strictly type src/animationtrackutils/src/Shared/AnimationTrackUtils.lua" }, + { "expect": "direct", "tag": "type-this", "prompt": "type this" }, + { "expect": "direct", "tag": "fix-errors", "prompt": "this --!strict file is throwing luau-lsp type errors, fix the typing" }, + { "expect": "direct", "tag": "scoped-node", "prompt": "Convert ONLY src/settings/src/Shared/Setting/SettingProperty.lua to --!strict; its dependencies are already strict." }, + { "expect": "direct", "tag": "one-class", "prompt": "make src/actionmanager/src/Client/ActionManager.lua strict" }, + { "expect": "direct", "tag": "no-eval-harness", "prompt": "Convert src/maid/src/Shared/Maid.lua to --!strict." }, + { "expect": "direct", "tag": "two-named-files", "prompt": "Strictly type src/foo/src/Shared/A.lua and src/foo/src/Shared/B.lua" }, + + { "expect": "planner", "tag": "whole-package", "prompt": "Convert the entire src/settings package to --!strict." }, + { "expect": "planner", "tag": "package-order", "prompt": "I need to strict-type all of src/rogue-properties — what order should I convert the files in?" }, + { "expect": "planner", "tag": "whole-with-deps", "prompt": "Make the whole gameconfig package strict, handling the inter-file dependencies properly." } + ] +} diff --git a/.claude/skills/strict-typing-luau/evals/triggers.json b/.claude/skills/strict-typing-luau/evals/triggers.json new file mode 100644 index 0000000000..f6f8159c5d --- /dev/null +++ b/.claude/skills/strict-typing-luau/evals/triggers.json @@ -0,0 +1,24 @@ +{ + "_doc": "Triggering tests for the skill's frontmatter `description`. A judge model is shown ONLY the skill's name+description and each user message, and decides whether the skill should fire. We score `fire` against `expect`. Positives are real ways users ask for the conversion; negatives are near-misses that an over-broad description would wrongly capture (TypeScript strict, tsconfig, formatting, feature work, wrong language). Run via lib/triggers.sh — one batched claude -p call.", + "cases": [ + { "expect": true, "tag": "explicit-strict", "prompt": "src/ragdoll/src/Server/Classes/RagdollCameraShake.lua is still --!nonstrict. Can you strictly type it?" }, + { "expect": true, "tag": "convert-strict", "prompt": "convert this file to --!strict please" }, + { "expect": true, "tag": "make-strict", "prompt": "make ActionManager.lua strict, it's still on the old --!nonstrict header" }, + { "expect": true, "tag": "add-types-lua", "prompt": "can you add types to this Luau module?" }, + { "expect": true, "tag": "type-this-selected", "prompt": "type this" }, + { "expect": true, "tag": "resolve-strict-errs","prompt": "this --!strict file is throwing a bunch of luau-lsp type errors, fix the typing following our conventions" }, + { "expect": true, "tag": "type-annotate", "prompt": "type-annotate RoguePropertyTable.lua for me" }, + { "expect": true, "tag": "legacy-cleanup", "prompt": "clean up the typing on this legacy nonstrict module" }, + + { "expect": false, "tag": "format", "prompt": "format this file with stylua" }, + { "expect": false, "tag": "bugfix", "prompt": "fix the off-by-one bug in this loop" }, + { "expect": false, "tag": "ts-strict", "prompt": "turn on strict mode for this TypeScript file under tools/" }, + { "expect": false, "tag": "tsconfig-strict", "prompt": "set \"strict\": true in our tsconfig.json" }, + { "expect": false, "tag": "add-method", "prompt": "add a new :Reset() method to this class" }, + { "expect": false, "tag": "run-tests", "prompt": "run the tests for the ragdoll package" }, + { "expect": false, "tag": "docstring", "prompt": "write a moonwave docstring for this function" }, + { "expect": false, "tag": "rename", "prompt": "rename _enabled to _isEnabled across this file" }, + { "expect": false, "tag": "perf", "prompt": "make this function faster" }, + { "expect": false, "tag": "loader-refactor", "prompt": "convert this Lua file to use the new module loader pattern" } + ] +} diff --git a/.claude/skills/strict-typing-luau/references/conventions.md b/.claude/skills/strict-typing-luau/references/conventions.md new file mode 100644 index 0000000000..202655c671 --- /dev/null +++ b/.claude/skills/strict-typing-luau/references/conventions.md @@ -0,0 +1,210 @@ +# Strict typing — canonical examples & error table + +Full, copyable shapes drawn from real strict files in this repo. The shapes match the project's +VS Code snippets (`tools/nevermore-vscode/snippets/luau.code-snippets`: `class`, `binder`, +`service`, `classtype`, `lib`, `pane`) — reproduce them verbatim for minimal diffs. + +## Plain util / library module (no metatable) + +```lua +--!strict +local require = require(script.Parent.loader).load(script) + +local MyUtils = {} + +function MyUtils.add(a: number, b: number): number + return a + b +end + +-- shared shapes get an export type; otherwise none is needed +export type Entry = { name: string, count: number } + +return MyUtils +``` + +## BaseObject class + +```lua +--!strict +local require = require(script.Parent.loader).load(script) + +local BaseObject = require("BaseObject") +local ValueObject = require("ValueObject") + +local MyClass = setmetatable({}, BaseObject) +MyClass.ClassName = "MyClass" +MyClass.__index = MyClass + +export type MyClass = typeof(setmetatable( + {} :: { + _enabled: ValueObject.ValueObject, + }, + {} :: typeof({ __index = MyClass }) +)) & BaseObject.BaseObject + +function MyClass.new(): MyClass + local self: MyClass = setmetatable(BaseObject.new() :: any, MyClass) + self._enabled = self._maid:Add(ValueObject.new(false)) + return self +end + +function MyClass.SetEnabled(self: MyClass, enabled: boolean): () + self._enabled.Value = enabled +end + +return MyClass +``` + +## Binder-bound class + +Same as a BaseObject class, but registered via a binder. The `:: any` on the class and the +`Binder.Binder` cast on the result are sanctioned: + +```lua +return Binder.new("MyTag", MyClass :: any) :: Binder.Binder +``` + +The binder constructor receives `(instance, serviceBag)`; type the constructor accordingly: +`function MyClass.new(obj: Instance, serviceBag: ServiceBag.ServiceBag): MyClass`. + +**Tightening a binder-bound constructor's param ripples to the BinderProvider that registers +it.** A `*BindersClient.lua` / `*BindersServer.lua` does `Binder.new("Tag", require("X"), bag)`; +once `X.new` takes a concrete type (e.g. `IntValue`) instead of being nonstrict, that registration +fails the `(Instance) -> any | ClassDefinition | ...` union and needs the sanctioned `:: any`. +But **do NOT write `require("X") :: any`** — luau-lsp's require resolver only fires on a *bare* +`require(...)` call, so casting the call expression makes it emit a spurious +`TypeError: Unknown require: .../X.lua`. Hoist the require to a module-level local and cast the +**local** instead: + +```lua +local TeamKillTracker = require("TeamKillTracker") +-- ... +self:Add(Binder.new("TeamKillTracker", TeamKillTracker :: any, serviceBag)) +``` + +## Generic class + +`T` must be load-bearing — appear in a field — or `MyClass` and `MyClass` +collapse. If the value is dynamic, give it a virtual field: + +```lua +export type ValueObject = typeof(setmetatable( + {} :: { + Value: T, -- virtual: served by a function __index, but keeps T flowing + _value: T, + }, + {} :: typeof({ __index = ValueObject }) +)) + +function ValueObject.Observe(self: ValueObject): Observable.Observable + -- ... +end +``` + +Dynamic / function `__index`: assign the runtime metamethods through an `any` view so they don't +fight the typed `MyClass.__index = MyClass`: + +```lua +local rawMyClass = MyClass :: any +rawMyClass.__index = function(self, index) ... end +``` + +## ServiceBag service + +A service is a plain table (no parent metatable) with a `.ServiceName`, an `Init(self, serviceBag)` +and optional `Start(self)` lifecycle, and dependencies pulled from the bag via `serviceBag:GetService`. +Model the instance with `typeof(setmetatable(...))` — same as a class, but no `& Parent` intersection. + +```lua +--!strict +local require = require(script.Parent.loader).load(script) + +local ServiceBag = require("ServiceBag") + +local MyService = {} +MyService.ServiceName = "MyService" + +export type MyService = typeof(setmetatable( + {} :: { + _serviceBag: ServiceBag.ServiceBag, + _otherService: OtherService.OtherService, + }, + {} :: typeof({ __index = MyService }) +)) + +function MyService.Init(self: MyService, serviceBag: ServiceBag.ServiceBag) + -- the field isn't set yet, so the typed `self` narrows it to nil — cast for the guard: + assert(not (self :: any)._serviceBag, "MyService is already initialized!") + self._serviceBag = assert(serviceBag, "No serviceBag") + self._otherService = serviceBag:GetService(require("OtherService")) +end + +function MyService.Start(self: MyService) + -- ... +end + +return MyService +``` + +## The `t` boundary + +`t` (the runtime type-checker) isn't strict-friendly. Scope an `any` to the import: + +```lua +local t: any = require("t") +``` + +## Typing Rx + Promises returns +Always type public return methods like `PromiseRestoreDefault` for example, instead of casting the return type to `any` + +```lua +function AssetServiceCache.PromiseBundleDetails(self: AssetServiceCache, bundleId: number): Promise.Promise + assert(type(bundleId) == "number", "Bad bundleId") + + self:_ensureInit() + + return self._promiseBundleDetails(bundleId) +end +``` + +For observables, you'll want to type-cast these like this + +```lua +function RxBodyColorsDataUtils.observeFromAttributes(instance: Instance): Observable.Observable +``` + +## Typing `MyClass:__index()` and `MyClass:__newindex` metamethods + +Type these meta-methods like this if you're getting type errors. + +Before: +```lua +function AdorneeValue:__index(self, value) + -- implementation here +end +``` + +After: +```lua +(AdorneeValue :: any).__index = function(self, index) + -- implementation here +end +``` + +## Common strict-mode errors → mechanical fix + +| Error | Fix | +|---|---| +| `Key 'self' not found` / `self` has type `unknown` in a method | Method uses colon syntax — change `function C:M(...)` to `function C.M(self: C, ...)`. | +| `Cannot add property '_x' to table` in the constructor | `_x` is missing from the `export type` field record — add it with its type. | +| `Type 'X' could not be converted into 'Y'` at `setmetatable(...)` | Add `:: any` to the parent constructor result: `setmetatable(Base.new() :: any, C)`. | +| `Unknown require` / sibling exports no type | Add `export type` upstream (preferred), or write a precise structural interface of the surface you use. | +| `Type contains a self-recursive construct` / "too complex" on one heavy generic | `any` + intended-type comment on every occurrence (fields, params, returns); sweep in one pass. | +| `Key 'k' not found` on `rawget(self, k)` / `self[dynamicKey]` | Cast the receiver: `rawget(self :: any, k)`. | +| `T` not flowing (generic collapses) | Make `T` load-bearing — add a `Value: T` (or similar) field. | +| Required arg the method ignores, or optional arg passed as required | Fix the upstream signature (make optional / remove dead param) if in scope; else flag it. Don't paper with a call-site `:: any`. | + +## Common type imports + +`ServiceBag.ServiceBag`, `Maid.Maid`, `Observable.Observable`, `Brio.Brio`, +`ValueObject.ValueObject`, `Signal.Signal`, `BaseObject.BaseObject`, `Binder.Binder`. diff --git a/.claude/skills/strict-typing-luau/references/rx.md b/.claude/skills/strict-typing-luau/references/rx.md new file mode 100644 index 0000000000..70879ef897 --- /dev/null +++ b/.claude/skills/strict-typing-luau/references/rx.md @@ -0,0 +1,61 @@ +# Rx / reactive chains — cast first, don't fight + +Reactive machinery (`Rx.*` pipe chains, `RxSignal`, `Observable`/`Brio` used across many members) +is the **single biggest time-sink** in a strict conversion. The old solver can't thread types +through `:Pipe({...})`, `switchMap`, or a heavy reactive field, and every attempt spawns a fresh +"too complex" / duplicate-`Observable`-module error. Do **not** iterate on these. The instant an +analyze error mentions a Pipe chain, an Rx operator, `RxSignal`, or an `Observable` unification, +**cast to `any` and move on** — that's the established answer, not a failure. + +The discipline that keeps this honest: **type the public surface precisely, cast only the internal +chain.** A method's *declared* return type imports fine — it's the *expression* producing it that +won't check. + +## Patterns — what to cast, what to keep precise + +**Pipe chains** — keep the declared return; cast the chain body: +```lua +function MyClass.Observe(self: MyClass): Observable.Observable -- precise public return + return (self._value :: any):Pipe({ -- cast the receiver/chain + Rx.distinct(), + Rx.map(function(v): any -- operator closures: -> any + return transform(v) + end), + }) +end +``` + +**Operator closures** (`Rx.map`, `Rx.switchMap`, `Rx.combineLatest`, …) — annotate the closure +return as `any`; don't try to type the intermediate stream value: +```lua +Rx.switchMap(function(playerSettings): any + return playerSettings:Observe() +end) +``` + +**Heavy reactive FIELDS that trip "too complex"** (`RxSignal.RxSignal`, or `Observable`/`Brio` +used in lots of fields/returns) — `any` with the intended type in a trailing comment, immediately: +```lua +Changed: any, -- RxSignal.RxSignal (heavy metatable; old solver can't hold it) +``` + +**Constructors of reactive primitives** — cast at the boundary: +```lua +local signal = Signal.new() :: any +local rxSignal = RxSignal.new((self:Observe() :: any):Pipe({ ... })) +``` + +## What you STILL type precisely (don't over-cast) + +- **Declared method return types**: `Observable.Observable`, `Brio.Brio`, `Promise.Promise<()>` + as the *signature* — these import cleanly. Only the producing expression gets the cast. +- **Plain `ValueObject`/`Signal` FIELDS** that aren't in a chain and don't trip "too complex" — + write the real type (`_enabled: ValueObject.ValueObject`). Cast is for the chain and the + fields that actually overflow the solver, not every reactive-looking thing. + +## The rule + +Rx is the canonical case of the skill's time-box: **first analyze error mentioning a Pipe / +operator / `RxSignal` / `Observable`-unification → cast that spot to `any` and continue.** Do not +re-run hoping to thread it precisely; you'll spend dozens of analyze loops to land on the same +`any`. Keep the public return types precise; let the internal chain be `any`.