test(jelly-micro): add per-fixture recall floors#1409
Conversation
Replace the trivially-passing recall >= 0 gate with a RECALL_FLOORS map keyed by fixture name. Fixtures that already reach 100% are locked at 1.0; partially-resolved fixtures (classes, defineProperty, super, super2) are locked at their current baseline so a single lost edge fails CI. Unresolvable fixtures (0% baseline) continue to default to 0. Closes #1387.
Greptile SummaryThis PR replaces the no-op
Confidence Score: 4/5Safe to merge pending clarification that unlisted fixtures with non-zero recall are intentionally excluded from the floor map. The listed floor values are all mathematically correct and the stale-key guard works as intended. The one concern is that the PR description's claim — 'fixtures that currently resolve 0 edges stay at 0' — does not match the aggregate numbers: 47 baseline TPs minus the 41 TPs represented in RECALL_FLOORS leaves 6 TPs spread across unlisted fixtures that can regress to 0% without failing CI. tests/benchmarks/resolution/jelly-micro.test.ts — specifically whether the 6 TPs from fixtures not listed in RECALL_FLOORS are intentionally left unguarded. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Module load] --> B{tests.length > 0?}
B -- No --> C[Suite skipped - fixtures absent in CI]
B -- Yes --> D[Sanity-check RECALL_FLOORS keys against discovered fixtures]
D -- Stale key found --> E[throw Error - aborts test collection]
D -- All keys valid --> F[Run describe suite]
F --> G[For each fixture: beforeAll - buildGraph and query DB]
G --> H[test: named-edge recall]
H --> I[Compute recall = TP / named]
I --> J[Lookup floor from RECALL_FLOORS, default 0]
J --> K{recall >= floor?}
K -- Yes --> L[Pass]
K -- No --> M[Fail: regression detected]
F --> N[afterAll: print aggregate summary]
Reviews (10): Last reviewed commit: "Merge branch 'main' into test/jelly-micr..." | Re-trigger Greptile |
| const RECALL_FLOORS: Record<string, number> = { | ||
| accessors3: 1.0, // 1/1 | ||
| arguments: 1.0, // 1/1 | ||
| classes: 0.2, // 6/30 | ||
| defineProperty: 0.5, // 3/6 | ||
| fun: 1.0, // 4/4 | ||
| generators: 1.0, // 9/9 | ||
| more1: 1.0, // 10/10 | ||
| 'receiver-callee-mixup': 1.0, // 1/1 | ||
| rest: 1.0, // 1/1 | ||
| spread: 1.0, // 4/4 | ||
| super: 0.38, // 5/13 | ||
| super2: 0.4, // 2/5 | ||
| super3: 1.0, // 3/3 | ||
| this: 1.0, // 1/1 | ||
| }; |
There was a problem hiding this comment.
Stale RECALL_FLOORS keys won't surface fixture renames
RECALL_FLOORS entries that don't match any discovered fixture name are silently ignored — the affected test would fall back to floor = 0 instead of the intended floor. If a fixture directory is renamed (e.g. super → super-v2) or a key is mistyped here, the regression protection is lost without any warning. Adding a quick sanity check after discoverTests() — verifying every Object.keys(RECALL_FLOORS) entry appears in tests — would catch this at suite startup rather than silently lowering the bar.
There was a problem hiding this comment.
Fixed — added a startup sanity-check after discoverTests() that throws if any key in RECALL_FLOORS does not match a discovered fixture directory. Gated on tests.length > 0 so it does not fire in CI environments where the jelly-micro directory is gitignored (the suite is skipped there anyway).
…tures Add a startup check after discoverTests() that throws if any key in RECALL_FLOORS does not match an actual fixture directory. A renamed or mistyped fixture key would otherwise silently lower the recall floor to 0 with no warning, defeating the regression gate. The check is gated on tests.length > 0 so it does not fire in CI environments where the jelly-micro directory is gitignored.
…387' into test/jelly-micro-recall-floors-1387-wt
|
Addressed CI failures: removed stale The |
…387' into test/jelly-micro-recall-floors-1387
|
Fixed CI failure: the Corrected to |
Summary
recall >= 0gate with aRECALL_FLOORSmap keyed by fixture name1.0— any regression immediately fails CIclasses20%,defineProperty50%,super38%,super240%) are locked at their current baseline; losing even a single TP trips the assertion0(they can only improve)origin/mainrun: precision=65.3%, recall=40.9%, TP=47 FP=25 FN=68Test plan
npx vitest run tests/benchmarks/resolution/jelly-micro.test.ts— all 65 passnpx biome check— no lint/format issuesCloses #1387.