Skip to content

test(jelly-micro): add per-fixture recall floors#1409

Merged
carlos-alm merged 9 commits into
mainfrom
test/jelly-micro-recall-floors-1387
Jun 9, 2026
Merged

test(jelly-micro): add per-fixture recall floors#1409
carlos-alm merged 9 commits into
mainfrom
test/jelly-micro-recall-floors-1387

Conversation

@carlos-alm

Copy link
Copy Markdown
Contributor

Summary

  • Replaces the trivially-passing recall >= 0 gate with a RECALL_FLOORS map keyed by fixture name
  • Fixtures at 100% recall are locked to 1.0 — any regression immediately fails CI
  • Partially-resolved fixtures (classes 20%, defineProperty 50%, super 38%, super2 40%) are locked at their current baseline; losing even a single TP trips the assertion
  • Fixtures that currently resolve 0 edges stay at 0 (they can only improve)
  • Baseline sourced from a clean origin/main run: precision=65.3%, recall=40.9%, TP=47 FP=25 FN=68

Test plan

  • npx vitest run tests/benchmarks/resolution/jelly-micro.test.ts — all 65 pass
  • npx biome check — no lint/format issues

Closes #1387.

Replace the trivially-passing recall >= 0 gate with a RECALL_FLOORS
map keyed by fixture name.  Fixtures that already reach 100% are locked
at 1.0; partially-resolved fixtures (classes, defineProperty, super,
super2) are locked at their current baseline so a single lost edge fails
CI.  Unresolvable fixtures (0% baseline) continue to default to 0.

Closes #1387.
@greptile-apps

greptile-apps Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR replaces the no-op recall >= 0 assertion with a per-fixture RECALL_FLOORS map, locking fully-resolved fixtures at 1.0 and partially-resolved ones at their current baseline fractions. It also adds the stale-key sanity check raised in the previous review thread.

  • The 13 fixtures listed in RECALL_FLOORS are correctly configured: every stored floor sits strictly between (TP−1)/named and TP/named, so a single lost TP trips the assertion.
  • The stale-key guard (module-level throw gated on tests.length > 0) prevents silently-degraded floors when a fixture directory is renamed or a key is mistyped.
  • The aggregate TP count from the listed floor entries (41) is 6 less than the baseline total of 47 stated in the PR description, suggesting at least a few unlisted fixtures have non-zero recall but a floor of 0 — those regressions would still go undetected.

Confidence Score: 4/5

Safe to merge pending clarification that unlisted fixtures with non-zero recall are intentionally excluded from the floor map.

The listed floor values are all mathematically correct and the stale-key guard works as intended. The one concern is that the PR description's claim — 'fixtures that currently resolve 0 edges stay at 0' — does not match the aggregate numbers: 47 baseline TPs minus the 41 TPs represented in RECALL_FLOORS leaves 6 TPs spread across unlisted fixtures that can regress to 0% without failing CI.

tests/benchmarks/resolution/jelly-micro.test.ts — specifically whether the 6 TPs from fixtures not listed in RECALL_FLOORS are intentionally left unguarded.

Important Files Changed

Filename Overview
tests/benchmarks/resolution/jelly-micro.test.ts Replaces the trivially-passing recall >= 0 gate with per-fixture RECALL_FLOORS. All listed floor values are mathematically correct (a single lost TP trips each), and the stale-key sanity check is properly guarded. However, the aggregate TP count from listed entries (41) falls 6 short of the baseline total (47), meaning at least some unlisted fixtures have non-zero recall but a floor of 0, leaving their regressions undetected.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Module load] --> B{tests.length > 0?}
    B -- No --> C[Suite skipped - fixtures absent in CI]
    B -- Yes --> D[Sanity-check RECALL_FLOORS keys against discovered fixtures]
    D -- Stale key found --> E[throw Error - aborts test collection]
    D -- All keys valid --> F[Run describe suite]
    F --> G[For each fixture: beforeAll - buildGraph and query DB]
    G --> H[test: named-edge recall]
    H --> I[Compute recall = TP / named]
    I --> J[Lookup floor from RECALL_FLOORS, default 0]
    J --> K{recall >= floor?}
    K -- Yes --> L[Pass]
    K -- No --> M[Fail: regression detected]
    F --> N[afterAll: print aggregate summary]
Loading

Fix All in Claude Code

Reviews (10): Last reviewed commit: "Merge branch 'main' into test/jelly-micr..." | Re-trigger Greptile

Comment on lines +70 to +85
const RECALL_FLOORS: Record<string, number> = {
accessors3: 1.0, // 1/1
arguments: 1.0, // 1/1
classes: 0.2, // 6/30
defineProperty: 0.5, // 3/6
fun: 1.0, // 4/4
generators: 1.0, // 9/9
more1: 1.0, // 10/10
'receiver-callee-mixup': 1.0, // 1/1
rest: 1.0, // 1/1
spread: 1.0, // 4/4
super: 0.38, // 5/13
super2: 0.4, // 2/5
super3: 1.0, // 3/3
this: 1.0, // 1/1
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Stale RECALL_FLOORS keys won't surface fixture renames

RECALL_FLOORS entries that don't match any discovered fixture name are silently ignored — the affected test would fall back to floor = 0 instead of the intended floor. If a fixture directory is renamed (e.g. supersuper-v2) or a key is mistyped here, the regression protection is lost without any warning. Adding a quick sanity check after discoverTests() — verifying every Object.keys(RECALL_FLOORS) entry appears in tests — would catch this at suite startup rather than silently lowering the bar.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added a startup sanity-check after discoverTests() that throws if any key in RECALL_FLOORS does not match a discovered fixture directory. Gated on tests.length > 0 so it does not fire in CI environments where the jelly-micro directory is gitignored (the suite is skipped there anyway).

…tures

Add a startup check after discoverTests() that throws if any key in
RECALL_FLOORS does not match an actual fixture directory. A renamed or
mistyped fixture key would otherwise silently lower the recall floor to
0 with no warning, defeating the regression gate.

The check is gated on tests.length > 0 so it does not fire in CI
environments where the jelly-micro directory is gitignored.
@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

carlos-alm and others added 4 commits June 8, 2026 11:29
more1 was moved from jelly-micro to the pts-javascript fixture set in
#1383 (commit ddfc14c). The RECALL_FLOORS map still referenced it,
causing the startup sanity-check to throw in CI where the merged branch
no longer includes more1 in jelly-micro.
…387' into test/jelly-micro-recall-floors-1387-wt
@carlos-alm

Copy link
Copy Markdown
Contributor Author

Addressed CI failures: removed stale more1 key from RECALL_FLOORS.

The more1 fixture was moved from jelly-micro to pts-javascript in #1383 (commit ddfc14c). The sanity-check guard (added in commit 48aeea8 per Greptile's suggestion) then correctly caught this as a stale entry, failing CI on all platforms. Removed more1: 1.0 from the map and updated the comment to explain why it's absent.

@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm

Copy link
Copy Markdown
Contributor Author

Fixed CI failure: the classes recall floor was set to 0.2 assuming 6/30 edges, but the fixture's expected-edge set (after deduplication by the new Set() keying on name@basename) has 31 unique entries — so the actual baseline recall is 6/31 ≈ 0.1935, which fails the 0.2 gate.

Corrected to classes: 0.19, // 6/31. The floor 0.19 satisfies 6/31 = 0.1935... >= 0.19 and would fail at 5/31 = 0.1613, preserving the single-TP regression sensitivity.

@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit f414be4 into main Jun 9, 2026
22 checks passed
@carlos-alm carlos-alm deleted the test/jelly-micro-recall-floors-1387 branch June 9, 2026 22:34
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 9, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test(jelly-micro): add per-fixture recall floor to jelly-micro.test.ts

1 participant