refactor(ci): let tracer-validation gate reuse benchmark artifact#1171
Conversation
The pre-publish-benchmark job's `Run resolution benchmark` step builds codegraphs for ~34 language fixtures, computes precision/recall, and writes resolution-result.json. The `Gate on resolution thresholds` step that follows then ran the same vitest suite which independently copied every fixture and rebuilt the graphs again — doubling the most expensive slice of the publish pipeline. Extend the script's per-language LangResult to include falsePositiveEdges and falseNegativeEdges so the gate test has everything it needs for the existing precision/recall threshold assertions and failure messages. Refactor the gate test to consume that artifact when RESOLUTION_RESULT_JSON is set, falling back to the build-from-fixtures path when unset so devs can still run `npx vitest run tests/benchmarks/resolution/...` standalone. Wire the env var through the workflow's Gate step. Verified locally: gate test in artifact mode passes 170/170 in ~0.5s against an artifact produced by scripts/resolution-benchmark.ts, and the legacy build-from-fixtures path still passes for the javascript fixture. Closes #1052
…acts (#1167) - scripts/resolution-benchmark.ts: stop rounding precision/recall to 3 decimals before writing the artifact. The rounding let a near-miss like 0.8497 round up to 0.850 and silently clear a 0.85 threshold in CI artifact mode while failing in fixture mode. - tests/benchmarks/resolution/resolution-benchmark.test.ts: validate numeric fields in metricsFromArtifact so a stale or malformed artifact surfaces a clear 'regenerate' error instead of a confusing TypeError at the threshold assertions. - tests/benchmarks/resolution/resolution-benchmark.test.ts: reject an empty artifact in loadArtifact. Without this guard, an empty {} would register zero describe blocks and vitest would exit 0 with '0 tests', silently passing the gate.
The pre-publish-benchmark job's `Run resolution benchmark` step already runs the per-language tracer once for every fixture (the script writes dynamic-edge counts into resolution-result.json for telemetry). The `Run tracer validation` step that follows then re-runs the same tracer subprocess once more per language to compute same-file recall — doubling tracer cost on the publish path. Spun off from #1052 / #1167, which deduped the resolution gate the same way. The tracer side wasn't trivial to fold into that PR because runDynamicTracer and runTracer had different toolchain-missing semantics (empty array vs null), so the artifact needed a status field to round-trip the "skipped" signal. - Extend the script's TracerArtifact to `{ status: 'ok' | 'skipped', edges }`, attached to every LangResult as the new `tracer` block. `skipped` is set when run-tracer.mjs reports an error and produced no edges — mirroring tracer-validation.test.ts's skip semantics so it round-trips through the artifact. - Refactor tracer-validation.test.ts to consume the artifact when RESOLUTION_RESULT_JSON is set (driving the language list from the artifact keys so we never silently drop a language), falling back to spawning run-tracer.mjs when unset so local `npx vitest run …` runs still work standalone. - Wire RESOLUTION_RESULT_JSON into the workflow's tracer validation step, same pattern as the resolution gate. Verified locally: artifact mode correctly distinguishes status=ok (runs recall computation, hits threshold check) from status=skipped (graceful skip), and the missing-tracer-block error path surfaces a clear "regenerate" message. Fallback mode still drives the suite from the filesystem when the env var is unset. Closes #1166
|
Claude finished @carlos-alm's task —— View job PR Review: Tracer Validation DeduplicationReview Progress
Phase 1: Root Cause Analysis ✅Problem Being Solved: The pre-publish-benchmark job was running per-language tracers twice - once in "Run resolution benchmark" and again in "Run tracer validation". This PR eliminates the duplicate work by having the tracer validation reuse the benchmark artifact. Why This Approach: The approach is well-justified. The duplication was identified as inefficiency in issue #1166. Rather than re-spawning tracer subprocesses, the validation now reads pre-captured edges from the resolution benchmark artifact, cutting CI time in half for this step. Risk Assessment: Low risk. This is a pure optimization that preserves all existing semantics. The fallback mode ensures local development workflows remain unchanged. Phase 2: Configuration Change Detection ✅No high-risk changes detected:
Phase 3: Test Integrity Assessment ✅The changes preserve test integrity well: ✅ Preserved semantics: The ✅ Robust fallback: Local development unaffected - when ✅ Artifact validation: Strong validation logic rejects malformed artifacts and provides clear error messages ✅ Improved safety: Artifact mode drives test discovery from artifact keys instead of filesystem, preventing silent skips when a language the script reported is missing Phase 4: Code Quality Assessment ✅Implementation Quality: Excellent
Specific Strengths:
No significant concerns found Phase 5: Backlog Compliance ✅Backlog Compliance
Critical ConcernsNone identified. This is a well-executed infrastructure optimization with thorough error handling and backward compatibility. Final Assessment |
Greptile SummaryThis PR eliminates a redundant per-language tracer subprocess invocation in CI by storing the tracer edges captured during
Confidence Score: 5/5Safe to merge; the deduplication logic is well-guarded and the local fallback is preserved. All three changed files have narrow, well-scoped changes. The artifact-read path has robust error handling for missing files, malformed JSON, and empty results. The existing workflow step ordering guarantees resolution-result.json is written before it is consumed. The local fallback via runTracer is untouched. No files require special attention beyond the note about artifact-mode language discovery in tracer-validation.test.ts. Important Files Changed
Sequence DiagramsequenceDiagram
participant RB as Run resolution benchmark
participant FS as resolution-result.json
participant RG as Gate on resolution thresholds
participant TV as Run tracer validation
RB->>RB: runDynamicTracer(lang)
RB->>FS: write JSON with tracer block per lang
RG->>FS: read via RESOLUTION_RESULT_JSON
RG->>RG: validate thresholds
TV->>FS: read via RESOLUTION_RESULT_JSON
TV->>TV: loadArtifact() returns artifact map
TV->>TV: tracerFromArtifact per lang
TV->>TV: "computeSameFileRecall assert >= threshold"
note over RB,TV: Before this PR TV spawned run-tracer.mjs per language. After TV reuses edges from RB.
Reviews (4): Last reviewed commit: "fix(test): wrap JSON.parse with helpful ..." | Re-trigger Greptile |
| function loadArtifact(artifactPath: string): Record<string, ArtifactLangResult> { | ||
| if (!fs.existsSync(artifactPath)) { | ||
| throw new Error( | ||
| `RESOLUTION_RESULT_JSON=${artifactPath} not found — run scripts/resolution-benchmark.ts first.`, | ||
| ); | ||
| } | ||
| const parsed = JSON.parse(fs.readFileSync(artifactPath, 'utf-8')) as Record< | ||
| string, | ||
| ArtifactLangResult | ||
| >; | ||
| // Refuse to proceed on an empty artifact: with zero languages, vitest would | ||
| // register no test cases and exit 0, silently passing the gate. | ||
| if (!parsed || typeof parsed !== 'object' || Object.keys(parsed).length === 0) { | ||
| throw new Error( | ||
| `RESOLUTION_RESULT_JSON=${artifactPath} contains no language results — regenerate with scripts/resolution-benchmark.ts.`, | ||
| ); | ||
| } | ||
| return parsed; | ||
| } |
There was a problem hiding this comment.
JSON parse error not wrapped with a helpful message
If resolution-result.json exists but contains malformed JSON, JSON.parse will throw a raw SyntaxError with no reference to RESOLUTION_RESULT_JSON or what to do next. The existing helpful error messages ("not found", "contains no language results") cover the empty-file and missing-file cases, but a corrupt-file scenario produces a confusing traceback. Wrapping JSON.parse in a try/catch and re-throwing with the same "regenerate with scripts/resolution-benchmark.ts" context message would keep the DX consistent.
There was a problem hiding this comment.
Fixed — wrapped JSON.parse in try/catch and re-throw with the 'regenerate with scripts/resolution-benchmark.ts' guidance, matching the helpful-error pattern used for the missing-file and empty-file cases. Verified locally that a malformed resolution-result.json now produces a clear error message instead of a raw SyntaxError traceback.
Codegraph Impact Analysis6 functions changed → 4 callers affected across 2 files
|
…tifact loader (#1171) When RESOLUTION_RESULT_JSON points to a file with malformed JSON, the artifact loader previously threw a raw SyntaxError with no context. Mirror the existing 'regenerate with scripts/resolution-benchmark.ts' guidance so the DX is consistent with the missing-file and empty-file cases.
Summary
Run resolution benchmark(telemetry intoresolution-result.json) and a second time inRun tracer validation(same-file recall). This PR deduplicates the second run by reusing the artifact.runDynamicTracerandrunTracerhad different toolchain-missing semantics (empty array vsnull), so the artifact needed a status field to round-trip the "skipped" signal.Changes
scripts/resolution-benchmark.ts— Extend the per-languageLangResultwith a new requiredtracer: { status: 'ok' | 'skipped', edges }block.runDynamicTracernow mirrorstracer-validation.test.ts's skip semantics (error && edges.length === 0→skipped). ExistingdynamicEdges/dynamicConfirmedcount fields are preserved for the report scripts.tests/benchmarks/resolution/tracer/tracer-validation.test.ts— AddloadArtifact/tracerFromArtifactconsumingRESOLUTION_RESULT_JSONwhen set. In artifact mode the suite drives the language list from artifact keys (so a language the script reported can never be silently skipped). When the env var is unset, fall back to spawningrun-tracer.mjsso localnpx vitest run …still works standalone..github/workflows/publish.yml— WireRESOLUTION_RESULT_JSONinto theRun tracer validationstep, mirroring the pattern already in place for the resolution gate.Stacked on
This branch is stacked on #1167 (
refactor/1052-dedupe-resolution-bench) — base will need to retargetmainafter that lands.Test plan
npm run lint)npx vitest run tests/benchmarks/resolution/tracer/tracer-validation.test.ts -t "javascript"passes (spawnsrun-tracer.mjslike before)resolution-result.jsonwithtracer.status="ok"exercises the recall computation and threshold checktracer.status="skipped"→ test reportsstatus: skippedrow, no failureregenerate with the current scripts/resolution-benchmark.tserrorRun tracer validationstep picks upRESOLUTION_RESULT_JSONfrom the workspace and reuses the artifact instead of re-spawning the tracer subprocessCloses #1166