Skip to content

Commit 4d8df7b

Browse files
authored
refactor(ci): let tracer-validation gate reuse benchmark artifact (#1171)
* refactor(ci): let resolution gate reuse benchmark artifact The pre-publish-benchmark job's `Run resolution benchmark` step builds codegraphs for ~34 language fixtures, computes precision/recall, and writes resolution-result.json. The `Gate on resolution thresholds` step that follows then ran the same vitest suite which independently copied every fixture and rebuilt the graphs again — doubling the most expensive slice of the publish pipeline. Extend the script's per-language LangResult to include falsePositiveEdges and falseNegativeEdges so the gate test has everything it needs for the existing precision/recall threshold assertions and failure messages. Refactor the gate test to consume that artifact when RESOLUTION_RESULT_JSON is set, falling back to the build-from-fixtures path when unset so devs can still run `npx vitest run tests/benchmarks/resolution/...` standalone. Wire the env var through the workflow's Gate step. Verified locally: gate test in artifact mode passes 170/170 in ~0.5s against an artifact produced by scripts/resolution-benchmark.ts, and the legacy build-from-fixtures path still passes for the javascript fixture. Closes #1052 * fix(ci): preserve full precision and guard malformed resolution artifacts (#1167) - scripts/resolution-benchmark.ts: stop rounding precision/recall to 3 decimals before writing the artifact. The rounding let a near-miss like 0.8497 round up to 0.850 and silently clear a 0.85 threshold in CI artifact mode while failing in fixture mode. - tests/benchmarks/resolution/resolution-benchmark.test.ts: validate numeric fields in metricsFromArtifact so a stale or malformed artifact surfaces a clear 'regenerate' error instead of a confusing TypeError at the threshold assertions. - tests/benchmarks/resolution/resolution-benchmark.test.ts: reject an empty artifact in loadArtifact. Without this guard, an empty {} would register zero describe blocks and vitest would exit 0 with '0 tests', silently passing the gate. * refactor(ci): let tracer-validation gate reuse benchmark artifact The pre-publish-benchmark job's `Run resolution benchmark` step already runs the per-language tracer once for every fixture (the script writes dynamic-edge counts into resolution-result.json for telemetry). The `Run tracer validation` step that follows then re-runs the same tracer subprocess once more per language to compute same-file recall — doubling tracer cost on the publish path. Spun off from #1052 / #1167, which deduped the resolution gate the same way. The tracer side wasn't trivial to fold into that PR because runDynamicTracer and runTracer had different toolchain-missing semantics (empty array vs null), so the artifact needed a status field to round-trip the "skipped" signal. - Extend the script's TracerArtifact to `{ status: 'ok' | 'skipped', edges }`, attached to every LangResult as the new `tracer` block. `skipped` is set when run-tracer.mjs reports an error and produced no edges — mirroring tracer-validation.test.ts's skip semantics so it round-trips through the artifact. - Refactor tracer-validation.test.ts to consume the artifact when RESOLUTION_RESULT_JSON is set (driving the language list from the artifact keys so we never silently drop a language), falling back to spawning run-tracer.mjs when unset so local `npx vitest run …` runs still work standalone. - Wire RESOLUTION_RESULT_JSON into the workflow's tracer validation step, same pattern as the resolution gate. Verified locally: artifact mode correctly distinguishes status=ok (runs recall computation, hits threshold check) from status=skipped (graceful skip), and the missing-tracer-block error path surfaces a clear "regenerate" message. Fallback mode still drives the suite from the filesystem when the env var is unset. Closes #1166 * fix(test): wrap JSON.parse with helpful error in tracer-validation artifact loader (#1171) When RESOLUTION_RESULT_JSON points to a file with malformed JSON, the artifact loader previously threw a raw SyntaxError with no context. Mirror the existing 'regenerate with scripts/resolution-benchmark.ts' guidance so the DX is consistent with the missing-file and empty-file cases.
1 parent 1a6ee7b commit 4d8df7b

3 files changed

Lines changed: 128 additions & 22 deletions

File tree

.github/workflows/publish.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -343,6 +343,12 @@ jobs:
343343

344344
- name: Run tracer validation (same-file edge recall)
345345
timeout-minutes: 10
346+
# Reuse the tracer edges captured by Run resolution benchmark instead
347+
# of re-spawning the per-language tracer subprocess (issue #1166). The
348+
# test falls back to spawning run-tracer.mjs when this env var is
349+
# unset, so local runs still work standalone.
350+
env:
351+
RESOLUTION_RESULT_JSON: ${{ github.workspace }}/resolution-result.json
346352
run: npx vitest run tests/benchmarks/resolution/tracer/tracer-validation.test.ts --reporter=verbose
347353

348354
- name: Merge resolution into build result

scripts/resolution-benchmark.ts

Lines changed: 42 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,14 @@ interface DynamicEdge {
5555
target_file: string;
5656
}
5757

58+
interface TracerArtifact {
59+
// `skipped` means the tracer's toolchain was unavailable (no edges captured,
60+
// not a recall failure). `ok` means the tracer ran — the edges array may
61+
// still be empty if the language's tracer legitimately produced no calls.
62+
status: 'ok' | 'skipped';
63+
edges: DynamicEdge[];
64+
}
65+
5866
interface LangResult {
5967
precision: number;
6068
recall: number;
@@ -68,6 +76,10 @@ interface LangResult {
6876
// instead of rebuilding fixtures from scratch (see issue #1052).
6977
falsePositiveEdges: string[];
7078
falseNegativeEdges: string[];
79+
// Raw tracer output so tests/benchmarks/resolution/tracer/tracer-validation.test.ts
80+
// can reuse it instead of re-running the per-language tracer subprocess
81+
// (see issue #1166).
82+
tracer: TracerArtifact;
7183
dynamicEdges?: number;
7284
dynamicConfirmed?: number;
7385
}
@@ -99,7 +111,10 @@ function edgeKey(sourceName: string, sourceFile: string, targetName: string, tar
99111
return `${sourceName}@${normalizeFile(sourceFile)} -> ${targetName}@${normalizeFile(targetFile)}`;
100112
}
101113

102-
function computeMetrics(resolvedEdges: ResolvedEdge[], expectedEdges: ExpectedEdge[]): LangResult {
114+
function computeMetrics(
115+
resolvedEdges: ResolvedEdge[],
116+
expectedEdges: ExpectedEdge[],
117+
): Omit<LangResult, 'tracer' | 'dynamicEdges' | 'dynamicConfirmed'> {
103118
const resolvedSet = new Set(
104119
resolvedEdges.map((e) => edgeKey(e.source_name, e.source_file, e.target_name, e.target_file)),
105120
);
@@ -162,10 +177,15 @@ const TRACER_SCRIPT = path.join(root, 'tests', 'benchmarks', 'resolution', 'trac
162177

163178
/**
164179
* Attempt to run the dynamic call tracer for a language fixture.
165-
* Returns captured edges on success, empty array on failure or unavailability.
180+
*
181+
* Returns `{ status: 'ok', edges }` on a successful run (edges may be empty if
182+
* the tracer produced no calls) and `{ status: 'skipped', edges: [] }` when
183+
* the tracer subprocess failed or its toolchain wasn't available. The status
184+
* distinction lets tests reusing this artifact (see issue #1166) preserve the
185+
* tracer-validation suite's skip-on-toolchain-missing behavior.
166186
*/
167-
function runDynamicTracer(lang: string): DynamicEdge[] {
168-
if (!fs.existsSync(TRACER_SCRIPT)) return [];
187+
function runDynamicTracer(lang: string): TracerArtifact {
188+
if (!fs.existsSync(TRACER_SCRIPT)) return { status: 'skipped', edges: [] };
169189

170190
const fixtureDir = path.join(FIXTURES_DIR, lang);
171191
try {
@@ -176,12 +196,17 @@ function runDynamicTracer(lang: string): DynamicEdge[] {
176196
stdio: ['pipe', 'pipe', 'pipe'],
177197
});
178198
const parsed = JSON.parse(result);
179-
if (parsed.error) {
199+
const edges: DynamicEdge[] = Array.isArray(parsed.edges) ? parsed.edges : [];
200+
// Mirror tracer-validation.test.ts: when run-tracer.mjs reports an error
201+
// and produced no edges, treat the run as a toolchain-missing skip
202+
// rather than a recall failure.
203+
if (parsed.error && edges.length === 0) {
180204
console.error(` Dynamic tracer for ${lang}: ${parsed.error}`);
205+
return { status: 'skipped', edges: [] };
181206
}
182-
return Array.isArray(parsed.edges) ? parsed.edges : [];
207+
return { status: 'ok', edges };
183208
} catch {
184-
return [];
209+
return { status: 'skipped', edges: [] };
185210
}
186211
}
187212

@@ -285,20 +310,23 @@ try {
285310
const expectedEdges: ExpectedEdge[] = manifest.edges;
286311

287312
// Run dynamic tracer if available
288-
const dynamicEdges = runDynamicTracer(lang);
289-
const { dynamicConfirmed } = mergeWithDynamic(expectedEdges, dynamicEdges);
313+
const tracer = runDynamicTracer(lang);
314+
const { dynamicConfirmed } = mergeWithDynamic(expectedEdges, tracer.edges);
290315

291316
// Use only expected edges for metrics (dynamic edges are supplemental)
292-
const metrics = computeMetrics(resolvedEdges, expectedEdges);
293-
if (dynamicEdges.length > 0) {
294-
metrics.dynamicEdges = dynamicEdges.length;
317+
const metrics: LangResult = {
318+
...computeMetrics(resolvedEdges, expectedEdges),
319+
tracer,
320+
};
321+
if (tracer.edges.length > 0) {
322+
metrics.dynamicEdges = tracer.edges.length;
295323
metrics.dynamicConfirmed = dynamicConfirmed;
296324
}
297325
results[lang] = metrics;
298326

299327
const dynamicInfo =
300-
dynamicEdges.length > 0
301-
? ` dynamic=${dynamicEdges.length} confirmed=${dynamicConfirmed}`
328+
tracer.edges.length > 0
329+
? ` dynamic=${tracer.edges.length} confirmed=${dynamicConfirmed}`
302330
: '';
303331
console.error(
304332
` ${lang}: precision=${(metrics.precision * 100).toFixed(1)}% recall=${(metrics.recall * 100).toFixed(1)}%${dynamicInfo}`,

tests/benchmarks/resolution/tracer/tracer-validation.test.ts

Lines changed: 80 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,13 @@
1111
* 3. Load expected-edges.json and filter to mode === "same-file"
1212
* 4. Compute recall = matched / total same-file expected edges
1313
* 5. Assert recall >= per-language threshold
14+
*
15+
* **Artifact mode (CI):** when `RESOLUTION_RESULT_JSON` points at a result
16+
* file produced by `scripts/resolution-benchmark.ts`, the suite reads the
17+
* pre-captured tracer edges from that artifact instead of re-running the
18+
* per-language tracer subprocess — avoiding the duplicate work that doubled
19+
* pre-publish tracer cost (issue #1166). Local runs without the env var
20+
* fall back to spawning `run-tracer.mjs` directly.
1421
*/
1522

1623
import { execFileSync } from 'node:child_process';
@@ -135,6 +142,65 @@ function runTracer(lang: string): TracerEdge[] | null {
135142
}
136143
}
137144

145+
// ── Artifact loading (CI dedup, issue #1166) ─────────────────────────────
146+
147+
interface ArtifactTracer {
148+
status: 'ok' | 'skipped';
149+
edges: TracerEdge[];
150+
}
151+
152+
interface ArtifactLangResult {
153+
tracer?: ArtifactTracer;
154+
}
155+
156+
const ARTIFACT_PATH = process.env.RESOLUTION_RESULT_JSON;
157+
158+
function loadArtifact(artifactPath: string): Record<string, ArtifactLangResult> {
159+
if (!fs.existsSync(artifactPath)) {
160+
throw new Error(
161+
`RESOLUTION_RESULT_JSON=${artifactPath} not found — run scripts/resolution-benchmark.ts first.`,
162+
);
163+
}
164+
const raw = fs.readFileSync(artifactPath, 'utf-8');
165+
let parsed: Record<string, ArtifactLangResult>;
166+
try {
167+
parsed = JSON.parse(raw) as Record<string, ArtifactLangResult>;
168+
} catch (err) {
169+
const reason = err instanceof Error ? err.message : String(err);
170+
throw new Error(
171+
`RESOLUTION_RESULT_JSON=${artifactPath} contains malformed JSON (${reason}) — regenerate with scripts/resolution-benchmark.ts.`,
172+
);
173+
}
174+
// Refuse to proceed on an empty artifact: with zero languages, vitest would
175+
// register no test cases and exit 0, silently passing the gate.
176+
if (!parsed || typeof parsed !== 'object' || Object.keys(parsed).length === 0) {
177+
throw new Error(
178+
`RESOLUTION_RESULT_JSON=${artifactPath} contains no language results — regenerate with scripts/resolution-benchmark.ts.`,
179+
);
180+
}
181+
return parsed;
182+
}
183+
184+
function tracerFromArtifact(
185+
lang: string,
186+
raw: ArtifactLangResult | undefined,
187+
): TracerEdge[] | null {
188+
if (!raw?.tracer) {
189+
throw new Error(
190+
`Resolution artifact for ${lang} is missing the tracer block — regenerate with the current scripts/resolution-benchmark.ts.`,
191+
);
192+
}
193+
// Preserve the runTracer null-as-skipped contract so toolchain-missing
194+
// languages are skipped gracefully in artifact mode too.
195+
if (raw.tracer.status === 'skipped') return null;
196+
if (!Array.isArray(raw.tracer.edges)) {
197+
throw new Error(
198+
`Resolution artifact for ${lang} has malformed tracer.edges — regenerate with the current scripts/resolution-benchmark.ts.`,
199+
);
200+
}
201+
return raw.tracer.edges;
202+
}
203+
138204
function loadExpectedEdges(lang: string): ExpectedEdge[] {
139205
const edgesFile = path.join(FIXTURES_DIR, lang, 'expected-edges.json');
140206
if (!fs.existsSync(edgesFile)) return [];
@@ -180,13 +246,19 @@ function computeSameFileRecall(
180246

181247
// ── Test Suite ───────────────────────────────────────────────────────────
182248

183-
// Discover all fixture languages that have expected-edges.json
184-
const languages = fs
185-
.readdirSync(FIXTURES_DIR, { withFileTypes: true })
186-
.filter((d) => d.isDirectory())
187-
.map((d) => d.name)
188-
.filter((lang) => fs.existsSync(path.join(FIXTURES_DIR, lang, 'expected-edges.json')))
189-
.sort();
249+
const artifact = ARTIFACT_PATH ? loadArtifact(ARTIFACT_PATH) : null;
250+
251+
// In artifact mode, drive the suite from the keys in the artifact so we never
252+
// silently skip a language the script reported. In local mode, discover from
253+
// the filesystem like before.
254+
const languages = artifact
255+
? Object.keys(artifact).sort()
256+
: fs
257+
.readdirSync(FIXTURES_DIR, { withFileTypes: true })
258+
.filter((d) => d.isDirectory())
259+
.map((d) => d.name)
260+
.filter((lang) => fs.existsSync(path.join(FIXTURES_DIR, lang, 'expected-edges.json')))
261+
.sort();
190262

191263
describe('Dynamic Tracer — Same-File Edge Recall', () => {
192264
// Summary table printed after all tests
@@ -211,7 +283,7 @@ describe('Dynamic Tracer — Same-File Edge Recall', () => {
211283
return;
212284
}
213285

214-
const tracerEdges = runTracer(lang);
286+
const tracerEdges = artifact ? tracerFromArtifact(lang, artifact[lang]) : runTracer(lang);
215287

216288
// If tracer couldn't run (toolchain missing), skip gracefully.
217289
// Many languages require runtimes (rustc, ghc, ruby, etc.) that aren't

0 commit comments

Comments
 (0)