Commit 4d8df7b
authored
refactor(ci): let tracer-validation gate reuse benchmark artifact (#1171)
* refactor(ci): let resolution gate reuse benchmark artifact
The pre-publish-benchmark job's `Run resolution benchmark` step builds
codegraphs for ~34 language fixtures, computes precision/recall, and
writes resolution-result.json. The `Gate on resolution thresholds` step
that follows then ran the same vitest suite which independently copied
every fixture and rebuilt the graphs again — doubling the most expensive
slice of the publish pipeline.
Extend the script's per-language LangResult to include
falsePositiveEdges and falseNegativeEdges so the gate test has
everything it needs for the existing precision/recall threshold
assertions and failure messages. Refactor the gate test to consume
that artifact when RESOLUTION_RESULT_JSON is set, falling back to the
build-from-fixtures path when unset so devs can still run
`npx vitest run tests/benchmarks/resolution/...` standalone. Wire the
env var through the workflow's Gate step.
Verified locally: gate test in artifact mode passes 170/170 in ~0.5s
against an artifact produced by scripts/resolution-benchmark.ts, and
the legacy build-from-fixtures path still passes for the javascript
fixture.
Closes #1052
* fix(ci): preserve full precision and guard malformed resolution artifacts (#1167)
- scripts/resolution-benchmark.ts: stop rounding precision/recall to 3
decimals before writing the artifact. The rounding let a near-miss like
0.8497 round up to 0.850 and silently clear a 0.85 threshold in CI
artifact mode while failing in fixture mode.
- tests/benchmarks/resolution/resolution-benchmark.test.ts: validate
numeric fields in metricsFromArtifact so a stale or malformed artifact
surfaces a clear 'regenerate' error instead of a confusing TypeError at
the threshold assertions.
- tests/benchmarks/resolution/resolution-benchmark.test.ts: reject an
empty artifact in loadArtifact. Without this guard, an empty {} would
register zero describe blocks and vitest would exit 0 with '0 tests',
silently passing the gate.
* refactor(ci): let tracer-validation gate reuse benchmark artifact
The pre-publish-benchmark job's `Run resolution benchmark` step already
runs the per-language tracer once for every fixture (the script writes
dynamic-edge counts into resolution-result.json for telemetry). The
`Run tracer validation` step that follows then re-runs the same tracer
subprocess once more per language to compute same-file recall — doubling
tracer cost on the publish path.
Spun off from #1052 / #1167, which deduped the resolution gate the
same way. The tracer side wasn't trivial to fold into that PR because
runDynamicTracer and runTracer had different toolchain-missing
semantics (empty array vs null), so the artifact needed a status field
to round-trip the "skipped" signal.
- Extend the script's TracerArtifact to `{ status: 'ok' | 'skipped',
edges }`, attached to every LangResult as the new `tracer` block.
`skipped` is set when run-tracer.mjs reports an error and produced
no edges — mirroring tracer-validation.test.ts's skip semantics so
it round-trips through the artifact.
- Refactor tracer-validation.test.ts to consume the artifact when
RESOLUTION_RESULT_JSON is set (driving the language list from the
artifact keys so we never silently drop a language), falling back
to spawning run-tracer.mjs when unset so local
`npx vitest run …` runs still work standalone.
- Wire RESOLUTION_RESULT_JSON into the workflow's tracer validation
step, same pattern as the resolution gate.
Verified locally: artifact mode correctly distinguishes status=ok
(runs recall computation, hits threshold check) from status=skipped
(graceful skip), and the missing-tracer-block error path surfaces a
clear "regenerate" message. Fallback mode still drives the suite
from the filesystem when the env var is unset.
Closes #1166
* fix(test): wrap JSON.parse with helpful error in tracer-validation artifact loader (#1171)
When RESOLUTION_RESULT_JSON points to a file with malformed JSON, the
artifact loader previously threw a raw SyntaxError with no context.
Mirror the existing 'regenerate with scripts/resolution-benchmark.ts'
guidance so the DX is consistent with the missing-file and empty-file
cases.1 parent 1a6ee7b commit 4d8df7b
3 files changed
Lines changed: 128 additions & 22 deletions
File tree
- .github/workflows
- scripts
- tests/benchmarks/resolution/tracer
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
343 | 343 | | |
344 | 344 | | |
345 | 345 | | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
346 | 352 | | |
347 | 353 | | |
348 | 354 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
58 | 66 | | |
59 | 67 | | |
60 | 68 | | |
| |||
68 | 76 | | |
69 | 77 | | |
70 | 78 | | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
71 | 83 | | |
72 | 84 | | |
73 | 85 | | |
| |||
99 | 111 | | |
100 | 112 | | |
101 | 113 | | |
102 | | - | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
103 | 118 | | |
104 | 119 | | |
105 | 120 | | |
| |||
162 | 177 | | |
163 | 178 | | |
164 | 179 | | |
165 | | - | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
166 | 186 | | |
167 | | - | |
168 | | - | |
| 187 | + | |
| 188 | + | |
169 | 189 | | |
170 | 190 | | |
171 | 191 | | |
| |||
176 | 196 | | |
177 | 197 | | |
178 | 198 | | |
179 | | - | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
180 | 204 | | |
| 205 | + | |
181 | 206 | | |
182 | | - | |
| 207 | + | |
183 | 208 | | |
184 | | - | |
| 209 | + | |
185 | 210 | | |
186 | 211 | | |
187 | 212 | | |
| |||
285 | 310 | | |
286 | 311 | | |
287 | 312 | | |
288 | | - | |
289 | | - | |
| 313 | + | |
| 314 | + | |
290 | 315 | | |
291 | 316 | | |
292 | | - | |
293 | | - | |
294 | | - | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
295 | 323 | | |
296 | 324 | | |
297 | 325 | | |
298 | 326 | | |
299 | 327 | | |
300 | | - | |
301 | | - | |
| 328 | + | |
| 329 | + | |
302 | 330 | | |
303 | 331 | | |
304 | 332 | | |
| |||
Lines changed: 80 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
14 | 21 | | |
15 | 22 | | |
16 | 23 | | |
| |||
135 | 142 | | |
136 | 143 | | |
137 | 144 | | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
138 | 204 | | |
139 | 205 | | |
140 | 206 | | |
| |||
180 | 246 | | |
181 | 247 | | |
182 | 248 | | |
183 | | - | |
184 | | - | |
185 | | - | |
186 | | - | |
187 | | - | |
188 | | - | |
189 | | - | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
190 | 262 | | |
191 | 263 | | |
192 | 264 | | |
| |||
211 | 283 | | |
212 | 284 | | |
213 | 285 | | |
214 | | - | |
| 286 | + | |
215 | 287 | | |
216 | 288 | | |
217 | 289 | | |
| |||
0 commit comments