fix(benchmarks): exempt 3.11.1 regression-guard entries that fail on publish (#1248)

carlos-alm · claude · web-flow · commit d93b257c5aaa · 2026-05-30T00:15:24.000-06:00
* docs: prepare release notes for v3.11.1 * fix(ci): pin claude-code-action to claude-sonnet-4-6 model Default model claude-sonnet-4-20250514 is deprecated; API rejects it with 0 tokens causing automated-review to fail with CLAUDE_SUCCESS=false. * fix(benchmarks): exempt 3.11.1 regression-guard entries that fail on publish 3.11.0 has no query benchmark data in committed history, so findLatestPair falls back to 3.10.0 as the baseline for 3.11.1. The 3.10.0 numbers predate the corpus-scope change from #1134 (resolution fixtures excluded from the build sweep), making DB bytes/file and fnDeps depth 3/5 appear as regressions against the older baseline. The per-PR gate uses version 'dev', which triggers the assertNoRegressions baseline-version fallback so KNOWN_REGRESSIONS entries for the baseline release also apply — masking the failures in CI. Publish uses the real semver (3.11.1), so that fallback doesn't fire and the guard fails. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(ci): pin claude-code-action to SHA de8e0b9 instead of @beta @beta is a moving tag; the unpin caused the automated-review job to pick up a version with a deprecated default model (claude-sonnet-4-20250514), which the API rejected with 0 tokens and CLAUDE_SUCCESS=false. Pinning to the SHA that @beta currently resolves to locks in the working version. --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
diff --git a/.github/workflows/claude.yml b/.github/workflows/claude.yml
@@ -49,7 +49,7 @@ jobs:
 
       - name: Run Automated AI Review
         id: automated-review
-        uses: anthropics/claude-code-action@beta
+        uses: anthropics/claude-code-action@de8e0b9c42c6cb58e904c857f164aa072244c1ac
         with:
           anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
           model: claude-sonnet-4-6
@@ -208,7 +208,7 @@ jobs:
 
       - name: Run Interactive AI Assistant
         id: interactive-claude
-        uses: anthropics/claude-code-action@beta
+        uses: anthropics/claude-code-action@de8e0b9c42c6cb58e904c857f164aa072244c1ac
         with:
           anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
           model: claude-sonnet-4-6
diff --git a/tests/benchmarks/regression-guard.test.ts b/tests/benchmarks/regression-guard.test.ts
@@ -243,6 +243,31 @@ const SKIP_VERSIONS = new Set(['3.8.0']);
  *   BENCHMARKS.md, which is not loaded at build time. Exempt this release;
  *   remove once 3.12.0+ data confirms stabilization under whatever runner
  *   generation is current at that time.
+ *
+ * - 3.11.1:DB bytes/file — same methodology-scope artifact as 3.10.0:DB
+ *   bytes/file. The 3.11.0 release does not have query benchmark data
+ *   committed to history, so findLatestPair falls back to 3.10.0 as the
+ *   baseline. The 3.10.0 corpus included resolution fixtures (~745 files);
+ *   3.11.1 measures only the codegraph source (~607 files) after #1134
+ *   excluded resolution fixtures from the build sweep. The denominator
+ *   shrinks while total DB content stays roughly constant, inflating
+ *   dbSizeBytes/file: native 41614 → 54107 (+30%), wasm 41543 → 53517
+ *   (+29%). No schema or extraction change; remove once 3.12.0+ data is
+ *   captured with the full 3.11.x baseline in committed query history.
+ *
+ * - 3.11.1:fnDeps depth 3 / 3.11.1:fnDeps depth 5 — same baseline-gap
+ *   root cause as 3.11.1:DB bytes/file. Because 3.11.0 query benchmark
+ *   data is absent from committed history, the guard compares 3.11.1
+ *   against the pre-3.11.0 3.10.0 baseline. The 3.10.0 query numbers
+ *   predate the steady-state established in 3.11.0 (fnDeps depth 3: 33ms,
+ *   depth 5: 33ms), so 3.11.1's equivalent values appear as regressions:
+ *     - native fnDeps depth 3: 24.3 → 34.7 (+43%)
+ *     - native fnDeps depth 5: 24.7 → 34.7 (+40%)
+ *     - wasm   fnDeps depth 3: 33   → 43.2 (+31%)
+ *     - wasm   fnDeps depth 5: 33   → 43.5 (+32%)
+ *   No fn_deps Rust implementation, fnDepsData JS wrapper, or DB index
+ *   changed between 3.10.0 and 3.11.1. Remove once 3.12.0+ data confirms
+ *   stable query numbers against a 3.11.x baseline.
  */
 const KNOWN_REGRESSIONS = new Set([
   '3.10.0:No-op rebuild',
@@ -257,6 +282,9 @@ const KNOWN_REGRESSIONS = new Set([
   '3.11.0:fnDeps depth 3',
   '3.11.0:fnDeps depth 5',
   '3.11.0:Full build',
+  '3.11.1:DB bytes/file',
+  '3.11.1:fnDeps depth 3',
+  '3.11.1:fnDeps depth 5',
 ]);
 
 /**