handling autogenerated tests suites by ignaciojimenezr · Pull Request #1906 · MCPJam/inspector

ignaciojimenezr · 2026-04-23T18:53:30Z

Backend PR

Store auto-suite state on the workspace server row with:
- autoEvalSuiteId
- autoEvalSuiteSuppressedAt
Add one backend path to decide whether to:
- return the existing auto-suite
- create a new one
- or do nothing because auto-create was suppressed
When a user deletes an auto-created suite, remember that and stop auto-creating it again
If the user explicitly recreates it later, clear suppression and allow a fresh suite
Drop the old legacy fallback so we only follow the new server-linked behavior going forward

Inspector PR

Update Playground to use the new backend auto-suite logic
Keep auto-creating suites for connected servers when appropriate
Stop silently recreating auto-suites after the user deletes them
Let users manually recreate a deleted auto-suite from Playground
Only generate starter test cases when the server is connected
If the server is disconnected, still create the suite but leave it empty
Keep a short-lived UI-only suppression marker per workspace + server so Playground doesn’t briefly auto-recreate a just-deleted suite before the backend suppression state finishes syncing

chelojimenez · 2026-04-23T18:53:42Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scan Engine	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

github-actions · 2026-04-23T18:57:31Z

Internal preview

Preview URL will appear in Railway after the deploy finishes.
Deployed commit: 87b22b3
PR head commit: afd315e
Backend target: preview requested.
Access is employee-only in non-production environments.

chatgpt-codex-connector · 2026-04-23T19:11:34Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

coderabbitai · 2026-04-23T19:17:06Z

Walkthrough

This pull request introduces an auto-evaluation suite creation mechanism that uses a new ensureAutoEvalSuiteMutation hook to intelligently manage suite creation and suppression state. Server data now tracks autoEvalSuiteId and autoEvalSuiteSuppressedAt metadata. The EvalsTab component tracks per-workspace suppression state, skipping auto-creation when suppression is active or a server has a suppression timestamp set. Manual suite creation can now recreate suppressed auto-suites. Supporting types, serialization logic, and test coverage have been updated to reflect this functionality.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx (1)
343-551: Meaningful coverage across the suppression matrix.

The three workspace/suppression scenarios plus the connected/disconnected manual-recreate pair exercise the genuinely risky paths. No concerns on correctness.

One optional polish: the mocks.useEvalQueries.mockImplementation(...) block is copy-pasted near-verbatim across the three new tests. Extracting a small helper (e.g. makeSingleSuiteQueryState(["server-b"], "suite-b")) would trim ~60 lines and make future schema tweaks a one-liner.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx` around
lines 343 - 551, The tests duplicate the same
mocks.useEvalQueries.mockImplementation block across multiple cases; extract
that repeated setup into a helper function (e.g., makeSingleSuiteQueryState or
getSingleSuiteUseEvalMock) and replace the inline mockImplementation calls with
mocks.useEvalQueries.mockImplementation(makeSingleSuiteQueryState(["server-b"],
"suite-b")); ensure the helper returns the identical object shape used currently
(suiteOverview, suiteDetails, suiteRuns, selectedSuiteEntry, selectedSuite,
sortedIterations, runsForSelectedSuite, activeIterations, sortedSuites,
isOverviewLoading, isSuiteDetailsLoading, isSuiteRunsLoading,
enableOverviewQuery, enableSuiteDetailsQuery) so existing tests keep the same
behavior and only the repeated block is removed.
mcpjam-inspector/client/src/components/EvalsTab.tsx (1)
276-374: Auto-suite effect: tight in-flight guard, but dependency on appState.servers may over-fire.

The explorePrefetchInFlight module-scoped guard plus the workspace-keyed workspaceServerKey correctly prevent duplicate creates across remount or across workspaces — well thought through.

One small observation: appState.servers is listed as a dependency (Line 365). Because that object is rebuilt on essentially any server mutation (connection status flips, OAuth token refreshes, etc.), this effect will re-run on churn that has nothing to do with suppression. The explorePrefetchInFlight and serversWithExploreSuite checks make this safe, but on a chatty workspace you'll see repeated no-op invocations. If that ever shows up in a profiler, narrowing to a memoized suppression map (e.g. useMemo(() => new Set(Object.entries(appState.servers).filter(([, s]) => s?.autoEvalSuiteSuppressedAt !== undefined).map(([k]) => k)), [appState.servers])) would cut the churn without changing semantics.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/EvalsTab.tsx` around lines 276 - 374,
The effect re-runs too often because appState.servers is a volatile dependency;
compute a memoized suppression set with useMemo (e.g. derive Set of server names
where entry.autoEvalSuiteSuppressedAt !== undefined from appState.servers) and
replace direct checks of appState.servers[name]?.autoEvalSuiteSuppressedAt in
the serversNeedingSuite filter with membership checks against that memoized Set,
then remove appState.servers from the useEffect dependency array and add the
memoized suppression set (and its variable name) instead; keep
explorePrefetchInFlight, workspaceServerKey, locallySuppressedServerKeys, and
other deps as-is.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@mcpjam-inspector/client/src/components/EvalsTab.tsx`:
- Around line 413-469: The manual-recreate branch dereferences result.suite._id,
result.suite.name and result.suite.description before verifying result.status;
add a defensive guard after the call to ensureAutoEvalSuiteMutation (the result
variable from ensureAutoEvalSuiteMutation) that checks result.status ===
"created" (or handles non-created statuses like "suppressed") and returns or
handles the error case early to avoid accessing result.suite when undefined;
keep subsequent calls to updateTestSuiteMutation, generateAndPersistEvalTests,
setLocallySuppressedServerKeys, toast.success and navigatePlaygroundEvalsRoute
only within the created-success path so result.suite is only used when present.

---

Nitpick comments:
In `@mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx`:
- Around line 343-551: The tests duplicate the same
mocks.useEvalQueries.mockImplementation block across multiple cases; extract
that repeated setup into a helper function (e.g., makeSingleSuiteQueryState or
getSingleSuiteUseEvalMock) and replace the inline mockImplementation calls with
mocks.useEvalQueries.mockImplementation(makeSingleSuiteQueryState(["server-b"],
"suite-b")); ensure the helper returns the identical object shape used currently
(suiteOverview, suiteDetails, suiteRuns, selectedSuiteEntry, selectedSuite,
sortedIterations, runsForSelectedSuite, activeIterations, sortedSuites,
isOverviewLoading, isSuiteDetailsLoading, isSuiteRunsLoading,
enableOverviewQuery, enableSuiteDetailsQuery) so existing tests keep the same
behavior and only the repeated block is removed.

In `@mcpjam-inspector/client/src/components/EvalsTab.tsx`:
- Around line 276-374: The effect re-runs too often because appState.servers is
a volatile dependency; compute a memoized suppression set with useMemo (e.g.
derive Set of server names where entry.autoEvalSuiteSuppressedAt !== undefined
from appState.servers) and replace direct checks of
appState.servers[name]?.autoEvalSuiteSuppressedAt in the serversNeedingSuite
filter with membership checks against that memoized Set, then remove
appState.servers from the useEffect dependency array and add the memoized
suppression set (and its variable name) instead; keep explorePrefetchInFlight,
workspaceServerKey, locallySuppressedServerKeys, and other deps as-is.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 95a3914f-be3b-4caf-9d91-b323f06aa4cb

📥 Commits

Reviewing files that changed from the base of the PR and between 8635c4d and afd315e.

📒 Files selected for processing (6)

mcpjam-inspector/client/src/components/EvalsTab.tsx
mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx
mcpjam-inspector/client/src/components/evals/use-eval-mutations.ts
mcpjam-inspector/client/src/hooks/useWorkspaces.ts
mcpjam-inspector/client/src/lib/workspace-serialization.ts
mcpjam-inspector/client/src/state/app-types.ts

coderabbitai · 2026-04-23T19:17:09Z

+        if (singleServerName && isSuppressedAutoSuiteRecreate) {
+          const result = await mutations.ensureAutoEvalSuiteMutation({
+            workspaceId,
+            serverName: singleServerName,
+            mode: "manual",
+          });
+
+          const nextDescription = payload.description ?? "";
+          const suiteUpdates: {
+            suiteId: string;
+            name?: string;
+            description?: string;
+          } = {
+            suiteId: result.suite._id,
+          };
+
+          if (payload.name !== result.suite.name) {
+            suiteUpdates.name = payload.name;
+          }
+          if (nextDescription !== (result.suite.description ?? "")) {
+            suiteUpdates.description = nextDescription;
+          }
+          if (
+            suiteUpdates.name !== undefined ||
+            suiteUpdates.description !== undefined
+          ) {
+            await mutations.updateTestSuiteMutation(suiteUpdates);
+          }
+
+          if (
+            result.status === "created" &&
+            connectedServerNames.has(singleServerName)
+          ) {
+            await generateAndPersistEvalTests({
+              convex,
+              getAccessToken,
+              workspaceId,
+              suiteId: result.suite._id,
+              serverIds: [singleServerName],
+              createTestCase: createTestCaseMutation,
+            });
+          }
+
+          setLocallySuppressedServerKeys((previous) =>
+            previous.filter(
+              (serverKey) =>
+                serverKey !== workspaceServerKey(workspaceId, singleServerName),
+            ),
+          );
+
+          toast.success("Suite created");
+          navigatePlaygroundEvalsRoute({
+            type: "suite-overview",
+            suiteId: result.suite._id,
+          });
+          return;
+        }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Locate the Convex function definition and inspect its return shape. fd -t f 'testSuites' | head -50 rg -nP -C5 'ensureAutoEvalSuiteForServer' --type=ts

Repository: MCPJam/inspector

Length of output: 2314

🏁 Script executed:

fd -t f -e ts 'testSuites' --search-path '.'

Repository: MCPJam/inspector

Length of output: 42

🏁 Script executed:

# Find convex backend directory git ls-files | grep -E '^convex/' | head -20

Repository: MCPJam/inspector

Length of output: 42

🏁 Script executed:

# Understand repository structure ls -la cat README.md 2>/dev/null | head -50 git ls-files | head -30

Repository: MCPJam/inspector

Length of output: 4791

🏁 Script executed:

# Search for the backend function definition rg -t ts "ensureAutoEvalSuiteForServer" --no-heading

Repository: MCPJam/inspector

Length of output: 341

🏁 Script executed:

# Check test file to understand expected return shapes rg -A 30 "ensureAutoEvalSuiteForServer" mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx

Repository: MCPJam/inspector

Length of output: 1203

🏁 Script executed:

# Find mutation mock setup in test file rg -B 5 -A 15 "ensureAutoEvalSuiteMutation.*mock" mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx

Repository: MCPJam/inspector

Length of output: 1991

🏁 Script executed:

# Search for all ensureAutoEvalSuiteMutation mock return values in tests rg -B 2 -A 8 'ensureAutoEvalSuiteMutation.mockResolvedValue' mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx

Repository: MCPJam/inspector

Length of output: 661

🏁 Script executed:

# Search for all status values returned by the mock in tests rg 'status: "' mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx | grep ensureAutoEvalSuite -A 5 -B 5

Repository: MCPJam/inspector

Length of output: 42

🏁 Script executed:

# Search for mock return values with status in test file rg -A 5 'ensureAutoEvalSuiteMutation.mockResolvedValue|status:' mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx | grep -E "(status:|ensureAutoEvalSuiteMutation)"

Repository: MCPJam/inspector

Length of output: 258

🏁 Script executed:

# Get complete mock return value blocks rg -B 2 -A 6 'ensureAutoEvalSuiteMutation.mockResolvedValue' mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx | head -60

Repository: MCPJam/inspector

Length of output: 517

🏁 Script executed:

# Find the auto path that handles suppressed status (lines 322-333 mentioned in review) sed -n '300,350p' mcpjam-inspector/client/src/components/EvalsTab.tsx

Repository: MCPJam/inspector

Length of output: 1666

🏁 Script executed:

# Check for "existing" status in tests and code rg '"existing"' mcpjam-inspector/client/src/

Repository: MCPJam/inspector

Length of output: 1573

🏁 Script executed:

# Check the manual branch code in EvalsTab.tsx (lines 413-469) sed -n '413,469p' mcpjam-inspector/client/src/components/EvalsTab.tsx

Repository: MCPJam/inspector

Length of output: 1858

🏁 Script executed:

# Search for "existing" as a suite status value in the entire codebase rg -t ts '"existing"' --no-heading mcpjam-inspector/client/src/components/ | grep -i suite

Repository: MCPJam/inspector

Length of output: 42

🏁 Script executed:

# Search for documentation or types related to ensureAutoEvalSuite return values rg -t tsx -t ts "ensureAutoEvalSuite" -A 20 mcpjam-inspector/client/src/components/evals/

Repository: MCPJam/inspector

Length of output: 87

🏁 Script executed:

# Search for ensureAutoEvalSuite references with context rg "ensureAutoEvalSuite" -A 20 mcpjam-inspector/client/src/components/evals/ --type=ts

Repository: MCPJam/inspector

Length of output: 5462

Add defensive status guard to manual-recreate branch.

The manual branch (line 429 onward) unconditionally dereferences result.suite._id, result.suite.name, and result.suite.description before checking result.status. The auto path handles "suppressed" defensively at line 323, but here a non-"created" response would throw TypeError: Cannot read properties of undefined at line 429.

While the backend may never return "suppressed" for mode: "manual", the client shouldn't assume that contract tacitly. Add a brief status check:

Proposed guard

if (singleServerName && isSuppressedAutoSuiteRecreate) { const result = await mutations.ensureAutoEvalSuiteMutation({ workspaceId, serverName: singleServerName, mode: "manual", }); + if (result.status !== "created") { + throw new Error( + `Unexpected ensureAutoEvalSuiteForServer status: ${result.status}`, + ); + } + const nextDescription = payload.description ?? "";

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@mcpjam-inspector/client/src/components/EvalsTab.tsx` around lines 413 - 469, The manual-recreate branch dereferences result.suite._id, result.suite.name and result.suite.description before verifying result.status; add a defensive guard after the call to ensureAutoEvalSuiteMutation (the result variable from ensureAutoEvalSuiteMutation) that checks result.status === "created" (or handles non-created statuses like "suppressed") and returns or handles the error case early to avoid accessing result.suite when undefined; keep subsequent calls to updateTestSuiteMutation, generateAndPersistEvalTests, setLocallySuppressedServerKeys, toast.success and navigatePlaygroundEvalsRoute only within the created-success path so result.suite is only used when present.

github-actions · 2026-04-23T20:00:00Z

Preview watchdog

The "preview requested" state has been stuck for ~20446 minutes.
The backend callback probably dropped (token expired, dispatch failed, or the backend job crashed).

Recover: re-run the upsert-preview workflow for this PR, or push an empty commit.

Watchdog runs every 15 minutes; this comment updates in place when conditions change.

handling autogenerated tests suites

4b2ffc7

ignaciojimenezr deployed to preview-pr-1906 April 23, 2026 18:53 — with GitHub Actions View deployment

Merge branch 'main' into autogenerating-tests

afd315e

ignaciojimenezr had a problem deploying to preview-pr-1906 April 23, 2026 19:07 — with GitHub Actions Failure

ignaciojimenezr marked this pull request as ready for review April 23, 2026 19:11

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Apr 23, 2026

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

handling autogenerated tests suites#1906

handling autogenerated tests suites#1906
ignaciojimenezr wants to merge 2 commits intomainfrom
autogenerating-tests

ignaciojimenezr commented Apr 23, 2026 •

edited

Loading

Uh oh!

chelojimenez commented Apr 23, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented Apr 23, 2026

Uh oh!

coderabbitai Bot commented Apr 23, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ignaciojimenezr commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend PR

Inspector PR

Uh oh!

chelojimenez commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

github-actions Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Internal preview

Uh oh!

chatgpt-codex-connector Bot commented Apr 23, 2026

Uh oh!

coderabbitai Bot commented Apr 23, 2026

Walkthrough

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Preview watchdog

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ignaciojimenezr commented Apr 23, 2026 •

edited

Loading

chelojimenez commented Apr 23, 2026 •

edited

Loading

github-actions Bot commented Apr 23, 2026 •

edited

Loading

github-actions Bot commented Apr 23, 2026 •

edited

Loading