Skip to content

handling autogenerated tests suites#1906

Open
ignaciojimenezr wants to merge 2 commits intomainfrom
autogenerating-tests
Open

handling autogenerated tests suites#1906
ignaciojimenezr wants to merge 2 commits intomainfrom
autogenerating-tests

Conversation

@ignaciojimenezr
Copy link
Copy Markdown
Collaborator

@ignaciojimenezr ignaciojimenezr commented Apr 23, 2026

Backend PR

  • Store auto-suite state on the workspace server row with:
    • autoEvalSuiteId
    • autoEvalSuiteSuppressedAt
  • Add one backend path to decide whether to:
    • return the existing auto-suite
    • create a new one
    • or do nothing because auto-create was suppressed
  • When a user deletes an auto-created suite, remember that and stop auto-creating it again
  • If the user explicitly recreates it later, clear suppression and allow a fresh suite
  • Drop the old legacy fallback so we only follow the new server-linked behavior going forward

Inspector PR

  • Update Playground to use the new backend auto-suite logic
  • Keep auto-creating suites for connected servers when appropriate
  • Stop silently recreating auto-suites after the user deletes them
  • Let users manually recreate a deleted auto-suite from Playground
  • Only generate starter test cases when the server is connected
  • If the server is disconnected, still create the suite but leave it empty
  • Keep a short-lived UI-only suppression marker per workspace + server so Playground doesn’t briefly auto-recreate a just-deleted suite before the backend suppression state finishes syncing

@chelojimenez
Copy link
Copy Markdown
Contributor

chelojimenez commented Apr 23, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2026

Internal preview

Preview URL will appear in Railway after the deploy finishes.
Deployed commit: 87b22b3
PR head commit: afd315e
Backend target: preview requested.
Access is employee-only in non-production environments.

@ignaciojimenezr ignaciojimenezr marked this pull request as ready for review April 23, 2026 19:11
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Apr 23, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

Walkthrough

This pull request introduces an auto-evaluation suite creation mechanism that uses a new ensureAutoEvalSuiteMutation hook to intelligently manage suite creation and suppression state. Server data now tracks autoEvalSuiteId and autoEvalSuiteSuppressedAt metadata. The EvalsTab component tracks per-workspace suppression state, skipping auto-creation when suppression is active or a server has a suppression timestamp set. Manual suite creation can now recreate suppressed auto-suites. Supporting types, serialization logic, and test coverage have been updated to reflect this functionality.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx (1)

343-551: Meaningful coverage across the suppression matrix.

The three workspace/suppression scenarios plus the connected/disconnected manual-recreate pair exercise the genuinely risky paths. No concerns on correctness.

One optional polish: the mocks.useEvalQueries.mockImplementation(...) block is copy-pasted near-verbatim across the three new tests. Extracting a small helper (e.g. makeSingleSuiteQueryState(["server-b"], "suite-b")) would trim ~60 lines and make future schema tweaks a one-liner.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx` around
lines 343 - 551, The tests duplicate the same
mocks.useEvalQueries.mockImplementation block across multiple cases; extract
that repeated setup into a helper function (e.g., makeSingleSuiteQueryState or
getSingleSuiteUseEvalMock) and replace the inline mockImplementation calls with
mocks.useEvalQueries.mockImplementation(makeSingleSuiteQueryState(["server-b"],
"suite-b")); ensure the helper returns the identical object shape used currently
(suiteOverview, suiteDetails, suiteRuns, selectedSuiteEntry, selectedSuite,
sortedIterations, runsForSelectedSuite, activeIterations, sortedSuites,
isOverviewLoading, isSuiteDetailsLoading, isSuiteRunsLoading,
enableOverviewQuery, enableSuiteDetailsQuery) so existing tests keep the same
behavior and only the repeated block is removed.
mcpjam-inspector/client/src/components/EvalsTab.tsx (1)

276-374: Auto-suite effect: tight in-flight guard, but dependency on appState.servers may over-fire.

The explorePrefetchInFlight module-scoped guard plus the workspace-keyed workspaceServerKey correctly prevent duplicate creates across remount or across workspaces — well thought through.

One small observation: appState.servers is listed as a dependency (Line 365). Because that object is rebuilt on essentially any server mutation (connection status flips, OAuth token refreshes, etc.), this effect will re-run on churn that has nothing to do with suppression. The explorePrefetchInFlight and serversWithExploreSuite checks make this safe, but on a chatty workspace you'll see repeated no-op invocations. If that ever shows up in a profiler, narrowing to a memoized suppression map (e.g. useMemo(() => new Set(Object.entries(appState.servers).filter(([, s]) => s?.autoEvalSuiteSuppressedAt !== undefined).map(([k]) => k)), [appState.servers])) would cut the churn without changing semantics.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/EvalsTab.tsx` around lines 276 - 374,
The effect re-runs too often because appState.servers is a volatile dependency;
compute a memoized suppression set with useMemo (e.g. derive Set of server names
where entry.autoEvalSuiteSuppressedAt !== undefined from appState.servers) and
replace direct checks of appState.servers[name]?.autoEvalSuiteSuppressedAt in
the serversNeedingSuite filter with membership checks against that memoized Set,
then remove appState.servers from the useEffect dependency array and add the
memoized suppression set (and its variable name) instead; keep
explorePrefetchInFlight, workspaceServerKey, locallySuppressedServerKeys, and
other deps as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@mcpjam-inspector/client/src/components/EvalsTab.tsx`:
- Around line 413-469: The manual-recreate branch dereferences result.suite._id,
result.suite.name and result.suite.description before verifying result.status;
add a defensive guard after the call to ensureAutoEvalSuiteMutation (the result
variable from ensureAutoEvalSuiteMutation) that checks result.status ===
"created" (or handles non-created statuses like "suppressed") and returns or
handles the error case early to avoid accessing result.suite when undefined;
keep subsequent calls to updateTestSuiteMutation, generateAndPersistEvalTests,
setLocallySuppressedServerKeys, toast.success and navigatePlaygroundEvalsRoute
only within the created-success path so result.suite is only used when present.

---

Nitpick comments:
In `@mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx`:
- Around line 343-551: The tests duplicate the same
mocks.useEvalQueries.mockImplementation block across multiple cases; extract
that repeated setup into a helper function (e.g., makeSingleSuiteQueryState or
getSingleSuiteUseEvalMock) and replace the inline mockImplementation calls with
mocks.useEvalQueries.mockImplementation(makeSingleSuiteQueryState(["server-b"],
"suite-b")); ensure the helper returns the identical object shape used currently
(suiteOverview, suiteDetails, suiteRuns, selectedSuiteEntry, selectedSuite,
sortedIterations, runsForSelectedSuite, activeIterations, sortedSuites,
isOverviewLoading, isSuiteDetailsLoading, isSuiteRunsLoading,
enableOverviewQuery, enableSuiteDetailsQuery) so existing tests keep the same
behavior and only the repeated block is removed.

In `@mcpjam-inspector/client/src/components/EvalsTab.tsx`:
- Around line 276-374: The effect re-runs too often because appState.servers is
a volatile dependency; compute a memoized suppression set with useMemo (e.g.
derive Set of server names where entry.autoEvalSuiteSuppressedAt !== undefined
from appState.servers) and replace direct checks of
appState.servers[name]?.autoEvalSuiteSuppressedAt in the serversNeedingSuite
filter with membership checks against that memoized Set, then remove
appState.servers from the useEffect dependency array and add the memoized
suppression set (and its variable name) instead; keep explorePrefetchInFlight,
workspaceServerKey, locallySuppressedServerKeys, and other deps as-is.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 95a3914f-be3b-4caf-9d91-b323f06aa4cb

📥 Commits

Reviewing files that changed from the base of the PR and between 8635c4d and afd315e.

📒 Files selected for processing (6)
  • mcpjam-inspector/client/src/components/EvalsTab.tsx
  • mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx
  • mcpjam-inspector/client/src/components/evals/use-eval-mutations.ts
  • mcpjam-inspector/client/src/hooks/useWorkspaces.ts
  • mcpjam-inspector/client/src/lib/workspace-serialization.ts
  • mcpjam-inspector/client/src/state/app-types.ts

Comment on lines +413 to +469
if (singleServerName && isSuppressedAutoSuiteRecreate) {
const result = await mutations.ensureAutoEvalSuiteMutation({
workspaceId,
serverName: singleServerName,
mode: "manual",
});

const nextDescription = payload.description ?? "";
const suiteUpdates: {
suiteId: string;
name?: string;
description?: string;
} = {
suiteId: result.suite._id,
};

if (payload.name !== result.suite.name) {
suiteUpdates.name = payload.name;
}
if (nextDescription !== (result.suite.description ?? "")) {
suiteUpdates.description = nextDescription;
}
if (
suiteUpdates.name !== undefined ||
suiteUpdates.description !== undefined
) {
await mutations.updateTestSuiteMutation(suiteUpdates);
}

if (
result.status === "created" &&
connectedServerNames.has(singleServerName)
) {
await generateAndPersistEvalTests({
convex,
getAccessToken,
workspaceId,
suiteId: result.suite._id,
serverIds: [singleServerName],
createTestCase: createTestCaseMutation,
});
}

setLocallySuppressedServerKeys((previous) =>
previous.filter(
(serverKey) =>
serverKey !== workspaceServerKey(workspaceId, singleServerName),
),
);

toast.success("Suite created");
navigatePlaygroundEvalsRoute({
type: "suite-overview",
suiteId: result.suite._id,
});
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Locate the Convex function definition and inspect its return shape.
fd -t f 'testSuites' | head -50
rg -nP -C5 'ensureAutoEvalSuiteForServer' --type=ts

Repository: MCPJam/inspector

Length of output: 2314


🏁 Script executed:

fd -t f -e ts 'testSuites' --search-path '.'

Repository: MCPJam/inspector

Length of output: 42


🏁 Script executed:

# Find convex backend directory
git ls-files | grep -E '^convex/' | head -20

Repository: MCPJam/inspector

Length of output: 42


🏁 Script executed:

# Understand repository structure
ls -la
cat README.md 2>/dev/null | head -50
git ls-files | head -30

Repository: MCPJam/inspector

Length of output: 4791


🏁 Script executed:

# Search for the backend function definition
rg -t ts "ensureAutoEvalSuiteForServer" --no-heading

Repository: MCPJam/inspector

Length of output: 341


🏁 Script executed:

# Check test file to understand expected return shapes
rg -A 30 "ensureAutoEvalSuiteForServer" mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx

Repository: MCPJam/inspector

Length of output: 1203


🏁 Script executed:

# Find mutation mock setup in test file
rg -B 5 -A 15 "ensureAutoEvalSuiteMutation.*mock" mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx

Repository: MCPJam/inspector

Length of output: 1991


🏁 Script executed:

# Search for all ensureAutoEvalSuiteMutation mock return values in tests
rg -B 2 -A 8 'ensureAutoEvalSuiteMutation.mockResolvedValue' mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx

Repository: MCPJam/inspector

Length of output: 661


🏁 Script executed:

# Search for all status values returned by the mock in tests
rg 'status: "' mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx | grep ensureAutoEvalSuite -A 5 -B 5

Repository: MCPJam/inspector

Length of output: 42


🏁 Script executed:

# Search for mock return values with status in test file
rg -A 5 'ensureAutoEvalSuiteMutation.mockResolvedValue|status:' mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx | grep -E "(status:|ensureAutoEvalSuiteMutation)"

Repository: MCPJam/inspector

Length of output: 258


🏁 Script executed:

# Get complete mock return value blocks
rg -B 2 -A 6 'ensureAutoEvalSuiteMutation.mockResolvedValue' mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx | head -60

Repository: MCPJam/inspector

Length of output: 517


🏁 Script executed:

# Find the auto path that handles suppressed status (lines 322-333 mentioned in review)
sed -n '300,350p' mcpjam-inspector/client/src/components/EvalsTab.tsx

Repository: MCPJam/inspector

Length of output: 1666


🏁 Script executed:

# Check for "existing" status in tests and code
rg '"existing"' mcpjam-inspector/client/src/

Repository: MCPJam/inspector

Length of output: 1573


🏁 Script executed:

# Check the manual branch code in EvalsTab.tsx (lines 413-469)
sed -n '413,469p' mcpjam-inspector/client/src/components/EvalsTab.tsx

Repository: MCPJam/inspector

Length of output: 1858


🏁 Script executed:

# Search for "existing" as a suite status value in the entire codebase
rg -t ts '"existing"' --no-heading mcpjam-inspector/client/src/components/ | grep -i suite

Repository: MCPJam/inspector

Length of output: 42


🏁 Script executed:

# Search for documentation or types related to ensureAutoEvalSuite return values
rg -t tsx -t ts "ensureAutoEvalSuite" -A 20 mcpjam-inspector/client/src/components/evals/

Repository: MCPJam/inspector

Length of output: 87


🏁 Script executed:

# Search for ensureAutoEvalSuite references with context
rg "ensureAutoEvalSuite" -A 20 mcpjam-inspector/client/src/components/evals/ --type=ts

Repository: MCPJam/inspector

Length of output: 5462


Add defensive status guard to manual-recreate branch.

The manual branch (line 429 onward) unconditionally dereferences result.suite._id, result.suite.name, and result.suite.description before checking result.status. The auto path handles "suppressed" defensively at line 323, but here a non-"created" response would throw TypeError: Cannot read properties of undefined at line 429.

While the backend may never return "suppressed" for mode: "manual", the client shouldn't assume that contract tacitly. Add a brief status check:

Proposed guard
         if (singleServerName && isSuppressedAutoSuiteRecreate) {
           const result = await mutations.ensureAutoEvalSuiteMutation({
             workspaceId,
             serverName: singleServerName,
             mode: "manual",
           });
 
+          if (result.status !== "created") {
+            throw new Error(
+              `Unexpected ensureAutoEvalSuiteForServer status: ${result.status}`,
+            );
+          }
+
           const nextDescription = payload.description ?? "";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/EvalsTab.tsx` around lines 413 - 469,
The manual-recreate branch dereferences result.suite._id, result.suite.name and
result.suite.description before verifying result.status; add a defensive guard
after the call to ensureAutoEvalSuiteMutation (the result variable from
ensureAutoEvalSuiteMutation) that checks result.status === "created" (or handles
non-created statuses like "suppressed") and returns or handles the error case
early to avoid accessing result.suite when undefined; keep subsequent calls to
updateTestSuiteMutation, generateAndPersistEvalTests,
setLocallySuppressedServerKeys, toast.success and navigatePlaygroundEvalsRoute
only within the created-success path so result.suite is only used when present.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2026

Preview watchdog

The "preview requested" state has been stuck for ~20446 minutes.
The backend callback probably dropped (token expired, dispatch failed, or the backend job crashed).

Recover: re-run the upsert-preview workflow for this PR, or push an empty commit.

Watchdog runs every 15 minutes; this comment updates in place when conditions change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants