handling autogenerated tests suites#1906
Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
WalkthroughThis pull request introduces an auto-evaluation suite creation mechanism that uses a new Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx (1)
343-551: Meaningful coverage across the suppression matrix.The three workspace/suppression scenarios plus the connected/disconnected manual-recreate pair exercise the genuinely risky paths. No concerns on correctness.
One optional polish: the
mocks.useEvalQueries.mockImplementation(...)block is copy-pasted near-verbatim across the three new tests. Extracting a small helper (e.g.makeSingleSuiteQueryState(["server-b"], "suite-b")) would trim ~60 lines and make future schema tweaks a one-liner.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx` around lines 343 - 551, The tests duplicate the same mocks.useEvalQueries.mockImplementation block across multiple cases; extract that repeated setup into a helper function (e.g., makeSingleSuiteQueryState or getSingleSuiteUseEvalMock) and replace the inline mockImplementation calls with mocks.useEvalQueries.mockImplementation(makeSingleSuiteQueryState(["server-b"], "suite-b")); ensure the helper returns the identical object shape used currently (suiteOverview, suiteDetails, suiteRuns, selectedSuiteEntry, selectedSuite, sortedIterations, runsForSelectedSuite, activeIterations, sortedSuites, isOverviewLoading, isSuiteDetailsLoading, isSuiteRunsLoading, enableOverviewQuery, enableSuiteDetailsQuery) so existing tests keep the same behavior and only the repeated block is removed.mcpjam-inspector/client/src/components/EvalsTab.tsx (1)
276-374: Auto-suite effect: tight in-flight guard, but dependency onappState.serversmay over-fire.The
explorePrefetchInFlightmodule-scoped guard plus the workspace-keyedworkspaceServerKeycorrectly prevent duplicate creates across remount or across workspaces — well thought through.One small observation:
appState.serversis listed as a dependency (Line 365). Because that object is rebuilt on essentially any server mutation (connection status flips, OAuth token refreshes, etc.), this effect will re-run on churn that has nothing to do with suppression. TheexplorePrefetchInFlightandserversWithExploreSuitechecks make this safe, but on a chatty workspace you'll see repeated no-op invocations. If that ever shows up in a profiler, narrowing to a memoized suppression map (e.g.useMemo(() => new Set(Object.entries(appState.servers).filter(([, s]) => s?.autoEvalSuiteSuppressedAt !== undefined).map(([k]) => k)), [appState.servers])) would cut the churn without changing semantics.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mcpjam-inspector/client/src/components/EvalsTab.tsx` around lines 276 - 374, The effect re-runs too often because appState.servers is a volatile dependency; compute a memoized suppression set with useMemo (e.g. derive Set of server names where entry.autoEvalSuiteSuppressedAt !== undefined from appState.servers) and replace direct checks of appState.servers[name]?.autoEvalSuiteSuppressedAt in the serversNeedingSuite filter with membership checks against that memoized Set, then remove appState.servers from the useEffect dependency array and add the memoized suppression set (and its variable name) instead; keep explorePrefetchInFlight, workspaceServerKey, locallySuppressedServerKeys, and other deps as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@mcpjam-inspector/client/src/components/EvalsTab.tsx`:
- Around line 413-469: The manual-recreate branch dereferences result.suite._id,
result.suite.name and result.suite.description before verifying result.status;
add a defensive guard after the call to ensureAutoEvalSuiteMutation (the result
variable from ensureAutoEvalSuiteMutation) that checks result.status ===
"created" (or handles non-created statuses like "suppressed") and returns or
handles the error case early to avoid accessing result.suite when undefined;
keep subsequent calls to updateTestSuiteMutation, generateAndPersistEvalTests,
setLocallySuppressedServerKeys, toast.success and navigatePlaygroundEvalsRoute
only within the created-success path so result.suite is only used when present.
---
Nitpick comments:
In `@mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx`:
- Around line 343-551: The tests duplicate the same
mocks.useEvalQueries.mockImplementation block across multiple cases; extract
that repeated setup into a helper function (e.g., makeSingleSuiteQueryState or
getSingleSuiteUseEvalMock) and replace the inline mockImplementation calls with
mocks.useEvalQueries.mockImplementation(makeSingleSuiteQueryState(["server-b"],
"suite-b")); ensure the helper returns the identical object shape used currently
(suiteOverview, suiteDetails, suiteRuns, selectedSuiteEntry, selectedSuite,
sortedIterations, runsForSelectedSuite, activeIterations, sortedSuites,
isOverviewLoading, isSuiteDetailsLoading, isSuiteRunsLoading,
enableOverviewQuery, enableSuiteDetailsQuery) so existing tests keep the same
behavior and only the repeated block is removed.
In `@mcpjam-inspector/client/src/components/EvalsTab.tsx`:
- Around line 276-374: The effect re-runs too often because appState.servers is
a volatile dependency; compute a memoized suppression set with useMemo (e.g.
derive Set of server names where entry.autoEvalSuiteSuppressedAt !== undefined
from appState.servers) and replace direct checks of
appState.servers[name]?.autoEvalSuiteSuppressedAt in the serversNeedingSuite
filter with membership checks against that memoized Set, then remove
appState.servers from the useEffect dependency array and add the memoized
suppression set (and its variable name) instead; keep explorePrefetchInFlight,
workspaceServerKey, locallySuppressedServerKeys, and other deps as-is.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 95a3914f-be3b-4caf-9d91-b323f06aa4cb
📒 Files selected for processing (6)
mcpjam-inspector/client/src/components/EvalsTab.tsxmcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsxmcpjam-inspector/client/src/components/evals/use-eval-mutations.tsmcpjam-inspector/client/src/hooks/useWorkspaces.tsmcpjam-inspector/client/src/lib/workspace-serialization.tsmcpjam-inspector/client/src/state/app-types.ts
| if (singleServerName && isSuppressedAutoSuiteRecreate) { | ||
| const result = await mutations.ensureAutoEvalSuiteMutation({ | ||
| workspaceId, | ||
| serverName: singleServerName, | ||
| mode: "manual", | ||
| }); | ||
|
|
||
| const nextDescription = payload.description ?? ""; | ||
| const suiteUpdates: { | ||
| suiteId: string; | ||
| name?: string; | ||
| description?: string; | ||
| } = { | ||
| suiteId: result.suite._id, | ||
| }; | ||
|
|
||
| if (payload.name !== result.suite.name) { | ||
| suiteUpdates.name = payload.name; | ||
| } | ||
| if (nextDescription !== (result.suite.description ?? "")) { | ||
| suiteUpdates.description = nextDescription; | ||
| } | ||
| if ( | ||
| suiteUpdates.name !== undefined || | ||
| suiteUpdates.description !== undefined | ||
| ) { | ||
| await mutations.updateTestSuiteMutation(suiteUpdates); | ||
| } | ||
|
|
||
| if ( | ||
| result.status === "created" && | ||
| connectedServerNames.has(singleServerName) | ||
| ) { | ||
| await generateAndPersistEvalTests({ | ||
| convex, | ||
| getAccessToken, | ||
| workspaceId, | ||
| suiteId: result.suite._id, | ||
| serverIds: [singleServerName], | ||
| createTestCase: createTestCaseMutation, | ||
| }); | ||
| } | ||
|
|
||
| setLocallySuppressedServerKeys((previous) => | ||
| previous.filter( | ||
| (serverKey) => | ||
| serverKey !== workspaceServerKey(workspaceId, singleServerName), | ||
| ), | ||
| ); | ||
|
|
||
| toast.success("Suite created"); | ||
| navigatePlaygroundEvalsRoute({ | ||
| type: "suite-overview", | ||
| suiteId: result.suite._id, | ||
| }); | ||
| return; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Locate the Convex function definition and inspect its return shape.
fd -t f 'testSuites' | head -50
rg -nP -C5 'ensureAutoEvalSuiteForServer' --type=tsRepository: MCPJam/inspector
Length of output: 2314
🏁 Script executed:
fd -t f -e ts 'testSuites' --search-path '.'Repository: MCPJam/inspector
Length of output: 42
🏁 Script executed:
# Find convex backend directory
git ls-files | grep -E '^convex/' | head -20Repository: MCPJam/inspector
Length of output: 42
🏁 Script executed:
# Understand repository structure
ls -la
cat README.md 2>/dev/null | head -50
git ls-files | head -30Repository: MCPJam/inspector
Length of output: 4791
🏁 Script executed:
# Search for the backend function definition
rg -t ts "ensureAutoEvalSuiteForServer" --no-headingRepository: MCPJam/inspector
Length of output: 341
🏁 Script executed:
# Check test file to understand expected return shapes
rg -A 30 "ensureAutoEvalSuiteForServer" mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsxRepository: MCPJam/inspector
Length of output: 1203
🏁 Script executed:
# Find mutation mock setup in test file
rg -B 5 -A 15 "ensureAutoEvalSuiteMutation.*mock" mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsxRepository: MCPJam/inspector
Length of output: 1991
🏁 Script executed:
# Search for all ensureAutoEvalSuiteMutation mock return values in tests
rg -B 2 -A 8 'ensureAutoEvalSuiteMutation.mockResolvedValue' mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsxRepository: MCPJam/inspector
Length of output: 661
🏁 Script executed:
# Search for all status values returned by the mock in tests
rg 'status: "' mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx | grep ensureAutoEvalSuite -A 5 -B 5Repository: MCPJam/inspector
Length of output: 42
🏁 Script executed:
# Search for mock return values with status in test file
rg -A 5 'ensureAutoEvalSuiteMutation.mockResolvedValue|status:' mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx | grep -E "(status:|ensureAutoEvalSuiteMutation)"Repository: MCPJam/inspector
Length of output: 258
🏁 Script executed:
# Get complete mock return value blocks
rg -B 2 -A 6 'ensureAutoEvalSuiteMutation.mockResolvedValue' mcpjam-inspector/client/src/components/__tests__/EvalsTab.test.tsx | head -60Repository: MCPJam/inspector
Length of output: 517
🏁 Script executed:
# Find the auto path that handles suppressed status (lines 322-333 mentioned in review)
sed -n '300,350p' mcpjam-inspector/client/src/components/EvalsTab.tsxRepository: MCPJam/inspector
Length of output: 1666
🏁 Script executed:
# Check for "existing" status in tests and code
rg '"existing"' mcpjam-inspector/client/src/Repository: MCPJam/inspector
Length of output: 1573
🏁 Script executed:
# Check the manual branch code in EvalsTab.tsx (lines 413-469)
sed -n '413,469p' mcpjam-inspector/client/src/components/EvalsTab.tsxRepository: MCPJam/inspector
Length of output: 1858
🏁 Script executed:
# Search for "existing" as a suite status value in the entire codebase
rg -t ts '"existing"' --no-heading mcpjam-inspector/client/src/components/ | grep -i suiteRepository: MCPJam/inspector
Length of output: 42
🏁 Script executed:
# Search for documentation or types related to ensureAutoEvalSuite return values
rg -t tsx -t ts "ensureAutoEvalSuite" -A 20 mcpjam-inspector/client/src/components/evals/Repository: MCPJam/inspector
Length of output: 87
🏁 Script executed:
# Search for ensureAutoEvalSuite references with context
rg "ensureAutoEvalSuite" -A 20 mcpjam-inspector/client/src/components/evals/ --type=tsRepository: MCPJam/inspector
Length of output: 5462
Add defensive status guard to manual-recreate branch.
The manual branch (line 429 onward) unconditionally dereferences result.suite._id, result.suite.name, and result.suite.description before checking result.status. The auto path handles "suppressed" defensively at line 323, but here a non-"created" response would throw TypeError: Cannot read properties of undefined at line 429.
While the backend may never return "suppressed" for mode: "manual", the client shouldn't assume that contract tacitly. Add a brief status check:
Proposed guard
if (singleServerName && isSuppressedAutoSuiteRecreate) {
const result = await mutations.ensureAutoEvalSuiteMutation({
workspaceId,
serverName: singleServerName,
mode: "manual",
});
+ if (result.status !== "created") {
+ throw new Error(
+ `Unexpected ensureAutoEvalSuiteForServer status: ${result.status}`,
+ );
+ }
+
const nextDescription = payload.description ?? "";🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@mcpjam-inspector/client/src/components/EvalsTab.tsx` around lines 413 - 469,
The manual-recreate branch dereferences result.suite._id, result.suite.name and
result.suite.description before verifying result.status; add a defensive guard
after the call to ensureAutoEvalSuiteMutation (the result variable from
ensureAutoEvalSuiteMutation) that checks result.status === "created" (or handles
non-created statuses like "suppressed") and returns or handles the error case
early to avoid accessing result.suite when undefined; keep subsequent calls to
updateTestSuiteMutation, generateAndPersistEvalTests,
setLocallySuppressedServerKeys, toast.success and navigatePlaygroundEvalsRoute
only within the created-success path so result.suite is only used when present.
Preview watchdogThe "preview requested" state has been stuck for ~20446 minutes. Recover: re-run the Watchdog runs every 15 minutes; this comment updates in place when conditions change. |
Backend PR
autoEvalSuiteIdautoEvalSuiteSuppressedAtInspector PR