fix(test-studio): return partial result instead of raising when metrics not cached (#358)#368
Merged
Merged
Conversation
…cs not cached (#358) get_test_results() in the test_results_resolver Lambda raised an unhandled ValueError ("Test run ... processing completed, evaluating results") when a run reached a terminal state but the evaluation aggregation never wrote testRunResult — i.e. aggregation is still running, timed out, or failed silently (reproduced on a 3463-document run). The exception surfaced as an opaque error and Test Studio spun on "Loading..." indefinitely. Return a structured partial TestRun (true status, file counts, and metadata; metric fields omitted) instead of raising. The GraphQL TestRun type already makes every metric field nullable, so the partial response is schema-valid. This also prevents a single not-yet-aggregated run from failing an entire compareTestRuns request. The deeper question of why aggregation can stall on very large runs is left as a documented follow-up. Add a regression unit test covering the terminal-status / no-cached- metrics path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #358. The
getTestRunresolver (nested/appsync/src/lambda/test_results_resolver/index.py,get_test_results()) raised an unhandledValueError:when a test run reached a terminal processing state but the evaluation-aggregation step never cached
testRunResult. This happens when aggregation is still running, timed out, or failed silently on a large run (the reporter hit it with 3 463 documents). The exception surfaced as an opaque error and the run spun on "Loading…" forever in Test Studio.Change
Instead of raising when
testRunResultis missing, return a structured partialTestRun— the truestatus, file counts, and metadata, with the metric fields omitted. The GraphQLTestRuntype already declares every metric field nullable (onlytestRunId,status, andfilesCountare non-null), so the partial response is schema-valid and the UI renders the in-progress/terminal state instead of erroring.Side benefit:
compareTestRunscallsget_test_results()per run, so previously a single not-yet-aggregated run would raise and fail the entire comparison. It now contributes a partial entry.Scope / follow-up
This PR addresses the resolver-error half of the issue (the user-facing breakage). The issue's second point — investigating why aggregation can stall on very large runs (possible Lambda 15-min timeout) — is a separate, deeper investigation and is intentionally out of scope here; left as a documented follow-up.
Testing
Added a regression test in
lib/idp_common_pkg/tests/unit/test_results_resolver.pythat drivesget_test_results()with a terminal-status run whose metadata has notestRunResult, asserting it returns a partial dict (true status + counts + metadata) rather than raising.Risk
Low. Replaces a raise with a structured return on a path that was already a hard error; no behavior change on the success (cached-metrics) path.
🤖 Generated with Claude Code