Skip to content

fix(test-studio): return partial result instead of raising when metrics not cached (#358)#368

Merged
rstrahan merged 2 commits into
developfrom
fix/358-test-results-evaluating-state
Jun 22, 2026
Merged

fix(test-studio): return partial result instead of raising when metrics not cached (#358)#368
rstrahan merged 2 commits into
developfrom
fix/358-test-results-evaluating-state

Conversation

@rstrahan

Copy link
Copy Markdown
Contributor

Summary

Fixes #358. The getTestRun resolver (nested/appsync/src/lambda/test_results_resolver/index.py, get_test_results()) raised an unhandled ValueError:

Test run … processing completed, evaluating results

when a test run reached a terminal processing state but the evaluation-aggregation step never cached testRunResult. This happens when aggregation is still running, timed out, or failed silently on a large run (the reporter hit it with 3 463 documents). The exception surfaced as an opaque error and the run spun on "Loading…" forever in Test Studio.

Change

Instead of raising when testRunResult is missing, return a structured partial TestRun — the true status, file counts, and metadata, with the metric fields omitted. The GraphQL TestRun type already declares every metric field nullable (only testRunId, status, and filesCount are non-null), so the partial response is schema-valid and the UI renders the in-progress/terminal state instead of erroring.

Side benefit: compareTestRuns calls get_test_results() per run, so previously a single not-yet-aggregated run would raise and fail the entire comparison. It now contributes a partial entry.

Scope / follow-up

This PR addresses the resolver-error half of the issue (the user-facing breakage). The issue's second point — investigating why aggregation can stall on very large runs (possible Lambda 15-min timeout) — is a separate, deeper investigation and is intentionally out of scope here; left as a documented follow-up.

Testing

Added a regression test in lib/idp_common_pkg/tests/unit/test_results_resolver.py that drives get_test_results() with a terminal-status run whose metadata has no testRunResult, asserting it returns a partial dict (true status + counts + metadata) rather than raising.

$ pytest tests/unit/test_results_resolver.py -q
8 passed

Risk

Low. Replaces a raise with a structured return on a path that was already a hard error; no behavior change on the success (cached-metrics) path.

🤖 Generated with Claude Code

rstrahan and others added 2 commits June 22, 2026 16:42
…cs not cached (#358)

get_test_results() in the test_results_resolver Lambda raised an
unhandled ValueError ("Test run ... processing completed, evaluating
results") when a run reached a terminal state but the evaluation
aggregation never wrote testRunResult — i.e. aggregation is still
running, timed out, or failed silently (reproduced on a 3463-document
run). The exception surfaced as an opaque error and Test Studio spun on
"Loading..." indefinitely.

Return a structured partial TestRun (true status, file counts, and
metadata; metric fields omitted) instead of raising. The GraphQL TestRun
type already makes every metric field nullable, so the partial response
is schema-valid. This also prevents a single not-yet-aggregated run from
failing an entire compareTestRuns request.

The deeper question of why aggregation can stall on very large runs is
left as a documented follow-up.

Add a regression unit test covering the terminal-status / no-cached-
metrics path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rstrahan rstrahan merged commit 548a66a into develop Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant