Skip to content

Commit 4160a09

Browse files
abrichrclaude
andcommitted
docs: update benchmark viewer GIF with multi-task eval results
Replace the old single-task (0% success) GIF with a new animation showing the phase0_multi_domain_v3 evaluation (5 tasks, 2 pass, 3 fail, 40% success rate). The new GIF cycles through the overview, task selection, and step-by-step screenshot replay for both passing and failing tasks across different Windows application domains. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent ffcb41d commit 4160a09

1 file changed

File tree

animations/benchmark-viewer.gif

-4.95 KB
Loading

0 commit comments

Comments
 (0)