|
| 1 | +# ResearchClawBench Math_000 Example Redesign |
| 2 | + |
| 3 | +## Goal |
| 4 | + |
| 5 | +Make `ResearchClawBench Math_000` the first example on the examples page and rewrite its presentation so a first-time visitor can quickly understand: |
| 6 | + |
| 7 | +- what the task is |
| 8 | +- what the agent is required to do |
| 9 | +- what the expected output is |
| 10 | +- how much the community-guided run improved over the no-community baseline |
| 11 | + |
| 12 | +The current issue is not lack of content. It is hierarchy. The hero uses too much headline text, the benchmark image is doing too much explanation work, and the key user-facing takeaway is visually delayed. |
| 13 | + |
| 14 | +## Scope |
| 15 | + |
| 16 | +Only redesign the `ResearchClawBench Math_000` example and the example ordering in `Choosing An Example`. |
| 17 | + |
| 18 | +Keep these unchanged: |
| 19 | + |
| 20 | +- `Parameter Golf` content structure and copy |
| 21 | +- the existence of the benchmark diagram asset |
| 22 | +- the three follow-on sections: |
| 23 | + - `1. What the user types` |
| 24 | + - `2. Before Vs After Community` |
| 25 | + - the measured-improvement section after that |
| 26 | + |
| 27 | +## Information Architecture |
| 28 | + |
| 29 | +### Example Ordering |
| 30 | + |
| 31 | +In `Choose An Example`, move `ResearchClawBench Math_000` to the first position and keep `Parameter Golf` second. |
| 32 | + |
| 33 | +### Top Section Direction |
| 34 | + |
| 35 | +Use a dashboard-first layout for the `Math_000` hero instead of a poster-style editorial block. |
| 36 | + |
| 37 | +The first screen should prioritize compact, scannable information blocks over one oversized sentence. The diagram remains present, but only as a supporting explainer card, not the main visual anchor. |
| 38 | + |
| 39 | +### Required First-Screen Content |
| 40 | + |
| 41 | +The top section should communicate these four things immediately: |
| 42 | + |
| 43 | +1. `What this is` |
| 44 | + `ResearchClawBench Math_000` is a crowded multi-object tracking benchmark. |
| 45 | +2. `Task requirement` |
| 46 | + The agent must convert frame-level detections into stable trajectories under occlusion and low-score conditions, then run the official evaluation workflow. |
| 47 | +3. `Output` |
| 48 | + The deliverable is recovered tracks plus the official benchmark score and analysis outputs. |
| 49 | +4. `Measured lift` |
| 50 | + The community-guided run improved the official score from `26.65` to `31.9`, a `+5.25` lift. |
| 51 | + |
| 52 | +## Layout Plan |
| 53 | + |
| 54 | +### Hero |
| 55 | + |
| 56 | +Use a dense dashboard composition with: |
| 57 | + |
| 58 | +- a compact title block |
| 59 | +- a prominent score-lift card |
| 60 | +- a small benchmark diagram card |
| 61 | +- short task-definition cards for input, requirement, output, and benchmark type |
| 62 | + |
| 63 | +Avoid long paragraph-first composition. The text should read like a task brief, not a manifesto. |
| 64 | + |
| 65 | +### Section Order |
| 66 | + |
| 67 | +For the `Math_000` example content, keep this order: |
| 68 | + |
| 69 | +1. top dashboard hero with score lift and task explanation |
| 70 | +2. `1. What the user types` |
| 71 | +3. `2. Before Vs After Community` |
| 72 | +4. measured-improvement explanation section |
| 73 | + |
| 74 | +This preserves the existing page logic while making the benchmark understandable before the table and prompt details. |
| 75 | + |
| 76 | +## Copy Direction |
| 77 | + |
| 78 | +Use plain language. Avoid assuming the reader knows what `Math_000`, `SparseTrack`, `DCM`, or `pseudo-depth` mean. |
| 79 | + |
| 80 | +The copy should explain the task at the level of: |
| 81 | + |
| 82 | +- input: frames plus detections |
| 83 | +- work: link detections into object tracks across time |
| 84 | +- output: final trajectories and official scoring result |
| 85 | + |
| 86 | +The benchmark diagram caption should explain what the reader is looking at, not repeat abstract benchmark language. |
| 87 | + |
| 88 | +## Visual Direction |
| 89 | + |
| 90 | +- Keep the existing site look and component language. |
| 91 | +- Reduce giant headline text. |
| 92 | +- Increase scannability with compact cards and stronger value grouping. |
| 93 | +- Make the score lift the clearest visual fact on the page. |
| 94 | +- Let the diagram support comprehension instead of dominating the section. |
| 95 | + |
| 96 | +## Risks And Constraints |
| 97 | + |
| 98 | +- A dashboard-heavy layout can become too dense if every fact is treated equally. |
| 99 | +- The score lift must stay visually strong without making the benchmark explanation feel secondary. |
| 100 | +- The diagram card must stay useful even when it is reduced in prominence. |
| 101 | + |
| 102 | +## Verification |
| 103 | + |
| 104 | +After implementation: |
| 105 | + |
| 106 | +- `ResearchClawBench Math_000` appears first in the example selector |
| 107 | +- the first screen explains task, requirement, and output clearly |
| 108 | +- the benchmark image is visually secondary to the score/task summary |
| 109 | +- the page still builds successfully |
| 110 | +- the examples page remains readable on desktop and mobile |
0 commit comments