Skip to content

Commit 50b2998

Browse files
committed
Add ResearchClawBench example redesign spec
1 parent fabeb4b commit 50b2998

1 file changed

Lines changed: 110 additions & 0 deletions

File tree

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# ResearchClawBench Math_000 Example Redesign
2+
3+
## Goal
4+
5+
Make `ResearchClawBench Math_000` the first example on the examples page and rewrite its presentation so a first-time visitor can quickly understand:
6+
7+
- what the task is
8+
- what the agent is required to do
9+
- what the expected output is
10+
- how much the community-guided run improved over the no-community baseline
11+
12+
The current issue is not lack of content. It is hierarchy. The hero uses too much headline text, the benchmark image is doing too much explanation work, and the key user-facing takeaway is visually delayed.
13+
14+
## Scope
15+
16+
Only redesign the `ResearchClawBench Math_000` example and the example ordering in `Choosing An Example`.
17+
18+
Keep these unchanged:
19+
20+
- `Parameter Golf` content structure and copy
21+
- the existence of the benchmark diagram asset
22+
- the three follow-on sections:
23+
- `1. What the user types`
24+
- `2. Before Vs After Community`
25+
- the measured-improvement section after that
26+
27+
## Information Architecture
28+
29+
### Example Ordering
30+
31+
In `Choose An Example`, move `ResearchClawBench Math_000` to the first position and keep `Parameter Golf` second.
32+
33+
### Top Section Direction
34+
35+
Use a dashboard-first layout for the `Math_000` hero instead of a poster-style editorial block.
36+
37+
The first screen should prioritize compact, scannable information blocks over one oversized sentence. The diagram remains present, but only as a supporting explainer card, not the main visual anchor.
38+
39+
### Required First-Screen Content
40+
41+
The top section should communicate these four things immediately:
42+
43+
1. `What this is`
44+
`ResearchClawBench Math_000` is a crowded multi-object tracking benchmark.
45+
2. `Task requirement`
46+
The agent must convert frame-level detections into stable trajectories under occlusion and low-score conditions, then run the official evaluation workflow.
47+
3. `Output`
48+
The deliverable is recovered tracks plus the official benchmark score and analysis outputs.
49+
4. `Measured lift`
50+
The community-guided run improved the official score from `26.65` to `31.9`, a `+5.25` lift.
51+
52+
## Layout Plan
53+
54+
### Hero
55+
56+
Use a dense dashboard composition with:
57+
58+
- a compact title block
59+
- a prominent score-lift card
60+
- a small benchmark diagram card
61+
- short task-definition cards for input, requirement, output, and benchmark type
62+
63+
Avoid long paragraph-first composition. The text should read like a task brief, not a manifesto.
64+
65+
### Section Order
66+
67+
For the `Math_000` example content, keep this order:
68+
69+
1. top dashboard hero with score lift and task explanation
70+
2. `1. What the user types`
71+
3. `2. Before Vs After Community`
72+
4. measured-improvement explanation section
73+
74+
This preserves the existing page logic while making the benchmark understandable before the table and prompt details.
75+
76+
## Copy Direction
77+
78+
Use plain language. Avoid assuming the reader knows what `Math_000`, `SparseTrack`, `DCM`, or `pseudo-depth` mean.
79+
80+
The copy should explain the task at the level of:
81+
82+
- input: frames plus detections
83+
- work: link detections into object tracks across time
84+
- output: final trajectories and official scoring result
85+
86+
The benchmark diagram caption should explain what the reader is looking at, not repeat abstract benchmark language.
87+
88+
## Visual Direction
89+
90+
- Keep the existing site look and component language.
91+
- Reduce giant headline text.
92+
- Increase scannability with compact cards and stronger value grouping.
93+
- Make the score lift the clearest visual fact on the page.
94+
- Let the diagram support comprehension instead of dominating the section.
95+
96+
## Risks And Constraints
97+
98+
- A dashboard-heavy layout can become too dense if every fact is treated equally.
99+
- The score lift must stay visually strong without making the benchmark explanation feel secondary.
100+
- The diagram card must stay useful even when it is reduced in prominence.
101+
102+
## Verification
103+
104+
After implementation:
105+
106+
- `ResearchClawBench Math_000` appears first in the example selector
107+
- the first screen explains task, requirement, and output clearly
108+
- the benchmark image is visually secondary to the score/task summary
109+
- the page still builds successfully
110+
- the examples page remains readable on desktop and mobile

0 commit comments

Comments
 (0)