Skip to content

Commit 8fda6b0

Browse files
committed
Clarify Results Explorer defaults and usage docs
1 parent f8b66aa commit 8fda6b0

File tree

2 files changed

+20
-11
lines changed

2 files changed

+20
-11
lines changed

docs/DAYTONA.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,7 @@ runs/staging/{run_name}/
128128
```
129129

130130
These artifacts are fully compatible with:
131-
- **Runs explorer**: `python3 scripts/export_official_results.py`
131+
- **Runs explorer**: `python3 scripts/export_official_results.py` (defaults to `runs/analysis/`; pass `--runs-dir ./runs/official/` for curated official runs)
132132
- **IR analysis pipeline**: `python3 scripts/normalize_retrieval_events.py`
133133
- **QA validation**: `python3 scripts/validate_task_run.py`
134134
- **Metrics extraction**: `python3 scripts/extract_task_metrics.py`

docs/OFFICIAL_RESULTS_BROWSER.md

Lines changed: 19 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
# Official Results Browser
22

3-
Use this workflow to publish valid official scores with easy-to-view parsed traces.
3+
Use this workflow to browse scored task results with parsed traces and task metrics.
44

55
## What It Exports
66

7-
`python3 scripts/export_official_results.py` scans `runs/official/` and exports only valid scored tasks (status `passed`/`failed` with numeric reward) into a static bundle:
7+
`python3 scripts/export_official_results.py` scans `runs/analysis/` by default and exports only valid scored tasks (status `passed`/`failed` with numeric reward) into a static bundle:
88

99
- `docs/official_results/README.md` - run/config score summary
1010
- `docs/official_results/runs/*.md` - per-run task tables
11-
- `docs/official_results/tasks/*.md` - per-task metrics and parsed trace/tool summaries
11+
- `docs/official_results/tasks/*.html` - per-task metrics and parsed trace/tool summaries
1212
- `docs/official_results/data/official_results.json` - machine-readable data
1313
- `docs/official_results/audits/*.json` - per-task audit payloads with trace parsing and SHA256 checksums
1414
- `docs/official_results/traces/*/trajectory.json` - bundled raw trajectory traces
@@ -31,22 +31,31 @@ artifact-mode configs:
3131

3232
## Usage
3333

34-
If you promote runs with:
34+
Default usage (pull from `runs/analysis/`):
3535

3636
```bash
37-
python3 scripts/promote_run.py --execute <staging_run_name>
37+
python3 scripts/export_official_results.py \
38+
--output-dir ./docs/official_results/
3839
```
3940

40-
`docs/official_results` is refreshed automatically after successful promotion
41-
and MANIFEST regeneration. Use `--no-export-official-results` to skip that
42-
step when needed.
41+
To export curated official runs instead:
4342

4443
```bash
4544
python3 scripts/export_official_results.py \
4645
--runs-dir ./runs/official/ \
4746
--output-dir ./docs/official_results/
4847
```
4948

49+
If you promote runs with:
50+
51+
```bash
52+
python3 scripts/promote_run.py --execute <staging_run_name>
53+
```
54+
55+
`docs/official_results` is refreshed automatically after successful promotion
56+
and MANIFEST regeneration. Use `--no-export-official-results` to skip that
57+
step when needed.
58+
5059
Filter to specific run(s):
5160

5261
```bash
@@ -64,5 +73,5 @@ python3 scripts/export_official_results.py --serve
6473
## Notes
6574

6675
- The exporter prefers `task_metrics.json` when present and falls back to transcript parsing for tool-call extraction.
67-
- Task pages link to bundled `audits/*.json` so GitHub viewers can audit without local `runs/official/`.
68-
- If `runs/official/MANIFEST.json` exists, export is automatically scoped to run directories tracked in the manifest.
76+
- Task pages link to bundled `audits/*.json` so GitHub viewers can audit without local runs data.
77+
- If `MANIFEST.json` exists under the selected `--runs-dir`, export is automatically scoped to run directories tracked in the manifest.

0 commit comments

Comments
 (0)