Clarify Results Explorer defaults and usage docs

sjarmak · sjarmak · commit 8fda6b067285 · 2026-03-03T18:53:43.000Z
diff --git a/docs/DAYTONA.md b/docs/DAYTONA.md
@@ -128,7 +128,7 @@ runs/staging/{run_name}/
 ```
 
 These artifacts are fully compatible with:
-- **Runs explorer**: `python3 scripts/export_official_results.py`
+- **Runs explorer**: `python3 scripts/export_official_results.py` (defaults to `runs/analysis/`; pass `--runs-dir ./runs/official/` for curated official runs)
 - **IR analysis pipeline**: `python3 scripts/normalize_retrieval_events.py`
 - **QA validation**: `python3 scripts/validate_task_run.py`
 - **Metrics extraction**: `python3 scripts/extract_task_metrics.py`
diff --git a/docs/OFFICIAL_RESULTS_BROWSER.md b/docs/OFFICIAL_RESULTS_BROWSER.md
@@ -1,14 +1,14 @@
 # Official Results Browser
 
-Use this workflow to publish valid official scores with easy-to-view parsed traces.
+Use this workflow to browse scored task results with parsed traces and task metrics.
 
 ## What It Exports
 
-`python3 scripts/export_official_results.py` scans `runs/official/` and exports only valid scored tasks (status `passed`/`failed` with numeric reward) into a static bundle:
+`python3 scripts/export_official_results.py` scans `runs/analysis/` by default and exports only valid scored tasks (status `passed`/`failed` with numeric reward) into a static bundle:
 
 - `docs/official_results/README.md` - run/config score summary
 - `docs/official_results/runs/*.md` - per-run task tables
-- `docs/official_results/tasks/*.md` - per-task metrics and parsed trace/tool summaries
+- `docs/official_results/tasks/*.html` - per-task metrics and parsed trace/tool summaries
 - `docs/official_results/data/official_results.json` - machine-readable data
 - `docs/official_results/audits/*.json` - per-task audit payloads with trace parsing and SHA256 checksums
 - `docs/official_results/traces/*/trajectory.json` - bundled raw trajectory traces
@@ -31,22 +31,31 @@ artifact-mode configs:
 
 ## Usage
 
-If you promote runs with:
+Default usage (pull from `runs/analysis/`):
 
 ```bash
-python3 scripts/promote_run.py --execute <staging_run_name>
+python3 scripts/export_official_results.py \
+  --output-dir ./docs/official_results/
 ```
 
-`docs/official_results` is refreshed automatically after successful promotion
-and MANIFEST regeneration. Use `--no-export-official-results` to skip that
-step when needed.
+To export curated official runs instead:
 
 ```bash
 python3 scripts/export_official_results.py \
   --runs-dir ./runs/official/ \
   --output-dir ./docs/official_results/
 ```
 
+If you promote runs with:
+
+```bash
+python3 scripts/promote_run.py --execute <staging_run_name>
+```
+
+`docs/official_results` is refreshed automatically after successful promotion
+and MANIFEST regeneration. Use `--no-export-official-results` to skip that
+step when needed.
+
 Filter to specific run(s):
 
 ```bash
@@ -64,5 +73,5 @@ python3 scripts/export_official_results.py --serve
 ## Notes
 
 - The exporter prefers `task_metrics.json` when present and falls back to transcript parsing for tool-call extraction.
-- Task pages link to bundled `audits/*.json` so GitHub viewers can audit without local `runs/official/`.
-- If `runs/official/MANIFEST.json` exists, export is automatically scoped to run directories tracked in the manifest.
+- Task pages link to bundled `audits/*.json` so GitHub viewers can audit without local runs data.
+- If `MANIFEST.json` exists under the selected `--runs-dir`, export is automatically scoped to run directories tracked in the manifest.