You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
full 500 queries includes ~48% path-signal-less queries
152
+
```
145
153
146
-
`Block` is the best default: perfect Hit@1 across both models, lowest cost on Sonnet 4.6 (prompt caching cuts cost by ~60%), and fastest latency. `Beam` and `Vertical` are sensitive to model version — `Block` is the most robust choice.
154
+
Output goes to `bench/runs/<timestamp>__<tier>/`: `report.md`, `summary.json`,
155
+
`per_query.jsonl`.
156
+
157
+
### Document mode — single long document
158
+
159
+
Compares retriever algorithms (Block / Beam / Vertical / ...) on one
160
+
hierarchical document. Reports time, LLM calls, token usage with prompt
161
+
caching, and USD cost.
162
+
163
+
```bash
164
+
python bench/run_document_bench.py \
165
+
--doc examples/large_doc.json \
166
+
--config bench/queries.json
167
+
```
147
168
148
-
These numbers are benchmark snapshots, not hard guarantees; exact cost and latency will vary with model choice, provider pricing, prompt-cache behavior, and corpus shape.
169
+
Queries live in the config JSON as `{"queries": ["...", "..."]}`. Swap in
170
+
any `--doc` and any `--config` to benchmark a different document.
0 commit comments