You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Path-only filesystem retrieval lands gold in top-10 for ~85% of queries that
183
-
carry any path-level signal. The `hard` tier (261 queries, module-stem
184
-
signal only) is in progress and will be added once Anthropic API rate-limit
185
-
retries are wired in. `all` (the path-signal-less ceiling) has not been run.
186
-
187
-
### Document mode — single long document
188
-
189
-
Compares retriever algorithms (Block / Beam / Vertical / ...) on one
190
-
hierarchical document. Reports time, LLM calls, token usage with prompt
191
-
caching, and USD cost.
192
-
193
-
```bash
194
-
python bench/run_document_bench.py \
195
-
--doc examples/large_doc.json \
196
-
--config bench/queries.json
197
-
```
164
+
`Block` is the best default: perfect Hit@1 across both models, lowest cost on Sonnet 4.6 (prompt caching cuts cost by ~60%), and fastest latency. `Beam` and `Vertical` are sensitive to model version — `Block` is the most robust choice.
198
165
199
-
Queries live in the config JSON as `{"queries": ["...", "..."]}`. Swap in
200
-
any `--doc` and any `--config` to benchmark a different document.
166
+
These numbers are benchmark snapshots, not hard guarantees; exact cost and latency will vary with model choice, provider pricing, prompt-cache behavior, and corpus shape.
0 commit comments