revert readme

carsontung666 · carsontung666 · commit 2822a3b0de44 · 2026-04-24T14:38:16.000+08:00
diff --git a/README.md b/README.md
@@ -141,63 +141,29 @@ result = db.query(tree_id, "question", strategy="block", beam_size=3)
 
 ## 📈 Benchmark Snapshot
 
-Two benchmarks live under `bench/`.
+Current filesystem benchmark summary lives in [bench/fs_block_beam_vertical.md](bench/fs_block_beam_vertical.md).
 
-### Filesystem mode — SWEBench-FileTree
+Run setup: `fs_query_order=prefix`, `beam_size=3`, `max_turns=10`, `5` filesystem queries on `context7` only.
 
-Runs on [`AmuroEita/SWEBench-FileTree`](https://huggingface.co/datasets/AmuroEita/SWEBench-FileTree),
-a path-only version of SWE-bench code retrieval:
+### Claude Opus 4.6
 
-- 500 GitHub issues as queries
-- 475 `(repo, commit)` repository snapshots as independent retrieval universes
-- 58,058 file paths; no source code, no file summaries
+| Retriever | Avg Time (s) | Avg LLM Calls | Hit@1 | Hit@10 | Total Cost (USD) |
+|---|---:|---:|---:|---:|---:|
+| **Block** | 8.44 | 2.4 | 1.00 | 1.00 | 0.2166 |
+| **Vertical** | 28.18 | 6.8 | 0.40 | 1.00 | 0.2900 |
+| **Beam** | 18.36 | 4.8 | 0.60 | 1.00 | 0.2091 |
 
-Given an issue and one snapshot's file tree, return the file(s) the fix
-touches. Specification: `notes/condb_swebench_filetree_bench.md`.
+### Claude Sonnet 4.6
 
-```bash
-export ANTHROPIC_API_KEY=sk-ant-...
-python bench/run_swebench_filetree.py --tier medium
-```
-
-Tiers (by retriever difficulty; lower difficulty = more path signal in query):
-
-```
-easy     107 queries   gold path appears in query text (sanity check)
-medium   133 queries   gold filename appears in query    (main report)
-hard     261 queries   gold module stem appears          (fuzzy matching)
-all      500 queries   no filter, includes ~48% path-signal-less queries
-```
-
-Output goes to `bench/runs/<timestamp>__<tier>/`: `report.md`, `summary.json`,
-`per_query.jsonl`.
+| Retriever | Avg Time (s) | Avg LLM Calls | Hit@1 | Hit@10 | Total Cost (USD) |
+|---|---:|---:|---:|---:|---:|
+| **Block** | 8.42 | 3.4 | 1.00 | 1.00 | 0.0643 |
+| **Vertical** | 20.78 | 7.0 | 0.40 | 0.80 | 0.1712 |
+| **Beam** | 17.84 | 4.8 | 0.40 | 1.00 | 0.1335 |
 
-#### Snapshot (Claude Sonnet 4.6, `--strategy auto`, top-k=10)
-
-| tier   | n   | hit@1 | hit@3 | hit@5 | hit@10 |  MRR  | nDCG@10 |
-|--------|-----|-------|-------|-------|--------|-------|---------|
-| easy   | 107 | 0.776 | 0.841 | 0.841 | 0.841  | 0.805 | 0.772   |
-| medium | 133 | 0.797 | 0.850 | 0.850 | 0.850  | 0.821 | 0.787   |
-
-Path-only filesystem retrieval lands gold in top-10 for ~85% of queries that
-carry any path-level signal. The `hard` tier (261 queries, module-stem
-signal only) is in progress and will be added once Anthropic API rate-limit
-retries are wired in. `all` (the path-signal-less ceiling) has not been run.
-
-### Document mode — single long document
-
-Compares retriever algorithms (Block / Beam / Vertical / ...) on one
-hierarchical document. Reports time, LLM calls, token usage with prompt
-caching, and USD cost.
-
-```bash
-python bench/run_document_bench.py \
-  --doc examples/large_doc.json \
-  --config bench/queries.json
-```
+`Block` is the best default: perfect Hit@1 across both models, lowest cost on Sonnet 4.6 (prompt caching cuts cost by ~60%), and fastest latency. `Beam` and `Vertical` are sensitive to model version — `Block` is the most robust choice.
 
-Queries live in the config JSON as `{"queries": ["...", "..."]}`. Swap in
-any `--doc` and any `--config` to benchmark a different document.
+These numbers are benchmark snapshots, not hard guarantees; exact cost and latency will vary with model choice, provider pricing, prompt-cache behavior, and corpus shape.
 
 ---
 
diff --git a/bench/run_swebench_filetree.py b/bench/run_swebench_filetree.py
@@ -30,7 +30,7 @@
 from contextdb.adapter.filesystem import FileSystemAdapter
 from contextdb.api.condb import ConDB
 
-DEFAULT_MODEL = "claude-sonnet-4-6"
+DEFAULT_MODEL = "claude-haiku-4-5"
 DEFAULT_DATA_DIR = Path("data/swebench_pathonly")
 K_VALUES = (1, 3, 5, 10)