howard0su · howard0su · May 11, 2026 · May 11, 2026 · May 12, 2026
diff --git a/dflash/DEVELOPER.md b/dflash/DEVELOPER.md
@@ -227,7 +227,7 @@ dflash/
 │   └── draft/model.safetensors
 ├── scripts/
 │   ├── server.py               # Main OpenAI/Codex server
-│   ├── server_tools.py         # Legacy fork with tool calling (deprecated)
+│   ├── server_tools.py         # Legacy fork kept for reference; server.py is the tool/Codex path
 │   ├── prefix_cache.py         # LRU prefix cache
 │   ├── _prefill_hook.py        # Speculative prefill compression
 │   ├── run.py                  # CLI text generation

diff --git a/dflash/README.md b/dflash/README.md
@@ -119,7 +119,7 @@ allows capacity checks where the draft and a target layer range share one GPU
 before serving integration. `--target-split-dflash` runs the same split target
 placement through a chain DFlash decode loop and reports acceptance length.
 
-**Python flags on `scripts/run.py`, `scripts/server.py`, `scripts/server_tools.py`:**
+**Python flags on `scripts/run.py` and `scripts/server.py` (`scripts/server_tools.py` is legacy):**
 ```bash
 python3 scripts/run.py --ctk q8_0 --ctv q4_0 --prompt "hello"
 python3 scripts/run.py --cache-type-k q8_0 --cache-type-v q4_0 --prompt "hello"