Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion dflash/DEVELOPER.md
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ dflash/
│ └── draft/model.safetensors
├── scripts/
│ ├── server.py # Main OpenAI/Codex server
│ ├── server_tools.py # Legacy fork with tool calling (deprecated)
│ ├── server_tools.py # Legacy fork kept for reference; server.py is the tool/Codex path
│ ├── prefix_cache.py # LRU prefix cache
│ ├── _prefill_hook.py # Speculative prefill compression
│ ├── run.py # CLI text generation
Expand Down
2 changes: 1 addition & 1 deletion dflash/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ allows capacity checks where the draft and a target layer range share one GPU
before serving integration. `--target-split-dflash` runs the same split target
placement through a chain DFlash decode loop and reports acceptance length.

**Python flags on `scripts/run.py`, `scripts/server.py`, `scripts/server_tools.py`:**
**Python flags on `scripts/run.py` and `scripts/server.py` (`scripts/server_tools.py` is legacy):**
```bash
python3 scripts/run.py --ctk q8_0 --ctv q4_0 --prompt "hello"
python3 scripts/run.py --cache-type-k q8_0 --cache-type-v q4_0 --prompt "hello"
Expand Down
Loading
Loading