Commit d3e02cd
tq_chat.py: native C engine backend — 15.6 tok/s (was 6 tok/s PyTorch)
Redesigned tq_chat.py to use tq_run C engine by default:
- Auto-detects model/tokenizer in HuggingFace cache
- Calls tq_run as subprocess, parses streaming output
- Displays "Native C Inference Engine" in header
- Shows tok/s, threads, kv type in KV analysis
- Falls back to PyTorch if tq_run not built (--engine pytorch)
Speed: 15.6 tok/s (native) vs 6.0 tok/s (PyTorch MPS) = 2.6x faster
No Python dependencies needed for native mode.
CLI integration: tq demo now routes to native engine by default
Fixed model path glob for safetensors-00001-of-00001 variant
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 26dfeab commit d3e02cd
4 files changed
Lines changed: 528 additions & 148 deletions
Loading
Loading
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
230 | 230 | | |
231 | 231 | | |
232 | 232 | | |
233 | | - | |
| 233 | + | |
234 | 234 | | |
235 | 235 | | |
236 | 236 | | |
| |||
239 | 239 | | |
240 | 240 | | |
241 | 241 | | |
| 242 | + | |
242 | 243 | | |
243 | 244 | | |
244 | 245 | | |
| |||
263 | 264 | | |
264 | 265 | | |
265 | 266 | | |
266 | | - | |
| 267 | + | |
267 | 268 | | |
| 269 | + | |
| 270 | + | |
268 | 271 | | |
269 | 272 | | |
270 | 273 | | |
| |||
281 | 284 | | |
282 | 285 | | |
283 | 286 | | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
284 | 290 | | |
285 | 291 | | |
286 | | - | |
| 292 | + | |
287 | 293 | | |
288 | 294 | | |
289 | 295 | | |
| |||
0 commit comments