Skip to content

Commit d3e02cd

Browse files
unamedkrclaude
andcommitted
tq_chat.py: native C engine backend — 15.6 tok/s (was 6 tok/s PyTorch)
Redesigned tq_chat.py to use tq_run C engine by default: - Auto-detects model/tokenizer in HuggingFace cache - Calls tq_run as subprocess, parses streaming output - Displays "Native C Inference Engine" in header - Shows tok/s, threads, kv type in KV analysis - Falls back to PyTorch if tq_run not built (--engine pytorch) Speed: 15.6 tok/s (native) vs 6.0 tok/s (PyTorch MPS) = 2.6x faster No Python dependencies needed for native mode. CLI integration: tq demo now routes to native engine by default Fixed model path glob for safetensors-00001-of-00001 variant Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 26dfeab commit d3e02cd

4 files changed

Lines changed: 528 additions & 148 deletions

File tree

docs/assets/hero.png

117 KB
Loading

docs/assets/hero_v06.png

1020 KB
Loading

tools/tq

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,7 @@ commands:
230230
bench Run performance benchmark
231231
+memory MODEL CTX Calculate memory savings
232232
+compare Run A/B comparison (requires build)
233-
demo Interactive chat with Qwen3.5-0.8B
233+
demo Chat with Qwen3.5-0.8B (native C engine)
234234
235235
examples:
236236
tq info
@@ -239,6 +239,7 @@ examples:
239239
tq +memory llama-3.2-3b 65536
240240
tq +memory qwen3.5-0.8b 131072 --json
241241
tq demo "What is quantization?"
242+
tq demo --engine pytorch "What is quantization?"
242243
""")
243244
parser.add_argument("--json", dest="json_output", action="store_true", help="JSON output (for AI agents)")
244245
sub = parser.add_subparsers(dest="command")
@@ -263,8 +264,10 @@ examples:
263264
sub.add_parser("+compare", help="Run A/B comparison")
264265

265266
# demo
266-
p_demo = sub.add_parser("demo", help="Chat with Qwen3.5-0.8B")
267+
p_demo = sub.add_parser("demo", help="Chat with Qwen3.5-0.8B (native C engine)")
267268
p_demo.add_argument("question", nargs="?", help="Question (interactive if omitted)")
269+
p_demo.add_argument("--engine", choices=["native", "pytorch"], default="native",
270+
help="Inference engine: native (tq_run, default) or pytorch")
268271

269272
args = parser.parse_args()
270273

@@ -281,9 +284,12 @@ examples:
281284
elif args.command == "+compare":
282285
return cmd_compare(args)
283286
elif args.command == "demo":
287+
demo_args = ["--engine", args.engine]
288+
if args.question:
289+
demo_args.append(args.question)
284290
os.execvp(sys.executable, [sys.executable,
285291
os.path.join(os.path.dirname(__file__), "tq_chat.py"),
286-
*([] if not args.question else [args.question])])
292+
*demo_args])
287293
return EXIT_OK
288294

289295

0 commit comments

Comments
 (0)