Commit b8edd00
v0.9.1: DeltaNet NEON optimization + cached Q8 + fast_exp
4 optimizations applied to non-matmul overhead:
A. NEON DeltaNet: fused decay+sk, outer product+output (2 passes vs 3)
B. Batched conv1d+SiLU: 4 channels/NEON, unrolled conv_width=4
C. Cached Q8 quantization: ~90 redundant quantizations eliminated/token
D. fast_expf(): Schraudolph's algorithm for sigmoid/softplus/SiLU/decay
Honest speed assessment:
Actual throughput: ~16 tok/s (50 tokens, including model loading)
Previous "38 tok/s" claim was excluding load time — corrected
DeltaNet optimizations show modest improvement in profiler but
wall-clock time dominated by model loading (~5s)
19/19 tests pass. Correctness verified: "France = Paris"
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent f8f286d commit b8edd00
4 files changed
Lines changed: 423 additions & 97 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
4 | | - | |
| 3 | + | |
| 4 | + | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | | - | |
| 11 | + | |
| 12 | + | |
12 | 13 | | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
23 | 63 | | |
24 | 64 | | |
25 | | - | |
| 65 | + | |
26 | 66 | | |
27 | 67 | | |
28 | 68 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
257 | 257 | | |
258 | 258 | | |
259 | 259 | | |
| 260 | + | |
| 261 | + | |
260 | 262 | | |
261 | 263 | | |
262 | 264 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
549 | 549 | | |
550 | 550 | | |
551 | 551 | | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
552 | 593 | | |
553 | 594 | | |
554 | 595 | | |
| |||
756 | 797 | | |
757 | 798 | | |
758 | 799 | | |
759 | | - | |
760 | 800 | | |
761 | 801 | | |
762 | 802 | | |
| |||
0 commit comments