Commit 94a4dfd
committed
feat(dflash): add NVFP4 per-tensor scale2 support
Add support for NVFP4-quantized GGUF models (e.g. LibertAI Qwen3.6-27B-NVFP4)
by loading per-tensor weight scales and applying them in the target graph.
Scale values are read as host-side floats from the GGUF mmap at load time and
applied via ggml_scale() — a compile-time scalar multiply with zero extra
kernel launches. This avoids ggml_mul() with [1]-shaped GPU tensors, which
adds 768 kernel launches per forward pass and causes ~30x overhead in batched
DDTree verify mode (1001ms -> 43ms per step on RTX 5090).
Supports both naming conventions:
- LibertAI: blk.N.ffn_gate.scale
- Heretic: blk.N.ffn_gate.weight.scale
Non-NVFP4 models (Q4_K_M etc) are unaffected — scale fields default to 1.0f
and apply_scale2() returns early with zero overhead.
Also removes the DFLASH27B_USE_BLACKWELL_CONSUMER_FIX CMake workaround, which
incorrectly assumed consumer Blackwell GPUs (RTX 5090) lack FP4 MMA
instructions. The RTX 5090 fully supports sm_120a and native FP4 tensor cores.
Note: full native FP4 MMA performance requires upstream PR ggml-org#22196 to
be merged into the Luce-Org llama.cpp submodule fork. Without it, NVFP4
models still work correctly via the generic dequant-to-Q8_1 fallback path.1 parent abdde79 commit 94a4dfd
4 files changed
Lines changed: 98 additions & 40 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
69 | 69 | | |
70 | 70 | | |
71 | 71 | | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
87 | | - | |
88 | | - | |
89 | | - | |
90 | | - | |
91 | | - | |
92 | | - | |
93 | | - | |
94 | | - | |
95 | | - | |
96 | | - | |
97 | | - | |
98 | | - | |
99 | | - | |
100 | 72 | | |
101 | 73 | | |
102 | 74 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
471 | 471 | | |
472 | 472 | | |
473 | 473 | | |
| 474 | + | |
| 475 | + | |
474 | 476 | | |
475 | 477 | | |
476 | 478 | | |
| |||
572 | 574 | | |
573 | 575 | | |
574 | 576 | | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
575 | 633 | | |
576 | 634 | | |
577 | 635 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
71 | 71 | | |
72 | 72 | | |
73 | 73 | | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
74 | 94 | | |
75 | 95 | | |
76 | 96 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
678 | 678 | | |
679 | 679 | | |
680 | 680 | | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
681 | 689 | | |
682 | 690 | | |
683 | 691 | | |
| |||
686 | 694 | | |
687 | 695 | | |
688 | 696 | | |
689 | | - | |
690 | | - | |
| 697 | + | |
| 698 | + | |
691 | 699 | | |
692 | | - | |
| 700 | + | |
693 | 701 | | |
694 | 702 | | |
695 | 703 | | |
| |||
721 | 729 | | |
722 | 730 | | |
723 | 731 | | |
724 | | - | |
| 732 | + | |
725 | 733 | | |
726 | 734 | | |
727 | 735 | | |
| |||
743 | 751 | | |
744 | 752 | | |
745 | 753 | | |
746 | | - | |
747 | | - | |
| 754 | + | |
| 755 | + | |
748 | 756 | | |
749 | 757 | | |
750 | 758 | | |
| |||
850 | 858 | | |
851 | 859 | | |
852 | 860 | | |
853 | | - | |
| 861 | + | |
854 | 862 | | |
855 | 863 | | |
856 | 864 | | |
| |||
885 | 893 | | |
886 | 894 | | |
887 | 895 | | |
888 | | - | |
| 896 | + | |
889 | 897 | | |
890 | 898 | | |
891 | 899 | | |
892 | | - | |
| 900 | + | |
893 | 901 | | |
894 | 902 | | |
895 | | - | |
| 903 | + | |
896 | 904 | | |
897 | 905 | | |
898 | 906 | | |
899 | 907 | | |
900 | 908 | | |
901 | 909 | | |
902 | 910 | | |
903 | | - | |
| 911 | + | |
904 | 912 | | |
905 | 913 | | |
906 | 914 | | |
| |||
1131 | 1139 | | |
1132 | 1140 | | |
1133 | 1141 | | |
1134 | | - | |
| 1142 | + | |
1135 | 1143 | | |
1136 | 1144 | | |
1137 | 1145 | | |
| |||
0 commit comments