Commit 6a98465
feat(dflash): add NVFP4 per-tensor scale2 support
Add support for NVFP4-quantized GGUF models (e.g. LibertAI Qwen3.6-27B-NVFP4)
by loading per-tensor weight scales and applying them in the target graph.
Scale values are read as host-side floats from the GGUF mmap at load time and
applied via ggml_scale() — a compile-time scalar multiply with zero extra
kernel launches. This avoids ggml_mul() with [1]-shaped GPU tensors, which
adds 768 kernel launches per forward pass and causes ~30x overhead in batched
DDTree verify mode (1001ms -> 43ms per step on RTX 5090).
Supports both naming conventions:
- LibertAI: blk.N.ffn_gate.scale
- Heretic: blk.N.ffn_gate.weight.scale
Non-NVFP4 models (Q4_K_M etc) are unaffected — scale fields default to 1.0f
and apply_scale2() returns early with zero overhead.
Also removes the DFLASH27B_USE_BLACKWELL_CONSUMER_FIX CMake workaround, which
incorrectly assumed consumer Blackwell GPUs (RTX 5090) lack FP4 MMA
instructions. The RTX 5090 fully supports sm_120a and native FP4 tensor cores.
Note: full native FP4 MMA performance requires upstream PR ggml-org#22196 to
be merged into the Luce-Org llama.cpp submodule fork. Without it, NVFP4
models still work correctly via the generic dequant-to-Q8_1 fallback path.1 parent 230ff17 commit 6a98465
3 files changed
Lines changed: 99 additions & 12 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
71 | 71 | | |
72 | 72 | | |
73 | 73 | | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
74 | 94 | | |
75 | 95 | | |
76 | 96 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
474 | 474 | | |
475 | 475 | | |
476 | 476 | | |
| 477 | + | |
| 478 | + | |
477 | 479 | | |
478 | 480 | | |
479 | 481 | | |
| |||
575 | 577 | | |
576 | 578 | | |
577 | 579 | | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
578 | 636 | | |
579 | 637 | | |
580 | 638 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
416 | 416 | | |
417 | 417 | | |
418 | 418 | | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
419 | 427 | | |
420 | 428 | | |
421 | | - | |
| 429 | + | |
422 | 430 | | |
423 | | - | |
| 431 | + | |
424 | 432 | | |
425 | | - | |
| 433 | + | |
426 | 434 | | |
427 | 435 | | |
428 | 436 | | |
| |||
456 | 464 | | |
457 | 465 | | |
458 | 466 | | |
459 | | - | |
| 467 | + | |
460 | 468 | | |
461 | 469 | | |
462 | 470 | | |
| |||
478 | 486 | | |
479 | 487 | | |
480 | 488 | | |
481 | | - | |
482 | | - | |
| 489 | + | |
| 490 | + | |
483 | 491 | | |
484 | 492 | | |
485 | 493 | | |
| |||
610 | 618 | | |
611 | 619 | | |
612 | 620 | | |
613 | | - | |
| 621 | + | |
614 | 622 | | |
615 | 623 | | |
616 | 624 | | |
| |||
645 | 653 | | |
646 | 654 | | |
647 | 655 | | |
648 | | - | |
| 656 | + | |
649 | 657 | | |
650 | 658 | | |
651 | 659 | | |
652 | | - | |
| 660 | + | |
653 | 661 | | |
654 | 662 | | |
655 | | - | |
| 663 | + | |
656 | 664 | | |
657 | 665 | | |
658 | 666 | | |
659 | 667 | | |
660 | 668 | | |
661 | 669 | | |
662 | 670 | | |
663 | | - | |
| 671 | + | |
664 | 672 | | |
665 | 673 | | |
666 | 674 | | |
| |||
885 | 893 | | |
886 | 894 | | |
887 | 895 | | |
888 | | - | |
| 896 | + | |
889 | 897 | | |
890 | 898 | | |
891 | 899 | | |
| |||
1462 | 1470 | | |
1463 | 1471 | | |
1464 | 1472 | | |
| 1473 | + | |
1465 | 1474 | | |
0 commit comments