Commit 2c5cc3a
committed
feat(hpc): SIMD-accelerate cam_pq + deepnsm hot paths via crate::simd
All consumer code uses crate::simd only. Zero raw intrinsics.
LazyLock dispatch table selects AVX-512 vs AVX2 at startup.
cam_pq.rs — squared_l2():
- Called 1,536× per CAM-PQ query (6 subspaces × 256 centroids)
- Was: scalar iter().zip().map().sum()
- Now: F32x16 for 16D subvectors (one SIMD lane = one subspace dimension)
- Fast path: n==16 → single load-subtract-multiply-reduce
- Medium path: n>=16 → chunked F32x16 with mul_add + scalar remainder
- Estimated 16× speedup on hot path
deepnsm.rs — nsm_decompose() normalization:
- Was: scalar iter().sum() + scalar /= loop
- Now: F32x16 accumulation (4×16=64 elements) + scalar remainder (10)
- Normalize via F32x16 * splat(1/sum) + scalar tail
deepnsm.rs — nsm_to_fingerprint() XOR:
- Was: scalar for j in 0..1250 { result[j] ^= pattern[j] }
- Now: U8x64 XOR (19×64=1216 bytes) + scalar remainder (34 bytes)
- 64 bytes per SIMD operation vs 1 byte scalar
deepnsm.rs — nsm_similarity() cosine:
- Was: scalar 3-accumulator loop over 74 elements
- Now: F32x16 with mul_add for dot/mag_a/mag_b (4×16=64) + scalar tail (10)
- Three reductions in one pass
23 deepnsm tests + 7 dispatch tests passing. Zero regressions.
https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o71 parent e21fbe1 commit 2c5cc3a
2 files changed
Lines changed: 105 additions & 15 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
456 | 456 | | |
457 | 457 | | |
458 | 458 | | |
459 | | - | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
460 | 463 | | |
461 | 464 | | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
462 | 499 | | |
463 | 500 | | |
464 | 501 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
609 | 609 | | |
610 | 610 | | |
611 | 611 | | |
612 | | - | |
613 | | - | |
614 | | - | |
615 | | - | |
616 | | - | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
617 | 619 | | |
618 | | - | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
619 | 637 | | |
620 | 638 | | |
621 | | - | |
| 639 | + | |
622 | 640 | | |
623 | 641 | | |
624 | 642 | | |
| |||
656 | 674 | | |
657 | 675 | | |
658 | 676 | | |
659 | | - | |
660 | | - | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
661 | 693 | | |
662 | 694 | | |
663 | 695 | | |
664 | 696 | | |
665 | 697 | | |
666 | 698 | | |
667 | | - | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
668 | 703 | | |
669 | | - | |
670 | | - | |
671 | | - | |
672 | | - | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
| 715 | + | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
673 | 725 | | |
674 | 726 | | |
675 | 727 | | |
676 | 728 | | |
| 729 | + | |
677 | 730 | | |
678 | 731 | | |
679 | 732 | | |
| |||
0 commit comments