Commit bad2a55
committed
feat: U8x64 byte-level ops for palette codec, nibble, byte scan (Pumpkin/SD)
Added to all three tiers (AVX-512 / AVX2 / scalar):
cmpeq_mask(other) → u64 — byte-wise equality, returns bitmask
shr_epi16(imm) → Self — shift right 16-bit lanes (nibble extract)
saturating_sub(other) — max(a-b, 0) per byte (delta subtraction)
unpack_lo_epi8(other) — interleave low bytes (nibble interleave)
unpack_hi_epi8(other) — interleave high bytes
These operations are used by:
palette_codec.rs — Minecraft-style variable-width bit packing
nibble.rs — 4-bit light level packing (Pumpkin)
byte_scan.rs — NBT format byte scanning
(future) stable_diffusion/ — VAE latent palette encoding via GGUF
All three are currently using raw _mm256_/_mm512_ intrinsics.
Next step: rewire them to use crate::simd::U8x64 instead.
https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp1 parent 1b06969 commit bad2a55
3 files changed
Lines changed: 150 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
713 | 713 | | |
714 | 714 | | |
715 | 715 | | |
716 | | - | |
| 716 | + | |
717 | 717 | | |
718 | 718 | | |
719 | 719 | | |
720 | 720 | | |
721 | 721 | | |
722 | 722 | | |
723 | 723 | | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
724 | 738 | | |
725 | | - | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
726 | 745 | | |
727 | 746 | | |
728 | 747 | | |
729 | | - | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
730 | 759 | | |
731 | | - | |
| 760 | + | |
732 | 761 | | |
733 | 762 | | |
734 | 763 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
761 | 761 | | |
762 | 762 | | |
763 | 763 | | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
| 806 | + | |
| 807 | + | |
| 808 | + | |
| 809 | + | |
| 810 | + | |
| 811 | + | |
| 812 | + | |
| 813 | + | |
| 814 | + | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
| 826 | + | |
| 827 | + | |
| 828 | + | |
| 829 | + | |
| 830 | + | |
| 831 | + | |
| 832 | + | |
| 833 | + | |
764 | 834 | | |
765 | 835 | | |
766 | 836 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
576 | 576 | | |
577 | 577 | | |
578 | 578 | | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
579 | 626 | | |
580 | 627 | | |
581 | 628 | | |
| |||
0 commit comments