|
| 1 | +# Jitson Shopping List — Implementation Plan |
| 2 | + |
| 3 | +> **Date:** 2026-03-24 |
| 4 | +> **Scope:** ndarray HPC SIMD upgrades for Pumpkin Minecraft server optimization |
| 5 | +> **Principle:** Upgrade existing scalar code to SIMD, file-by-file, with scalar parity tests |
| 6 | +
|
| 7 | +--- |
| 8 | + |
| 9 | +## Architecture Pattern (All SIMD Code Follows This) |
| 10 | + |
| 11 | +1. **Public dispatch function** → `is_x86_feature_detected!()` → best available backend |
| 12 | +2. **`#[target_feature(enable = "...")]` unsafe inner** → actual intrinsics |
| 13 | +3. **Scalar fallback** always present |
| 14 | +4. **`// SAFETY:` comment** before every unsafe block |
| 15 | +5. **Parity tests** compare SIMD output against scalar reference |
| 16 | + |
| 17 | +Dispatch hierarchy: AVX-512 VPOPCNTDQ > AVX-512 BW > AVX-512 F > AVX2 > SSE4.1 > scalar |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +## Phase 1: Foundation SIMD Upgrades (No New Public API, Parallelizable) |
| 22 | + |
| 23 | +### 1A. `byte_scan.rs` — AVX-512 VPCMPB (64 bytes/cycle) |
| 24 | +- Add `byte_find_all_avx512` + `byte_count_avx512` using `_mm512_cmpeq_epi8_mask` |
| 25 | +- Update dispatch: check `avx512bw` before `avx2` |
| 26 | +- **~60 new lines** |
| 27 | + |
| 28 | +### 1B. `property_mask.rs` — AVX-512 VPTERNLOGD + VPOPCNTDQ |
| 29 | +- Add `test_section_avx512` processing 8 u64s/iter with `_mm512_ternarylogic_epi64` |
| 30 | +- Add `count_section_avx512` with `_mm512_popcnt_epi64` (VPOPCNTDQ) |
| 31 | +- **~80 new lines** |
| 32 | + |
| 33 | +### 1C. `palette_codec.rs` — AVX-512 Unpack All Bit Widths + Pack |
| 34 | +- Add `unpack_generic_avx512` using `_mm512_srlv_epi32` (VPSRLVD) with shift table |
| 35 | +- Add `pack_generic_avx512` using `_mm512_sllv_epi32` (VPSLLVD) + `_mm512_or_epi32` |
| 36 | +- Start with power-of-2 widths (1,2,4,8), then add 3,5,6,7 |
| 37 | +- **~150 new lines** |
| 38 | + |
| 39 | +--- |
| 40 | + |
| 41 | +## Phase 2: Nibble Module Expansion |
| 42 | + |
| 43 | +### 2A. `nibble_unpack_avx2` — 32 nibbles/cycle |
| 44 | +- Load 16 bytes → AND low, shift+AND high → interleave → store 32 u8s |
| 45 | +- **~50 new lines** |
| 46 | + |
| 47 | +### 2B. `nibble_above_threshold_avx2` — SIMD threshold scan |
| 48 | +- Split lo/hi nibbles, cmpgt threshold, extract bitmask, emit indices |
| 49 | +- **~60 new lines** |
| 50 | + |
| 51 | +### 2C. `nibble_propagate_bfs` — Compose existing kernels |
| 52 | +- `nibble_sub_clamp(packed, delta)` + `nibble_above_threshold(packed, 0)` → frontier |
| 53 | +- **~20 new lines** |
| 54 | + |
| 55 | +### 2D. `nibble_sub_clamp_avx512` — 64 bytes/iter (128 nibbles) |
| 56 | +- `_mm512_subs_epu8` for saturating subtract |
| 57 | +- **~35 new lines** |
| 58 | + |
| 59 | +--- |
| 60 | + |
| 61 | +## Phase 3: AABB Module |
| 62 | + |
| 63 | +### 3A. AVX-512 Batch Intersect — 16 candidates/iter |
| 64 | +- Broadcast query, gather candidate coords, `_mm512_cmp_ps_mask`, AND 6 kmasks |
| 65 | +- **~80 new lines** |
| 66 | + |
| 67 | +### 3B. Ray-AABB Slab Test — Projectile collision |
| 68 | +- New `Ray` struct, slab method (t_enter/t_exit), scalar + AVX-512 |
| 69 | +- **~120 new lines** |
| 70 | + |
| 71 | +--- |
| 72 | + |
| 73 | +## Phase 4: Spatial Hash SIMD Distance |
| 74 | + |
| 75 | +- `batch_sq_dist_avx2` helper for inner loop |
| 76 | +- New `query_radius_simd` method |
| 77 | +- **~100 new lines** |
| 78 | + |
| 79 | +--- |
| 80 | + |
| 81 | +## Phase 5: Jitson Templates |
| 82 | + |
| 83 | +### 5A. TerrainFillParams — Baked biome params for JIT fill loop |
| 84 | +### 5B. CompiledNoiseConfig — Flattened octave params for JIT compilation |
| 85 | +- **~140 new lines combined** |
| 86 | + |
| 87 | +--- |
| 88 | + |
| 89 | +## Phase 6: Wiring |
| 90 | +- Re-export new types from `jitson/mod.rs` |
| 91 | +- **~5 lines** |
| 92 | + |
| 93 | +--- |
| 94 | + |
| 95 | +## Total: ~900 new lines across 8 files |
| 96 | + |
| 97 | +## Dependency Graph |
| 98 | + |
| 99 | +``` |
| 100 | +Phase 1 (parallel): byte_scan ─┐ |
| 101 | + prop_mask ──┼── Phase 2 (nibble) ── Phase 3 (aabb) ── Phase 4 (spatial) |
| 102 | + palette_codec┘ │ |
| 103 | + Phase 5 (jitson) |
| 104 | + │ |
| 105 | + Phase 6 (wire) |
| 106 | +``` |
0 commit comments