Skip to content

Commit df01951

Browse files
authored
Merge pull request #30 from AdaWorldAPI/claude/continue-session-0mAVa
feat(hpc): implement jitson shopping list — SIMD upgrades + terrain t…
2 parents e856f30 + e4c5f01 commit df01951

9 files changed

Lines changed: 1534 additions & 6 deletions

File tree

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Jitson Shopping List — Implementation Plan
2+
3+
> **Date:** 2026-03-24
4+
> **Scope:** ndarray HPC SIMD upgrades for Pumpkin Minecraft server optimization
5+
> **Principle:** Upgrade existing scalar code to SIMD, file-by-file, with scalar parity tests
6+
7+
---
8+
9+
## Architecture Pattern (All SIMD Code Follows This)
10+
11+
1. **Public dispatch function**`is_x86_feature_detected!()` → best available backend
12+
2. **`#[target_feature(enable = "...")]` unsafe inner** → actual intrinsics
13+
3. **Scalar fallback** always present
14+
4. **`// SAFETY:` comment** before every unsafe block
15+
5. **Parity tests** compare SIMD output against scalar reference
16+
17+
Dispatch hierarchy: AVX-512 VPOPCNTDQ > AVX-512 BW > AVX-512 F > AVX2 > SSE4.1 > scalar
18+
19+
---
20+
21+
## Phase 1: Foundation SIMD Upgrades (No New Public API, Parallelizable)
22+
23+
### 1A. `byte_scan.rs` — AVX-512 VPCMPB (64 bytes/cycle)
24+
- Add `byte_find_all_avx512` + `byte_count_avx512` using `_mm512_cmpeq_epi8_mask`
25+
- Update dispatch: check `avx512bw` before `avx2`
26+
- **~60 new lines**
27+
28+
### 1B. `property_mask.rs` — AVX-512 VPTERNLOGD + VPOPCNTDQ
29+
- Add `test_section_avx512` processing 8 u64s/iter with `_mm512_ternarylogic_epi64`
30+
- Add `count_section_avx512` with `_mm512_popcnt_epi64` (VPOPCNTDQ)
31+
- **~80 new lines**
32+
33+
### 1C. `palette_codec.rs` — AVX-512 Unpack All Bit Widths + Pack
34+
- Add `unpack_generic_avx512` using `_mm512_srlv_epi32` (VPSRLVD) with shift table
35+
- Add `pack_generic_avx512` using `_mm512_sllv_epi32` (VPSLLVD) + `_mm512_or_epi32`
36+
- Start with power-of-2 widths (1,2,4,8), then add 3,5,6,7
37+
- **~150 new lines**
38+
39+
---
40+
41+
## Phase 2: Nibble Module Expansion
42+
43+
### 2A. `nibble_unpack_avx2` — 32 nibbles/cycle
44+
- Load 16 bytes → AND low, shift+AND high → interleave → store 32 u8s
45+
- **~50 new lines**
46+
47+
### 2B. `nibble_above_threshold_avx2` — SIMD threshold scan
48+
- Split lo/hi nibbles, cmpgt threshold, extract bitmask, emit indices
49+
- **~60 new lines**
50+
51+
### 2C. `nibble_propagate_bfs` — Compose existing kernels
52+
- `nibble_sub_clamp(packed, delta)` + `nibble_above_threshold(packed, 0)` → frontier
53+
- **~20 new lines**
54+
55+
### 2D. `nibble_sub_clamp_avx512` — 64 bytes/iter (128 nibbles)
56+
- `_mm512_subs_epu8` for saturating subtract
57+
- **~35 new lines**
58+
59+
---
60+
61+
## Phase 3: AABB Module
62+
63+
### 3A. AVX-512 Batch Intersect — 16 candidates/iter
64+
- Broadcast query, gather candidate coords, `_mm512_cmp_ps_mask`, AND 6 kmasks
65+
- **~80 new lines**
66+
67+
### 3B. Ray-AABB Slab Test — Projectile collision
68+
- New `Ray` struct, slab method (t_enter/t_exit), scalar + AVX-512
69+
- **~120 new lines**
70+
71+
---
72+
73+
## Phase 4: Spatial Hash SIMD Distance
74+
75+
- `batch_sq_dist_avx2` helper for inner loop
76+
- New `query_radius_simd` method
77+
- **~100 new lines**
78+
79+
---
80+
81+
## Phase 5: Jitson Templates
82+
83+
### 5A. TerrainFillParams — Baked biome params for JIT fill loop
84+
### 5B. CompiledNoiseConfig — Flattened octave params for JIT compilation
85+
- **~140 new lines combined**
86+
87+
---
88+
89+
## Phase 6: Wiring
90+
- Re-export new types from `jitson/mod.rs`
91+
- **~5 lines**
92+
93+
---
94+
95+
## Total: ~900 new lines across 8 files
96+
97+
## Dependency Graph
98+
99+
```
100+
Phase 1 (parallel): byte_scan ─┐
101+
prop_mask ──┼── Phase 2 (nibble) ── Phase 3 (aabb) ── Phase 4 (spatial)
102+
palette_codec┘ │
103+
Phase 5 (jitson)
104+
105+
Phase 6 (wire)
106+
```

0 commit comments

Comments
 (0)