|
| 1 | +# 3DGS SIMD Forward Renderer Plan — ndarray |
| 2 | + |
| 3 | +## Goal |
| 4 | + |
| 5 | +Productionize `ndarray::hpc::splat3d` as the CPU-SIMD forward renderer for 3D Gaussian Splatting. |
| 6 | + |
| 7 | +The renderer should be usable by higher layers as a deterministic kernel package: |
| 8 | + |
| 9 | +```text |
| 10 | +camera + splat block + render budget |
| 11 | + -> |
| 12 | +projected splat footprints + optional framebuffer / visibility report |
| 13 | +``` |
| 14 | + |
| 15 | +## Non-goals |
| 16 | + |
| 17 | +- Do not implement 3D Tiles parsing here. |
| 18 | +- Do not implement ArcGIS or Cesium service APIs here. |
| 19 | +- Do not own tile graph traversal policy here. |
| 20 | +- Do not require WGPU or GPU availability. |
| 21 | + |
| 22 | +## Core modules |
| 23 | + |
| 24 | +Target namespace: |
| 25 | + |
| 26 | +```text |
| 27 | +src/hpc/splat3d/ |
| 28 | + mod.rs |
| 29 | + types.rs |
| 30 | + camera.rs |
| 31 | + covariance.rs |
| 32 | + projection.rs |
| 33 | + ewa.rs |
| 34 | + raster.rs |
| 35 | + visibility.rs |
| 36 | + simd.rs |
| 37 | + report.rs |
| 38 | +``` |
| 39 | + |
| 40 | +## Required public DTOs |
| 41 | + |
| 42 | +```rust |
| 43 | +pub struct Splat3dCamera { |
| 44 | + pub view: [[f32; 4]; 4], |
| 45 | + pub proj: [[f32; 4]; 4], |
| 46 | + pub viewport_width: u32, |
| 47 | + pub viewport_height: u32, |
| 48 | + pub near: f32, |
| 49 | + pub far: f32, |
| 50 | +} |
| 51 | + |
| 52 | +pub struct Splat3dBlockView<'a> { |
| 53 | + pub pos_x: &'a [f32], |
| 54 | + pub pos_y: &'a [f32], |
| 55 | + pub pos_z: &'a [f32], |
| 56 | + pub scale_x: &'a [f32], |
| 57 | + pub scale_y: &'a [f32], |
| 58 | + pub scale_z: &'a [f32], |
| 59 | + pub quat_w: &'a [f32], |
| 60 | + pub quat_x: &'a [f32], |
| 61 | + pub quat_y: &'a [f32], |
| 62 | + pub quat_z: &'a [f32], |
| 63 | + pub opacity: &'a [f32], |
| 64 | + pub rgba: Option<&'a [[u8; 4]]>, |
| 65 | +} |
| 66 | + |
| 67 | +pub struct ProjectedSplat { |
| 68 | + pub screen_x: f32, |
| 69 | + pub screen_y: f32, |
| 70 | + pub depth: f32, |
| 71 | + pub radius_px: f32, |
| 72 | + pub covariance_2d: [f32; 3], |
| 73 | + pub opacity: f32, |
| 74 | + pub valid: bool, |
| 75 | +} |
| 76 | +``` |
| 77 | + |
| 78 | +## Hot path |
| 79 | + |
| 80 | +1. Load SoA splat columns. |
| 81 | +2. SIMD transform world coordinates to view space. |
| 82 | +3. Reject behind-camera, outside-near/far, NaN, invalid scale, invalid quaternion. |
| 83 | +4. Construct 3D SPD covariance from scale/quaternion. |
| 84 | +5. Push covariance through EWA sandwich. |
| 85 | +6. Compute projected screen footprint. |
| 86 | +7. Emit projected splat data or render into framebuffer. |
| 87 | +8. Return counters and failure reasons. |
| 88 | + |
| 89 | +## SIMD tiers |
| 90 | + |
| 91 | +Use existing ndarray SIMD dispatch style: |
| 92 | + |
| 93 | +- scalar baseline |
| 94 | +- AVX2/FMA |
| 95 | +- AVX-512 |
| 96 | +- NEON |
| 97 | +- runtime-dispatch optional path |
| 98 | + |
| 99 | +Every SIMD kernel must have a scalar reference implementation and deterministic tests. |
| 100 | + |
| 101 | +## Error handling |
| 102 | + |
| 103 | +Do not panic on malformed splats in batch paths. Return counters: |
| 104 | + |
| 105 | +```rust |
| 106 | +pub struct Splat3dRenderReport { |
| 107 | + pub input_count: usize, |
| 108 | + pub projected_count: usize, |
| 109 | + pub rejected_behind_camera: usize, |
| 110 | + pub rejected_invalid_covariance: usize, |
| 111 | + pub rejected_too_small: usize, |
| 112 | + pub rejected_outside_view: usize, |
| 113 | + pub max_radius_px: f32, |
| 114 | + pub min_depth: f32, |
| 115 | + pub max_depth: f32, |
| 116 | +} |
| 117 | +``` |
| 118 | + |
| 119 | +## Acceptance criteria |
| 120 | + |
| 121 | +- `cargo test -p ndarray --features std,linalg,splat3d` |
| 122 | +- Scalar and SIMD paths match within explicit tolerances. |
| 123 | +- Invalid splats are counted, not fatal. |
| 124 | +- EWA covariance remains PSD or is rejected with reason. |
| 125 | +- Renderer can process a columnar block without heap allocation in the inner loop. |
| 126 | +- Benchmarks include 1k, 10k, 100k, and 1M splat projection-only workloads. |
| 127 | + |
| 128 | +## Follow-up hooks |
| 129 | + |
| 130 | +- Connect to `src/hpc/pillar/ewa_sandwich_3d.rs` for certification. |
| 131 | +- Expose projection-only mode for `lance-graph` tile preflight. |
| 132 | +- Expose framebuffer mode for CPU preview / headless validation. |
0 commit comments