You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
sigma_packed:&[f32], // length 6*N (upper-triangle Σ per Gaussian)
202
+
cholesky_scratch:&mut [f32], // length 6*N (caller-provided; holds packed L per Gaussian)
203
+
out_dist_sq:&mut [f32], // length M*N (row-major)
198
204
);
199
205
```
200
206
201
-
**Implementation note:** internally calls `batched_cholesky_3x3`once on `sigma_packed`, caches L (heap-free via stack or caller-provided scratch), then triangular-solve + squared norm per (m, n) pair.
207
+
**Implementation note:** internally calls `batched_cholesky_3x3` on `sigma_packed` once per call, writing packed L into `cholesky_scratch` (caller-provided; zero-allocation contract). The caller sizes the buffer as `6 * N * size_of::<f32>()` — for `N = 1_000_000` Gaussians this is **24 MiB**, which is not stack-feasible; callers must allocate it once at engine init and re-use across frames (matches the `splat-fit` / registration loop pattern). For small `N` (e.g. `N ≤ 8192`) callers MAY pass a stack-resident buffer. The function MUST NOT allocate internally.
202
208
203
209
**Tests:**
204
210
- Reference comparison against scipy `scipy.spatial.distance.mahalanobis` on random points + Σ.
sorted_amplitudes:&[f32], // flat; contains all rays' samples concatenated
236
+
ray_offsets:&[u32], // length = n_rays + 1 (CSR-style); ray r's range is [ray_offsets[r]..ray_offsets[r+1])
237
+
opacity_lut:&[u8; 256],
238
+
out_alpha:&mut [u8], // length = n_rays
230
239
);
231
240
```
232
241
242
+
**Per-ray segmentation contract.** A renderer composites N independent view rays per frame; each ray has its own front-to-back-sorted Gaussian sequence. `ray_offsets` is a CSR-style prefix-sum (length `n_rays + 1`) so ray `r`'s amplitudes are `sorted_amplitudes[ray_offsets[r] as usize..ray_offsets[r+1] as usize]` and `out_alpha[r]` is its composited alpha. Constraints:
243
+
-`ray_offsets[0] == 0` and `ray_offsets[n_rays] == sorted_amplitudes.len() as u32` (assert-on-debug).
244
+
- A ray with `ray_offsets[r] == ray_offsets[r+1]` (empty) yields `out_alpha[r] = 0`.
245
+
- Per-frame amplitude quantization (the 256-bucket LUT input) is computed by the caller from the per-frame max amplitude; `opacity_lut` is a frame-global constant for that pass.
246
+
247
+
**Implementation note:** the SIMD inner loop processes one ray's range as a contiguous front-to-back sweep; rays are independent (no cross-ray data dependence) so the outer ray loop is trivially parallelizable.
248
+
233
249
**Tests:**
234
-
- Reference comparison against scalar reference for known sequences.
250
+
- Reference comparison against scalar reference for known sequences (single-ray + multi-ray).
235
251
- Saturation at full opacity (sequence of high-amplitude Gaussians → α = 255).
236
-
- Empty sequence → α = 0.
252
+
- Empty ray (`ray_offsets[r] == ray_offsets[r+1]`) → α = 0.
253
+
- Multi-ray independence (concatenated rays produce same per-ray output as separate single-ray calls).
254
+
-`ray_offsets` invariant violations (debug assert on `ray_offsets[0] != 0` or `ray_offsets[last] != amplitudes.len()`).
0 commit comments