Commit 0c1fb05
committed
docs(splat-native): address review feedback on #212 (2 fixes)
Follow-up to PR #212 (merged). Addresses two codex review findings.
## Fix 1 — `batched_opacity_blend` needs ray segmentation (codex P1)
The original signature took a single flat `sorted_amplitudes` slice
and emitted a single `out_alpha[ray]` — but a renderer composites N
independent view rays per frame, each with its own front-to-back-
sorted Gaussian sequence. Without per-ray boundaries the
implementation could not know which Gaussians belong to which output
pixel, so it would either composite the same global sequence for
every ray or guess boundaries outside the API.
Adds a CSR-style `ray_offsets: &[u32]` prefix-sum (length `n_rays + 1`)
that segments the flat amplitude buffer into per-ray ranges.
Documented contract:
- `ray_offsets[0] == 0` and `ray_offsets[n_rays] == sorted_amplitudes.len()`
- Empty ray (`ray_offsets[r] == ray_offsets[r+1]`) yields `out_alpha[r] = 0`
- Rays are independent (no cross-ray data dependence) — outer loop
is trivially parallelizable
- Per-frame amplitude quantization is caller-side; `opacity_lut` is
a frame-global constant for that pass
Adds three new tests:
- Multi-ray independence (concatenated rays match per-ray calls)
- Empty-ray boundary case (→ α = 0)
- `ray_offsets` invariant debug-asserts
## Fix 2 — `batched_mahalanobis` needs scratch buffer for Cholesky cache (codex P2)
The original implementation note said L was "heap-free via stack or
caller-provided scratch" but the public signature had no scratch
parameter. At the documented `N = 1_000_000` bench size, the
Cholesky cache is `6 * N * size_of::<f32>() = 24 MiB` — not
stack-feasible. The function would either have to allocate
internally (breaking the zero-allocation contract) or recompute
factors per query (breaking the throughput contract).
Adds explicit `cholesky_scratch: &mut [f32]` parameter (length `6*N`)
with documented sizing guidance:
- `N ≤ 8192` MAY use a stack-resident buffer
- `N > 8192` MUST allocate once at engine init and re-use across frames
- The function MUST NOT allocate internally
Matches the `splat-fit` engine and registration-loop pattern where
the scratch is allocated once per `SplatFitActor` mailbox at boot.
## What's NOT in this PR
- Source code: still none. Plan-spec only.
- The W1c primitive-addition contract (all three backends mandatory,
parity tests gate, VPABSB-correction-style degenerate-input
documentation) is unchanged — the fix updates the two signatures
but not the testing or backend invariants.
## Test plan
- [x] Codex P1 (ray segmentation) — added per-ray offset + 3 new tests
to the contract.
- [x] Codex P2 (Mahalanobis scratch) — added `cholesky_scratch`
parameter + sizing note + zero-allocation contract.
- [x] Signatures rebalanced (each line-broken with one arg per line +
sized comments) for readability.
- [ ] Codex re-review on this PR.1 parent 481205a commit 0c1fb05
1 file changed
Lines changed: 21 additions & 5 deletions
Lines changed: 21 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
194 | 194 | | |
195 | 195 | | |
196 | 196 | | |
197 | | - | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
198 | 202 | | |
199 | 203 | | |
200 | 204 | | |
201 | | - | |
| 205 | + | |
202 | 206 | | |
203 | 207 | | |
204 | 208 | | |
| |||
226 | 230 | | |
227 | 231 | | |
228 | 232 | | |
229 | | - | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
230 | 237 | | |
231 | 238 | | |
232 | 239 | | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
233 | 247 | | |
234 | | - | |
| 248 | + | |
235 | 249 | | |
236 | | - | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
237 | 253 | | |
238 | 254 | | |
239 | 255 | | |
| |||
0 commit comments