|
| 1 | +# 3DGS Validation and Benchmark Plan — ndarray |
| 2 | + |
| 3 | +## Goal |
| 4 | + |
| 5 | +Create the reproducibility, correctness, and performance harness for the ndarray-side 3DGS work. |
| 6 | + |
| 7 | +This plan covers tests and benchmarks for: |
| 8 | + |
| 9 | +- splat covariance construction |
| 10 | +- EWA projection |
| 11 | +- CPU-SIMD projection/raster kernels |
| 12 | +- pillar certificates |
| 13 | +- quantized codecs |
| 14 | +- HHTL cascade decisions |
| 15 | + |
| 16 | +## Test tiers |
| 17 | + |
| 18 | +### Tier 0: compile gates |
| 19 | + |
| 20 | +Required gates: |
| 21 | + |
| 22 | +```bash |
| 23 | +cargo check -p ndarray --features std,linalg,splat3d |
| 24 | +cargo check -p ndarray --features std,linalg,pillar,splat3d |
| 25 | +cargo test -p ndarray --features std,linalg,pillar,splat3d |
| 26 | +``` |
| 27 | + |
| 28 | +### Tier 1: scalar reference tests |
| 29 | + |
| 30 | +Every SIMD path must have a scalar reference. |
| 31 | + |
| 32 | +Required tests: |
| 33 | + |
| 34 | +- scalar covariance construction |
| 35 | +- scalar EWA projection |
| 36 | +- scalar frustum rejection |
| 37 | +- scalar HHTL decision |
| 38 | +- scalar codec roundtrip |
| 39 | + |
| 40 | +### Tier 2: SIMD equivalence tests |
| 41 | + |
| 42 | +For each supported SIMD tier: |
| 43 | + |
| 44 | +- compare projected coordinates within tolerance |
| 45 | +- compare projected covariance within tolerance |
| 46 | +- compare rejection masks exactly where possible |
| 47 | +- compare aggregate counters exactly |
| 48 | + |
| 49 | +### Tier 3: deterministic pillar probes |
| 50 | + |
| 51 | +Run the active pillar probes twice in the same test and assert identical reports where expected. |
| 52 | + |
| 53 | +Required properties: |
| 54 | + |
| 55 | +- seed preserved |
| 56 | +- sample count preserved |
| 57 | +- pass/fail preserved |
| 58 | +- PSD rate stable |
| 59 | +- concentration values stable within tolerance |
| 60 | + |
| 61 | +### Tier 4: golden vectors |
| 62 | + |
| 63 | +Add small committed fixtures: |
| 64 | + |
| 65 | +```text |
| 66 | +tests/fixtures/3dgs/ |
| 67 | + tiny_splats_f32.json |
| 68 | + tiny_camera.json |
| 69 | + projected_reference.json |
| 70 | + quantized_reference.json |
| 71 | +``` |
| 72 | + |
| 73 | +Fixtures should be small enough for code review. |
| 74 | + |
| 75 | +## Benchmark groups |
| 76 | + |
| 77 | +Use Criterion where already available. |
| 78 | + |
| 79 | +Suggested benchmarks: |
| 80 | + |
| 81 | +```text |
| 82 | +splat3d_project_1k |
| 83 | +splat3d_project_10k |
| 84 | +splat3d_project_100k |
| 85 | +splat3d_project_1m |
| 86 | +splat3d_codec_f32_to_f16 |
| 87 | +splat3d_codec_tile_i16 |
| 88 | +hhtl_heel_frustum_100k |
| 89 | +hhtl_full_cascade_100k |
| 90 | +pillar_ewa_3d_probe |
| 91 | +``` |
| 92 | + |
| 93 | +## Metrics to record |
| 94 | + |
| 95 | +- splats projected per second |
| 96 | +- rejected splats per second |
| 97 | +- blocks classified per second |
| 98 | +- bytes read per projected splat |
| 99 | +- allocations per call |
| 100 | +- max covariance error |
| 101 | +- max projected coordinate error |
| 102 | +- certificate generation time |
| 103 | + |
| 104 | +## Regression policy |
| 105 | + |
| 106 | +A benchmark regression is not automatically a correctness failure. |
| 107 | + |
| 108 | +Correctness failures: |
| 109 | + |
| 110 | +- non-deterministic certificate output |
| 111 | +- invalid PSD covariance accepted |
| 112 | +- scalar/SIMD divergence above tolerance |
| 113 | +- codec roundtrip above declared error |
| 114 | +- HHTL action mismatch between scalar and SIMD reference |
| 115 | + |
| 116 | +Performance warnings: |
| 117 | + |
| 118 | +- more allocations in hot path |
| 119 | +- throughput regression above configured threshold |
| 120 | +- unexpected branch-heavy behavior in SIMD path |
| 121 | + |
| 122 | +## CI recommendations |
| 123 | + |
| 124 | +Use two profiles: |
| 125 | + |
| 126 | +1. correctness profile |
| 127 | + - fast |
| 128 | + - runs on every PR |
| 129 | + - scalar + default SIMD feature checks |
| 130 | + |
| 131 | +2. benchmark profile |
| 132 | + - manual or scheduled |
| 133 | + - records hardware and CPU features |
| 134 | + - writes results to a stable report artifact |
| 135 | + |
| 136 | +## Acceptance criteria |
| 137 | + |
| 138 | +- A developer can run one command and verify 3DGS correctness. |
| 139 | +- A developer can run one command and benchmark 3DGS hot paths. |
| 140 | +- Every certificate includes enough metadata to reproduce the run. |
| 141 | +- All fixture files are small and human-inspectable. |
| 142 | +- No benchmark depends on network access. |
0 commit comments