|
| 1 | +# libsurvive Launch-Readiness Triage |
| 2 | + |
| 3 | +**Created**: 2026-04-17 |
| 4 | +**Context**: Stagehand virtual-production shoots. Robustness + stability over all else. Fail closed and loudly — grey failures are the worst failure mode. Cheaper to stop a shoot than fix bad data in post. |
| 5 | +**Source docs**: `specs/*`, `arrows/*`, `reflection-rejection.md` |
| 6 | + |
| 7 | +## Summary |
| 8 | + |
| 9 | +33 open items in the libsurvive docs, bucketed by shoot-readiness impact. |
| 10 | + |
| 11 | +- 6 **launch blockers** — silent corruption or grey failures |
| 12 | +- 5 **should fix before launch** — diagnosability gaps |
| 13 | +- 7 **nice to have** — robustness / dev hygiene |
| 14 | +- 8 **save for later** — irrelevant to stagehand (third-party bindings) |
| 15 | + |
| 16 | +--- |
| 17 | + |
| 18 | +## Launch Blockers |
| 19 | + |
| 20 | +Each item either (a) lets bad data through without visibility, or (b) makes on-set incident response harder because docs/specs don't match reality. |
| 21 | + |
| 22 | +- [ ] **LB-1 — Close spec drift on back-face filter** (`LPI-PROC-060` through `LPI-PROC-066`) |
| 23 | + - Category: **coherence / operational safety** |
| 24 | + - Feature deployed to both Pis with `--filter-normal-facingness 0.1`. Seven specs in [lighthouse-protocol-intelligence-specs.md](specs/lighthouse-protocol-intelligence-specs.md) still marked `[ ]`. |
| 25 | + - Fix: mark `[x]`, update [lighthouse-protocol-intelligence.md](arrows/lighthouse-protocol-intelligence.md) arrow-doc coverage table. |
| 26 | + - Risk if skipped: on-set grep "is the back-face filter in?" returns "not implemented" during a 3am incident. |
| 27 | + |
| 28 | +- [ ] **LB-2 — Verify GSS solve-failure rejection** (`DDS-BE-054`) |
| 29 | + - Category: **fail-closed on calibration corruption** |
| 30 | + - Spec: if global scene solver error exceeds threshold, reject result and retain previous calibration. |
| 31 | + - `--max-cal-error 0.01` clamps the initial solve, but mid-session re-solve rejection may not be wired. |
| 32 | + - Fix: audit `driver_global_scene_solver.c` for rejection path; implement if missing. |
| 33 | + - Risk if skipped: bad GSS solve silently overwrites good calibration mid-shoot. |
| 34 | + |
| 35 | +- [ ] **LB-3 — Recording file open/write failure = loud error** (`LI-BE-063`) |
| 36 | + - Category: **fail-loud (directly matches stagehand principle)** |
| 37 | + - Currently libsurvive can silently drop recording events if the file cannot be written. |
| 38 | + - For shoots using `.rec.gz` as authoritative capture, silent drop = unrecoverable grey failure. |
| 39 | + - Fix: emit error and disable recording instead of silent drop. |
| 40 | + |
| 41 | +- [ ] **LB-4 — Remove debug `fprintf` from Gen2 LFSR decoder** (`LPI-PROC-032`) |
| 42 | + - Category: **operational safety** |
| 43 | + - Uncontrolled stdout in hot path — pollutes logs, can distort timing. |
| 44 | + - Flagged in [lighthouse-protocol-intelligence.md](arrows/lighthouse-protocol-intelligence.md) "Must Fix #1". Cheap. |
| 45 | + - Fix: wrap in `SV_VERBOSE` or delete. |
| 46 | + |
| 47 | +- [ ] **LB-5 — Resolve duplicate spec IDs `TE-PROC-040/041`** |
| 48 | + - Category: **coherence** |
| 49 | + - Defined twice in [tracking-engine-specs.md](specs/tracking-engine-specs.md): once for `lc-angular-rate-max` (lines 43–44), once for Kalman lighthouse tracker (lines 49–50). |
| 50 | + - Breaks grep-based traceability during incident response. |
| 51 | + - Fix: renumber the Kalman-lighthouse-tracker pair to unused IDs. |
| 52 | + |
| 53 | +- [ ] **LB-6 — Measure + enable pose-emission gates** (`kalman-max-pose-angular-rate`, `--light-max-error`) |
| 54 | + - Category: **operational safety / fail-closed on output** |
| 55 | + - From [reflection-rejection.md](../reflection-rejection.md) bottom checklist. Orthogonal to existing input-side defenses. |
| 56 | + - Without these, a reflection that slips past `light-outlier-threshold` + `filter-normal-facingness` reaches the output stream. |
| 57 | + - Fix: |
| 58 | + - [ ] Run `reflect_test.cap` with `--survive-verbose 105`, record per-pose angular rates during reflection bursts and clean tracking. |
| 59 | + - [ ] Set `--kalman-max-pose-angular-rate` from the data (expected 5–10 rad/s). |
| 60 | + - [ ] Enable `--light-max-error 0.01` in the agent config. |
| 61 | + |
| 62 | +## Should Fix Before Launch |
| 63 | + |
| 64 | +Diagnosability. Won't corrupt data but make grey failures hard to spot. |
| 65 | + |
| 66 | +- [ ] **SF-1 — Warn on high OOTX sync bit error rate** (`LPI-PROC-026`) |
| 67 | + - Category: **fail-loud on degraded calibration** |
| 68 | + - Silent OOTX degradation is the classic grey failure. |
| 69 | + |
| 70 | +- [ ] **SF-2 — Surface IEKF divergence as a warning** (tracking-engine arrow, Should Fix #2) |
| 71 | + - Category: **fail-loud** |
| 72 | + - IEKF hitting max-iter without convergence = Kalman is guessing. Currently invisible. |
| 73 | + - Fix: emit warning via `printf` hook on max-iter termination. |
| 74 | + |
| 75 | +- [ ] **SF-3 — Warn on conflicting Gen1/Gen2 generation classification** (`LPI-PROC-004`) |
| 76 | + - Category: **fail-loud** |
| 77 | + - Mid-session generation flips mean calibration decoded under the wrong model. |
| 78 | + |
| 79 | +- [ ] **SF-4 — Implement angular-rate gates** (`TE-PROC-040`, `TE-PROC-041`, `TE-PROC-042`) |
| 80 | + - Category: **operational safety** |
| 81 | + - Overlaps with LB-6. These specs cover both the Kalman-input gate and the pose-emission gate. |
| 82 | + - Fix: implement the three specs end-to-end. |
| 83 | + |
| 84 | +- [ ] **SF-5 — Fix stagehand developer-guide `REFLECTION_JUMP_DEG` doc** |
| 85 | + - Category: **fail-loud during incident response** |
| 86 | + - `stagehand/docs/developer-guide.md:493` says default is 15°; code value is 25°. |
| 87 | + - Misleading docs at 3am on-set is exactly when wrong numbers hurt. |
| 88 | + |
| 89 | +## Nice to Have |
| 90 | + |
| 91 | +Robustness + dev hygiene. Won't block a shoot. |
| 92 | + |
| 93 | +- [ ] **NH-1 — `survive_close()` cross-thread warning instead of deadlock** (`LI-API-005`) |
| 94 | +- [ ] **NH-2 — Simple API event queue overflow counter** (`LI-API-053`) |
| 95 | +- [ ] **NH-3 — Plugin load failure symbol diagnostics** (`LI-BE-033`) |
| 96 | +- [ ] **NH-4 — CI guard: `src/generated/` vs Python sources** (`TE-DATA-064`) |
| 97 | +- [ ] **NH-5 — Debug-mode broken hook-chain detection** (`LI-BE-013`) |
| 98 | +- [ ] **NH-6 — Arrow-doc documentation items** |
| 99 | + - `survive_close()` threading constraint |
| 100 | + - `driver_usbmon.c` format + usage |
| 101 | + - Off-by-two config length quirk |
| 102 | + - Evaluate `driver_simulator.c` (complete or mark unfinished) |
| 103 | + - Evaluate `poser_epnp.c` (when vs BaryCentricSVD) |
| 104 | + - IMU bias model update cadence + interaction with main Kalman |
| 105 | +- [ ] **NH-7 — Submit `quatdist` fix upstream to collabora/libsurvive** |
| 106 | + - Reduces future rebase pain. From [reflection-rejection.md](../reflection-rejection.md). |
| 107 | + |
| 108 | +## Save for Later |
| 109 | + |
| 110 | +Irrelevant to stagehand's C-embedded + custom Python receiver architecture. Valuable for third-party libsurvive users. |
| 111 | + |
| 112 | +- [ ] **SL-1** — `LB-API-005` — Python binding FLT precision check |
| 113 | +- [ ] **SL-2** — `LB-API-006` — CI regen for `pysurvive_generated.py` |
| 114 | +- [ ] **SL-3** — `LB-API-024` — C# StructLayout size test |
| 115 | +- [ ] **SL-4** — `LB-API-031` — Unity handedness bug fix |
| 116 | +- [ ] **SL-5** — `LB-API-043` — ROS 32-bit timecode wraparound |
| 117 | +- [ ] **SL-6** — `LB-API-044` — ROS REP-103 coordinate frame transform |
| 118 | +- [ ] **SL-7** — `LB-API-052` — OpenVR mid-session coord transform recompute |
| 119 | +- [ ] **SL-8** — `LB-API-062` — WebSocket backend port configurability |
| 120 | + |
| 121 | +## Recommended Sequence |
| 122 | + |
| 123 | +1. **Zero-risk first**: LB-1, LB-4, LB-5, SF-5 — pure doc/cleanup fixes, no code risk. |
| 124 | +2. **Investigation next**: LB-2 (GSS rejection audit), LB-6 Phase 1 (measure angular rates). |
| 125 | +3. **Implementation**: LB-3, LB-6 Phase 2 (enable gates), SF-1 → SF-4. |
| 126 | +4. **Post-launch**: NH-1 through NH-7, then SL-*. |
| 127 | + |
| 128 | +## References |
| 129 | + |
| 130 | +- [specs/](specs/) — all spec files |
| 131 | +- [arrows/](arrows/) — arrow status + Work Required lists |
| 132 | +- [arrows/index.yaml](arrows/index.yaml) — arrow dependency graph |
| 133 | +- [reflection-rejection.md](reflection-rejection.md) — reflection defense layering + bottom checklist |
| 134 | +- [back-face-filter-hld.md](back-face-filter-hld.md) — back-face filter design rationale |
0 commit comments