Skip to content

Commit 75ccb41

Browse files
committed
launch readiness triage created
1 parent e46a967 commit 75ccb41

1 file changed

Lines changed: 134 additions & 0 deletions

File tree

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
# libsurvive Launch-Readiness Triage
2+
3+
**Created**: 2026-04-17
4+
**Context**: Stagehand virtual-production shoots. Robustness + stability over all else. Fail closed and loudly — grey failures are the worst failure mode. Cheaper to stop a shoot than fix bad data in post.
5+
**Source docs**: `specs/*`, `arrows/*`, `reflection-rejection.md`
6+
7+
## Summary
8+
9+
33 open items in the libsurvive docs, bucketed by shoot-readiness impact.
10+
11+
- 6 **launch blockers** — silent corruption or grey failures
12+
- 5 **should fix before launch** — diagnosability gaps
13+
- 7 **nice to have** — robustness / dev hygiene
14+
- 8 **save for later** — irrelevant to stagehand (third-party bindings)
15+
16+
---
17+
18+
## Launch Blockers
19+
20+
Each item either (a) lets bad data through without visibility, or (b) makes on-set incident response harder because docs/specs don't match reality.
21+
22+
- [ ] **LB-1 — Close spec drift on back-face filter** (`LPI-PROC-060` through `LPI-PROC-066`)
23+
- Category: **coherence / operational safety**
24+
- Feature deployed to both Pis with `--filter-normal-facingness 0.1`. Seven specs in [lighthouse-protocol-intelligence-specs.md](specs/lighthouse-protocol-intelligence-specs.md) still marked `[ ]`.
25+
- Fix: mark `[x]`, update [lighthouse-protocol-intelligence.md](arrows/lighthouse-protocol-intelligence.md) arrow-doc coverage table.
26+
- Risk if skipped: on-set grep "is the back-face filter in?" returns "not implemented" during a 3am incident.
27+
28+
- [ ] **LB-2 — Verify GSS solve-failure rejection** (`DDS-BE-054`)
29+
- Category: **fail-closed on calibration corruption**
30+
- Spec: if global scene solver error exceeds threshold, reject result and retain previous calibration.
31+
- `--max-cal-error 0.01` clamps the initial solve, but mid-session re-solve rejection may not be wired.
32+
- Fix: audit `driver_global_scene_solver.c` for rejection path; implement if missing.
33+
- Risk if skipped: bad GSS solve silently overwrites good calibration mid-shoot.
34+
35+
- [ ] **LB-3 — Recording file open/write failure = loud error** (`LI-BE-063`)
36+
- Category: **fail-loud (directly matches stagehand principle)**
37+
- Currently libsurvive can silently drop recording events if the file cannot be written.
38+
- For shoots using `.rec.gz` as authoritative capture, silent drop = unrecoverable grey failure.
39+
- Fix: emit error and disable recording instead of silent drop.
40+
41+
- [ ] **LB-4 — Remove debug `fprintf` from Gen2 LFSR decoder** (`LPI-PROC-032`)
42+
- Category: **operational safety**
43+
- Uncontrolled stdout in hot path — pollutes logs, can distort timing.
44+
- Flagged in [lighthouse-protocol-intelligence.md](arrows/lighthouse-protocol-intelligence.md) "Must Fix #1". Cheap.
45+
- Fix: wrap in `SV_VERBOSE` or delete.
46+
47+
- [ ] **LB-5 — Resolve duplicate spec IDs `TE-PROC-040/041`**
48+
- Category: **coherence**
49+
- Defined twice in [tracking-engine-specs.md](specs/tracking-engine-specs.md): once for `lc-angular-rate-max` (lines 43–44), once for Kalman lighthouse tracker (lines 49–50).
50+
- Breaks grep-based traceability during incident response.
51+
- Fix: renumber the Kalman-lighthouse-tracker pair to unused IDs.
52+
53+
- [ ] **LB-6 — Measure + enable pose-emission gates** (`kalman-max-pose-angular-rate`, `--light-max-error`)
54+
- Category: **operational safety / fail-closed on output**
55+
- From [reflection-rejection.md](../reflection-rejection.md) bottom checklist. Orthogonal to existing input-side defenses.
56+
- Without these, a reflection that slips past `light-outlier-threshold` + `filter-normal-facingness` reaches the output stream.
57+
- Fix:
58+
- [ ] Run `reflect_test.cap` with `--survive-verbose 105`, record per-pose angular rates during reflection bursts and clean tracking.
59+
- [ ] Set `--kalman-max-pose-angular-rate` from the data (expected 5–10 rad/s).
60+
- [ ] Enable `--light-max-error 0.01` in the agent config.
61+
62+
## Should Fix Before Launch
63+
64+
Diagnosability. Won't corrupt data but make grey failures hard to spot.
65+
66+
- [ ] **SF-1 — Warn on high OOTX sync bit error rate** (`LPI-PROC-026`)
67+
- Category: **fail-loud on degraded calibration**
68+
- Silent OOTX degradation is the classic grey failure.
69+
70+
- [ ] **SF-2 — Surface IEKF divergence as a warning** (tracking-engine arrow, Should Fix #2)
71+
- Category: **fail-loud**
72+
- IEKF hitting max-iter without convergence = Kalman is guessing. Currently invisible.
73+
- Fix: emit warning via `printf` hook on max-iter termination.
74+
75+
- [ ] **SF-3 — Warn on conflicting Gen1/Gen2 generation classification** (`LPI-PROC-004`)
76+
- Category: **fail-loud**
77+
- Mid-session generation flips mean calibration decoded under the wrong model.
78+
79+
- [ ] **SF-4 — Implement angular-rate gates** (`TE-PROC-040`, `TE-PROC-041`, `TE-PROC-042`)
80+
- Category: **operational safety**
81+
- Overlaps with LB-6. These specs cover both the Kalman-input gate and the pose-emission gate.
82+
- Fix: implement the three specs end-to-end.
83+
84+
- [ ] **SF-5 — Fix stagehand developer-guide `REFLECTION_JUMP_DEG` doc**
85+
- Category: **fail-loud during incident response**
86+
- `stagehand/docs/developer-guide.md:493` says default is 15°; code value is 25°.
87+
- Misleading docs at 3am on-set is exactly when wrong numbers hurt.
88+
89+
## Nice to Have
90+
91+
Robustness + dev hygiene. Won't block a shoot.
92+
93+
- [ ] **NH-1 — `survive_close()` cross-thread warning instead of deadlock** (`LI-API-005`)
94+
- [ ] **NH-2 — Simple API event queue overflow counter** (`LI-API-053`)
95+
- [ ] **NH-3 — Plugin load failure symbol diagnostics** (`LI-BE-033`)
96+
- [ ] **NH-4 — CI guard: `src/generated/` vs Python sources** (`TE-DATA-064`)
97+
- [ ] **NH-5 — Debug-mode broken hook-chain detection** (`LI-BE-013`)
98+
- [ ] **NH-6 — Arrow-doc documentation items**
99+
- `survive_close()` threading constraint
100+
- `driver_usbmon.c` format + usage
101+
- Off-by-two config length quirk
102+
- Evaluate `driver_simulator.c` (complete or mark unfinished)
103+
- Evaluate `poser_epnp.c` (when vs BaryCentricSVD)
104+
- IMU bias model update cadence + interaction with main Kalman
105+
- [ ] **NH-7 — Submit `quatdist` fix upstream to collabora/libsurvive**
106+
- Reduces future rebase pain. From [reflection-rejection.md](../reflection-rejection.md).
107+
108+
## Save for Later
109+
110+
Irrelevant to stagehand's C-embedded + custom Python receiver architecture. Valuable for third-party libsurvive users.
111+
112+
- [ ] **SL-1**`LB-API-005` — Python binding FLT precision check
113+
- [ ] **SL-2**`LB-API-006` — CI regen for `pysurvive_generated.py`
114+
- [ ] **SL-3**`LB-API-024` — C# StructLayout size test
115+
- [ ] **SL-4**`LB-API-031` — Unity handedness bug fix
116+
- [ ] **SL-5**`LB-API-043` — ROS 32-bit timecode wraparound
117+
- [ ] **SL-6**`LB-API-044` — ROS REP-103 coordinate frame transform
118+
- [ ] **SL-7**`LB-API-052` — OpenVR mid-session coord transform recompute
119+
- [ ] **SL-8**`LB-API-062` — WebSocket backend port configurability
120+
121+
## Recommended Sequence
122+
123+
1. **Zero-risk first**: LB-1, LB-4, LB-5, SF-5 — pure doc/cleanup fixes, no code risk.
124+
2. **Investigation next**: LB-2 (GSS rejection audit), LB-6 Phase 1 (measure angular rates).
125+
3. **Implementation**: LB-3, LB-6 Phase 2 (enable gates), SF-1 → SF-4.
126+
4. **Post-launch**: NH-1 through NH-7, then SL-*.
127+
128+
## References
129+
130+
- [specs/](specs/) — all spec files
131+
- [arrows/](arrows/) — arrow status + Work Required lists
132+
- [arrows/index.yaml](arrows/index.yaml) — arrow dependency graph
133+
- [reflection-rejection.md](reflection-rejection.md) — reflection defense layering + bottom checklist
134+
- [back-face-filter-hld.md](back-face-filter-hld.md) — back-face filter design rationale

0 commit comments

Comments
 (0)