Skip to content

Commit 092fdc9

Browse files
authored
Merge pull request #57 from AdaWorldAPI/claude/torso-shading
FMA torso anatomy: filled smooth triangle-mesh cockpit
2 parents e73511e + d1be8af commit 092fdc9

59 files changed

Lines changed: 1829 additions & 267 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Dockerfile

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,28 @@ RUN git clone https://github.com/AdaWorldAPI/lance-graph.git \
7373
&& git clone --depth 1 https://github.com/AdaWorldAPI/ndarray.git \
7474
&& git clone --depth 1 https://github.com/AdaWorldAPI/neo4j-rs.git
7575

76+
# CPU baseline: x86-64-v4 (the 4th microarch level — AVX-512F/BW/CD/DQ/VL on top
77+
# of v3's AVX2+FMA). This is the compile FLOOR; it flips on `target_feature =
78+
# "avx512f"`, so q2-ndarray's `simd.rs` dispatch selects its native `simd_avx512`
79+
# backend (`__m512`/`__m512d`/`__m512i`) instead of the v3 AVX2 default.
80+
#
81+
# BF16 + AMX 16x16 tile GEMM are NOT gated by this flag — they ride q2-ndarray's
82+
# CPU-AGNOSTIC runtime autodetect polyfill (`simd_caps()` + the AMX `arch_prctl`
83+
# XTILEDATA enable + CPU-model detect). The polyfill opportunistically lights them
84+
# up only when the *runtime* host actually has them, and always keeps the AVX2 /
85+
# scalar paths it compiled in as fallback. So: AVX-512 = compile baseline here;
86+
# BF16/AMX = runtime-detected; everything below v4 = polyfill fallback.
87+
#
88+
# ⚠ REQUIREMENT: a v4 floor makes the binary REQUIRE AVX-512 at run time — it
89+
# SIGILLs on the first `__m512` op on a host without it (the PR #170 failure mode,
90+
# one level up). The Railway *build* machine needs no AVX-512 (compiling != run),
91+
# but the *deploy* host does. AMX additionally needs a Sapphire/Emerald/Granite
92+
# Rapids Xeon at run time; on anything older the autodetect simply skips AMX (that
93+
# is the agnostic polyfill working as intended, not an error). If a deploy target
94+
# may lack AVX-512, drop this to `x86-64-v3` and rely on runtime dispatch for the
95+
# AVX-512/AMX paths — one portable binary, same hot paths when the silicon allows.
96+
ENV CARGO_BUILD_RUSTFLAGS="-C target-cpu=x86-64-v4"
97+
7698
# Build the q2 binary with embedded frontend
7799
WORKDIR /build/q2
78100
RUN cargo build --release -p cockpit-server --features embed-cockpit,planner \

claude-notes/plans/2026-06-24-fma-torso-bodyparts3d-splat.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,3 +108,112 @@ Validates the design before wiring it into the render. Next increments:
108108
(node_row-bounded + normal-oriented = crisp colours in the render)
109109
- [ ] animation: deform node anchors -> motion-skinned gaussians follow
110110
(Motion-Blender GS; the partonomy is the rig)
111+
112+
## Best shading + lazylock + adaptive-FPS + SPL4 (branch claude/torso-shading)
113+
114+
User: "best possible shading and lazylock buffering to mitigate batching", then
115+
"adaptive framerate prediction + SIMD batching + v4", then the key insight: "the
116+
Motion is fixed Rotation ... so it could easily prebuffer 270 frames for 90 FPS".
117+
Scoping answers: framerate = BOTH (render-loop throttle now + codec P-frames as
118+
the SPL4 motion track); PR scope = all of the above incl SPL4 in one push.
119+
120+
### Infra fact ("GitHub uses Cargo not Dockerfile?")
121+
q2 CI = pure Cargo+npm (`cargo fmt`/`xtask lint`/`clippy -D warnings`/`nextest`,
122+
wasm-pack/npm). The only `docker` in CI is `docker image prune` (free runner disk).
123+
The root `/Dockerfile` is Railway-deploy ONLY (`q2-cockpit` embeds the Vite cockpit,
124+
clones lance-graph/ndarray/neo4j for the graph hot path). This splat feature does
125+
not touch the Dockerfile.
126+
- [x] **Dockerfile CPU baseline -> x86-64-v4** (user ask): `ENV
127+
CARGO_BUILD_RUSTFLAGS="-C target-cpu=x86-64-v4"` before the cockpit-server
128+
build. Flips `target_feature="avx512f"` so q2-ndarray's `simd.rs` picks the
129+
native `simd_avx512` backend. BF16+AMX tile GEMM ride ndarray's runtime
130+
autodetect polyfill (`simd_caps()` + AMX arch_prctl/model-detect) — not gated
131+
by the flag, lit only when the host has them, AVX2/scalar fallback always
132+
compiled. ⚠ v4 = AVX-512 REQUIRED at runtime (SIGILL otherwise, the PR#170
133+
mode one level up); AMX needs Sapphire/Emerald/Granite Rapids at runtime
134+
(autodetect skips it otherwise = agnostic working as intended). Documented the
135+
`x86-64-v3` fallback in the Dockerfile for non-AVX-512 deploy targets.
136+
137+
### Shading (the lit look) — DONE
138+
- [x] Render driver (scratchpad, ndarray 1.95, OUT of q2 workspace): shade AT
139+
RECONSTRUCTION from the per-vertex normal already in SPL2 — hemisphere ambient
140+
(sky/ground) + key diffuse (n·L, L fixed in WORLD so camera orbits a still
141+
light = consistent turntable) + soft fill. Shading MULTIPLIES the flat palette
142+
colour, so the codec-free per-structure colour story is intact. 20-frame
143+
shaded turntable rendered (9s/frame) → JPEG (67 KB/frame) →
144+
cockpit/public/torso-frames/. Verified in-cockpit: volumetric depth, colours
145+
preserved, no Warhol blob.
146+
147+
### Prebuffer = the answer to BOTH (A) and (B) [the user's insight]
148+
The demo motion is a FIXED, periodic, deterministic camera rotation. So you neither
149+
ADAPT the framerate nor PREDICT motion frame-by-frame — you PRECOMPUTE the closed
150+
loop once and replay → every frame free → guaranteed 90 fps. This is exactly the
151+
x265 GOP idea: a periodic camera path is a closed Group-of-Pictures; prebuffer the
152+
GOP, replay forever. It is ALSO the honest SPL4 (B) motion source: the orbit is a
153+
real known closed trajectory, so the 270 rotation steps ARE its P-frames — NO
154+
synthetic breathing deformation needed (drop that demo).
155+
- [ ] /torso turntable: bump FRAME_COUNT 20 → loop count over an exact 360° (frame
156+
N == frame 0 for a seamless loop), 90 fps playback. Re-bake at the higher count
157+
(background). Ship-size lever: 67 KB/frame × 270 ≈ 18 MB JPEG → offer WebM
158+
encode (~3 MB) as the compaction. Mandatory here because CPU EWA splat is
159+
9s/frame — live render impossible; prebuffer is THE technique, not an optim.
160+
- note: the live WebGL points view is already real-time; prebuffering full
161+
framebuffers there is VRAM-prohibitive (270×810×1080×4 ≈ 945 MB) — so the
162+
live-view win is lazylock + adaptive-FPS, and image-prebuffer stays on /torso.
163+
164+
### Live views light up + lazylock + adaptive-FPS
165+
- [ ] /torso-live (TorsoSplat) + /torso-map (TorsoMap): decode SPL2 `normal 3i8`
166+
into an aNormal attribute (both skip it today); port hemisphere+diffuse+fill
167+
into the FRAG. Same L → CPU frames and live WebGL agree.
168+
- [ ] LazyLock build-once buffer: build geometry (pos+aColor+aNormal+aRow) ONCE;
169+
mutate only via uniforms + draw-RANGE, never rebuild.
170+
- [ ] Adaptive-FPS: EMA of rAF delta; over budget → shrink draw-range over the
171+
Morton-ordered buffer (prefix = uniform spatial subsample) + drop pixelRatio;
172+
recover when cheap; log active fraction (no silent decimation).
173+
174+
### SPL4 — ship the codec (static I-frame real, motion track reserved)
175+
- [ ] `spl_codec.py`: WRITE a real `.spl4` (helix-Morton order, per-node anchor
176+
I-frame, motion-from-anchor + zig-zag residual, anchor-predicted palette colour
177+
= 0 per-gaussian bytes, normals). Header `motion_track_count` (0 static) reserves
178+
the P-frame slot without a format bump (RESERVE-DON'T-RECLAIM).
179+
- [ ] TS `decodeSpl4`: inverse — reconstruct pos/normal/rgb/row at load; all 3 views
180+
switch to SPL4.
181+
- [ ] Fold deferred #55 nits: `import math` → module top; fix "round-trips it"
182+
docstring; TorsoMap `ray.params.Points` mutate-not-replace.
183+
- [ ] (B) motion track = orbit-as-motion P-frames (above); ship the FORMAT slot +
184+
decode contract; the camera trajectory is the demonstrator (honest, not faked).
185+
186+
### Verify + ship
187+
- [ ] `cd cockpit && npm run build` (tsc clean); inspect shaded turntable + live
188+
view; codec round-trip RMSE unchanged. Commit incrementally on
189+
claude/torso-shading; ASK before push (GIT PUSH POLICY).
190+
191+
## v4 — is_a-PRIMARY whole-body anatomical atlas (major pivot, 2026-06-24)
192+
193+
Operator-driven pivot, several corrections of my assumptions:
194+
1. **Use is_a, not part-of, for classification + names.** part-of is REGIONAL
195+
(walk up a muscle -> chest wall -> thorax, never "muscular system") and its
196+
names aren't canonical. is_a is the TYPE tree: every structure resolves up to
197+
its canonical type (`pectoralis minor` -> ... -> `muscle organ`); is_a ships
198+
canonical names; is_a's mesh set is a SUPERSET of part-of (2234 vs 1258 FJ,
199+
+976) with finer organ segmentation (no single "aorta"/"heart" — split into
200+
ascending/arch/descending/abdominal, each its own mesh). Downloaded the 142 MB
201+
is_a obj package + the small is_a relation/name txts.
202+
2. **container:identity / DN->GUID addressing.** tissue = walk the is_a TYPE tree
203+
to the first type keyword (O(1), cached) = the DistinguishedName path, which
204+
MATERIALISES to a numeric container:identity GUID (container = tissue class).
205+
Stored per node: `tissue`, `is_a` (DN path, upper-ontology stripped),
206+
`container`, `identity`, `guid`.
207+
3. **Whole body is the goal — NO spatial torso filter.** Region focus (torso, an
208+
organ) is a future SELECT -> CAMERA-ZOOM feature on the full-body splat, driven
209+
O(1) by each node's centroid+bbox in the SoA, not a bake-time clip.
210+
4. **Performance is the point.** Whole body = 602,341 gaussians / 1658 is_a
211+
structures / 12.6 MB (414 arteries, 382 muscles, 221 veins, 203 bones, 126
212+
nerves, full viscera). The deliberate load that motivates lazylock +
213+
adaptive-FPS (live views) and the prebuffered turntable (CPU EWA).
214+
- bake = `bake_torso_splat.py` v4 (is_a-primary). Tissue atlas palette + depth-peel
215+
opacity. Driver orientation fixed (+90 about X; head was landing down).
216+
- [ ] re-render upright whole-body turntable -> /torso; live views already decode
217+
the unchanged SPL2 (extra nodes.json fields are ignored) — light them +
218+
lazylock + adaptive-FPS to show + mitigate the 602K load.
219+
- research: `claude-notes/research/2026-06-24-torso-anatomy-coverage-gap.md`.

0 commit comments

Comments
 (0)