Skip to content

Commit 5f71b8c

Browse files
committed
docs(editor): refresh playback benchmarks and session notes
Made-with: Cursor
1 parent 1a6d6d9 commit 5f71b8c

File tree

1 file changed

+42
-12
lines changed

1 file changed

+42
-12
lines changed

crates/editor/PLAYBACK-FINDINGS.md

Lines changed: 42 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -35,22 +35,21 @@
3535

3636
## Current Status
3737

38-
**Last Updated**: 2026-01-30
38+
**Last Updated**: 2026-03-25
3939

4040
### Performance Summary
4141

42-
| Metric | Target | MP4 Mode | Fragmented Mode | Status |
43-
|--------|--------|----------|-----------------|--------|
44-
| Decoder Init (display) | <200ms | 337ms* | TBD | 🟡 Note |
45-
| Decoder Init (camera) | <200ms | 23ms | TBD | ✅ Pass |
46-
| Decode Latency (p95) | <50ms | 3.1ms | TBD | ✅ Pass |
47-
| Effective FPS | ≥30 fps | 549 fps | TBD | ✅ Pass |
48-
| Decode Jitter | <10ms | ~1ms | TBD | ✅ Pass |
49-
| A/V Sync (mic↔video) | <100ms | 77ms | TBD | ✅ Pass |
50-
| A/V Sync (system↔video) | <100ms | 162ms | TBD | 🟡 Known |
51-
| Camera-Display Drift | <100ms | 0ms | TBD | ✅ Pass |
42+
| Metric | Target | QHD (2560x1440) | 4K (3840x2160) | Status |
43+
|--------|--------|-----------------|----------------|--------|
44+
| Decoder Init (display) | <200ms | 123ms | 29ms | ✅ Pass |
45+
| Decoder Init (camera) | <200ms | 7ms | 6ms | ✅ Pass |
46+
| Decode Latency (p95) | <50ms | 1.4ms | 4.3ms | ✅ Pass |
47+
| Effective FPS | ≥30 fps | 1318 fps | 479 fps | ✅ Pass |
48+
| Decode Jitter | <10ms | ~1ms | ~2ms | ✅ Pass |
49+
| A/V Sync (mic↔video) | <100ms | 0ms | 0ms | ✅ Pass |
50+
| Camera-Display Drift | <100ms | 0ms | 0ms | ✅ Pass |
5251

53-
*Display decoder init time includes multi-position pool initialization (3 decoder instances)
52+
*Display decoder init time includes multi-position pool initialization (5 decoder instances)
5453

5554
### What's Working
5655
- ✅ Playback test infrastructure in place
@@ -391,6 +390,37 @@ The CPU RGBA→NV12 conversion was taking 15-25ms per frame for 3024x1964 resolu
391390

392391
---
393392

393+
### Session 2026-03-25 (Decoder Init + Frame Processing Optimizations)
394+
395+
**Goal**: Run playback benchmarks, identify performance improvement areas, implement safe optimizations
396+
397+
**What was done**:
398+
1. Ran full playback benchmarks on synthetic QHD (2560x1440) and 4K (3840x2160) recordings
399+
2. Deep-dived into entire playback pipeline: decoder, frame converter, WebSocket transport, WebGPU renderer
400+
3. Identified 5 concrete optimization opportunities via parallel code analysis agents
401+
4. Implemented 5 targeted optimizations
402+
5. Re-ran benchmarks to verify improvements with no regressions
403+
404+
**Changes Made**:
405+
- `crates/video-decode/src/avassetreader.rs`: Single file open in KeyframeIndex::build (was opening the file twice - once for metadata, once for packet scan). Also caches pixel_format/width/height from the initial probe so pool decoders skip redundant FFmpeg opens.
406+
- `crates/rendering/src/decoder/frame_converter.rs`: BGRA→RGBA conversion now processes 8 pixels (32 bytes) per loop iteration with direct indexed writes instead of per-pixel push(). Added fast path for RGBA when stride==width*4 (single memcpy instead of per-row copies).
407+
- `apps/desktop/src-tauri/src/frame_ws.rs`: Consolidated WebSocket frame packing into single pack_ws_frame() function, removed redundant pack_*_ref helper functions.
408+
409+
**Results**:
410+
- 4K decoder init: 66.8ms → 28.6ms (**-57%**)
411+
- QHD decoder init: 146.1ms → 123.1ms (**-16%**)
412+
- Camera decoder init: 9.6ms → 6.5ms (**-32%**)
413+
- KeyframeIndex build: 17ms → 10ms (**-41%**) at 4K
414+
- All playback metrics remain healthy, no regressions
415+
- BGRA→RGBA and RGBA copy improvements don't show in decoder benchmarks (these formats aren't used by the test videos) but benefit real recordings where macOS outputs BGRA
416+
417+
**Stopping point**: All optimizations implemented and verified. Future directions:
418+
- Consider lazy pool decoder creation (defer creating secondary decoders until needed for scrubbing)
419+
- Shared memory / IPC instead of WebSocket for local frame transport (architectural change)
420+
- NEON SIMD intrinsics for BGRA→RGBA on Apple Silicon (currently uses unrolled scalar)
421+
422+
---
423+
394424
## References
395425

396426
- `PLAYBACK-BENCHMARKS.md` - Raw performance test data (auto-updated by test runner)

0 commit comments

Comments
 (0)