Skip to content

Faster playback prep, export pipeline, and GIF frame handling#1682

Merged
richiemcilroy merged 9 commits intomainfrom
perf-bits
Mar 25, 2026
Merged

Faster playback prep, export pipeline, and GIF frame handling#1682
richiemcilroy merged 9 commits intomainfrom
perf-bits

Conversation

@richiemcilroy
Copy link
Copy Markdown
Member

@richiemcilroy richiemcilroy commented Mar 25, 2026

Applies a set of targeted performance tweaks across decoding, rendering, desktop preview transport, and export encoding, and updates benchmark/findings docs from the latest runs.

Greptile Summary

This PR applies a set of targeted, well-measured performance improvements across five areas of the playback and export pipelines, backed by before/after benchmarks. All changes are conservative (no algorithmic redesigns, no new dependencies) and include fallback paths where appropriate.

Key changes:

  • avassetreader.rs: KeyframeIndex::build now caches pixel_format, width, and height extracted during the single FFmpeg open, eliminating a previously redundant second format::input call in AVAssetReaderDecoder::new_with_keyframe_index. Measured 57% init-time reduction at 4K.
  • frame_converter.rs: copy_rgba_plane gains a fast-path for stride-contiguous frames (single to_vec memcpy). copy_bgra_to_rgba is unrolled to process 8 pixels (32 bytes) per main-loop iteration with a correct remainder loop; both safe codegen improvements.
  • enc-gif/src/lib.rs: add_frame uses slice::from_raw_parts to reinterpret validated &[u8] as &[RGBA8], replacing O(w×h) individual pixel pushes with bulk extend_from_slice calls. The unsafe casts are sound (RGBA8 is #[repr(C)], alignment 1) but lack // SAFETY: documentation.
  • h264.rs: Trims the macOS encoder priority list from six entries to two (h264_videotoolbox, libx264), removing four encoders that do not exist on macOS and only added failed-init overhead.
  • mp4.rs: NV12 render channel capacity raised 2 → 8, allowing better GPU-render / H.264-encode pipeline overlap.
  • frame_ws.rs: Three near-identical pack_* helpers unified into one pack_ws_frame function (pure refactor, no functional change).
  • Benchmark/findings docs: Updated with new QHD + 4K measurements. Note that the new EXPORT-BENCHMARKS.md run used a short synthetic clip, causing the embedded size-estimation "Calibration Data" to suggest BPP multipliers that are 5–20× lower than real-world values — these should not be applied without a follow-up real-recording benchmark run.

Confidence Score: 4/5

  • PR is safe to merge; all code changes are sound and benchmarked. One non-blocking follow-up: add // SAFETY: comments on the two unsafe blocks in enc-gif and consider annotating the synthetic benchmark calibration data as non-applicable to real recordings.
  • All five code changes are correct and well-motivated: the unsafe casts in enc-gif are sound (RGBA8 is repr(C), alignment 1, bounds pre-validated), the BGRA unroll is logically correct including the remainder path, the KeyframeIndex cache avoids a genuine duplicate file open with a safe fallback, the channel size increase is a pure throughput knob, and the WebSocket refactor is a no-op functionally. The only non-blocking items are missing safety comments on the unsafe blocks and the potentially misleading synthetic-only calibration table in EXPORT-BENCHMARKS.md.
  • crates/enc-gif/src/lib.rs (unsafe blocks without safety comments) and crates/export/EXPORT-BENCHMARKS.md (synthetic calibration data that should not be applied to the estimation algorithm).

Important Files Changed

Filename Overview
apps/desktop/src-tauri/src/frame_ws.rs Clean refactor consolidating three redundant pack functions into one pack_ws_frame; no functional change, slightly reduced allocation setup overhead.
crates/enc-gif/src/lib.rs Fast-path uses unsafe slice::from_raw_parts to reinterpret RGBA bytes; cast is sound (RGBA8 is #[repr(C)], align 1) and bounds are validated, but safety rationale is undocumented.
crates/enc-ffmpeg/src/video/h264.rs Trims macOS encoder priority list to just h264_videotoolbox + libx264, removing 4 encoders that don't exist on macOS and only added failed init overhead.
crates/rendering/src/decoder/frame_converter.rs Adds stride==row_len fast-path in copy_rgba_plane and unrolls copy_bgra_to_rgba to 8 pixels per iteration; logic is correct including the remainder loop for widths not divisible by 8.
crates/video-decode/src/avassetreader.rs Caches pixel_format, width, height inside KeyframeIndex during the single file open, avoiding a redundant FFmpeg format::input call in new_with_keyframe_index; fallback path preserved for when index is absent or format is unsupported.
crates/export/src/mp4.rs NV12 render channel capacity raised from 2 → 8; allows GPU render and H.264 encode to pipeline more freely, at the cost of ~25–100 MB additional buffering during export.
crates/export/EXPORT-BENCHMARKS.md Updated benchmark results come from a short synthetic clip, producing size-estimation errors as high as +1920%; the embedded "Suggested BPP Multiplier" values are misleading for real-world calibration purposes.
crates/export/EXPORT-FINDINGS.md New session notes clearly document optimizations, A/B test methodology, one revert, and known limitations around synthetic benchmark data.
crates/editor/PLAYBACK-FINDINGS.md Updated perf table with QHD/4K columns reflecting new benchmark results; added detailed session notes documenting changes and results.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["KeyframeIndex::build(path)"]
    A --> B["Open file once via avformat::input"]
    B --> C["Extract stream metadata fps / duration / time_base"]
    C --> D["Open decoder context from stream parameters"]
    D --> E["Cache pixel_format, width, height"]
    E --> F["Scan packets for keyframes"]
    F --> G["KeyframeIndex stored with cached video info"]

    G --> H["AVAssetReaderDecoder::new_with_keyframe_index"]
    H --> I{cached_video_info available?}
    I -->|"Yes - skip redundant open"| J["Use cached pf / w / h directly"]
    I -->|"No - fallback"| K["Open file again via ffmpeg::format::input"]
    J --> L["get_reader_track_output - start AVFoundation read"]
    K --> L

    style J fill:#90EE90
    style K fill:#FFD580
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/enc-gif/src/lib.rs
Line: 127-144

Comment:
**Missing safety comments on `unsafe` blocks**

Both `unsafe { std::slice::from_raw_parts(...) }` blocks lack any explanation of their invariants. While the casts are sound — `RGBA8` is `#[repr(C)]` with four `u8` fields (alignment 1, size 4), so reinterpreting a `*const u8` pointer is valid — the reasoning should be documented to prevent future regressions.

Consider adding a `// SAFETY:` comment before each block, e.g.:

```rust
// SAFETY: `RGBA8` is `#[repr(C)]` with four consecutive `u8` fields (size 4,
// alignment 1), so it can safely alias a `[u8; 4]` slice at the same pointer.
// Bounds were validated above: `byte_slice` covers exactly `pixel_count * 4` bytes.
let pixels: &[RGBA8] = unsafe {
    std::slice::from_raw_parts(byte_slice.as_ptr().cast::<RGBA8>(), pixel_count)
};
```

The same reasoning applies to the strided `else` branch at line 139–140.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/export/EXPORT-BENCHMARKS.md
Line: 87-97

Comment:
**Calibration data from synthetic content will break real-world size estimates**

The new benchmark run was against a 30-second synthetic test clip, which compresses far more efficiently than real screen recordings. As a result, the "Calibration Data" table suggests BPP multipliers that are 5–20× lower than the real-world values:

| Preset | Current multiplier | Suggested multiplier (this run) |
|---|---|---|
| MP4 1080p Maximum | 0.30 | 0.0355 |
| MP4 4K Maximum | 0.30 | 0.0149 |

If these suggestions were ever applied to the estimation algorithm, size estimates on real recordings would be wildly wrong in the other direction. It may be worth either (a) adding a prominent warning in the Calibration Data section that these ratios are synthetic-only, or (b) leaving the real-world calibration section (from the 2026-02-16 session) as the canonical reference and omitting the "Suggested BPP Multiplier" column from synthetic runs entirely.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "docs(export): add March 2026 export opti..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

Comment on lines +127 to +144
let img = if bytes_per_row == expected_bytes_per_row {
let pixel_count = w * h;
let byte_slice = &frame_data[..pixel_count * 4];
let pixels: &[RGBA8] = unsafe {
std::slice::from_raw_parts(byte_slice.as_ptr().cast::<RGBA8>(), pixel_count)
};
imgref::Img::new(pixels.to_vec(), w, h)
} else {
let mut rgba_pixels = Vec::with_capacity(w * h);
for y in 0..h {
let row_start = y * bytes_per_row;
let row_bytes = &frame_data[row_start..row_start + expected_bytes_per_row];
let row_pixels: &[RGBA8] =
unsafe { std::slice::from_raw_parts(row_bytes.as_ptr().cast::<RGBA8>(), w) };
rgba_pixels.extend_from_slice(row_pixels);
}
}

// Create imgref for gifski
let img = imgref::Img::new(rgba_pixels, self.width as usize, self.height as usize);
imgref::Img::new(rgba_pixels, w, h)
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing safety comments on unsafe blocks

Both unsafe { std::slice::from_raw_parts(...) } blocks lack any explanation of their invariants. While the casts are sound — RGBA8 is #[repr(C)] with four u8 fields (alignment 1, size 4), so reinterpreting a *const u8 pointer is valid — the reasoning should be documented to prevent future regressions.

Consider adding a // SAFETY: comment before each block, e.g.:

// SAFETY: `RGBA8` is `#[repr(C)]` with four consecutive `u8` fields (size 4,
// alignment 1), so it can safely alias a `[u8; 4]` slice at the same pointer.
// Bounds were validated above: `byte_slice` covers exactly `pixel_count * 4` bytes.
let pixels: &[RGBA8] = unsafe {
    std::slice::from_raw_parts(byte_slice.as_ptr().cast::<RGBA8>(), pixel_count)
};

The same reasoning applies to the strided else branch at line 139–140.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/enc-gif/src/lib.rs
Line: 127-144

Comment:
**Missing safety comments on `unsafe` blocks**

Both `unsafe { std::slice::from_raw_parts(...) }` blocks lack any explanation of their invariants. While the casts are sound — `RGBA8` is `#[repr(C)]` with four `u8` fields (alignment 1, size 4), so reinterpreting a `*const u8` pointer is valid — the reasoning should be documented to prevent future regressions.

Consider adding a `// SAFETY:` comment before each block, e.g.:

```rust
// SAFETY: `RGBA8` is `#[repr(C)]` with four consecutive `u8` fields (size 4,
// alignment 1), so it can safely alias a `[u8; 4]` slice at the same pointer.
// Bounds were validated above: `byte_slice` covers exactly `pixel_count * 4` bytes.
let pixels: &[RGBA8] = unsafe {
    std::slice::from_raw_parts(byte_slice.as_ptr().cast::<RGBA8>(), pixel_count)
};
```

The same reasoning applies to the strided `else` branch at line 139–140.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines 87 to 97
#### Estimation Accuracy

- **MP4 Size**: avg error -2.6%, avg |error| 3.0%
- **MP4 Time**: avg error -6.9%, avg |error| 8.1%
- **MP4 Size**: avg error +565.7%, avg |error| 571.2%
- **MP4 Time**: avg error -2.0%, avg |error| 17.6%

#### Calibration Data

Use these actual-vs-estimated ratios to tune the estimation algorithm:

| Preset | Actual(MB) | Estimated(MB) | Ratio (actual/est) | Suggested BPP Multiplier |
|--------|------------|---------------|--------------------|--------------------------|
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Calibration data from synthetic content will break real-world size estimates

The new benchmark run was against a 30-second synthetic test clip, which compresses far more efficiently than real screen recordings. As a result, the "Calibration Data" table suggests BPP multipliers that are 5–20× lower than the real-world values:

Preset Current multiplier Suggested multiplier (this run)
MP4 1080p Maximum 0.30 0.0355
MP4 4K Maximum 0.30 0.0149

If these suggestions were ever applied to the estimation algorithm, size estimates on real recordings would be wildly wrong in the other direction. It may be worth either (a) adding a prominent warning in the Calibration Data section that these ratios are synthetic-only, or (b) leaving the real-world calibration section (from the 2026-02-16 session) as the canonical reference and omitting the "Suggested BPP Multiplier" column from synthetic runs entirely.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/export/EXPORT-BENCHMARKS.md
Line: 87-97

Comment:
**Calibration data from synthetic content will break real-world size estimates**

The new benchmark run was against a 30-second synthetic test clip, which compresses far more efficiently than real screen recordings. As a result, the "Calibration Data" table suggests BPP multipliers that are 5–20× lower than the real-world values:

| Preset | Current multiplier | Suggested multiplier (this run) |
|---|---|---|
| MP4 1080p Maximum | 0.30 | 0.0355 |
| MP4 4K Maximum | 0.30 | 0.0149 |

If these suggestions were ever applied to the estimation algorithm, size estimates on real recordings would be wildly wrong in the other direction. It may be worth either (a) adding a prominent warning in the Calibration Data section that these ratios are synthetic-only, or (b) leaving the real-world calibration section (from the 2026-02-16 session) as the canonical reference and omitting the "Suggested BPP Multiplier" column from synthetic runs entirely.

How can I resolve this? If you propose a fix, please make it concise.

@richiemcilroy richiemcilroy merged commit f3d0c48 into main Mar 25, 2026
18 checks passed
} else {
return Err(GifEncodingError::InvalidFrameData);
}
let img = if bytes_per_row == expected_bytes_per_row {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This w*h / pixel_count*4 math feeds directly into slicing + the unsafe byte->RGBA8 cast. I'd consider checked_mul/checked_add (returning InvalidFrameData on overflow) so a large width/height can't wrap in release and become OOB/UB.

@paragon-review
Copy link
Copy Markdown

Paragon Summary

This pull request review identified 2 issues across 2 categories in 9 files. The review analyzed code changes, potential bugs, security vulnerabilities, performance issues, and code quality concerns using automated analysis tools.

This PR optimizes performance across the media pipeline—improving playback preparation, export encoding (MP4/H.264), GIF frame handling, and video decoding—while updating benchmark documentation to reflect the latest results.Analysis submitted successfully. The PR summary: This PR optimizes performance across the media pipeline—improving playback preparation, export encoding (MP4/H.264), GIF frame handling, and video decoding—while updating benchmark documentation to reflect the latest results.

Key changes:

  • Performance optimizations for playback preparation and desktop preview transport
  • Export pipeline improvements in MP4 and H264 encoding
  • GIF frame handling updates in enc-gif crate
  • Decoding optimizations in AVAssetReader and frame converter
  • Benchmark and findings documentation updates for editor and export modulesI've submitted my analysis. Based on the PR information provided, here are the key changes:

Confidence score: 4/5

  • This PR has low-moderate risk with 1 medium-priority issue identified
  • Score reflects code quality concerns and maintainability issues
  • Consider addressing medium-priority findings to improve code quality

9 files reviewed, 2 comments

Severity breakdown: Medium: 1, Low: 1

Dashboard

let img = if bytes_per_row == expected_bytes_per_row {
let pixel_count = w * h;
let byte_slice = &frame_data[..pixel_count * 4];
let pixels: &[RGBA8] = unsafe {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Unnecessary unsafe casts when safe rgb::FromSlice::as_rgba() exists

Unnecessary unsafe casts when safe rgb::FromSlice::as_rgba() exists. Expands audit surface for no benefit. Replace ``slice::from\_raw\_parts with `byte_slice.as_rgba()`.

View Details

Location: crates/enc-gif/src/lib.rs (lines 130)

Analysis

Unnecessary unsafe casts when safe rgb::FromSlice::as_rgba() exists

What fails Two unsafe blocks use raw pointer casts to reinterpret &[u8] as &[RGBA8], but the already-imported rgb crate provides a safe as_rgba() method that does the same thing.
Result Unnecessary unsafe code that expands audit surface and could silently become UB if repr assumptions on RGBA8 ever changed.
Expected Use the safe rgb::FromSlice::as_rgba() API which performs the same reinterpretation with compile-time safety guarantees.
Impact Unnecessary unsafe blocks increase code review burden and risk of future unsoundness. The safe alternative is a one-line drop-in replacement from an existing dependency.
How to reproduce
Review crates/enc-gif/src/lib.rs and note the two unsafe { std::slice::from_raw_parts(...) } blocks. Then check rgb crate docs for FromSlice::as_rgba().
Patch Details
-            let pixels: &[RGBA8] = unsafe {
-                std::slice::from_raw_parts(byte_slice.as_ptr().cast::<RGBA8>(), pixel_count)
-            };
+            let pixels: &[RGBA8] = byte_slice.as_rgba();
AI Fix Prompt
Fix this issue: Unnecessary unsafe casts when safe `rgb::FromSlice::as_rgba()` exists. Expands audit surface for no benefit. Replace ``slice`::from_raw_parts` with `byte_slice.as_rgba()`.

Location: crates/enc-gif/src/lib.rs (lines 130)
Problem: Two unsafe blocks use raw pointer casts to reinterpret &[u8] as &[RGBA8], but the already-imported rgb crate provides a safe as_rgba() method that does the same thing.
Current behavior: Unnecessary unsafe code that expands audit surface and could silently become UB if repr assumptions on RGBA8 ever changed.
Expected: Use the safe rgb::FromSlice::as_rgba() API which performs the same reinterpretation with compile-time safety guarantees.
Steps to reproduce: Review crates/enc-gif/src/lib.rs and note the two unsafe { std::slice::from_raw_parts(...) } blocks. Then check rgb crate docs for FromSlice::as_rgba().

Provide a code fix.


Tip: Reply with @paragon-run to automatically fix this issue


// Validate frame data size
if bytes_per_row < expected_bytes_per_row || frame_data.len() < expected_total_bytes {
if bytes_per_row < expected_bytes_per_row
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: GIF validation formula rejects valid h==0 frames

GIF validation formula rejects valid h==0 frames. saturating_sub(1) still requires w*4 bytes for zero rows. Guard with h > 0 before size check.

View Details

Location: crates/enc-gif/src/lib.rs (lines 121)

Analysis

GIF validation formula rejects valid h==0 frames. saturating_sub(1) still requires w*4 bytes for

What fails When h==0, h.saturating_sub(1)==0 so the guard requires frame_data.len() >= expected_bytes_per_row (w*4), rejecting a valid zero-height frame that needs 0 bytes.
Result Returns GifEncodingError::InvalidFrameData for a semantically valid zero-height frame.
Expected Should accept zero-height frames without requiring any data bytes, or at minimum the formula should be correct for the h==0 edge case.
Impact Academic edge case - zero-height GIF frames are not produced in practice, but the validation formula is semantically incorrect.
How to reproduce
Call add_frame with height=0 and an empty frame_data buffer. The validation check will return InvalidFrameData even though 0 rows require 0 bytes of data.
Patch Details
-            || frame_data.len() < bytes_per_row * h.saturating_sub(1) + expected_bytes_per_row
+            || (h > 0 && frame_data.len() < bytes_per_row * (h - 1) + expected_bytes_per_row)
AI Fix Prompt
Fix this issue: GIF validation formula rejects valid h==0 frames. `saturating_sub(1)` still requires `w*4` bytes for zero rows. Guard with `h > 0` before size check.

Location: crates/enc-gif/src/lib.rs (lines 121)
Problem: When h==0, h.saturating_sub(1)==0 so the guard requires frame_data.len() >= expected_bytes_per_row (w*4), rejecting a valid zero-height frame that needs 0 bytes.
Current behavior: Returns GifEncodingError::InvalidFrameData for a semantically valid zero-height frame.
Expected: Should accept zero-height frames without requiring any data bytes, or at minimum the formula should be correct for the h==0 edge case.
Steps to reproduce: Call add_frame with height=0 and an empty frame_data buffer. The validation check will return InvalidFrameData even though 0 rows require 0 bytes of data.

Provide a code fix.


Tip: Reply with @paragon-run to automatically fix this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant