Skip to content

perf: optimize ffmpeg video saving#1198

Open
starrkk wants to merge 3 commits into
ModelTC:mainfrom
starrkk:codex/ffmpeg-video-save-perf
Open

perf: optimize ffmpeg video saving#1198
starrkk wants to merge 3 commits into
ModelTC:mainfrom
starrkk:codex/ffmpeg-video-save-perf

Conversation

@starrkk

@starrkk starrkk commented Jun 30, 2026

Copy link
Copy Markdown

Summary

  • add optional LIGHTX2V_FFMPEG_PRESET support for ffmpeg/x264 video output
  • bulk-write contiguous frames to ffmpeg when no padding is required

Why

Video encoding can be a visible part of end-to-end generation latency. This keeps default behavior unchanged while allowing deployments to choose a faster x264 preset such as veryfast.

Validation

  • branch rebuilt on latest ModelTC/LightX2V:main (89dfa833)
  • git diff --check passed for the PR branch
  • local save_to_video smoke test passed in the Hygon runtime container

(cherry picked from commit 673a80dc1c566109f90e9b8068eba864b42c9aa9)

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for configuring the FFmpeg preset via the LIGHTX2V_FFMPEG_PRESET environment variable and optimizes frame writing to FFmpeg's stdin by writing the entire contiguous array at once when dimensions match. The review feedback suggests further optimizing memory usage and CPU overhead by writing the numpy arrays directly to process.stdin instead of calling .tobytes(), which creates unnecessary in-memory copies.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread lightx2v/utils/utils.py Outdated
Comment on lines +328 to +329
if frames.shape[1] == height and frames.shape[2] == width:
process.stdin.write(np.ascontiguousarray(frames).tobytes())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling .tobytes() on a numpy array creates a full copy of the array data as a Python bytes object in memory. For large video tensors, this can cause a significant memory spike and unnecessary CPU overhead.

Since frames is already a contiguous numpy array (due to the .copy() on line 252), and process.stdin.write() accepts any object implementing the buffer protocol (including contiguous numpy arrays), you can write frames directly to the pipe without any copying.

Suggested change
if frames.shape[1] == height and frames.shape[2] == width:
process.stdin.write(np.ascontiguousarray(frames).tobytes())
if frames.shape[1] == height and frames.shape[2] == width:
process.stdin.write(frames)

Comment thread lightx2v/utils/utils.py Outdated
Comment on lines +331 to +336
for frame in frames:
if frame.shape[0] < height or frame.shape[1] < width:
padded = np.zeros((height, width, 3), dtype=np.uint8)
padded[: frame.shape[0], : frame.shape[1]] = frame
frame = padded
process.stdin.write(frame.tobytes())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similarly, calling frame.tobytes() here creates a temporary bytes object copy for every single frame. Since frame is already contiguous (either as a slice of the contiguous frames array or as a newly allocated padded array), you can write frame directly to process.stdin to avoid these per-frame allocations and copies.

Suggested change
for frame in frames:
if frame.shape[0] < height or frame.shape[1] < width:
padded = np.zeros((height, width, 3), dtype=np.uint8)
padded[: frame.shape[0], : frame.shape[1]] = frame
frame = padded
process.stdin.write(frame.tobytes())
for frame in frames:
if frame.shape[0] < height or frame.shape[1] < width:
padded = np.zeros((height, width, 3), dtype=np.uint8)
padded[: frame.shape[0], : frame.shape[1]] = frame
frame = padded
process.stdin.write(frame)

@starrkk starrkk marked this pull request as ready for review June 30, 2026 09:24
@helloyongyang

Copy link
Copy Markdown
Contributor

We will check it.

@helloyongyang

Copy link
Copy Markdown
Contributor

Could you explain the purpose of these changes:
-an
ffmpeg_preset
padded

@starrkk

starrkk commented Jul 1, 2026

Copy link
Copy Markdown
Author

@helloyongyang

-an and the padding logic were already existing behaviors; this PR preserves both.

This PR mainly introduces two changes:

When the frame size already matches the target size, it writes the contiguous numpy frame buffer directly to ffmpeg stdin, avoiding per-frame .tobytes() copies and repeated Python write calls.

It adds LIGHTX2V_FFMPEG_PRESET to pass an x264 preset, which controls the tradeoff between encoding speed and compression ratio. For example, ultrafast is the fastest but usually produces larger files, medium is the x264 default, and slow/slower can improve compression but take longer to encode. The value is now constrained to the standard preset enum, and invalid values are ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants