perf: optimize ffmpeg video saving#1198
Conversation
(cherry picked from commit 673a80dc1c566109f90e9b8068eba864b42c9aa9)
There was a problem hiding this comment.
Code Review
This pull request introduces support for configuring the FFmpeg preset via the LIGHTX2V_FFMPEG_PRESET environment variable and optimizes frame writing to FFmpeg's stdin by writing the entire contiguous array at once when dimensions match. The review feedback suggests further optimizing memory usage and CPU overhead by writing the numpy arrays directly to process.stdin instead of calling .tobytes(), which creates unnecessary in-memory copies.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if frames.shape[1] == height and frames.shape[2] == width: | ||
| process.stdin.write(np.ascontiguousarray(frames).tobytes()) |
There was a problem hiding this comment.
Calling .tobytes() on a numpy array creates a full copy of the array data as a Python bytes object in memory. For large video tensors, this can cause a significant memory spike and unnecessary CPU overhead.
Since frames is already a contiguous numpy array (due to the .copy() on line 252), and process.stdin.write() accepts any object implementing the buffer protocol (including contiguous numpy arrays), you can write frames directly to the pipe without any copying.
| if frames.shape[1] == height and frames.shape[2] == width: | |
| process.stdin.write(np.ascontiguousarray(frames).tobytes()) | |
| if frames.shape[1] == height and frames.shape[2] == width: | |
| process.stdin.write(frames) |
| for frame in frames: | ||
| if frame.shape[0] < height or frame.shape[1] < width: | ||
| padded = np.zeros((height, width, 3), dtype=np.uint8) | ||
| padded[: frame.shape[0], : frame.shape[1]] = frame | ||
| frame = padded | ||
| process.stdin.write(frame.tobytes()) |
There was a problem hiding this comment.
Similarly, calling frame.tobytes() here creates a temporary bytes object copy for every single frame. Since frame is already contiguous (either as a slice of the contiguous frames array or as a newly allocated padded array), you can write frame directly to process.stdin to avoid these per-frame allocations and copies.
| for frame in frames: | |
| if frame.shape[0] < height or frame.shape[1] < width: | |
| padded = np.zeros((height, width, 3), dtype=np.uint8) | |
| padded[: frame.shape[0], : frame.shape[1]] = frame | |
| frame = padded | |
| process.stdin.write(frame.tobytes()) | |
| for frame in frames: | |
| if frame.shape[0] < height or frame.shape[1] < width: | |
| padded = np.zeros((height, width, 3), dtype=np.uint8) | |
| padded[: frame.shape[0], : frame.shape[1]] = frame | |
| frame = padded | |
| process.stdin.write(frame) |
|
We will check it. |
|
Could you explain the purpose of these changes: |
|
-an and the padding logic were already existing behaviors; this PR preserves both. This PR mainly introduces two changes: When the frame size already matches the target size, it writes the contiguous numpy frame buffer directly to ffmpeg stdin, avoiding per-frame .tobytes() copies and repeated Python write calls. It adds LIGHTX2V_FFMPEG_PRESET to pass an x264 preset, which controls the tradeoff between encoding speed and compression ratio. For example, ultrafast is the fastest but usually produces larger files, medium is the x264 default, and slow/slower can improve compression but take longer to encode. The value is now constrained to the standard preset enum, and invalid values are ignored. |
Summary
LIGHTX2V_FFMPEG_PRESETsupport for ffmpeg/x264 video outputWhy
Video encoding can be a visible part of end-to-end generation latency. This keeps default behavior unchanged while allowing deployments to choose a faster x264 preset such as
veryfast.Validation
ModelTC/LightX2V:main(89dfa833)git diff --checkpassed for the PR branchsave_to_videosmoke test passed in the Hygon runtime container