Skip to content

Commit e7cddaa

Browse files
Copilotanxiangsir
andcommitted
Fix video tensor permutation order for correct [B, C, T, H, W] shape
Co-authored-by: anxiangsir <31175974+anxiangsir@users.noreply.github.com>
1 parent b94ff54 commit e7cddaa

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,7 @@ num_frames, frame_tokens, target_frames = 16, 256, 64
209209
frames = [Image.open(f"path/to/frame_{i}.jpg") for i in range(num_frames)]
210210
video_pixel_values = preprocessor(images=frames, return_tensors="pt")["pixel_values"]
211211
# Reshape from [T, C, H, W] to [B, C, T, H, W]
212-
video = video_pixel_values.permute(1, 0, 2, 3).unsqueeze(0).to("cuda")
212+
video = video_pixel_values.unsqueeze(0).permute(0, 2, 1, 3, 4).to("cuda")
213213

214214
# Build visible_indices for temporal sampling
215215
frame_pos = torch.linspace(0, target_frames - 1, num_frames).long().cuda()

0 commit comments

Comments
 (0)