Skip to content

Commit b94ff54

Browse files
Copilotanxiangsir
andcommitted
Add clarifying comments for image paths and tensor transformations
Co-authored-by: anxiangsir <31175974+anxiangsir@users.noreply.github.com>
1 parent 0a658e9 commit b94ff54

2 files changed

Lines changed: 5 additions & 4 deletions

File tree

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -196,7 +196,7 @@ preprocessor = AutoImageProcessor.from_pretrained(
196196
)
197197

198198
# Image inference: [B, C, H, W]
199-
image = Image.open("path/to/image.jpg")
199+
image = Image.open("path/to/your/image.jpg") # Replace with your image path
200200
pixel_values = preprocessor(images=image, return_tensors="pt")["pixel_values"].to("cuda")
201201
with torch.no_grad():
202202
outputs = model(pixel_values)
@@ -205,10 +205,11 @@ with torch.no_grad():
205205

206206
# Video inference: [B, C, T, H, W] with visible_indices
207207
num_frames, frame_tokens, target_frames = 16, 256, 64
208-
# Load video frames and preprocess each frame
208+
# Load video frames and preprocess each frame (replace with your video frame paths)
209209
frames = [Image.open(f"path/to/frame_{i}.jpg") for i in range(num_frames)]
210210
video_pixel_values = preprocessor(images=frames, return_tensors="pt")["pixel_values"]
211-
video = video_pixel_values.permute(1, 0, 2, 3).unsqueeze(0).to("cuda") # [B, C, T, H, W]
211+
# Reshape from [T, C, H, W] to [B, C, T, H, W]
212+
video = video_pixel_values.permute(1, 0, 2, 3).unsqueeze(0).to("cuda")
212213

213214
# Build visible_indices for temporal sampling
214215
frame_pos = torch.linspace(0, target_frames - 1, num_frames).long().cuda()

onevision_encoder/modeling_onevision_encoder.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -533,7 +533,7 @@ def forward(
533533
534534
>>> model = AutoModel.from_pretrained("lmms-lab/onevision-encoder-large", trust_remote_code=True)
535535
>>> preprocessor = AutoImageProcessor.from_pretrained("lmms-lab/onevision-encoder-large", trust_remote_code=True)
536-
>>> image = Image.open("path/to/image.jpg")
536+
>>> image = Image.open("path/to/your/image.jpg") # Replace with your image path
537537
>>> pixel_values = preprocessor(images=image, return_tensors="pt")["pixel_values"]
538538
>>> outputs = model(pixel_values)
539539
>>> last_hidden_states = outputs.last_hidden_state

0 commit comments

Comments
 (0)