Skip to content

Commit 5cdb841

Browse files
authored
fix: insert correct number of image tokens for multi-image in Cambrians model (#1075)
When a benchmark (e.g. MindCube) provides multiple images with a plain string prompt, the Cambrians model wrapper only inserted a single <image> token into the prompt regardless of the actual image count. This caused the downstream cambrian library to take the single-image code path while the image features tensor was shaped for multiple images, resulting in a dimension mismatch RuntimeError. Use len(visual_sizes) to insert one <image> token per image so the cambrian library correctly routes to its multi-image concatenation path.
1 parent 79e9d3f commit 5cdb841

1 file changed

Lines changed: 4 additions & 3 deletions

File tree

lmms_eval/models/simple/cambrians.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -350,11 +350,12 @@ def __getitem__(self, idx):
350350
real_qs += DEFAULT_IMAGE_TOKEN + "\n"
351351
qs = real_qs
352352
elif isinstance(qs, str):
353+
num_images = len(visual_sizes)
353354
if self.model_config.mm_use_im_start_end:
354-
qs = DEFAULT_IM_START_TOKEN + DEFAULT_IMAGE_TOKEN + DEFAULT_IM_END_TOKEN + "\n" + qs
355+
image_tokens = (DEFAULT_IM_START_TOKEN + DEFAULT_IMAGE_TOKEN + DEFAULT_IM_END_TOKEN + "\n") * num_images
356+
qs = image_tokens + qs
355357
else:
356-
assert len(visual_tensors) == 1, "This should not happen."
357-
qs = DEFAULT_IMAGE_TOKEN * len(visual_tensors) + "\n" + qs
358+
qs = DEFAULT_IMAGE_TOKEN * num_images + "\n" + qs
358359
else:
359360
raise NotImplementedError
360361

0 commit comments

Comments
 (0)