Commit 5cdb841
authored
fix: insert correct number of image tokens for multi-image in Cambrians model (#1075)
When a benchmark (e.g. MindCube) provides multiple images with a plain
string prompt, the Cambrians model wrapper only inserted a single
<image> token into the prompt regardless of the actual image count.
This caused the downstream cambrian library to take the single-image
code path while the image features tensor was shaped for multiple
images, resulting in a dimension mismatch RuntimeError.
Use len(visual_sizes) to insert one <image> token per image so the
cambrian library correctly routes to its multi-image concatenation
path.1 parent 79e9d3f commit 5cdb841
1 file changed
Lines changed: 4 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
350 | 350 | | |
351 | 351 | | |
352 | 352 | | |
| 353 | + | |
353 | 354 | | |
354 | | - | |
| 355 | + | |
| 356 | + | |
355 | 357 | | |
356 | | - | |
357 | | - | |
| 358 | + | |
358 | 359 | | |
359 | 360 | | |
360 | 361 | | |
| |||
0 commit comments