Enable batch inference for image slices#1298
Conversation
|
Thanks a lot for your work! I can't get it working. Could you help look into this issue? failed tests: errors I got: (on both yolo11 and yolo26 models) |
JiwaniZakir
left a comment
There was a problem hiding this comment.
In ultralytics.py, the else branch of perform_per_image_batch_inference (the standard detection path, no mask/OBB) has a clear bug: it iterates over prediction_result (the full batch) instead of image_result (the single image's result from the outer loop). This means for every image in the batch, processed_result will contain boxes from all images, causing temp_shift_idxs to be wildly inflated and predictions to be duplicated across every slice. It should be [result.boxes.data for result in image_result], matching the pattern used in the has_mask branch.
Additionally, self._original_shape = image_list[0].shape assumes all slices share the same dimensions, which breaks for boundary slices that are typically smaller than interior ones — this could silently corrupt coordinate transformations downstream.
The type annotation temp_results_list: torch.Tensor | np.ndarray[...] = [] is also misleading; the variable is always a plain list and the union type implies it could be a tensor or array, which it never is.
Override perform_batch_inference to pass image lists directly to YOLO model for true batch processing. Extract _extract_predictions helper to avoid duplication. Store per-image shapes for correct mask resizing in batch mode. Closes #1113, closes #1298, closes #1318 Signed-off-by: Onuralp SEZER <thunderbirdtr@gmail.com>
This PR is my attempt at implementing batch inference over all slices of an image. I decided to work on this because I tried the existing pull requests addressing this feature, but unfortunately they did not work in my case.
My current implementation is still somewhat hacky and it is limited to Ultralytics models, but I made sure to include tests demonstrating that it works as intended. I plan to continue refining it as time permits, and I hope others in the community find it useful.