Problem
Loading COCO-format instance segmentation datasets with polygon annotations through the experimental API is ~7.7x slower than equivalent Ultralytics dataloader.
| Pipeline |
Time per sample (ms) |
Relative |
| Ultralytics (baseline) |
24.5 |
1.0x |
| Datumaro experimental API |
189.6 |
7.7x |
Bottleneck
cProfile breakdown per sample:
| Function |
Time (ms) |
% |
mask_converters.convert() |
45 |
24% |
polygons_to_instance_masks() |
33 |
17% |
numpy.stack() |
15 |
8% |
Polars collect() / with_columns() |
45 |
24% |
Location: datumaro/experimental/converters/mask_converters.py
Root cause
Ultralytics converts polygons to masks using a tight C loop via cv2.fillPoly (~1-2ms for 10 instances). Datumaro uses Polars DataFrame operations with Python-level iteration and repeated numpy.stack calls.
Reproduction
from datumaro.experimental import Dataset
ds = Dataset.from_coco("path/to/coco", subset="train")
ds = ds.select(["image", "instance_mask"])
for i in range(100):
sample = ds[i]
_ = sample["instance_mask"] # triggers conversion
Dataset: any COCO instance segmentation with polygon annotations (e.g., WGISD, COCO val2017).
Expected
Polygon-to-mask conversion performance should match or exceed cv2.fillPoly baseline.
Environment
- datumaro 1.11.0
- Python 3.11/3.13
- Linux x86_64
Problem
Loading COCO-format instance segmentation datasets with polygon annotations through the experimental API is ~7.7x slower than equivalent Ultralytics dataloader.
Bottleneck
cProfile breakdown per sample:
mask_converters.convert()polygons_to_instance_masks()numpy.stack()collect()/with_columns()Location:
datumaro/experimental/converters/mask_converters.pyRoot cause
Ultralytics converts polygons to masks using a tight C loop via
cv2.fillPoly(~1-2ms for 10 instances). Datumaro uses Polars DataFrame operations with Python-level iteration and repeatednumpy.stackcalls.Reproduction
Dataset: any COCO instance segmentation with polygon annotations (e.g., WGISD, COCO val2017).
Expected
Polygon-to-mask conversion performance should match or exceed cv2.fillPoly baseline.
Environment