System Info
- `transformers` version: 5.9.0
- Platform: macOS (Apple Silicon, MPS backend)
- Python version: 3.13
- PyTorch version: (MPS-enabled build)
- `docling` version: 2.95.0
- `docling-ibm-models` version: 3.13.2
Reproduction
Run RTDetrV2ForObjectDetection inference on any Apple Silicon Mac (MPS device). Triggered in practice via docling → docling-ibm-models → transformers, but reproducible with:
import torch
from transformers import RTDetrV2ForObjectDetection, RTDetrImageProcessor
from PIL import Image
import requests
device = torch.device("mps")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
processor = RTDetrImageProcessor.from_pretrained("PekingU/rtdetr_v2_r18vd")
model = RTDetrV2ForObjectDetection.from_pretrained("PekingU/rtdetr_v2_r18vd").to(device)
inputs = processor(images=image, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs) # crashes here
Error:
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
Traceback points to modeling_rt_detr_v2.py line 988, inside build_2d_sinusoidal_position_embedding:
omega = torch.arange(pos_dim, dtype=torch.float64, device=device) / pos_dim
Expected behavior
Inference should succeed on MPS devices.
Root cause: build_2d_sinusoidal_position_embedding already accepts a dtype parameter (defaulting to torch.float32), and the caller at line 1077 correctly passes dtype=hidden_states.dtype. However, all internal tensor allocations hardcode torch.float64, ignoring the parameter entirely.
Note: the same function body is the canonical source in modeling_vit_mae.py and gets inlined into the generated modeling_rt_detr.py and modeling_rt_detr_v2.py, so all three files (or the source + a regeneration) would need updating.
Verified locally on Apple Silicon (MPS, transformers 5.9.0): crash without the fix, logits shape: torch.Size([1, 300, 80]) with it.
System Info
Reproduction
Run
RTDetrV2ForObjectDetectioninference on any Apple Silicon Mac (MPS device). Triggered in practice viadocling→docling-ibm-models→transformers, but reproducible with:Error:
Traceback points to
modeling_rt_detr_v2.pyline 988, insidebuild_2d_sinusoidal_position_embedding:Expected behavior
Inference should succeed on MPS devices.
Root cause:
build_2d_sinusoidal_position_embeddingalready accepts adtypeparameter (defaulting totorch.float32), and the caller at line 1077 correctly passesdtype=hidden_states.dtype. However, all internal tensor allocations hardcodetorch.float64, ignoring the parameter entirely.Note: the same function body is the canonical source in
modeling_vit_mae.pyand gets inlined into the generatedmodeling_rt_detr.pyandmodeling_rt_detr_v2.py, so all three files (or the source + a regeneration) would need updating.Verified locally on Apple Silicon (MPS, transformers 5.9.0): crash without the fix,
logits shape: torch.Size([1, 300, 80])with it.