Skip to content

[RT-DETRv2] MPS crash: build_2d_sinusoidal_position_embedding hardcodes torch.float64, breaking Apple Silicon / MPS inference #46159

@shubhammr21

Description

@shubhammr21

System Info

- `transformers` version: 5.9.0
- Platform: macOS (Apple Silicon, MPS backend)
- Python version: 3.13
- PyTorch version: (MPS-enabled build)
- `docling` version: 2.95.0
- `docling-ibm-models` version: 3.13.2

Reproduction

Run RTDetrV2ForObjectDetection inference on any Apple Silicon Mac (MPS device). Triggered in practice via doclingdocling-ibm-modelstransformers, but reproducible with:

import torch
from transformers import RTDetrV2ForObjectDetection, RTDetrImageProcessor
from PIL import Image
import requests

device = torch.device("mps")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

processor = RTDetrImageProcessor.from_pretrained("PekingU/rtdetr_v2_r18vd")
model = RTDetrV2ForObjectDetection.from_pretrained("PekingU/rtdetr_v2_r18vd").to(device)

inputs = processor(images=image, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)  # crashes here

Error:

TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

Traceback points to modeling_rt_detr_v2.py line 988, inside build_2d_sinusoidal_position_embedding:

omega = torch.arange(pos_dim, dtype=torch.float64, device=device) / pos_dim

Expected behavior

Inference should succeed on MPS devices.

Root cause: build_2d_sinusoidal_position_embedding already accepts a dtype parameter (defaulting to torch.float32), and the caller at line 1077 correctly passes dtype=hidden_states.dtype. However, all internal tensor allocations hardcode torch.float64, ignoring the parameter entirely.

Note: the same function body is the canonical source in modeling_vit_mae.py and gets inlined into the generated modeling_rt_detr.py and modeling_rt_detr_v2.py, so all three files (or the source + a regeneration) would need updating.

Verified locally on Apple Silicon (MPS, transformers 5.9.0): crash without the fix, logits shape: torch.Size([1, 300, 80]) with it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions