[RT-DETRv2] MPS crash: build_2d_sinusoidal_position_embedding hardcodes torch.float64, breaking Apple Silicon / MPS inference

### System Info

```
- `transformers` version: 5.9.0
- Platform: macOS (Apple Silicon, MPS backend)
- Python version: 3.13
- PyTorch version: (MPS-enabled build)
- `docling` version: 2.95.0
- `docling-ibm-models` version: 3.13.2
```

### Reproduction

Run `RTDetrV2ForObjectDetection` inference on any Apple Silicon Mac (MPS device). Triggered in practice via `docling` → `docling-ibm-models` → `transformers`, but reproducible with:

```python
import torch
from transformers import RTDetrV2ForObjectDetection, RTDetrImageProcessor
from PIL import Image
import requests

device = torch.device("mps")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

processor = RTDetrImageProcessor.from_pretrained("PekingU/rtdetr_v2_r18vd")
model = RTDetrV2ForObjectDetection.from_pretrained("PekingU/rtdetr_v2_r18vd").to(device)

inputs = processor(images=image, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)  # crashes here
```

**Error:**
```
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
```

Traceback points to `modeling_rt_detr_v2.py` line 988, inside `build_2d_sinusoidal_position_embedding`:
```python
omega = torch.arange(pos_dim, dtype=torch.float64, device=device) / pos_dim
```

### Expected behavior

Inference should succeed on MPS devices.

**Root cause:** `build_2d_sinusoidal_position_embedding` already accepts a `dtype` parameter (defaulting to `torch.float32`), and the caller at line 1077 correctly passes `dtype=hidden_states.dtype`. However, all internal tensor allocations hardcode `torch.float64`, ignoring the parameter entirely.

Note: the same function body is the canonical source in `modeling_vit_mae.py` and gets inlined into the generated `modeling_rt_detr.py` and `modeling_rt_detr_v2.py`, so all three files (or the source + a regeneration) would need updating.

Verified locally on Apple Silicon (MPS, transformers 5.9.0): crash without the fix, `logits shape: torch.Size([1, 300, 80])` with it.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RT-DETRv2] MPS crash: build_2d_sinusoidal_position_embedding hardcodes torch.float64, breaking Apple Silicon / MPS inference #46159

System Info

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[RT-DETRv2] MPS crash: build_2d_sinusoidal_position_embedding hardcodes torch.float64, breaking Apple Silicon / MPS inference #46159

Description

System Info

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions