Skip to content

[Kernel] feat: add Metal support for Apple Silicon GPU#1415

Draft
AlpinDale wants to merge 9 commits into
mainfrom
metal-backend
Draft

[Kernel] feat: add Metal support for Apple Silicon GPU#1415
AlpinDale wants to merge 9 commits into
mainfrom
metal-backend

Conversation

@AlpinDale
Copy link
Copy Markdown
Collaborator

@AlpinDale AlpinDale commented Aug 12, 2025

Adding native support to Apple M-series GPUs through Metal shading language for the kernels. Currently, attention is implemented through Torch SDPA's MPS backend, and custom paged attention metal kernels.

To test, first make sure xcode is installed:

$ xcode-select --install

Then configure the path:

$ xcode-select --switch $(xcode-select --print-path)

Then build:

$ APHRODITE_TARGET_DEVICE=mps pip install -e .

You can run the API server:

$ aphrodite run Qwen/Qwen3-0.6B --no-enable-chunked-prefill --no-enable-prefix-caching

Benchmarks

Qwen3-0.6B BF16, Apple M4 Pro (MacBook), Batch Size 1:

CPU Backend:

Prefill: 110.1 tokens/s, Decode: 22.1 tokens/s

MPS Backend:

Prefill: 2415.7 tokens/s, Decode: 14.8 tokens/s

MLX (LMStudio):

Prefill: 1318.4 tok/s, Decode: 138.53

Currently, prefill is leagues faster but decode takes a hit. Needs more work. The MPS backend also has accuracy issues, so we get incorrect outputs (not NaN or nonsense).

Tests

$ pytest tests/kernels/attention/test_attention.py
$ pytest tests/kernels/core/test_activation.py
$ pytest tests/kernels/core/test_layernorm.py
$ pytest tests/kernels/core/test_pos_encoding.py

TODO

  • Implement Paged Attention (based on mistral-rs's kernels)
  • Implement cache kernels
  • Implement Flash Attention
  • Implement LayerNorm kernels
  • Implement Activation kernels
  • Implement Rotary Embedding kernels
  • Implement MLX quantization (if possible)
  • Fix incorrect outputs
  • torch.compile torch.compile on MPS progress tracker pytorch/pytorch#150121
  • Implement Chunked Prefill
  • Implement Prefix Caching

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant