Skip to content

results: m4-max-128gb-40gpu#6

Open
snagnever wants to merge 4 commits into
famstack-dev:mainfrom
snagnever:results/m4-max-128gb-40gpu
Open

results: m4-max-128gb-40gpu#6
snagnever wants to merge 4 commits into
famstack-dev:mainfrom
snagnever:results/m4-max-128gb-40gpu

Conversation

@snagnever

Copy link
Copy Markdown

Benchmark results from m4-max-128gb-40gpu using lmstudio.

  • Hardware: Apple M4 Max / 128GB / 40 GPU cores
  • Backend: lmstudio
  • Model: qwen3.6-27b-dense-mlx-6bit
  • Scenarios: creative-writing, doc-summary, ops-agent, prefill-test

🤖 Generated with Claude Code

Vitor de Araujo and others added 4 commits May 17, 2026 13:54
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Effective-tokens/sec sweep across the four Gemma 4 variants on the
Mac Studio M4 Max 128 GB rig. Each model has 4 scenarios (ops-agent,
doc-summary, prefill-test, creative-writing), capturing total wall-
clock including prefill.

Headlines:
  - 26B-A4B @4bit: 100.3 gen t/s ops-agent — fastest model on the rig
    (beats every Phase 1 model)
  - 26B-A4B @6bit: 80.8 gen t/s — ~4x faster than 27b dense
  - 31B dense:     13.7 gen t/s — 6x slower than @6bit for no quality
                    win; demoted from rotation
  - e4b (4B):      70.9 gen t/s — useful for FIM / quick-call slot only

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant