Skip to content

feat(server): add Anthropic Messages API endpoint (/v1/messages)#778

Open
carlushuang wants to merge 3 commits into
mainfrom
carhuang/enable_anthropic_endp
Open

feat(server): add Anthropic Messages API endpoint (/v1/messages)#778
carlushuang wants to merge 3 commits into
mainfrom
carhuang/enable_anthropic_endp

Conversation

@carlushuang
Copy link
Copy Markdown
Contributor

Summary

Add /v1/messages endpoint to ATOM's OpenAI server, enabling Claude Code and other Anthropic-compatible tools to use ATOM as a backend.

Depends on PR #775 (MiniMax M2.7 reasoning parser fix).

What it does

Translates between Anthropic Messages API format and ATOM's internal OpenAI format:

Claude Code CLI → /v1/messages (Anthropic format)
       ↓
serving_anthropic.py (format translation)
       ↓
ATOM engine (any model, e.g., MiniMax M2.7)
       ↓
GPU inference

Features

  • Non-streaming and streaming responses
  • Anthropic SSE event format: message_start, content_block_start, content_block_delta, content_block_stop, message_delta, message_stop
  • Thinking/reasoning separation: <think> blocks → thinking content blocks (via ReasoningFilter)
  • System messages: string or content-block array
  • Tool definitions: Anthropic → OpenAI format translation
  • Tool use/result messages: bidirectional translation

New files

  • atom/entrypoints/openai/serving_anthropic.py — request/response schemas, format converters, SSE helpers

Usage with Claude Code

# 1. Start ATOM
python -m atom.entrypoints.openai_server --model MiniMaxAI/MiniMax-M2.7 \
  --trust-remote-code --kv_cache_dtype fp8 -tp 2 --server-port 8000

# 2. Configure Claude Code (~/.claude/settings.json)
{
  "env": {
    "ANTHROPIC_BASE_URL": "http://localhost:8000",
    "ANTHROPIC_AUTH_TOKEN": "dummy",
    "ANTHROPIC_MODEL": "MiniMax-M2.7",
    "DISABLE_PROMPT_CACHING": "1"
  }
}

# 3. Use Claude Code
claude

Verified on

  • MiniMax M2.7 on MI355X (gfx950), TP=2, FP8 KV
  • Claude Code --print "Say hello world"Hello, world!
  • Claude Code --print "Write is_prime function" → correct Python code
  • Streaming and non-streaming both work
  • Thinking content properly separated into thinking blocks

Test plan

  • Non-streaming /v1/messages returns correct Anthropic format
  • Streaming /v1/messages returns correct SSE events
  • Thinking/reasoning separated into thinking content blocks
  • Claude Code end-to-end: hello world, code generation, math
  • Tool calling (needs model with tool-call support)

MiniMax M2.7's chat template injects <think> as part of the prompt, so
the model output contains only </think> (no <think> start tag). The
reasoning parser now splits at </think> even without a preceding <think>
in both non-streaming (separate_reasoning) and streaming (ReasoningFilter)
paths.
Replace MiniMax-M2.5 → M2.7 and M2.5-MXFP4 → M2.7-MXFP4 across all
benchmark and accuracy configs. Same architecture (MiniMaxM2ForCausalLM),
M2.7 has better-trained weights.

Updated accuracy baselines from M2.7 HF card: gsm8k=0.9181 (BF16),
MXFP4=0.9189. MXFP4 model: amd/MiniMax-M2.7-MXFP4 (Quark quantized).

Local perf verified on MI355X: M2.7 BF16 TP=2 matches M2.5 dashboard
numbers within noise (817 vs 808 tok/s at c=4, 4745 vs 4685 at c=64).
@carlushuang carlushuang force-pushed the carhuang/enable_anthropic_endp branch 3 times, most recently from 581f897 to 4c104f9 Compare May 13, 2026 22:49
Enables Claude Code and other Anthropic-compatible tools to use ATOM
as a backend. Translates between Anthropic Messages format and ATOM's
internal OpenAI format.

Supports:
- Non-streaming and streaming responses
- System messages, multi-turn conversations
- Thinking/reasoning content separation (via ReasoningFilter)
- Anthropic SSE event format (message_start, content_block_delta, etc.)
- Tool definitions translation (Anthropic → OpenAI format)

Usage with Claude Code:
  ANTHROPIC_BASE_URL=http://localhost:8000 \
  ANTHROPIC_AUTH_TOKEN=dummy \
  ANTHROPIC_MODEL=MiniMax-M2.7 \
  claude
@carlushuang carlushuang force-pushed the carhuang/enable_anthropic_endp branch from 4c104f9 to 298a7a8 Compare May 14, 2026 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant