feat(server): add Anthropic Messages API endpoint (/v1/messages)#778
Open
carlushuang wants to merge 3 commits into
Open
feat(server): add Anthropic Messages API endpoint (/v1/messages)#778carlushuang wants to merge 3 commits into
carlushuang wants to merge 3 commits into
Conversation
MiniMax M2.7's chat template injects <think> as part of the prompt, so the model output contains only </think> (no <think> start tag). The reasoning parser now splits at </think> even without a preceding <think> in both non-streaming (separate_reasoning) and streaming (ReasoningFilter) paths.
Replace MiniMax-M2.5 → M2.7 and M2.5-MXFP4 → M2.7-MXFP4 across all benchmark and accuracy configs. Same architecture (MiniMaxM2ForCausalLM), M2.7 has better-trained weights. Updated accuracy baselines from M2.7 HF card: gsm8k=0.9181 (BF16), MXFP4=0.9189. MXFP4 model: amd/MiniMax-M2.7-MXFP4 (Quark quantized). Local perf verified on MI355X: M2.7 BF16 TP=2 matches M2.5 dashboard numbers within noise (817 vs 808 tok/s at c=4, 4745 vs 4685 at c=64).
581f897 to
4c104f9
Compare
Enables Claude Code and other Anthropic-compatible tools to use ATOM as a backend. Translates between Anthropic Messages format and ATOM's internal OpenAI format. Supports: - Non-streaming and streaming responses - System messages, multi-turn conversations - Thinking/reasoning content separation (via ReasoningFilter) - Anthropic SSE event format (message_start, content_block_delta, etc.) - Tool definitions translation (Anthropic → OpenAI format) Usage with Claude Code: ANTHROPIC_BASE_URL=http://localhost:8000 \ ANTHROPIC_AUTH_TOKEN=dummy \ ANTHROPIC_MODEL=MiniMax-M2.7 \ claude
4c104f9 to
298a7a8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add
/v1/messagesendpoint to ATOM's OpenAI server, enabling Claude Code and other Anthropic-compatible tools to use ATOM as a backend.Depends on PR #775 (MiniMax M2.7 reasoning parser fix).
What it does
Translates between Anthropic Messages API format and ATOM's internal OpenAI format:
Features
message_start,content_block_start,content_block_delta,content_block_stop,message_delta,message_stop<think>blocks →thinkingcontent blocks (via ReasoningFilter)New files
atom/entrypoints/openai/serving_anthropic.py— request/response schemas, format converters, SSE helpersUsage with Claude Code
Verified on
--print "Say hello world"→Hello, world!--print "Write is_prime function"→ correct Python codethinkingblocksTest plan
/v1/messagesreturns correct Anthropic format/v1/messagesreturns correct SSE eventsthinkingcontent blocks