feat: add OpenAI /v1/completions adapter for vLLM gpt-oss-120b accuracy#308
Draft
feat: add OpenAI /v1/completions adapter for vLLM gpt-oss-120b accuracy#308
Conversation
Adds APIType.OPENAI_COMPLETIONS routing to /v1/completions, which accepts pre-tokenized token ID arrays and bypasses vLLM's chat template — required for gpt-oss-120b where the Harmony format must be applied client-side. - Add APIType.OPENAI_COMPLETIONS with default_route "/v1/completions" - Add TextCompletionRequest/Response/SSE msgspec types - Add OpenAITextCompletionsAdapter (mirrors SGLang adapter, reuses OpenAISSEAccumulator) - Register adapter and accumulator in endpoint_client/config.py - Rename gptoss → gptoss_sglang presets; add gptoss_vllm across aime25/gpqa/livecodebench - Update sglang_gptoss_120b_example.yaml to use gptoss_sglang presets - Update vllm_gptoss_120b_example.yaml to use openai_completions + gptoss_vllm presets - Add 18 unit tests covering adapter, SSE, preset existence, and APIType integration fix: move lazy test imports to module level; fix decode_sse_message return type - Move all inline imports in test_completions_adapter.py to file-level - Add test for empty-text SSE choice path - Fix HttpRequestAdapter.decode_sse_message abstract annotation from str -> Any (SGLang and completions adapters both return SSEDelta structs, not str) examples/04_GPTOSS120B_Example/Readme.md: - Replace stale chat-completions note with accurate openai_completions description - Update performance-only vLLM api_type reference from "openai" to "openai_completions"
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Code Review
This pull request introduces a new openai_completions API type and adapter to support the OpenAI /v1/completions endpoint, enabling the use of pre-tokenized input with vLLM. This change allows users to bypass server-side chat templates, ensuring parity with SGLang results for specific models like gpt-oss-120b. The implementation includes the OpenAITextCompletionsAdapter, updated configuration templates, documentation, and new unit tests. I have no feedback to provide.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds APIType.OPENAI_COMPLETIONS routing to /v1/completions, which accepts pre-tokenized token ID arrays and bypasses vLLM's chat template — required for gpt-oss-120b where the Harmony format must be applied client-side.
fix: move lazy test imports to module level; fix decode_sse_message return type
examples/04_GPTOSS120B_Example/Readme.md:
What does this PR do?
Type of change
Related issues
Testing
Checklist