Skip to content

feat: add OpenAI /v1/completions adapter for vLLM gpt-oss-120b accuracy#308

Draft
arekay-nv wants to merge 1 commit intomainfrom
arekay/openai-completions-adapter
Draft

feat: add OpenAI /v1/completions adapter for vLLM gpt-oss-120b accuracy#308
arekay-nv wants to merge 1 commit intomainfrom
arekay/openai-completions-adapter

Conversation

@arekay-nv
Copy link
Copy Markdown
Collaborator

Adds APIType.OPENAI_COMPLETIONS routing to /v1/completions, which accepts pre-tokenized token ID arrays and bypasses vLLM's chat template — required for gpt-oss-120b where the Harmony format must be applied client-side.

  • Add APIType.OPENAI_COMPLETIONS with default_route "/v1/completions"
  • Add TextCompletionRequest/Response/SSE msgspec types
  • Add OpenAITextCompletionsAdapter (mirrors SGLang adapter, reuses OpenAISSEAccumulator)
  • Register adapter and accumulator in endpoint_client/config.py
  • Rename gptoss → gptoss_sglang presets; add gptoss_vllm across aime25/gpqa/livecodebench
  • Update sglang_gptoss_120b_example.yaml to use gptoss_sglang presets
  • Update vllm_gptoss_120b_example.yaml to use openai_completions + gptoss_vllm presets
  • Add 18 unit tests covering adapter, SSE, preset existence, and APIType integration

fix: move lazy test imports to module level; fix decode_sse_message return type

  • Move all inline imports in test_completions_adapter.py to file-level
  • Add test for empty-text SSE choice path
  • Fix HttpRequestAdapter.decode_sse_message abstract annotation from str -> Any (SGLang and completions adapters both return SSEDelta structs, not str)

examples/04_GPTOSS120B_Example/Readme.md:

  • Replace stale chat-completions note with accurate openai_completions description
  • Update performance-only vLLM api_type reference from "openai" to "openai_completions"

What does this PR do?

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

Adds APIType.OPENAI_COMPLETIONS routing to /v1/completions, which accepts
pre-tokenized token ID arrays and bypasses vLLM's chat template — required
for gpt-oss-120b where the Harmony format must be applied client-side.

- Add APIType.OPENAI_COMPLETIONS with default_route "/v1/completions"
- Add TextCompletionRequest/Response/SSE msgspec types
- Add OpenAITextCompletionsAdapter (mirrors SGLang adapter, reuses OpenAISSEAccumulator)
- Register adapter and accumulator in endpoint_client/config.py
- Rename gptoss → gptoss_sglang presets; add gptoss_vllm across aime25/gpqa/livecodebench
- Update sglang_gptoss_120b_example.yaml to use gptoss_sglang presets
- Update vllm_gptoss_120b_example.yaml to use openai_completions + gptoss_vllm presets
- Add 18 unit tests covering adapter, SSE, preset existence, and APIType integration

fix: move lazy test imports to module level; fix decode_sse_message return type

- Move all inline imports in test_completions_adapter.py to file-level
- Add test for empty-text SSE choice path
- Fix HttpRequestAdapter.decode_sse_message abstract annotation from str -> Any
  (SGLang and completions adapters both return SSEDelta structs, not str)

examples/04_GPTOSS120B_Example/Readme.md:
- Replace stale chat-completions note with accurate openai_completions description
- Update performance-only vLLM api_type reference from "openai" to "openai_completions"
@github-actions github-actions Bot requested a review from nvzhihanj May 9, 2026 11:38
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@arekay-nv arekay-nv requested review from nv-alicheng and viraatc May 9, 2026 11:38
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new openai_completions API type and adapter to support the OpenAI /v1/completions endpoint, enabling the use of pre-tokenized input with vLLM. This change allows users to bypass server-side chat templates, ensuring parity with SGLang results for specific models like gpt-oss-120b. The implementation includes the OpenAITextCompletionsAdapter, updated configuration templates, documentation, and new unit tests. I have no feedback to provide.

@arekay-nv arekay-nv requested a review from tianmu-li May 9, 2026 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant