fix: update context size precedence to prioritize backend configuration over model configuration by ilopezluna · Pull Request #847 · docker/model-runner

ilopezluna · 2026-04-08T10:45:26Z

This pull request updates the logic to ensure that the backend configuration for context size (or equivalent parameter) always takes precedence over the model configuration. This change standardizes runtime configuration behavior and updates both the implementation and associated tests to reflect this new precedence order.

Backend context size precedence updates:

Changed the logic in llamacpp, sglang, vllm, and mlx backend config files so that the backend config value for context size/max tokens is used if present, falling back to the model config only if the backend config is unset. This affects functions such as GetContextSize, GetContextLength, and GetMaxModelLen.
Updated function documentation in the above files to clarify that backend config takes precedence over model config for context size parameters.

Test updates for new precedence:

Modified and added tests in llamacpp_config_test.go, sglang_config_test.go, and vllm_config_test.go to verify that backend config takes precedence over model config. Added new test cases to ensure fallback to model config when backend config is not set.

…on over model configuration

gemini-code-assist

Code Review

This pull request updates context size precedence across backends to prioritize runtime configuration over packaging defaults. While the logic is correctly implemented in most areas, the MLX backend remains incomplete as GetMaxTokens still returns nil. Additionally, the vLLM implementation requires a stricter validation check to ensure the model's context size is positive. The test suites have been appropriately updated to reflect these changes, providing good verification for the new precedence rules.

pkg/inference/backends/mlx/mlx_config.go

pkg/inference/backends/vllm/vllm_config.go

…using it

…dling

sourcery-ai

Hey - I've left some high level feedback:

The compose CLI tests for context-size duplicate logic from RunE (including the int32 range check), which may get out of sync over time; consider extracting that validation into a small helper function used by both the command and the tests.
In the compose up code, BackendConfiguration is always created even when the context-size flag is not set; if you want to clearly distinguish 'no runtime override' cases, you might consider only instantiating or sending a non-empty BackendConfiguration when at least one override (e.g., ContextSize or Speculative) is present.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The compose CLI tests for context-size duplicate logic from RunE (including the int32 range check), which may get out of sync over time; consider extracting that validation into a small helper function used by both the command and the tests.
- In the compose up code, BackendConfiguration is always created even when the context-size flag is not set; if you want to clearly distinguish 'no runtime override' cases, you might consider only instantiating or sending a non-empty BackendConfiguration when at least one override (e.g., ContextSize or Speculative) is present.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

pkg/inference/backends/mlx/mlx_config.go

…ze over model configuration in mlx also added goleak detector in mlx package

fix: update context size precedence to prioritize backend configurati…

24a1710

…on over model configuration

gemini-code-assist bot reviewed Apr 8, 2026

View reviewed changes

pkg/inference/backends/mlx/mlx_config.go Show resolved Hide resolved

pkg/inference/backends/vllm/vllm_config.go Outdated Show resolved Hide resolved

fix: ensure context size from model configuration is positive before …

8499b3d

…using it

ilopezluna requested a review from a team April 8, 2026 10:56

fix: validate context size range and update backend configuration han…

ca846ed

…dling

ilopezluna marked this pull request as ready for review April 8, 2026 11:12

sourcery-ai bot reviewed Apr 8, 2026

View reviewed changes

krissetto reviewed Apr 8, 2026

View reviewed changes

pkg/inference/backends/mlx/mlx_config.go Show resolved Hide resolved

fix: implement GetMaxTokens function to prioritize backend context si…

686ead0

…ze over model configuration in mlx also added goleak detector in mlx package

ilopezluna requested a review from krissetto April 8, 2026 11:25

krissetto approved these changes Apr 8, 2026

View reviewed changes

krissetto mentioned this pull request Apr 8, 2026

Fix context size configuration precedence #846

Closed

doringeman approved these changes Apr 8, 2026

View reviewed changes

ilopezluna merged commit b48efb3 into main Apr 8, 2026
16 of 17 checks passed

ilopezluna deleted the fix/context-size-priority branch April 8, 2026 13:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: update context size precedence to prioritize backend configuration over model configuration#847

fix: update context size precedence to prioritize backend configuration over model configuration#847
ilopezluna merged 4 commits intomainfrom
fix/context-size-priority

ilopezluna commented Apr 8, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ilopezluna commented Apr 8, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants