fix: update context size precedence to prioritize backend configuration over model configuration#847
Merged
ilopezluna merged 4 commits intomainfrom Apr 8, 2026
Merged
Conversation
…on over model configuration
Contributor
There was a problem hiding this comment.
Code Review
This pull request updates context size precedence across backends to prioritize runtime configuration over packaging defaults. While the logic is correctly implemented in most areas, the MLX backend remains incomplete as GetMaxTokens still returns nil. Additionally, the vLLM implementation requires a stricter validation check to ensure the model's context size is positive. The test suites have been appropriately updated to reflect these changes, providing good verification for the new precedence rules.
Contributor
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The compose CLI tests for context-size duplicate logic from RunE (including the int32 range check), which may get out of sync over time; consider extracting that validation into a small helper function used by both the command and the tests.
- In the compose up code, BackendConfiguration is always created even when the context-size flag is not set; if you want to clearly distinguish 'no runtime override' cases, you might consider only instantiating or sending a non-empty BackendConfiguration when at least one override (e.g., ContextSize or Speculative) is present.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The compose CLI tests for context-size duplicate logic from RunE (including the int32 range check), which may get out of sync over time; consider extracting that validation into a small helper function used by both the command and the tests.
- In the compose up code, BackendConfiguration is always created even when the context-size flag is not set; if you want to clearly distinguish 'no runtime override' cases, you might consider only instantiating or sending a non-empty BackendConfiguration when at least one override (e.g., ContextSize or Speculative) is present.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
krissetto
reviewed
Apr 8, 2026
…ze over model configuration in mlx also added goleak detector in mlx package
krissetto
approved these changes
Apr 8, 2026
doringeman
approved these changes
Apr 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request updates the logic to ensure that the backend configuration for context size (or equivalent parameter) always takes precedence over the model configuration. This change standardizes runtime configuration behavior and updates both the implementation and associated tests to reflect this new precedence order.
Backend context size precedence updates:
Changed the logic in
llamacpp,sglang,vllm, andmlxbackend config files so that the backend config value for context size/max tokens is used if present, falling back to the model config only if the backend config is unset. This affects functions such asGetContextSize,GetContextLength, andGetMaxModelLen.Updated function documentation in the above files to clarify that backend config takes precedence over model config for context size parameters.
Test updates for new precedence:
llamacpp_config_test.go,sglang_config_test.go, andvllm_config_test.goto verify that backend config takes precedence over model config. Added new test cases to ensure fallback to model config when backend config is not set.