Skip to content

fix: update context size precedence to prioritize backend configuration over model configuration#847

Merged
ilopezluna merged 4 commits intomainfrom
fix/context-size-priority
Apr 8, 2026
Merged

fix: update context size precedence to prioritize backend configuration over model configuration#847
ilopezluna merged 4 commits intomainfrom
fix/context-size-priority

Conversation

@ilopezluna
Copy link
Copy Markdown
Contributor

This pull request updates the logic to ensure that the backend configuration for context size (or equivalent parameter) always takes precedence over the model configuration. This change standardizes runtime configuration behavior and updates both the implementation and associated tests to reflect this new precedence order.

Backend context size precedence updates:

  • Changed the logic in llamacpp, sglang, vllm, and mlx backend config files so that the backend config value for context size/max tokens is used if present, falling back to the model config only if the backend config is unset. This affects functions such as GetContextSize, GetContextLength, and GetMaxModelLen.

  • Updated function documentation in the above files to clarify that backend config takes precedence over model config for context size parameters.

Test updates for new precedence:

  • Modified and added tests in llamacpp_config_test.go, sglang_config_test.go, and vllm_config_test.go to verify that backend config takes precedence over model config. Added new test cases to ensure fallback to model config when backend config is not set.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates context size precedence across backends to prioritize runtime configuration over packaging defaults. While the logic is correctly implemented in most areas, the MLX backend remains incomplete as GetMaxTokens still returns nil. Additionally, the vLLM implementation requires a stricter validation check to ensure the model's context size is positive. The test suites have been appropriately updated to reflect these changes, providing good verification for the new precedence rules.

@ilopezluna ilopezluna requested a review from a team April 8, 2026 10:56
@ilopezluna ilopezluna marked this pull request as ready for review April 8, 2026 11:12
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The compose CLI tests for context-size duplicate logic from RunE (including the int32 range check), which may get out of sync over time; consider extracting that validation into a small helper function used by both the command and the tests.
  • In the compose up code, BackendConfiguration is always created even when the context-size flag is not set; if you want to clearly distinguish 'no runtime override' cases, you might consider only instantiating or sending a non-empty BackendConfiguration when at least one override (e.g., ContextSize or Speculative) is present.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The compose CLI tests for context-size duplicate logic from RunE (including the int32 range check), which may get out of sync over time; consider extracting that validation into a small helper function used by both the command and the tests.
- In the compose up code, BackendConfiguration is always created even when the context-size flag is not set; if you want to clearly distinguish 'no runtime override' cases, you might consider only instantiating or sending a non-empty BackendConfiguration when at least one override (e.g., ContextSize or Speculative) is present.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

…ze over model configuration in mlx also added goleak detector in mlx package
@ilopezluna ilopezluna requested a review from krissetto April 8, 2026 11:25
@ilopezluna ilopezluna merged commit b48efb3 into main Apr 8, 2026
16 of 17 checks passed
@ilopezluna ilopezluna deleted the fix/context-size-priority branch April 8, 2026 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants