fix(BA-5967): pre-validate slot types in sokovan validator chain#11515
Open
fix(BA-5967): pre-validate slot types in sokovan validator chain#11515
Conversation
fregataa
added a commit
that referenced
this pull request
May 8, 2026
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens the Sokovan SessionSpec validator chain by pre-validating slot keys against the cluster’s configured/registered known_slot_types, preventing a 4xx validation failure from being converted into a 500 when downstream rules attempt to humanize unknown slot types.
Changes:
- Added
ImageSlotTypeRuleandRequestedSlotTypeRuleto reject unknown slot keys early (or no-op whenknown_slot_typesis empty). - Wired the new rules into the scheduling controller’s validator chain ahead of
ResourceLimitRule. - Added unit tests covering pass/reject/skip behavior for both rules.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/ai/backend/manager/sokovan/scheduling_controller/validators/image_slot_type_rule.py |
New rule validating image-declared slot keys against known_slot_types. |
src/ai/backend/manager/sokovan/scheduling_controller/validators/requested_slot_type_rule.py |
New rule validating user-requested slot keys against known_slot_types. |
src/ai/backend/manager/sokovan/scheduling_controller/validators/__init__.py |
Re-exported new validator rules. |
src/ai/backend/manager/sokovan/scheduling_controller/scheduling_controller.py |
Inserted the new rules into the validator chain before ResourceLimitRule. |
tests/unit/manager/sokovan/scheduling_controller/validators/test_session_spec_rules.py |
Added test coverage for the two new validator rules. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
c14d730 to
78122d4
Compare
78122d4 to
0ae6197
Compare
jopemachine
reviewed
May 8, 2026
seedspirit
reviewed
May 8, 2026
410ebf8 to
4d766c6
Compare
Add two SessionSpec validator rules so slot keys absent from the target resource group's agent inventory are caught with a clear 4xx before they reach ResourceLimitRule, where the downstream humanizer would otherwise escalate the missing key into a 500. ImageSlotTypeRule rejects images that declare a resource_spec slot the target resource group has no agent for. RequestedSlotTypeRule rejects caller-supplied resource entries with the same property. Both rules also reject when the RG has no non-terminated agents at all. The per-RG slot inventory is sourced from the same ScalingGroupRow fetch via selectinload over agents -> agent_resource_rows, with TERMINATED agents filtered out in Python. The new bundle dataclass _ScalingGroupWithAgentResources keeps the SG row and its filtered agent list together so the inventory snapshot stays consistent with the same readonly transaction. The fetch helper raises ScalingGroupNotFound on missing RG, so the caller no longer needs a separate None check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4d766c6 to
9810ac4
Compare
b8aee32 to
9810ac4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ImageSlotTypeRuleandRequestedSlotTypeRuleto the Sokovan SessionSpec validator chain so slot keys absent from the cluster'sknown_slot_typesare caught with a clear 4xx before reachingResourceLimitRule.ResourceLimitRulecallsResourceSlot.to_humanized()to format an error message, and a missing slot key turns the original 4xx into a 500 (the bug reported in BA-5967).known_slot_typesis empty, matching the existing raw-fallback path thatResourceLimitRule._humanize_slotsuses on freshly bootstrapped clusters.Test plan
pants test tests/unit/manager/sokovan/scheduling_controller/validators/test_session_spec_rules.py— newTestImageSlotTypeRuleandTestRequestedSlotTypeRulecover known-slot pass, mismatched-slot rejection, and empty-known-slot-types skip.pants fmt/fix/lint/checkon changed files.🤖 Generated with Claude Code