docs: revise BEP-1000/1016 with PEP-817-style variant properties#11404
Open
docs: revise BEP-1000/1016 with PEP-817-style variant properties#11404
Conversation
Extend both BEPs to incorporate PEP 817 wheel variant concepts for accelerator device capability matching in the scheduler: BEP-1000: - Add VariantProperty 3-tuple (namespace, feature_name, feature_value) - Add VariantFeatureDescriptor with match modes (EXACT, MINIMUM, COMPATIBLE) - Add provider-independent matching algorithm with code examples - Add DB schema for agent_device_variant_properties and namespace descriptors - Clarify integration with BEP-1047 resource slot normalization tables BEP-1016: - Add get_variant_namespace_descriptors() to plugin API - Add Agent→Manager variant property reporting via heartbeat - Add variant-aware agent selection pass in Sokovan scheduler - Add variant mismatch error message design with user-friendly examples - Add resource group variant-filtered availability API (REST + GraphQL) - Add session creation with variant_requirements - Add variant property discovery API
…in BEP-1016 Add a new section to BEP-1016 proposing consolidation of the redundant module-level PREFIX constant and class-level key attribute into a single variant_namespace() abstract classmethod on AbstractComputePlugin. - Survey all 13+ existing plugins showing PREFIX/key duplication - Define variant_namespace() as single source of truth for plugin identity - Derive key property from variant_namespace() for backward compat - Handle nvidia→cuda namespace-to-device-name mapping - Show migration examples for CUDA, mock (dynamic), and transition shim - Add corresponding problem statement in BEP-1000
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the BEP documentation for accelerator metadata (BEP-1000) and the accelerator plugin interface (BEP-1016) by adopting PEP-817-style “variant properties” and describing how agents, the manager DB, and the Sokovan scheduler would use them for capability-aware placement.
Changes:
- Add PEP-817-style
(namespace, feature_name, feature_value)variant properties and namespace descriptors to BEP-1000, including a provider-independent matching algorithm and DB schema. - Extend BEP-1016 with a
variant_namespace()concept, agent→manager heartbeat reporting of per-device variant properties/descriptors, and a variant-aware selection pass in Sokovan. - Add cross-links (“Related BEPs”) and additional API surface sketches (REST/GraphQL) around variant discovery and availability.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 12 comments.
| File | Description |
|---|---|
| proposals/BEP-1016-accelerator-interface-v2.md | Documents new plugin identity mechanism (variant_namespace), reporting pipeline (heartbeat → DB), scheduler pass, and API sketches for variant-aware scheduling and discovery. |
| proposals/BEP-1000-redefining-accelerator-metadata.md | Defines variant-property types, example descriptors, matching algorithm, and storage schema to enable scheduler-side provider-independent capability matching. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+448
to
+450
| # Prefix-based compatibility (e.g., "12" matches "12.4", "12.8") | ||
| if not any( | ||
| any(dv.startswith(rv) or rv.startswith(dv) for dv in device_values) |
Comment on lines
+461
to
+463
| # Single-value: the required value must match | ||
| if not req_values & device_values: | ||
| return False |
Comment on lines
+468
to
+473
| def _version_key(v: str) -> tuple[int, ...]: | ||
| """Parse a version-like string into a comparable tuple.""" | ||
| try: | ||
| return tuple(int(x) for x in v.split(".")) | ||
| except ValueError: | ||
| return (0,) |
Comment on lines
+217
to
+228
| backward-compat shim can be provided during the transition period: | ||
|
|
||
| ```python | ||
| # ai.backend.accelerator.cuda_open.plugin (transition period) | ||
| # ----------------------------------------------------------- | ||
| class CUDAPlugin(AbstractComputePlugin): | ||
| @classmethod | ||
| def variant_namespace(cls) -> str: | ||
| return "nvidia" | ||
|
|
||
| # Backward compat: external code that does `from ... import PREFIX` | ||
| PREFIX = CUDAPlugin.variant_namespace() |
Comment on lines
+125
to
+129
| @property | ||
| def key(self) -> DeviceName: | ||
| """DeviceName key derived from variant_namespace (backward compatible).""" | ||
| return DeviceName(self.variant_namespace()) | ||
|
|
Comment on lines
+895
to
+905
| @strawberry.type | ||
| class VariantMatchedDevices: | ||
| total_devices: int | ||
| matched_devices: int | ||
|
|
||
|
|
||
| @strawberry.type | ||
| class VariantAwareAvailabilityNode: | ||
| available_slots: JSONString | ||
| variant_matched_devices: list[VariantMatchedDevices] | ||
|
|
| # ai.backend.common.accelerator | ||
| # ------------------------------ | ||
| from collections import defaultdict | ||
| from packaging.version import Version |
Comment on lines
+119
to
+120
| - The resource slot prefix (e.g., "cuda" → "cuda.device", "cuda.shares") | ||
|
|
| ) | ||
|
|
||
| if not compatible: | ||
| raise NoCompatibleAgentError( |
Comment on lines
+605
to
+613
| for device_id, props in devices.items(): | ||
| if check_variant_compatibility( | ||
| required=[ | ||
| VariantProperty(r.namespace, r.feature_name, r.feature_value) | ||
| for r in variant_reqs | ||
| ], | ||
| device_props=props, | ||
| namespace_descriptors=agent_ns_descs, | ||
| ): |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
(namespace, feature_name, feature_value)format to enable provider-independent matching logicBEP-1000 Changes
VariantProperty,VariantFeatureDescriptor,VariantNamespaceDescriptortypes with code examplesagent_device_variant_properties,agent_variant_namespace_descriptorsresource_slot_types/agent_resourcesBEP-1016 Changes
variant_namespace()abstract classmethod: consolidate the redundant module-levelPREFIXconstant and class-levelkeyattribute into a single source of truthkeyproperty fromvariant_namespace()for backward compatibilitynvidia→cuda)get_variant_namespace_descriptors()to plugin APIvariant_requirementsparameterTest plan