Skip to content

docs: revise BEP-1000/1016 with PEP-817-style variant properties#11404

Open
achimnol wants to merge 2 commits intomainfrom
docs/bep-1000-1016-revision
Open

docs: revise BEP-1000/1016 with PEP-817-style variant properties#11404
achimnol wants to merge 2 commits intomainfrom
docs/bep-1000-1016-revision

Conversation

@achimnol
Copy link
Copy Markdown
Member

@achimnol achimnol commented Apr 29, 2026

Summary

  • Apply PEP 817 (Wheel Variant Support) concepts to BEP-1000 and BEP-1016, designing a variant-property-based matching system for accelerator devices
  • Adopt PEP 817's 3-tuple (namespace, feature_name, feature_value) format to enable provider-independent matching logic
  • Define integration points with BEP-1047 (Resource Slot DB Normalization)

BEP-1000 Changes

  • Define VariantProperty, VariantFeatureDescriptor, VariantNamespaceDescriptor types with code examples
  • Three match modes: EXACT, MINIMUM (minimum version), COMPATIBLE (prefix-based)
  • Concrete variant property declaration examples for NVIDIA CUDA and AMD ROCm plugins
  • Provider-independent matching algorithm (conjunctive AND + disjunctive OR)
  • DB schema: agent_device_variant_properties, agent_variant_namespace_descriptors
  • Clarify complementary relationship with BEP-1047's resource_slot_types/agent_resources
  • Document the PREFIX/key duplication problem in the Problems section

BEP-1016 Changes

  • Introduce variant_namespace() abstract classmethod: consolidate the redundant module-level PREFIX constant and class-level key attribute into a single source of truth
    • Survey table covering 13+ plugins with their current PREFIX/key values
    • Derive key property from variant_namespace() for backward compatibility
    • Handle namespace-to-device-name mapping (e.g., nvidiacuda)
    • Migration examples for CUDA, mock (dynamic key), and transition shim
  • Add get_variant_namespace_descriptors() to plugin API
  • Agent→Manager variant property reporting via heartbeat extension
  • Insert variant compatibility pass into Sokovan scheduler (between existing architecture and resource passes)
  • User-friendly error messages when resource slots are available but no device matches variant requirements
  • Resource group variant-filtered availability API (REST + GraphQL)
  • Session creation with variant_requirements parameter
  • Variant property discovery API for clients

Test plan

  • Review BEP document content (code example consistency, type coherence)
  • Verify alignment with PEP 817 specification
  • Review BEP-1047 integration points
  • Validate backward compatibility of variant_namespace() migration path
  • Review Sokovan scheduler extension design against existing codebase

Extend both BEPs to incorporate PEP 817 wheel variant concepts for
accelerator device capability matching in the scheduler:

BEP-1000:
- Add VariantProperty 3-tuple (namespace, feature_name, feature_value)
- Add VariantFeatureDescriptor with match modes (EXACT, MINIMUM, COMPATIBLE)
- Add provider-independent matching algorithm with code examples
- Add DB schema for agent_device_variant_properties and namespace descriptors
- Clarify integration with BEP-1047 resource slot normalization tables

BEP-1016:
- Add get_variant_namespace_descriptors() to plugin API
- Add Agent→Manager variant property reporting via heartbeat
- Add variant-aware agent selection pass in Sokovan scheduler
- Add variant mismatch error message design with user-friendly examples
- Add resource group variant-filtered availability API (REST + GraphQL)
- Add session creation with variant_requirements
- Add variant property discovery API
@github-actions github-actions Bot added the size:XL 500~ LoC label Apr 29, 2026
…in BEP-1016

Add a new section to BEP-1016 proposing consolidation of the redundant
module-level PREFIX constant and class-level key attribute into a single
variant_namespace() abstract classmethod on AbstractComputePlugin.

- Survey all 13+ existing plugins showing PREFIX/key duplication
- Define variant_namespace() as single source of truth for plugin identity
- Derive key property from variant_namespace() for backward compat
- Handle nvidia→cuda namespace-to-device-name mapping
- Show migration examples for CUDA, mock (dynamic), and transition shim
- Add corresponding problem statement in BEP-1000
@achimnol achimnol added the skip:changelog Make the action workflow to skip towncrier check label Apr 29, 2026
@achimnol achimnol marked this pull request as ready for review May 2, 2026 07:27
Copilot AI review requested due to automatic review settings May 2, 2026 07:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the BEP documentation for accelerator metadata (BEP-1000) and the accelerator plugin interface (BEP-1016) by adopting PEP-817-style “variant properties” and describing how agents, the manager DB, and the Sokovan scheduler would use them for capability-aware placement.

Changes:

  • Add PEP-817-style (namespace, feature_name, feature_value) variant properties and namespace descriptors to BEP-1000, including a provider-independent matching algorithm and DB schema.
  • Extend BEP-1016 with a variant_namespace() concept, agent→manager heartbeat reporting of per-device variant properties/descriptors, and a variant-aware selection pass in Sokovan.
  • Add cross-links (“Related BEPs”) and additional API surface sketches (REST/GraphQL) around variant discovery and availability.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 12 comments.

File Description
proposals/BEP-1016-accelerator-interface-v2.md Documents new plugin identity mechanism (variant_namespace), reporting pipeline (heartbeat → DB), scheduler pass, and API sketches for variant-aware scheduling and discovery.
proposals/BEP-1000-redefining-accelerator-metadata.md Defines variant-property types, example descriptors, matching algorithm, and storage schema to enable scheduler-side provider-independent capability matching.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +448 to +450
# Prefix-based compatibility (e.g., "12" matches "12.4", "12.8")
if not any(
any(dv.startswith(rv) or rv.startswith(dv) for dv in device_values)
Comment on lines +461 to +463
# Single-value: the required value must match
if not req_values & device_values:
return False
Comment on lines +468 to +473
def _version_key(v: str) -> tuple[int, ...]:
"""Parse a version-like string into a comparable tuple."""
try:
return tuple(int(x) for x in v.split("."))
except ValueError:
return (0,)
Comment on lines +217 to +228
backward-compat shim can be provided during the transition period:

```python
# ai.backend.accelerator.cuda_open.plugin (transition period)
# -----------------------------------------------------------
class CUDAPlugin(AbstractComputePlugin):
@classmethod
def variant_namespace(cls) -> str:
return "nvidia"

# Backward compat: external code that does `from ... import PREFIX`
PREFIX = CUDAPlugin.variant_namespace()
Comment on lines +125 to +129
@property
def key(self) -> DeviceName:
"""DeviceName key derived from variant_namespace (backward compatible)."""
return DeviceName(self.variant_namespace())

Comment on lines +895 to +905
@strawberry.type
class VariantMatchedDevices:
total_devices: int
matched_devices: int


@strawberry.type
class VariantAwareAvailabilityNode:
available_slots: JSONString
variant_matched_devices: list[VariantMatchedDevices]

# ai.backend.common.accelerator
# ------------------------------
from collections import defaultdict
from packaging.version import Version
Comment on lines +119 to +120
- The resource slot prefix (e.g., "cuda" → "cuda.device", "cuda.shares")

)

if not compatible:
raise NoCompatibleAgentError(
Comment on lines +605 to +613
for device_id, props in devices.items():
if check_variant_compatibility(
required=[
VariantProperty(r.namespace, r.feature_name, r.feature_value)
for r in variant_reqs
],
device_props=props,
namespace_descriptors=agent_ns_descs,
):
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL 500~ LoC skip:changelog Make the action workflow to skip towncrier check

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants