Skip to content

KEP-666: Gang Scheduling in LWS#844

Open
yankay wants to merge 2 commits into
kubernetes-sigs:mainfrom
yankay:kep/666-gang-scheduling-in-lws
Open

KEP-666: Gang Scheduling in LWS#844
yankay wants to merge 2 commits into
kubernetes-sigs:mainfrom
yankay:kep/666-gang-scheduling-in-lws

Conversation

@yankay
Copy link
Copy Markdown
Member

@yankay yankay commented May 3, 2026

What type of PR is this?

/kind feature
/kind documentation

What this PR does / why we need it

Initial draft of KEP-666: Gang Scheduling in LWS, integrating the upstream Workload and PodGroup APIs (alpha, kubernetes/enhancements#5558, kubernetes/enhancements#5832) as a gang-scheduling provider for LeaderWorkerSet, alongside the existing third-party path in KEP-407.

Key design points
  • One PodGroup per replica. An LWS with replicas: N, size: M produces N PodGroups, each with MinCount: M. A single shared PodGroup with Replicas=N, MinCount=M cannot express "every replica is complete" under the alpha API, so it would not actually prevent partial-replica scheduling.
  • Lifecycle invariants (full details in the KEP): PodGroup <lws-name>-<i> is created before any pod with group-index=i (covering first creation, scale-up, and maxSurge bursts); LWS owns the Workload and steady-state PodGroups; the Workload podGroupTemplate is created once and never mutated; PodGroups are keyed by group-index and reused across revisions, with minCount updated in place when Size changes.
  • LWS-managed lifecycle only for alpha. A user-managed mode (gang.podGroupNamePrefix) was considered and dropped — see Alternatives. It can be revisited once a concrete external consumer needs it.

See also "Limitations of the alpha PodGroup API" in the KEP for why a single MinCount scalar cannot express the per-role availability needed by KEP-766 DisaggregatedSet.

Related work

Parallel efforts on the same upstream APIs (KEP-4671 + KEP-5832) in sibling projects, listed for reviewer context:

  • kubernetes-sigs/jobset#1068 — JobSet's GangConfig KEP (currently on the older alpha1 API; under design discussion).
  • kubernetes/enhancements#5548 — Job-side Workload integration KEP (deferred until upstream Workload API ships in v1.36+).
  • kubernetes-sigs/kueue — no direct KEP-4671 integration PR yet; the Escape Hatch (pre-set pod.spec.schedulingGroup) leaves room for Kueue or any future external owner to drive Workload lifecycle with zero new LWS-side API surface.

LWS targets the newer scheduling.k8s.io/v1alpha2 decoupled API; reviewer questions raised on JobSet #1068 (owner refs, Workload lifecycle, defaulting/validation, feature-gate posture) are addressed in the corresponding KEP-666 sections.

Which issue(s) this PR fixes

Fixes #666

Special notes for your reviewer

  • Design-only PR (KEP markdown); implementation will follow once the design is agreed.
  • Co-authored with @Edwinhr716.

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. kind/documentation Categorizes issue or PR as related to documentation. labels May 3, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented May 3, 2026

Deploy Preview for kubernetes-sigs-lws canceled.

Name Link
🔨 Latest commit 531a481
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-lws/deploys/6a0abb679c11040008527120

@k8s-ci-robot k8s-ci-robot requested review from ardaguclu and kerthcet May 3, 2026 13:47
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented May 3, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: yankay / name: Kay Yan (4c6d883)

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 3, 2026
@k8s-ci-robot k8s-ci-robot requested a review from Edwinhr716 May 3, 2026 13:48
@yankay yankay force-pushed the kep/666-gang-scheduling-in-lws branch from fd3f47d to 558b085 Compare May 3, 2026 13:50
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: yankay
Once this PR has been reviewed and has the lgtm label, please ask for approval from edwinhr716. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yankay yankay force-pushed the kep/666-gang-scheduling-in-lws branch from 558b085 to 0b97fb7 Compare May 3, 2026 13:50
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels May 3, 2026
@yankay yankay force-pushed the kep/666-gang-scheduling-in-lws branch 10 times, most recently from 3cf2eae to dabbcd9 Compare May 3, 2026 17:00
@Edwinhr716
Copy link
Copy Markdown
Contributor

I was working on a prototype https://github.com/Edwinhr716/lws/tree/was-poc, happy to collaborate here

kind: PodGroup
metadata:
name: leaderworkerset-sample-pg-0
ownerReferences:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on having the leader pod own the PodGroup instead? That way it gets cleaned up when LWS is autoscaled. It also simplifies the logic of managing the life cycle during maxSurge and rolling updates

Copy link
Copy Markdown
Member Author

@yankay yankay May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Edwinhr716. My one concern is forward compatibility: KEP-5832 §Risks and Mitigations plans a validating admission controller for Workload → PodGroup → Pod creation order, with UnschedulableAndUnresolvable kept only as a "last line of defense". If that lands and is on by default, pod-owned would break at admission instead of degrading.

That said, the cleanup simplification is real and you're closer to the implementation — happy to go with pod-owned, just want to note this as a known follow-up.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out, I wasn't aware that there was going to be a validating admission order.

Mmm that makes it trickier for the LWS controller to manage the lifecycle of PodGroups, and is something we need to think about in the design.

@yankay yankay force-pushed the kep/666-gang-scheduling-in-lws branch from dabbcd9 to 9fcb988 Compare May 6, 2026 07:33
@yankay
Copy link
Copy Markdown
Member Author

yankay commented May 6, 2026

Pushed an update aligning the KEP with the original PoC Google Doc:

  • Naming. Workload <lws-name>, PodGroup <lws-name>-<group-index> (= leader pod name), inner template <lws-name>-pg-template — matches the PoC and KEP-407, and follows LWS's existing per-replica naming.
  • Escape hatch. Pre-set pod.spec.schedulingGroup opts out of LWS-managed lifecycle (Job controller pattern); replaces the rejected explicit podGroupNamePrefix knob.
  • SubGroup / minCount < size. Documented as future work via hierarchical PodGroups (KEP-6012), per the PoC.
  • Admission. Trimmed to the four functionally required rules (annotation immutability, Size immutability in LWS-managed mode, gang + LeaderReady, gang + exclusive-topology) plus a UX rejection when the v1alpha2 API resources are not registered.
  • Editorial. Removed the redundant Interaction with StartupPolicy section, merged the prerequisites risks, and tightened a few wordy passages.

@Edwinhr716
Copy link
Copy Markdown
Contributor

Something we need to discuss further here is whether or not it makes sense to integrate with the PodGroup and Workload API now, or does it make more sense to wait until kubernetes/enhancements#6017 to address the limitations that I flagged here https://docs.google.com/document/d/1VqfNB1u8cmrRhMe0DKycX-bfaLgDm5cgHGWdWu-94cM/edit?tab=t.0#bookmark=kix.e0bqf1kap91e.

If the former, we also need to think about the migration from simple PodGroups to using CompositePodGroup APIs

@yankay yankay force-pushed the kep/666-gang-scheduling-in-lws branch from 551722e to 49cf8f8 Compare May 7, 2026 02:38

## Proposal

When the LWS object carries `leaderworkerset.sigs.k8s.io/gang-scheduling: "true"`, LWS creates and owns one `scheduling.k8s.io/v1alpha2` Workload (holding a gang PodGroup template) plus one standalone `PodGroup` per replica; each PodGroup's `MinCount` defaults to `LeaderWorkerTemplate.Size`, so all pods of a replica co-schedule by default. The pod webhook sets each pod's `spec.schedulingGroup.podGroupName` from its `leaderworkerset.sigs.k8s.io/group-index` label.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be any API discovery on k8s clusters for this feature?

ie check if this API is available in addition to the annotation.

Copy link
Copy Markdown
Member Author

@yankay yankay May 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — at admission the webhook resolves v1alpha2 Workload + PodGroup via a cached RESTMapper, missing → reject with an error naming the missing GVK. See the new API Discovery and Prerequisites section. Caveat: the upstream GenericWorkload gate itself can't be discovered, so it stays an install-time prereq.


## Proposal

When the LWS object carries `leaderworkerset.sigs.k8s.io/gang-scheduling: "true"`, LWS creates and owns one `scheduling.k8s.io/v1alpha2` Workload (holding a gang PodGroup template) plus one standalone `PodGroup` per replica; each PodGroup's `MinCount` defaults to `LeaderWorkerTemplate.Size`, so all pods of a replica co-schedule by default. The pod webhook sets each pod's `spec.schedulingGroup.podGroupName` from its `leaderworkerset.sigs.k8s.io/group-index` label.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we go with an API for alpha with a feature gate instead?

Annotations are a bit hacky and difficult to deprecate. Plus I'm not sure how you plan to support ResourceClaims or TopologyAwareScheduling without going with an API.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with creating an API field. I did an annotation because it was an easy way to enable the prototype, but if we want to have an actual integration I agree we should add an API field

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we go with an API for alpha with a feature gate instead?

Annotations are a bit hacky and difficult to deprecate. Plus I'm not sure how you plan to support ResourceClaims or TopologyAwareScheduling without going with an API.

HI @kannon92 Quick check — did "feature gate" mean an LWS-side gate, or upstream GenericWorkload as the de-facto guard? Latest push takes the latter: typed alpha spec.gangScheduling, no LWS gate (matches SubGroupPolicy / RolloutStrategy.MaxSurge). The LWS-side feature-gate scaffold question is tracked separately in #850 — happy to flip back if you meant the LWS-gate reading.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with creating an API field. I did an annotation because it was an easy way to enable the prototype, but if we want to have an actual integration I agree we should add an API field

Done — typed alpha spec.gangScheduling field replaces the annotation. Empty struct in alpha (presence = opt-in); future TAS / DRA / RC knobs added additively. No LWS-side feature gate — webhook API discovery against upstream GenericWorkload is the guard. Scaffold question tracked separately in #850.

- Validate [KEP-4671][kep4671] (Workload / PodGroup APIs) for multi-host inference use cases.
- Support autoscaling at the replica level.

### Non-Goals
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should maybe call out all the other features for WAS as Non-Goals .

TAS, Workload disruption, PodGroupResourceClaims.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added under Non-Goals: TAS (KEP-5732), workload-aware preemption / disruption (KEP-5710), PodGroup-shared ResourceClaims (KEP-5729). Escape Hatch is the alpha workaround.


[kep6012]: https://github.com/kubernetes/enhancements/issues/6012

### Future Work: Hierarchical Gang via CompositePodGroup
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something else to call out here. A composite gang that places the leader and the workers in separate PodGroups

CompositePodGroup serving-root
├─ PodGroup       leader         minCoint = 1 (parentRef=serving-root)
└─ PodGroup       workers        minCount = lws.size - 1 (parentRef=serving-root)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This covers the use case where the leader requests different resources from the worker, and the use case where we want to give priority to the leader when it comes to preemption

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 15ed9ce — Limitations §Per-role gang policy + Future Work §Single-LWS per-role split (per-replica CompositePodGroup tree).

Open question: any concrete LWS workload where splitting leader and workers into separate PodGroups actually helps? In vLLM head and workers all sit inside the same TP group — same GPU shape, all-or-nothing — so two leaf PodGroups give the same scheduling outcome as a single MinCount = size PodGroup. Heterogeneous resources / leader preemption priority make sense in the abstract but I can't map them to a real LWS workload yet.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any concrete LWS workload where splitting leader and workers into separate PodGroups actually helps?

Yes, take a look at axlearn for example https://github.com/apple/axlearn/blob/main/axlearn/cloud/gcp/pathways_utils.py#L1611. Their leader only requests CPU resources, while the workers request TPUs.

To guarantee that the workers all fall into the same TPU slice, while also being able to run the leader in a separate CPU nodepool, they use LeaderOnly subgroup policy + subgroup-exclusive-topology. That use case would be covered by having separate PodGroups for leader and workers

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any concrete LWS workload where splitting leader and workers into separate PodGroups actually helps?

Yes, take a look at axlearn for example https://github.com/apple/axlearn/blob/main/axlearn/cloud/gcp/pathways_utils.py#L1611. Their leader only requests CPU resources, while the workers request TPUs.

To guarantee that the workers all fall into the same TPU slice, while also being able to run the leader in a separate CPU nodepool, they use LeaderOnly subgroup policy + subgroup-exclusive-topology. That use case would be covered by having separate PodGroups for leader and workers

Thanks — pulled into the KEP as the heterogeneous-role motivating shape (CPU leader + single-accelerator-slice workers). Concrete reference: axlearn pathways_utils.py#L1611.

yankay added a commit to yankay/lws that referenced this pull request May 9, 2026
Per kubernetes-sigs#844 review (kannon92, Edwinhr716):

- Replace the gang-scheduling annotation with a typed
  spec.gangScheduling *GangSchedulingPolicy field, gated by the
  GangScheduling feature gate (off by default; empty struct = opt-in).
- Promote API discovery to a first-class "API Discovery and
  Prerequisites" design subsection.
- Split the project-wide pkg/features scaffold to kubernetes-sigs#850 as a
  prerequisite; KEP-666 only adds the GangScheduling constant.

Signed-off-by: Kay Yan <kay.yan@daocloud.io>
yankay added a commit to yankay/lws that referenced this pull request May 9, 2026
Expand the single `minCount < size` Limitations bullet into a
`Per-role gang policy` umbrella with three sub-cases (leader-first
gang, leader preemption priority, heterogeneous role minimums) and
add a matching `Single-LWS per-role split` Future Work subsection
with the per-replica CompositePodGroup tree.

Addresses Edwinhr716's review on PR kubernetes-sigs#844.

Signed-off-by: Kay Yan <kay.yan@daocloud.io>
yankay added a commit to yankay/lws that referenced this pull request May 9, 2026
Per kubernetes-sigs#844 review (kannon92, Edwinhr716):

- Replace the gang-scheduling annotation with a typed alpha
  spec.gangScheduling *GangSchedulingPolicy field (presence = opt-in;
  TAS / DRA / ResourceClaims / hierarchical knobs added additively as
  upstream stabilizes them).
- Promote API discovery to a first-class "API Discovery and
  Prerequisites" subsection.
- Drop the LWS-side feature gate; upstream GenericWorkload is the
  de-facto kill switch, propagated to admission via webhook discovery.
  pkg/features scaffold tracked separately in kubernetes-sigs#850.
- Split the single `minCount < size` Limitations bullet into a
  Per-role gang policy umbrella (leader-first, leader preemption
  priority, heterogeneous role minimums) with a matching Single-LWS
  per-role split Future Work subsection.
- Add other WAS features to Non-Goals: TAS (KEP-5732), workload-aware
  preemption (KEP-5710), PodGroup-shared ResourceClaims (KEP-5729);
  Escape Hatch is the alpha workaround.
- Trim Implementation History to milestone entries.

Signed-off-by: Kay Yan <kay.yan@daocloud.io>
@yankay yankay force-pushed the kep/666-gang-scheduling-in-lws branch 2 times, most recently from 1c59655 to e81f521 Compare May 9, 2026 07:23
Document the upstream Workload and PodGroup API integration for LWS as
a parallel path to KEP-407.

Design highlights:

- Opt-in via a typed alpha spec.gangScheduling *GangSchedulingPolicy
  field (presence = opt-in; TAS / DRA / ResourceClaims / hierarchical
  knobs added additively as upstream stabilizes them).
- spec.gangScheduling is the umbrella opt-in for both LWS-managed mode
  and the escape-hatch sub-mode; admission rejects a pre-set
  pod.spec.schedulingGroup without it, keeping spec.gangScheduling as
  the single source of truth for "this LWS uses gang scheduling".
- Lifecycle: lws_controller creates one Workload <lws-name>;
  pod_controller creates one PodGroup per replica named
  <lws-name>-<group-index> (= leader pod name).
- Escape Hatch: spec.gangScheduling set together with a pre-set
  pod.spec.schedulingGroup opts out of LWS-managed lifecycle (Job
  controller pattern), enabling external owners (e.g. Kueue,
  DisaggregatedSet) without LWS-side API surface.
- Admission rules: reject mutation of spec.gangScheduling, gang +
  LeaderReady, gang + exclusive-topology, Size mutation in LWS-managed
  mode, gang when v1alpha2 API resources are not registered, and a
  pre-set pod.spec.schedulingGroup without spec.gangScheduling.
- API discovery against upstream GenericWorkload at admission, with
  cached RESTMapper invalidation on NoMatchError so installing the API
  takes effect on the next admission without an LWS restart.
- No LWS-side feature gate; upstream GenericWorkload is the de-facto
  kill switch (project-wide pkg/features scaffold tracked separately
  in kubernetes-sigs#850).
- Per-role gang policy split (leader-first, leader preemption
  priority, heterogeneous role minimums) documented as Limitations of
  alpha and Single-LWS per-role split Future Work via KEP-6012.
- Forward-compat sketch for KEP-6012 (CompositePodGroup): cross-LWS
  gangs (e.g. DisaggregatedSet) layer on the escape hatch via an
  external Workload owner; concrete tree shape owned by KEP-766.
- Other WAS features as Non-Goals: TAS (KEP-5732), workload-aware
  preemption (KEP-5710), PodGroup-shared ResourceClaims (KEP-5729);
  Escape Hatch is the alpha workaround.

Signed-off-by: Kay Yan <kay.yan@daocloud.io>
type GangSchedulingPolicy struct{}
```

No LWS-side feature gate: the upstream `GenericWorkload` gate already controls whether `kube-apiserver` preserves `pod.spec.schedulingGroup`, and the webhook's [API discovery](#api-discovery-and-prerequisites) propagates that into LWS admission. Matches how LWS handles other typed alpha fields (`SubGroupPolicy`, `RolloutStrategy.MaxSurge`); a project-wide `pkg/features` scaffold, if ever needed, is tracked in [#850][lws-feature-gate-issue].
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So once GenericWorkload FG is promoted to stable in Kubernetes, it will be GA'ed here too?.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — LWS GangScheduling tracks the upstream GenericWorkload lifecycle: alpha → beta (default-on, aligned with upstream beta) → removed at GA. See updated §Graduation Criteria.

// upstream Workload / PodGroup APIs (one PodGroup per replica,
// MinCount = LeaderWorkerTemplate.Size). Alpha; subject to change.
// +optional
GangScheduling *GangSchedulingPolicy `json:"gangScheduling,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field can not be set if feature gate is not enabled on the cluster?.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if the LWS (including this feature) will be deployed onto old Kubernetes cluster?. As far as I understand, this is backwards compatible, right?.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if the LWS (including this feature) will be deployed onto old Kubernetes cluster?. As far as I understand, this is backwards compatible, right?.

Yes — added an explicit Backwards Compatibility covering the three cases:

  • Field unset → zero behavior change.
  • Field set, v1alpha2 APIs missing → admission rejects with the missing GVK named; no half-created objects.
  • APIs registered but GenericWorkload gate off → install-time prerequisite, not a runtime failure.

Copy link
Copy Markdown
Member Author

@yankay yankay May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field can not be set if feature gate is not enabled on the cluster?.

Yes — spec.gangScheduling is an LWS field, but the LWS validating webhook rejects it at admission if the upstream v1alpha2 Workload/PodGroup GVKs aren't registered (see API Discovery).

Rejected: `MinCount` only requires M co-scheduled pods, with no notion of which replica they belong to — the scheduler may legally pick M pods from different replicas, none complete, and the model still cannot start. Per-replica PodGroups make each replica an independent all-or-nothing unit.

**Rely on [KEP-407][kep407] only**.
KEP-407 targets third-party schedulers (Volcano / coscheduling / YuniKorn) via their own PodGroup CRDs; this KEP targets the upstream-native `scheduling.k8s.io/v1alpha2` Workload and PodGroup APIs. The two evolve independently — different prerequisites, different API surfaces, no shared data path — and a single LWS object opts into at most one.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As described above, we have added Volcano based gang-scheduling already. Users are expected to use one of them (Kubernetes gang-scheduling, Volcano)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — pushed an update so they're not mutually exclusive. New Unified Provider Model makes spec.gangScheduling the single opt-in across all gang backends (this KEP + KEP-407's Volcano; KAI / coscheduling possible later), following Grove's Backend framework.

- Unified Provider Model: spec.gangScheduling as the single opt-in
  across all gang backends; upstream v1alpha2 schema is the reference,
  third-party backends honor a subset. Backend chosen via KEP-407's
  existing --gang-scheduler-provider flag.
- Backwards Compatibility section: covers field-unset, gate-off, and
  v1alpha2-APIs-missing cases.
- Adopt LWS GangScheduling feature gate (alpha=false -> beta=true ->
  removed at GA) tracking upstream GenericWorkload lifecycle, after the
  pkg/features scaffold landed. Reverses the earlier no-LWS-gate
  decision.

Signed-off-by: Kay Yan <kay.yan@daocloud.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] Support kubernetes gang scheduling as a pod group provider

5 participants