Skip to content

Commit 04a4728

Browse files
authored
fix: use pipelines for CR scheduling requests that dont write history (#881)
1 parent eb30d85 commit 04a4728

3 files changed

Lines changed: 301 additions & 6 deletions

File tree

helm/bundles/cortex-nova/templates/pipelines_kvm.yaml

Lines changed: 297 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -632,4 +632,301 @@ spec:
632632
description: |
633633
Excludes hosts that are not ready or are disabled.
634634
weighers: []
635+
---
636+
apiVersion: cortex.cloud/v1alpha1
637+
kind: Pipeline
638+
metadata:
639+
name: kvm-general-purpose-load-balancing-no-history
640+
spec:
641+
schedulingDomain: nova
642+
description: |
643+
Variant of kvm-general-purpose-load-balancing used for committed-resource
644+
reservation scheduling and capacity probes. Identical filter/weigher chain
645+
but does not write placement history, to avoid polluting history with
646+
internal CR controller calls.
647+
type: filter-weigher
648+
ignorePreselection: true
649+
createHistory: false
650+
filters:
651+
- name: filter_correct_az
652+
description: |
653+
This step will filter out hosts whose aggregate information indicates they
654+
are not placed in the requested availability zone.
655+
- name: filter_host_instructions
656+
description: |
657+
This step will consider the `ignore_hosts` and `force_hosts` instructions
658+
from the nova scheduler request spec to filter out or exclusively allow
659+
certain hosts.
660+
- name: filter_status_conditions
661+
description: |
662+
This step will filter out hosts for which the hypervisor status conditions
663+
do not meet the expected values, for example, that the hypervisor is ready
664+
and not disabled.
665+
- name: filter_capabilities
666+
description: |
667+
This step will filter out hosts that do not meet the compute capabilities
668+
requested by the nova flavor extra specs, like `{"arch": "x86_64",
669+
"maxphysaddr:bits": 46, ...}`.
670+
671+
Note: currently, advanced boolean/numeric operators for the capabilities
672+
like `>`, `!`, ... are not supported because they are not used by any of our
673+
flavors in production.
674+
- name: filter_has_requested_traits
675+
description: |
676+
This step filters hosts that do not have the requested traits given by the
677+
nova flavor extra spec: "trait:<trait>": "forbidden" means the host must
678+
not have the specified trait. "trait:<trait>": "required" means the host
679+
must have the specified trait.
680+
- name: filter_external_customer
681+
description: |
682+
This step prefix-matches the domain name for external customer domains and
683+
filters out hosts that are not intended for external customers. It considers
684+
the `CUSTOM_EXTERNAL_CUSTOMER_EXCLUSIVE` trait on hosts as well as the
685+
`domain_name` scheduler hint from the nova request spec.
686+
params:
687+
- {key: domainNamePrefixes, stringListValue: ["iaas-"]}
688+
- name: filter_has_accelerators
689+
description: |
690+
This step will filter out hosts without the trait `COMPUTE_ACCELERATORS` if
691+
the nova flavor extra specs request accelerators via "accel:device_profile".
692+
- name: filter_instance_group_affinity
693+
description: |
694+
This step selects hosts in the instance group specified in the nova
695+
scheduler request spec.
696+
- name: filter_instance_group_anti_affinity
697+
description: |
698+
This step selects hosts not in the instance group specified in the nova
699+
scheduler request spec, but only until the max_server_per_host limit is
700+
reached (default = 1).
701+
- name: filter_has_enough_capacity
702+
description: |
703+
This step will filter out hosts that do not have enough available capacity
704+
to host the requested flavor. If enabled, this step will subtract the
705+
current reservations residing on this host from the available capacity.
706+
params:
707+
- {key: lockReserved, boolValue: false}
708+
- name: filter_allowed_projects
709+
description: |
710+
This step filters hosts based on allowed projects defined in the
711+
hypervisor resource. Note that hosts allowing all projects are still
712+
accessible and will not be filtered out. In this way some hypervisors
713+
are made accessible to some projects only.
714+
- name: filter_aggregate_metadata
715+
description: |
716+
This step filters hosts based on metadata defined in their aggregates. For
717+
example, if an aggregate has the metadata "filter_tenant_id": "<project_id>",
718+
only hosts in that aggregate that match the project ID in the nova request
719+
will pass this filter.
720+
- name: filter_live_migratable
721+
description: |
722+
This step ensures that the target host of a live migration can accept
723+
the migrating VM, by checking cpu architecture, cpu features, emulated
724+
devices, and cpu modes.
725+
- name: filter_requested_destination
726+
description: |
727+
This step filters hosts based on the `requested_destination` instruction
728+
from the nova scheduler request spec. It supports filtering by host and
729+
by aggregates. Aggregates use AND logic between list elements, with
730+
comma-separated UUIDs within an element using OR logic.
731+
- name: filter_quota_enforcement
732+
description: |
733+
This step enforces project quota by checking whether the request has
734+
headroom under the project's committed resources or pay-as-you-go quota.
735+
If a matching CommittedResource has unused capacity, the request is accepted.
736+
Otherwise, PAYG headroom is checked for ram, cores, and instances.
737+
Rejects all hosts if neither tier has headroom.
738+
When dryRun is true the filter runs in shadow mode: it logs and emits
739+
the cortex_nova_filter_quota_enforcement_decisions_total metric for
740+
would-be rejects but never actually removes hosts.
741+
params:
742+
- {key: dryRun, boolValue: true}
743+
weighers:
744+
- name: kvm_prefer_smaller_hosts
745+
params:
746+
- {key: resourceWeights, floatMapValue: {"memory": 1.0}}
747+
description: |
748+
This step pulls virtual machines onto smaller hosts (by capacity). This
749+
ensures that larger hosts are not overly fragmented with small VMs,
750+
and can still accommodate larger VMs when they need to be scheduled.
751+
- name: kvm_instance_group_soft_affinity
752+
description: |
753+
This weigher implements the "soft affinity" and "soft anti-affinity" policy
754+
for instance groups in nova.
755+
756+
It assigns a weight to each host based on how many instances of the same
757+
instance group are already running on that host. The more instances of the
758+
same group on a host, the lower (for soft-anti-affinity) or higher
759+
(for soft-affinity) the weight, which makes it less likely or more likely,
760+
respectively, for the scheduler to choose that host for new instances of
761+
the same group.
762+
- name: kvm_binpack
763+
multiplier: -1.0 # inverted = balancing
764+
params:
765+
- {key: resourceWeights, floatMapValue: {"memory": 1.0}}
766+
description: |
767+
This step implements a balancing weigher for workloads on kvm hypervisors,
768+
which is the opposite of binpacking. Instead of pulling the requested vm
769+
into the smallest gaps possible, it spreads the load to ensure
770+
workloads are balanced across hosts. In this pipeline, the balancing will
771+
focus on general purpose virtual machines.
772+
- name: kvm_failover_evacuation
773+
description: |
774+
This weigher prefers hosts with active failover reservations during
775+
evacuation requests. Hosts matching a failover reservation where the
776+
VM is allocated get a higher weight, encouraging placement on
777+
pre-reserved failover capacity. For non-evacuation requests, this
778+
weigher has no effect.
779+
- name: kvm_committed_resource_reservation
780+
description: |
781+
This weigher boosts hosts that have a ready CommittedResourceReservation
782+
matching the request's project, resource group, and availability zone,
783+
with enough free memory capacity for the requested VM. Hosts without a
784+
matching reservation or without enough free capacity receive a lower weight.
785+
---
786+
apiVersion: cortex.cloud/v1alpha1
787+
kind: Pipeline
788+
metadata:
789+
name: kvm-hana-bin-packing-no-history
790+
spec:
791+
schedulingDomain: nova
792+
description: |
793+
Variant of kvm-hana-bin-packing used for committed-resource reservation
794+
scheduling and capacity probes. Identical filter/weigher chain but does not
795+
write placement history, to avoid polluting history with internal CR
796+
controller calls.
797+
type: filter-weigher
798+
ignorePreselection: true
799+
createHistory: false
800+
filters:
801+
- name: filter_correct_az
802+
description: |
803+
This step will filter out hosts whose aggregate information indicates they
804+
are not placed in the requested availability zone.
805+
- name: filter_host_instructions
806+
description: |
807+
This step will consider the `ignore_hosts` and `force_hosts` instructions
808+
from the nova scheduler request spec to filter out or exclusively allow
809+
certain hosts.
810+
- name: filter_status_conditions
811+
description: |
812+
This step will filter out hosts for which the hypervisor status conditions
813+
do not meet the expected values, for example, that the hypervisor is ready
814+
and not disabled.
815+
- name: filter_capabilities
816+
description: |
817+
This step will filter out hosts that do not meet the compute capabilities
818+
requested by the nova flavor extra specs, like `{"arch": "x86_64",
819+
"maxphysaddr:bits": 46, ...}`.
820+
821+
Note: currently, advanced boolean/numeric operators for the capabilities
822+
like `>`, `!`, ... are not supported because they are not used by any of our
823+
flavors in production.
824+
- name: filter_has_requested_traits
825+
description: |
826+
This step filters hosts that do not have the requested traits given by the
827+
nova flavor extra spec: "trait:<trait>": "forbidden" means the host must
828+
not have the specified trait. "trait:<trait>": "required" means the host
829+
must have the specified trait.
830+
- name: filter_external_customer
831+
description: |
832+
This step prefix-matches the domain name for external customer domains and
833+
filters out hosts that are not intended for external customers. It considers
834+
the `CUSTOM_EXTERNAL_CUSTOMER_EXCLUSIVE` trait on hosts as well as the
835+
`domain_name` scheduler hint from the nova request spec.
836+
params:
837+
- {key: domainNamePrefixes, stringListValue: ["iaas-"]}
838+
- name: filter_has_accelerators
839+
description: |
840+
This step will filter out hosts without the trait `COMPUTE_ACCELERATORS` if
841+
the nova flavor extra specs request accelerators via "accel:device_profile".
842+
- name: filter_instance_group_affinity
843+
description: |
844+
This step selects hosts in the instance group specified in the nova
845+
scheduler request spec.
846+
- name: filter_instance_group_anti_affinity
847+
description: |
848+
This step selects hosts not in the instance group specified in the nova
849+
scheduler request spec, but only until the max_server_per_host limit is
850+
reached (default = 1).
851+
- name: filter_has_enough_capacity
852+
description: |
853+
This step will filter out hosts that do not have enough available capacity
854+
to host the requested flavor. If enabled, this step will subtract the
855+
current reservations residing on this host from the available capacity.
856+
params:
857+
- {key: lockReserved, boolValue: false}
858+
- name: filter_allowed_projects
859+
description: |
860+
This step filters hosts based on allowed projects defined in the
861+
hypervisor resource. Note that hosts allowing all projects are still
862+
accessible and will not be filtered out. In this way some hypervisors
863+
are made accessible to some projects only.
864+
- name: filter_aggregate_metadata
865+
description: |
866+
This step filters hosts based on metadata defined in their aggregates. For
867+
example, if an aggregate has the metadata "filter_tenant_id": "<project_id>",
868+
only hosts in that aggregate that match the project ID in the nova request
869+
will pass this filter.
870+
- name: filter_live_migratable
871+
description: |
872+
This step ensures that the target host of a live migration can accept
873+
the migrating VM, by checking cpu architecture, cpu features, emulated
874+
devices, and cpu modes.
875+
- name: filter_requested_destination
876+
description: |
877+
This step filters hosts based on the `requested_destination` instruction
878+
from the nova scheduler request spec. It supports filtering by host and
879+
by aggregates. Aggregates use AND logic between list elements, with
880+
comma-separated UUIDs within an element using OR logic.
881+
- name: filter_quota_enforcement
882+
description: |
883+
This step enforces project quota by checking whether the request has
884+
headroom under the project's committed resources or pay-as-you-go quota.
885+
If a matching CommittedResource has unused capacity, the request is accepted.
886+
Otherwise, PAYG headroom is checked for ram, cores, and instances.
887+
Rejects all hosts if neither tier has headroom.
888+
When dryRun is true the filter runs in shadow mode: it logs and emits
889+
the cortex_nova_filter_quota_enforcement_decisions_total metric for
890+
would-be rejects but never actually removes hosts.
891+
params:
892+
- {key: dryRun, boolValue: true}
893+
weighers:
894+
- name: kvm_prefer_smaller_hosts
895+
params:
896+
- {key: resourceWeights, floatMapValue: {"memory": 1.0}}
897+
description: |
898+
This step pulls virtual machines onto smaller hosts (by capacity). This
899+
ensures that larger hosts are not overly fragmented with small VMs,
900+
and can still accommodate larger VMs when they need to be scheduled.
901+
- name: kvm_instance_group_soft_affinity
902+
description: |
903+
This weigher implements the "soft affinity" and "soft anti-affinity" policy
904+
for instance groups in nova.
905+
It assigns a weight to each host based on how many instances of the same
906+
instance group are already running on that host. The more instances of the
907+
same group on a host, the lower (for soft-anti-affinity) or higher
908+
(for soft-affinity) the weight, which makes it less likely or more likely,
909+
respectively, for the scheduler to choose that host for new instances of
910+
the same group.
911+
- name: kvm_binpack
912+
params:
913+
- {key: resourceWeights, floatMapValue: {"memory": 1.0}}
914+
description: |
915+
This step implements a binpacking weigher for workloads on kvm hypervisors.
916+
It pulls the requested vm into the smallest gaps possible, to ensure
917+
other hosts with less allocation stay free for bigger vms.
918+
In this pipeline, the binpacking will focus on hana virtual machines.
919+
- name: kvm_failover_evacuation
920+
description: |
921+
This weigher prefers hosts with active failover reservations during
922+
evacuation requests. Hosts matching a failover reservation where the
923+
VM is allocated get a higher weight, encouraging placement on
924+
pre-reserved failover capacity. For non-evacuation requests, this
925+
weigher has no effect.
926+
- name: kvm_committed_resource_reservation
927+
description: |
928+
This weigher boosts hosts that have a ready CommittedResourceReservation
929+
matching the request's project, resource group, and availability zone,
930+
with enough free memory capacity for the requested VM. Hosts without a
931+
matching reservation or without enough free capacity receive a lower weight.
635932
{{- end }}

helm/bundles/cortex-nova/values.yaml

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ cortex-scheduling-controllers:
139139
# Pipeline used for the empty-state capacity probe (ignores allocations and reservations).
140140
capacityTotalPipeline: "kvm-report-capacity"
141141
# Pipeline used for the current-state capacity probe (considers current VM allocations).
142-
capacityPlaceablePipeline: "kvm-general-purpose-load-balancing"
142+
capacityPlaceablePipeline: "kvm-general-purpose-load-balancing-no-history"
143143
# How often the capacity controller re-runs its scheduler probes.
144144
capacityReconcileInterval: 5m
145145
enabledTasks:
@@ -154,11 +154,9 @@ cortex-scheduling-controllers:
154154
committedResourceReservationController:
155155
# Maps flavor group IDs to pipeline names; "*" acts as catch-all fallback
156156
flavorGroupPipelines:
157-
"2152": "kvm-hana-bin-packing" # HANA flavor group
158-
"2101": "kvm-general-purpose-load-balancing" # General Purpose flavor group
159-
"*": "kvm-general-purpose-load-balancing" # Catch-all fallback
157+
"*": "kvm-general-purpose-load-balancing-no-history" # Catch-all fallback
160158
# Fallback pipeline when no flavorGroupPipelines entry matches
161-
pipelineDefault: "kvm-general-purpose-load-balancing"
159+
pipelineDefault: "kvm-general-purpose-load-balancing-no-history"
162160
# How often to re-verify active Reservation CRDs (healthy state)
163161
requeueIntervalActive: "5m"
164162
# Back-off interval when knowledge is unavailable

internal/scheduling/reservations/capacity/config.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ func DefaultConfig() Config {
4747
return Config{
4848
ReconcileInterval: metav1.Duration{Duration: 5 * time.Minute},
4949
TotalPipeline: "kvm-report-capacity",
50-
PlaceablePipeline: "kvm-general-purpose-load-balancing",
50+
PlaceablePipeline: "kvm-general-purpose-load-balancing-no-history",
5151
SchedulerURL: "http://localhost:8080/scheduler/nova/external",
5252
}
5353
}

0 commit comments

Comments
 (0)