Skip to content

Commit 693ca00

Browse files
docs(reservations): document InFlightReservation, domain hints, and CRS metrics (#963)
## Summary - Document InFlightReservation CRD type (#954): temporary capacity blocks for VMs being scheduled that prevent double-booking across candidate hosts, including lifecycle, spec fields, and interaction with CR reservation blocking - Document domain name resolution via Keystone (#955): placement requests include domain_name scheduler hints resolved from DomainID to enforce host restrictions for external customer domains via filter_external_customer - Document CRS evaluation subsystem (#847): post-placement classification of outcomes (no_cr, cr_exhausted, slot_exhausted, slot_blocked, slot_missed, slot_used) and new Prometheus metrics cortex_nova_no_host_found_total and cortex_nova_placement_total ## Test plan - [ ] Verify markdown renders correctly on GitHub - [ ] Confirm internal links to #inflightreservation anchor resolve correctly - [ ] Review metric names and classification categories match source code Assisted-by: Claude Code:claude-sonnet-4-20250514 [Bash] [Read] Co-authored-by: cortex-ai-agents[bot] <279748396+cortex-ai-agents[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent d50018e commit 693ca00

1 file changed

Lines changed: 54 additions & 2 deletions

File tree

docs/reservations/committed-resource-reservations.md

Lines changed: 54 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,15 @@ Cortex reserves hypervisor capacity for customers who pre-commit resources (comm
1212
- [Reservation Lifecycle](#reservation-lifecycle)
1313
- [VM Lifecycle](#vm-lifecycle)
1414
- [Capacity Blocking](#capacity-blocking)
15+
- [InFlightReservation](#inflightreservation)
1516
- [Reservation Controller](#reservation-controller)
1617
- [Info API](#info-api)
1718
- [Change-Commitments API](#change-commitments-api)
1819
- [Quota API](#quota-api)
1920
- [Report-Usage API](#report-usage-api)
2021
- [Report-Capacity API](#report-capacity-api)
2122
- [Syncer Task](#syncer-task)
23+
- [Placement Observability (CRS Evaluation)](#placement-observability-crs-evaluation)
2224

2325
The CR reservation implementation is located in `internal/scheduling/reservations/commitments/`. Key components include:
2426
- `CommittedResource` controller — acceptance, rejection, child Reservation CRUD (memory) or arithmetic headroom check (cores)
@@ -247,7 +249,7 @@ block = max(remaining, spec_only_unblocked)
247249

248250
When a VM is in flight (Nova choosing between candidates), a pessimistic blocking reservation exists on each candidate host. For any SpecOnly VM that has such a reservation on the same host, the pessimistic blocking reservation is the authority — the CR reservation must not double-count it. The `spec_only_unblocked` term excludes those VMs.
249251

250-
See the pessimistic blocking reservations documentation for the full interaction semantics.
252+
See the [InFlightReservation](#inflightreservation) section below for how these reservations are managed.
251253

252254
**Migration state (`Spec.TargetHost != Status.Host`):**
253255

@@ -261,11 +263,27 @@ When a reservation is being migrated to a new host, block the full `max(Spec.Res
261263

262264
- **VM live migration within a reservation** (VM moves away from the reservation's host): handled implicitly by `hv.Status.Allocation`. Libvirt reports resource consumption on both source and target during live migration, so both hosts' `hv.Status.Allocation` already reflects the in-flight state. No special filter logic needed. The reservation controller will eventually remove the VM from the reservation once it's confirmed on the wrong host past the grace period.
263265

266+
#### InFlightReservation
267+
268+
An `InFlightReservation` is a short-lived Reservation CRD (type `InFlightReservation`) that pessimistically blocks capacity on each candidate host while a VM is being scheduled. It prevents double-booking when multiple scheduling decisions are in flight concurrently.
269+
270+
**Lifecycle:**
271+
- **Created** by the scheduling pipeline at the end of a successful placement run, one per candidate host returned to Nova. Creation is skipped when the `SkipInflight` pipeline option is set (used by reservation scheduling, capacity checks, and failover — any non-VM-placement run).
272+
- **Deleted** once the VM has been confirmed on a host (the in-flight reservation is no longer needed) or after a timeout if the VM never lands.
273+
274+
**Spec fields** (`InFlightReservationSpec`):
275+
- `VMID` — Nova server UUID of the VM being scheduled
276+
- `UserID` — owner of the VM
277+
- `ProjectID` — project/tenant of the VM
278+
- `Intent` — lifecycle operation that triggered the placement (e.g., create, migrate, resize)
279+
280+
**Interaction with CR reservations:** When computing how much capacity a CR reservation must block, Spec-only VMs that already have an InFlightReservation on the same host are excluded from the CR reservation's block calculation (the `spec_only_unblocked` term). This avoids double-counting resources that are already blocked by the pessimistic InFlightReservation.
281+
264282
#### Reservation Controller
265283

266284
The `Reservation` controller watches `Reservation` CRDs and `Hypervisor` CRDs. `MaxConcurrentReconciles=1` prevents overbooking during concurrent placements.
267285

268-
**Placement** — finds hosts for new reservations (calls scheduler API)
286+
**Placement** — finds hosts for new reservations (calls scheduler API). Placement requests include a `domain_name` scheduler hint resolved from the reservation's `DomainID` via Keystone. This allows the `filter_external_customer` pipeline filter to enforce host restrictions for external customer domains. Domain name resolution uses an in-process cache that stores names indefinitely (domain names are immutable in OpenStack). If the Keystone integration is not configured (`keystoneSecretRef` absent), the hint is omitted and domain-based host restrictions are not enforced.
269287

270288
**Allocation Verification** — tracks VM lifecycle on reservations. The controller uses the Hypervisor CRD as the sole source of truth, with two triggers:
271289
- New VMs (within `committedResourceAllocationGracePeriod`, default: 15 min): verification deferred — VM may still be spawning; requeued every `committedResourceRequeueIntervalGracePeriod` (default: 1 min)
@@ -322,3 +340,37 @@ For each VM, the API reports whether it accounts to a specific commitment or PAY
322340
### Syncer Task
323341

324342
The syncer task runs periodically and syncs local `CommittedResource` CRD state to match Limes' view of commitments, correcting drift from missed API calls or restarts. It writes `CommittedResource` CRDs only — capacity management is the controller's responsibility.
343+
344+
### Placement Observability (CRS Evaluation)
345+
346+
The `internal/scheduling/nova/crs/` package provides post-placement classification and Prometheus metrics for committed resource slot utilization. It answers the question: "For each VM placement (or no-host-found failure), what was the CR slot situation?"
347+
348+
**Prometheus metrics:**
349+
350+
| Metric | Labels | Description |
351+
|--------|--------|-------------|
352+
| `cortex_nova_no_host_found_total` | `cr_slot`, `flavor_group`, `intent` | No-host-found results classified by CR coverage |
353+
| `cortex_nova_placement_total` | `flavor_group`, `intent`, `cr_slot` | Successful placements classified by CR slot outcome |
354+
355+
PAYG placements (flavor not in any configured group) are not counted by either metric.
356+
357+
**No-host-found classification (`cr_slot` label on `cortex_nova_no_host_found_total`):**
358+
359+
| Category | Meaning |
360+
|----------|---------|
361+
| `no_cr` | Project has no active CommittedResources for the flavor group |
362+
| `cr_exhausted` | CommittedResources exist but are fully occupied (used >= capacity) |
363+
| `slot_exhausted` | CR has remaining capacity but no input host has a usable reservation slot |
364+
| `slot_blocked` | A usable slot exists on an input host but scheduling constraints excluded all such hosts |
365+
366+
**Placement classification (`cr_slot` label on `cortex_nova_placement_total`):**
367+
368+
| Category | Meaning |
369+
|----------|---------|
370+
| `no_cr` | No active CR or CR capacity fully exhausted |
371+
| `slot_missed` | CR has remaining capacity but no candidate host has a slot with remaining memory > 0 |
372+
| `slot_used` | CR has remaining capacity and at least one candidate host has a usable slot |
373+
374+
**Slot evaluator:** The `SlotEvaluator` is built once per scheduling request from Hypervisor and Reservation CRDs (no further K8s reads during classification). It computes per-host free memory and indexes ready CR reservation slots by host. `HasUsableSlot` checks whether a host has a slot that can accommodate the VM under the overfill model: `slot.remaining + host.base_free >= vmMemBytes`.
375+
376+
**Recorder:** The `Recorder` is called after each placement decision. On success (`slot_used`), it writes the VM UUID into the best-fit reservation slot (`PickSlot` selects the slot that maximises coverage with tightest-fit tiebreaking). On no-host-found, it classifies the failure and increments the counter.

0 commit comments

Comments
 (0)