converged-computing · vsoch · Jun 6, 2026 · Jun 6, 2026
diff --git a/Dockerfile b/Dockerfile
@@ -1,11 +1,7 @@
+# Mr. Fluence!
 # Multi-stage build for the fluence scheduler.
-#
 # The scheduler binary cgo-links flux-sched (Fluxion) for resource matching.
-# It does NOT depend on QRMI — quantum job submission is a separate workload
-# (github.com/converged-computing/qrmi-sampler). So this image needs only
-# flux-sched, no Rust/QRMI. Mirrors the .devcontainer build.
 
-# ---------- builder ----------
 FROM fluxrm/flux-core:noble AS builder
 
 USER root
@@ -37,7 +33,9 @@ COPY . .
 RUN CGO_ENABLED=1 \
     CGO_CFLAGS="-I/opt/flux-sched" \
     CGO_LDFLAGS="-L/opt/flux-sched/resource -L/opt/flux-sched/resource/libjobspec -L/opt/flux-sched/resource/reapi/bindings -lresource -ljobspec_conv -lreapi_cli -lflux-idset -lstdc++ -lczmq -ljansson -lhwloc -lboost_system -lflux-hostlist -lboost_graph -lyaml-cpp" \
-    go build -ldflags '-w' -o /bin/fluence ./cmd/fluence
+    go build -ldflags '-w' -o /bin/fluence ./cmd/fluence && \
+    CGO_ENABLED=0 go build -ldflags '-w' -o /bin/fluence-deviceplugin ./cmd/deviceplugin && \
+    CGO_ENABLED=0 go build -ldflags '-w' -o /bin/fluence-webhook ./cmd/webhook
 
 FROM fluxrm/flux-core:noble AS runtime
 
@@ -55,4 +53,6 @@ COPY --from=builder /usr/lib/libjobspec_conv.so* /usr/lib/
 RUN ldconfig
 
 COPY --from=builder /bin/fluence /bin/fluence
-ENTRYPOINT ["/bin/fluence"]
+COPY --from=builder /bin/fluence-deviceplugin /bin/fluence-deviceplugin
+COPY --from=builder /bin/fluence-webhook /bin/fluence-webhook
+ENTRYPOINT ["/bin/fluence"]
diff --git a/Makefile b/Makefile
@@ -18,13 +18,16 @@ CGO_LDFLAGS = -L$(FLUX_SCHED_ROOT)/resource \
               -lflux-hostlist -lboost_graph -lyaml-cpp
 
 .PHONY: build
-build: ## Build the fluence scheduler binary (needs flux-sched)
+build: ## Build all binaries (scheduler needs flux-sched; helpers are pure Go)
 	CGO_ENABLED=1 CGO_CFLAGS="$(CGO_CFLAGS)" CGO_LDFLAGS="$(CGO_LDFLAGS)" \
 	  go build -o bin/fluence ./cmd/fluence
+	CGO_ENABLED=0 go build -o bin/fluence-deviceplugin ./cmd/deviceplugin
+	CGO_ENABLED=0 go build -o bin/fluence-webhook ./cmd/webhook
 
 .PHONY: test
 test: ## Pure-Go unit tests (no flux, no k8s scheduler libs, no cluster)
-	go test ./pkg/jgf/... ./pkg/cluster/... ./pkg/jobspec/... ./pkg/placement/... ./pkg/quantum/...
+	go test ./pkg/jgf/... ./pkg/cluster/... ./pkg/jobspec/... ./pkg/placement/... \
+	  ./pkg/quantum/... ./pkg/webhook/... ./pkg/deviceplugin/...
 
 .PHONY: test-graph
 test-graph: ## Matcher tests (needs flux-sched)

diff --git a/README.md b/README.md
@@ -4,119 +4,222 @@
 
 A Kubernetes scheduler plugin that places **pod groups** (and individual pods)
 by matching them against a [Fluxion](https://github.com/flux-framework/flux-sched)
-(flux-sched) resource graph built from the live cluster. 
+(flux-sched) resource graph built from the live cluster.
 
 This is an update from [flux-k8s](https://github.com/flux-framework/flux-k8s)
 that uses the native PodGroup and optionally allows for scheduling
-against **quantum resources** modeled in the same graph. I am also improving
-the design by not requiring a sidecar for fluence - the plugin is built as one
-container. 
+against arbitrary resources such as **quantum resources** modeled in the same graph. 
+I am also improving the design by not requiring a sidecar for fluence, and not
+requiring the `kubernetes-sigs/scheduler-plugins` dependency. We use native Gang
+scheduling provided by Kubernetes. 
 
 For quantum resource modeling, we start from the prototype proven out in
-[fluxion-quantum](https://github.com/converged-computing/fluxion-quantum). 
-This design is an improvement upon the initial fluence because we drop
-the `kubernetes-sigs/scheduler-plugins` dependency and use Kubernetes
-**native gang scheduling** (the `PodGroup` API, `scheduling.k8s.io/v1alpha2`,
-alpha in 1.35/1.36).
+[fluxion-quantum](https://github.com/converged-computing/fluxion-quantum).
 
 ## How it works
 
+### Gang Scheduling
+
 Gang semantics (all-or-nothing) come from the native `PodGroup` API. Fluence is
 responsible only for **placement**:
 
 1. **Discover** — on startup fluence lists cluster nodes and turns their
    cpu/memory/gpu capacity into a Fluxion JGF resource graph
-   (`pkg/cluster` + `pkg/jgf`). Quantum backends from a config file are injected
-   as `qpu` vertices under a `qgateway` (`AddQuantum`).
+   (`pkg/cluster` + `pkg/jgf`). If a resources config is provided (via
+   `FLUENCE_RESOURCES`), its entries (e.g. quantum backends) are injected as
+   `qpu`/`qubit` vertices. With no config the graph is classical-only.
 2. **Match** — when the first pod of a group hits `PreFilter`, fluence builds a
-   Fluxion jobspec for the whole gang (`pkg/fluence.JobspecForGroup`), asks the
+   Fluxion jobspec for the whole gang (`pkg/placement.JobspecForGroup`), asks the
    matcher to allocate (`pkg/graph.FluxionGraph.MatchAllocateSpec`), and parses
-   the allocation into node names (`PlacementFromAllocation`).
-3. **Place** — `Filter` then permits each pod only on its allocated node.
-
-For a **quantum** pod (one that requests `quantum.flux-framework.org/qpu`), the
-match allocates a `qpu` vertex instead of cores; the allocated backend name
-(e.g. `ibm_fez`) is what the workload submits to via
-[qrmi-go](https://github.com/converged-computing/qrmi-go) (job mode on the IBM
-open plan — see fluxion-quantum for that story).
-
-```
-nodes (kubectl get nodes) ─┐
-                           ├─► JGF resource graph ─► Fluxion match ─► node + backend placement
-quantum-backends.yaml ─────┘
+   the allocation into node and backend names (`PlacementFromAllocation`).
+3. **Place** — `Filter` permits each pod only on its allocated node. (A
+   quantum-only pod allocates a `qpu` but no node — the backend is a remote API
+   any node can reach — so fluence imposes no node constraint in that case.)
+4. **Hand off** — for a quantum pod, `PreBind` records the allocated backend on
+   the pod as the `fluence.flux-framework.org/backend` annotation. The mutating
+   webhook (installed with the base) injects a downward-API env so the container
+   reads it as `QRMI_BACKEND` with no boilerplate in the manifest.
+
+### Design Choices
+
+While Quantum resources are this first target, notably we should be able to support
+any arbitrary resource in the graph. I decided that a pod can request a graph resource generically
+e.g., `fluxion.flux-framework.org/<type>` (like `.../qpu: "1"`) and that becomes a jobspec count
+of `<type>`. To support this, we deploy a **device plugin** that can advertise these virtual 
+types on every node. We need to do this because of the in-tree `NodeResourcesFit` endpoint. 
+If we do not have the device plugin, this call will not be satisfied. Note that
+this device plugin will return True for any resources it sees added to the Fluxion resource graph,
+but is not actually involved with scheduling. Fluxion does the real matching.
+
+```console
+nodes (kubectl get nodes) ──┐
+                            ├─► JGF resource graph ─► Fluxion match ─► node + backend placement
+fluence-resources ConfigMap ┘
 ```
 
+I am also choosing to keep credentials and qrmi interactions on the level of the application.
+I am not comfortable with the design of an operator holding any kind of credential or being
+responsible for managing calls with qrmi in a multi-tenant environment. Finally, since
+there are (and will continue to be) a lot of environment variables that I do not want 
+to place on the user to define, we have a webhook to handle this. We can combine an annotation
+added with the webhook with a PreBind call to define the annotation to orchestrate that.
+
 ## Build
 
-The scheduler binary links flux-sched (the matcher) and, for quantum, QRMI:
+The scheduler binary links flux-sched (the matcher). It does **not** link QRMI —
+quantum job submission lives in a separate workload container
+([qrmi-sampler](https://github.com/converged-computing/qrmi-sampler)), not here.
 
 ```bash
-# If you want to debug inside the .devcontainer, use this one
-make build      # needs flux-sched at /opt/flux-sched and QRMI at /usr/local
+# Inside the .devcontainer (flux-sched at /opt/flux-sched):
+# builds bin/fluence (cgo+flux) + bin/fluence-deviceplugin + bin/fluence-webhook
+make build      
+make test
 
-# If you want to test outside (and build the docker image, this one)
+# Or build the container image (all three binaries):
 make image
 ```
 
-Pure-Go pieces (graph builder, discovery, jobspec, placement) need neither and
-are covered by:
+## Deploy
+
+Create a development cluster on a Kubernetes release that supports native gang
+scheduling, with the feature gates enabled:
 
 ```bash
-make test
+kind create cluster --image kindest/node:v1.36.1 --config deploy/kind-config.yaml
 ```
 
-## Deploy
+(See [installing kind](https://kind.sigs.k8s.io/docs/user/quick-start#installing-from-release-binaries).)
+The kind config turns on the `GangScheduling` and `GenericWorkload` feature gates
+and the `scheduling.k8s.io/v1alpha2` API group on the apiserver and scheduler. In
+the future these will likely be enabled by default. 
 
-Here is how I am creating a development cluster with a release of Kubernetes that will support
-what we need:
+Load the image (built above) into the cluster:
 
 ```bash
-kind create cluster --image kindest/node:v1.36.1 --config deploy/kind-config.yaml
+kind load docker-image ghcr.io/converged-computing/fluence:latest
 ```
 
-And if you [need to install kind](https://kind.sigs.k8s.io/docs/user/quick-start#installing-from-release-binaries).
+### 1. Gang Scheduling
 
+Install the **base** scheduler (this is all you need for classical scheduling —
+no device plugin, no quantum):
 
 ```bash
-# This creates the quantum backends yaml graph
-kubectl create configmap fluence-quantum-backends --from-file=quantum-backends.yaml=config/quantum-backends.yaml -n kube-system
+kubectl apply -f deploy/fluence.yaml
+```
 
-# load docker image
-kind load docker-image ghcr.io/converged-computing/fluence
+This installs the scheduler, its RBAC, and the mutating webhook. Pods opt in with
+`schedulerName: fluence`; a multi-pod gang adds a `scheduling.k8s.io/pod-group`
+label (a single pod is treated as a group of one and needs no label).
 
-kubectl apply -f deploy/fluence.yaml          # RBAC + scheduler in kube-system
-kubectl apply -f examples/podgroup.yaml       # a gang scheduled by fluence
-```
+## Testing
+
+### 1. Classical (a pod group)
 
-This works by enabling the native gang feature on the cluster (kube-scheduler / API server), meaning
-the `GangScheduling` and `GenericWorkload` feature gates and the `scheduling.k8s.io/v1alpha2` API group.
-In the future these will likely be enabled by default.
+The base install is enough. Schedule a gang:
 
-Pods opt in with `schedulerName: fluence` and a `scheduling.k8s.io/pod-group` label; group size can be set explicitly with
-`fluence.flux-framework.org/group-size`.
+```bash
+kubectl apply -f examples/podgroup.yaml
+kubectl get pods -o wide  
+kubectl get events --field-selector reason=Scheduled
+kubectl get podgroups.scheduling.k8s.io
+```
+```console
+NAME       POLICY   WORKLOAD   STATUS      AGE
+training   Gang     <none>     Scheduled   15s
+```
 
-Note that when you are developing / debugging a group deletion can hang because of finalizers. I do:
+And cleanup.
 
 ```bash
 kubectl patch podgroup training -n default --type=merge -p '{"metadata":{"finalizers":null}}'
+kubectl delete -f examples/podgroup.yaml
 ```
 
-## Quantum
+### 2. Quantum
 
-We can bing fluence up with quantum resources by pointing `FLUENCE_QUANTUM_CONFIG` at a backends file (see `config/quantum-backends.yaml`). 
-Those backends become schedulable `qpu` vertices; a pod requesting `quantum.flux-framework.org/qpu` will be matched to one, and the allocated backend is handed to the workload.
+Quantum needs the resources add-on, which supplies the `fluence-resources`
+ConfigMap (the single source of truth for which backends exist) **and** the
+device plugin that advertises them:
+
+```bash
+kubectl apply -f deploy/fluence-resources.yaml
+# The scheduler reads its resources config at startup, so restart it to pick up
+# the quantum vertices:
+kubectl rollout restart deployment/fluence -n kube-system
+```
+
+Confirm the device plugin advertised the resources on the nodes:
+
+```bash
+kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.allocatable}{"\n"}{end}' \
+  | grep fluxion.flux-framework.org
+```
+```console
+kind-control-plane	{"cpu":"16","ephemeral-storage":"982292956Ki","fluxion.flux-framework.org/qpu":"1k","fluxion.flux-framework.org/qubit":"1k","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"61400748Ki","pods":"110"}
+kind-worker	{"cpu":"16","ephemeral-storage":"982292956Ki","fluxion.flux-framework.org/qpu":"1k","fluxion.flux-framework.org/qubit":"1k","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"61400748Ki","pods":"110"}
+kind-worker2	{"cpu":"16","ephemeral-storage":"982292956Ki","fluxion.flux-framework.org/qpu":"1k","fluxion.flux-framework.org/qubit":"1k","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"61400748Ki","pods":"110"}
+```
+
+Create the IBM credentials the **workload** uses to submit (in the namespace
+where the workload runs — the scheduler itself never needs them):
+
+```bash
+# If you don't have this yet
+curl -fsSL https://clis.cloud.ibm.com/install/linux | sudo sh
+ibmcloud login --apikey <key>
+# 12 for us-east
+```
+```bash
+export IBM_CLOUD_TOKEN=<key>
+export IBM_CLOUD_CRN=$(ibmcloud resource service-instances --service-name quantum-computing --output json | jq -r '.[] | {name: .name, crn: .crn}' | jq -r .crn)
+```
+
+```bash
+kubectl create secret generic ibm-quantum -n default --from-literal=token="$IBM_CLOUD_TOKEN" --from-literal=crn="$IBM_CLOUD_CRN"
+```
+
+Run a single quantum pod. It just requests `fluxion.flux-framework.org/qpu` — no
+group, and no hard-coded backend (the webhook + PreBind supply `QRMI_BACKEND`):
+
+```bash
+kubectl apply -f examples/quantum-pod.yaml
+kubectl get pod sampler -o wide
+
+# fluence's chosen backend, injected as an environment variable:
+kubectl get pod sampler -o jsonpath='{.metadata.annotations.fluence\.flux-framework\.org/backend}{"\n"}'
+kubectl logs sampler
+```
+```console
+kubectl logs sampler -f
+2026/06/06 19:04:38 submitting sampler job to ibm_marrakesh
+{"results": [{"data": {"c": {"samples": ["0x0", "0x1", "0x0", "0x0", "0x1", "0x1", "0x1", "0x1", "0x1", "0x0", "0x0", "0x0", "0x0", "0x1", "0x1", "0x1", "0x1", "0x1", "0x1", "0x1", "0x0", "0x1", "0x0", "0x0", "0x0", "0x1", "0x0", "0x1", "0x0", "0x1", "0x1", "0x0", "0x0", "0x0", "0x0", "0x1", "0x1", "0x1", "0x1", "0x1", "0x1", "0x1", "0x0", "0x0", "0x0", "0x1", "0x0", "0x1", "0x0", "0x0", "0x1", "0x1", "0x1", "0x0", "0x0", "0x0", "0x1", "0x1", "0x1", "0x0", "0x1", "0x1", "0x0", "0x1", "0x0", "0x1", "0x1", "0x1", "0x1", "0x0", "0x0", "0x1", "0x1", "0x1", "0x1", "0x1", "0x1", "0x1", "0x0", "0x0", "0x1", "0x0", "0x1", "0x1", "0x0", "0x1", "0x0", "0x1", "0x1", "0x1", "0x0", "0x0", "0x0", "0x1", "0x1", "0x1", "0x1", "0x0", "0x0", "0x1", "0x0", "0x0", "0x0", "0x0", "0x0", "0x0", "0x1", "0x1", "0x0", "0x1", "0x0", "0x0", "0x1", "0x1", "0x1", "0x1", "0x1", "0x0", "0x1", "0x0", "0x0", "0x0", "0x1", "0x1", "0x1", "0x0", "0x1", "0x1", "0x0", "0x0", "0x1", "0x0", "0x1", "0x0", "0x0", "0x1", "0x0", "0x0", "0x1", "0x1", "0x1", "0x0", "0x1", "0x1", "0x0", "0x0", "0x0", "0x1", "0x1", "0x1", "0x0", "0x1", "0x1", "0x0", "0x0", "0x1", "0x0", "0x1", "0x0", "0x0", "0x0", "0x0", "0x1", "0x1", "0x1", "0x1", "0x1", "0x1", "0x1", "0x0", "0x1", "0x1", "0x1", "0x0", "0x1", "0x1", "0x0", "0x0", "0x0", "0x0", "0x1", "0x0", "0x0", "0x0", "0x0", "0x1", "0x0", "0x0", "0x1", "0x1", "0x1", "0x0", "0x1", "0x0", "0x1", "0x0", "0x1", "0x1", "0x1", "0x1", "0x0", "0x0", "0x1", "0x1", "0x0", "0x1", "0x0", "0x0", "0x0", "0x1", "0x1", "0x1", "0x0", "0x0", "0x0", "0x1", "0x0", "0x0", "0x0", "0x1", "0x0", "0x1", "0x0", "0x0", "0x0", "0x1", "0x1", "0x0", "0x1", "0x0", "0x0", "0x0", "0x1", "0x0", "0x0", "0x1", "0x1", "0x0", "0x0", "0x0", "0x0", "0x0", "0x1", "0x1", "0x1", "0x0", "0x1", "0x1", "0x1", "0x1", "0x1", "0x1", "0x0", "0x0", "0x0", "0x0"], "num_bits": 1}}, "metadata": {"circuit_metadata": {}}}], "metadata": {"execution": {"execution_spans": [[{"date": "2026-06-06T19:04:43.221657"}, {"date": "2026-06-06T19:04:44.372421"}, {"0": [[256], [0, 1], [0, 256]]}]]}, "version": 2}}
+2026/06/06 19:04:50 done: 2070 bytes from ibm_marrakesh
+```
+Boum!
+
+### A note on deletion
+
+When developing/debugging, a PodGroup (or its pods) can hang on delete because of
+finalizers (the workload controller may not be running). Clear them with:
+
+```bash
+kubectl patch podgroup training -n default --type=merge -p '{"metadata":{"finalizers":null}}'
+```
 
-**under development** I am still thinking about how to make this request. -V
+Importantly, submission is **not** done by the scheduler — the workload container holds the
+user's credentials and submits via qrmi-go (job mode on the IBM open plan; see
+fluxion-quantum for that story). Fluence only schedules and hands off the backend.
+When we actually have control of local quantum devices this will be different.
 
 ## License
 
 HPCIC DevTools is distributed under the terms of the MIT license.
 All new contributions must be made under this license.
 
-See [LICENSE](https://github.com/converged-computing/cloud-select/blob/main/LICENSE),
-[COPYRIGHT](https://github.com/converged-computing/cloud-select/blob/main/COPYRIGHT), and
-[NOTICE](https://github.com/converged-computing/cloud-select/blob/main/NOTICE) for details.
+See [LICENSE](LICENSE), [COPYRIGHT](COPYRIGHT), and [NOTICE](NOTICE) for details.
 
-SPDX-License-Identifier: (MIT)
+SPDX-License-Identifier: MIT
 
-LLNL-CODE- 842614
+LLNL-CODE-842614