-t kustomize deploys an upstream llm-d guide by running the commands
parsed from that guide's README.md, instead of rendering the
modelservice/standalone templates. It is implemented in
llmdbenchmark/standup/steps/step_06_kustomize_deploy.py and
llmdbenchmark/kustomize/.
Under kustomize the deployment is defined entirely by the guide's own
manifests. The normal scenario/CLI/experiment merge chain does not reach
it. Specifically ignored: -m/--models, model.*, decode/prefill replicas,
parallelism, resources, gateway, and all other scenario/CLI tuning. DoE
experiment setup sweeps also do not alter a kustomize deploy (only
run/workload treatments still apply). model.name is only recorded in the
standup-parameters ConfigMap as metadata.
The only way to change a kustomize deployment is the kustomize.* keys
below.
llmdbenchmark --spec guides/optimized-baseline standup -t kustomize -p NS-t kustomize sets kustomize.enabled: true (and disables the other
methods). Equivalently, set kustomize.enabled: true in the scenario. If
kustomize.enabled is false the step is a no-op.
| Key | Default | Effect |
|---|---|---|
enabled |
false |
Must be true (or -t kustomize) for the step to deploy. |
guideName |
— (required) | Guide dir under guides/<name> in the llm-d repo. |
repoPath |
"" |
Local llm-d clone to use. Falls back to --llmd-repo-path; empty ⇒ clone https://github.com/llm-d/llm-d.git into workspace/llm-d. |
repoRef |
"main" |
Git ref used when cloning. |
acceleratorBackend |
"gpu/vllm" |
Swaps modelserver/gpu/vllm → modelserver/<backend> in guide paths. |
gaieVersion |
README GAIE_VERSION or v1.5.0 |
GAIE version substituted into README commands. |
monitoring |
false |
Also applies guides/recipes/modelserver/components/monitoring. |
deployTimeout |
900 |
Pod-readiness wait (seconds). |
patches |
[] |
Inline strategic-merge patches (modelserver). See below. |
overlayPath |
"" |
Directory overlay (modelserver). See below. |
extraHelmValues |
[] |
-f <file> appended to the router/GAIE helm command. |
extraHelmSets |
{} |
--set k=v appended to the router/GAIE helm command. |
guideVariableOverrides |
{} |
Override/fill the guide README's ${VAR} values (cannot add new variables). |
- Modelserver (the workload pods):
patches,overlayPath. - Router / GAIE (the helm release the guide README installs):
extraHelmValues,extraHelmSets. - README placeholders:
guideVariableOverrides.
patches/overlayPath only take effect when patches is non-empty or
overlayPath is an existing directory; the generated wrapper
kustomization.yaml (base = the guide's modelserver dir) is written to
workspace/setup/kustomize-overlay/ and applied with kubectl apply -k.
kustomize:
enabled: true
guideName: "optimized-baseline"
# patches → modelserver. Strategic-merge, matched by
# apiVersion + kind + metadata.name against the guide's base.
patches:
- patch: | # override the guide's replica count
apiVersion: apps/v1
kind: Deployment
metadata: { name: decode }
spec: { replicas: 4 }
- patch: | # gated models: inject HF token env
apiVersion: apps/v1
kind: Deployment
metadata: { name: decode }
spec:
template:
spec:
containers:
- name: modelserver
env:
- name: HF_TOKEN
valueFrom:
secretKeyRef:
name: llm-d-hf-token
key: HF_TOKEN
# overlayPath → modelserver. If the dir contains kustomization.yaml it
# is added as a kustomize component; otherwise every *.yaml in it is
# applied as a patch file. Combinable with `patches`.
overlayPath: "/abs/path/my-overlay"
# extraHelmValues / extraHelmSets → router/GAIE helm release ONLY.
extraHelmValues: ["/abs/path/gaie-values.yaml"]
extraHelmSets:
inferenceExtension.replicas: "2"
# guideVariableOverrides → override/fill ${VAR} tokens the guide
# README already uses (cannot introduce new variables).
guideVariableOverrides:
SOME_README_VAR: "value"- Valid patch targets (
metadata.name, e.g.decode,prefill) and validextraHelmValues/extraHelmSetskeys depend on the upstream guide — readguides/<guideName>/modelserver/<backend>/and the helm step inguides/<guideName>/README.mdin the llm-d repo. They are not arbitrary. guideVariableOverridesonly affects${VAR}tokens that literally appear in that guide's README; it cannot introduce new variables. Precedence (llmdbenchmark/kustomize/variable_resolver.py): README-declared defaults <guideVariableOverrides< forcedGUIDE_NAME/NAMESPACE/GAIE_VERSION(those three cannot be overridden).- The deployed model is whatever the guide pins; change it via a
patchesentry against the resource that carries it, not via-m/model.name. - Multi-model (multi-stack) is NOT supported with kustomize. The
deployment is keyed entirely on
guideName(resources + the{guideName}-eppendpoint) with no per-stack/per-model uniquification (unlike modelservice's per-stack identity resolution), so multiple stacks would collide on the same guide resources. Use themodelservicemethod (e.g. themulti-model-wvascenario) for multi-model; keep kustomize scenarios single-stack.