Skip to content

Add tolerations and affinity support for inferenceExtension (EPP) pod#1436

Open
yardenmaymon-td wants to merge 1 commit into
llm-d:mainfrom
yardenmaymon-td:ymaymon/epp-tolerations
Open

Add tolerations and affinity support for inferenceExtension (EPP) pod#1436
yardenmaymon-td wants to merge 1 commit into
llm-d:mainfrom
yardenmaymon-td:ymaymon/epp-tolerations

Conversation

@yardenmaymon-td
Copy link
Copy Markdown
Contributor

What

Propagate optional scheduling fields for the EndpointPicker (EPP) pod into the inferencepool chart values (config/templates/jinja/12_gaie-values.yaml.j2):

  • inferenceExtension.tolerations — emitted only when a scenario sets a non-empty list, leaving the chart's tolerations: [] default untouched otherwise.
  • affinity — reuses the scenario-wide affinity convenience wrapper already consumed by decode/prefill/standalone: a flat affinity.nodeSelector map is expanded into a required nodeAffinity term, with podAffinity/podAntiAffinity passed through. Gated on affinity.enabled.

This lets the EPP pod be scheduled onto tainted nodes (e.g. GPU nodes) and steered toward specific nodes.

Why these fields

Scoped to what the inferencepool/epplib chart (v1.4.0 and v1.5.0) actually consumes in its EPP pod spec — it honours tolerations and affinity but has no nodeSelector field, so propagating a top-level inferenceExtension.nodeSelector would be a silent no-op.

Note: the pod(anti)affinity passthrough uses indent(_, true) so the first rendered line is indented; the unparametrised indent(6) used by the modelservice template would leave the first line at column 0 (only hidden there because those keys default to {}).

Testing

  • New tests/test_gaie_values_render.py — renders the template with and without each field, including a pod-anti-affinity case that locks the first-line indentation. Values are based on defaults.yaml so the test stays resilient to unrelated template churn.
  • Full suite: pytest tests/ → 279 passed, 4 skipped.
  • cicd/kind canary renders cleanly (29 files, 0 errors).

🤖 Generated with Claude Code

Propagate `inferenceExtension.tolerations` and reuse the scenario-wide
`affinity` wrapper (as decode/prefill/standalone do) into the gaie values
so the EPP pod can be scheduled onto tainted/specific nodes.

Scoped to fields the inferencepool chart (v1.4.0/v1.5.0) actually
consumes; it has no `nodeSelector`, so propagating one would be a no-op.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Yarden Maymon <yarden.maymon@twodelta.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant