Add private OCI OKE cookbook for Llama Nemotron Nano 8B#117
Add private OCI OKE cookbook for Llama Nemotron Nano 8B#117fede-kamel wants to merge 13 commits into
Conversation
|
@chrisalexiuk-nvidia @anushapant @shashank3959 — Friendly follow-up! This PR adds an OCI OKE deployment cookbook for Llama Nemotron Nano 8B. Would love to get a review when you get a chance. Thanks! |
|
Hey team 👋 — just checking in on this one. Happy to address any feedback or make adjustments to scope if that helps move things along. Let me know if there's anything I can do on my end! @chrisalexiuk-nvidia @anushapant @shashank3959 |
✅ Cross-validated with NeMo Agent Toolkit OCI integrationThis OKE deployment is now serving as the live inference backend for NVIDIA/NeMo-Agent-Toolkit#1804, which adds first-class OCI Generative AI support to the Agent Toolkit. The full Agent Toolkit OCI test suite — 11/11 tests pass — was validated against the |
|
@chrisalexiuk-nvidia @anushapant @shashank3959 — Quick update — we just used this exact OKE deployment to validate the full OCI integration for NeMo Agent Toolkit! 🚀 The Really exciting to see both pieces come together: this PR provides the reproducible OCI deployment, and the Toolkit PR builds on top of it with a first-class integration. Two repos, one Nemotron story on OCI. Looking forward to your review! |
|
Hey @chrisalexiuk-nvidia @anushapant @shashank3959 @chtruong814 — wanted to check in and see if any of you have had a chance to take a look at this one. This PR is part of the broader NVIDIA × OCI partnership effort — it provides a reproducible OKE deployment for Nemotron Nano 8B on Oracle Cloud, and has already been validated as the live backend for the OCI integration in NeMo Agent Toolkit. Would love to get it across the finish line when bandwidth allows. No pressure — just want to make sure it's on your radar. Happy to address any feedback! |
|
Hey - I wasn't getting pings on these, I've fixed this up. This LGTM - would love to merge it in! |
|
Nice!. Are you able to help on moving this one forward? Thank you. |
|
Yessir! Will get the review going and let's go from there! |
|
/claude review. |
| modelSpec: | ||
| - name: "llama31-nemotron-nano-8b" | ||
| repository: "vllm/vllm-openai" | ||
| tag: "latest" |
There was a problem hiding this comment.
This cookbook is documented as a "validated" / "known-good" configuration, but using tag: "latest" means anyone deploying later will get a different vLLM version than what was actually validated. This could silently break the deployment (e.g., tool-calling behavior, memory profile, chat template paths).
Consider pinning to the specific vLLM tag that was validated.
There was a problem hiding this comment.
Good catch — fixed in d863095. Pinned to v0.14.0, which is the vLLM release that was current when this cookbook was validated on the live OKE cluster (Jan 22 2026).
There was a problem hiding this comment.
Correction on the version — confirmed by exec'ing into the running Nemotron pod on the live private OKE cluster: vllm.__version__ reports 0.17.1. Updated the pin to v0.17.1 in 0d8eeeb.
There was a problem hiding this comment.
Final update — upgraded the live OKE cluster to vLLM v0.19.0 (latest) and re-validated. Pinned to v0.19.0 in c984916. All endpoints (health, models, chat completion) confirmed working through both engine and router.
d876121 to
7c59d19
Compare
|
@chrisalexiuk-nvidia — all commits are signed off and the PR is ready for review. Thanks! |
|
Updated in
Validated end-to-end against Ready for review. |
88ee6ab to
8a22372
Compare
|
@chrisalexiuk-nvidia DCO is green and I just ran the updated cookbook end-to-end in one uninterrupted pass on a fresh private OKE cluster in
Kubelet capacity after Step 7b: 198 GiB on the GPU node (VM.GPU.A10.1, 200 GB boot volume), 89 GiB on the CPU node (100 GB boot volume). The expansion took effect, no pod evictions for Two signed-off commits on this branch:
Ready for review. |
|
Hey @chrisalexiuk-nvidia @anushapant @shashank3959 @chtruong814 — quick update: the companion PR NVIDIA/NeMo-Agent-Toolkit#1804 ("Add OCI LangChain support for hosted Nemotron workflows") was merged on April 14 ✅ That PR's OCI integration was validated end-to-end against the exact private OKE deployment documented in this cookbook (11/11 tests pass on On this PR's side everything is green — DCO passing, signed-off, end-to-end re-validated on a fresh cluster after the OpenSSH 10.x / Step 7b updates ( Thanks! |
Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
Replace tag: "latest" with the specific vLLM release that was running when the cookbook was validated on the private OKE cluster (Jan 22 2026). This ensures reproducible deployments. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
Confirmed by exec'ing into the running Nemotron pod on the validated private OKE cluster — vllm.__version__ reports 0.17.1. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
Upgraded the nemotron-vllm-phx cluster from v0.17.1 to v0.19.0 and validated chat completion, health, and model listing through both the engine service and the router. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
Document that the cookbook was validated against vLLM v0.19.0. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
Rewrite the README as a complete step-by-step guide based on a clean end-to-end deployment validated on 2026-04-15 against OKE v1.31.10 in us-phoenix-1 with vllm-stack chart 0.1.10. Key fixes from live deployment validation: - Move chatTemplate from chart field to vllmConfig.extraArgs (chart 0.1.10 prepends /templates/ to the path, breaking absolute container paths) - Document vllm-templates-pvc prerequisite (new in chart 0.1.10, engine pod stays Pending without it) - Add boot volume filesystem expansion steps with iSCSI rescan for online-resized volumes - Document CPU node boot volume must be >= 100 GB (router image is ~10.5 GB, default 47 GB causes eviction) - Add CoreDNS and kube-dns-autoscaler GPU toleration patches - Add StorageClass creation step - Add Bastion tunnel setup with kubeconfig CA cert workaround - Add troubleshooting for all issues encountered during deployment Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
Switch from the OKE Terraform module to manual OCI CLI commands based on the Oracle vLLM Production Stack guide. The Terraform module's NSG configuration blocks OCI Bastion port-forwarding to the private control plane endpoint. The CLI approach creates a dedicated bastion subnet with proper security list rules, which provides reliable tunnel access to the private OKE cluster. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
- Add explicit wait gates between steps (cluster ACTIVE before node pools, nodes ACTIVE before connecting) - Use unique pod names per node for boot volume expansion - Add NVIDIA device plugin note (pre-installed on enhanced OKE) - Add reference to checked-in values file - Add cleanup section with teardown commands - Add note about Terraform directory as alternative reference - Clarify AD selection for GPU capacity Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
- Add ASCII architecture diagram showing VCN layout - Change default profile to DEFAULT (not user-specific) - Add image ID verification step with fallback query - Explain why bastion subnet is public - Improve cleanup section with pool listing command and subnet deletion loop - Add commented fallback for manual image selection Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
- Add Appendix A: jump-host VM alternative to OCI Bastion. OpenSSH 10.x clients (macOS 15+, recent Linux) cannot sustain OCI Bastion port-forwarding sessions; the server closes the connection immediately after publickey auth. Appendix A replaces Step 4 and the bastion-session block inside Step 6 with a public-IP jump-host in the bastion subnet and a plain SSH -L tunnel. - Add Step 7b: soft-reset each node after filesystem expansion so kubelet re-reads ephemeral-storage capacity. The in-place systemctl restart kubelet in Step 7 never worked (SIGKILL from containerd bounce; kubelet also caches capacity at startup), causing the engine pod to be evicted mid image pull with disk-pressure despite df showing the expanded filesystem. Step 7b includes a capacity verification loop that gates on the expected Ki values. - Replace Step 5's hardcoded data[1] AD picker with a capacity-aware loop that queries gpu-a10-count availability per AD and selects the first AD with non-zero capacity. - Appendix A.1 waits for port 22 with nc -z before the first SSH, to avoid the cloud-init race between VM RUNNING and authorized_keys being installed. - New troubleshooting entries for stale kubelet capacity and SSH tunnel closure on OpenSSH 10.x. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
The previous snippet used `oci ce node-pool list` with a query over the
nested `nodes[]` array, but the `list` API does not populate `nodes` —
it always returns `null` — so the lookup produced an empty
`INSTANCE_ID` and the subsequent `compute instance action` looped with
`Parameter --instance-id cannot be whitespace or empty string`.
OKE sets each node's `.spec.providerID` to `oci://<instance-ocid>`, so
`kubectl get node <ip> -o jsonpath='{.spec.providerID}'` gives a
reliable one-line lookup.
Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
4798f74 to
69bbedd
Compare
|
Hey @chrisalexiuk-nvidia — friendly nudge on this one. Last word from you was an LGTM on Apr 9; since then the cookbook's been re-validated end-to-end on a fresh OKE cluster, the branch is rebased clean on Anything blocking on your side, or is it just an Approve click away? Happy to address anything else if needed. |
Summary
nvidia/Llama-3.1-Nemotron-Nano-8B-v1us-phoenix-1with no public control-plane endpoint, no public worker IPs, and no public inference endpointvLLMvalues file for a singleVM.GPU.A10.1nodeValidation
Validated against a live Phoenix OKE deployment of
nvidia/Llama-3.1-Nemotron-Nano-8B-v1using a private cluster plus OCI Bastion/tunnel access:terraform planterraform applyPHX-AD-2/health/v1/modelsNotes