Add private OCI OKE cookbook for Llama Nemotron Nano 8B by fede-kamel · Pull Request #117 · NVIDIA-NeMo/Nemotron

fede-kamel · 2026-03-16T20:37:20Z

Summary

add a Nemotron-specific OCI cookbook for nvidia/Llama-3.1-Nemotron-Nano-8B-v1
document a validated private-only OKE deployment in us-phoenix-1 with no public control-plane endpoint, no public worker IPs, and no public inference endpoint
add a checked-in Terraform wrapper for the private Phoenix OKE infrastructure using Oracle's official OKE module plus OCI Bastion service
include a known-good vLLM values file for a single VM.GPU.A10.1 node
surface the new OCI cookbook in the root README and cookbook index

Validation

Validated against a live Phoenix OKE deployment of nvidia/Llama-3.1-Nemotron-Nano-8B-v1 using a private cluster plus OCI Bastion/tunnel access:

terraform plan
terraform apply
private OKE cluster active
OCI Bastion service active
CPU node pool active
GPU node pool active in PHX-AD-2
/health
/v1/models
chat completion
tool calling
streaming
async concurrent requests

Notes

this contribution is intentionally Nemotron-specific and OCI-specific
the deployment guidance is private-only and does not use public IPs for the Kubernetes API or inference endpoint
the OCI path is documented as a reproducible option comparable to common AWS GPU/Kubernetes deployment patterns, without claiming repo-local AWS Terraform already exists in this repo
the Bastion resource is the OCI Bastion service, not a public bastion VM

fede-kamel · 2026-03-24T14:15:01Z

@chrisalexiuk-nvidia @anushapant @shashank3959 — Friendly follow-up! This PR adds an OCI OKE deployment cookbook for Llama Nemotron Nano 8B. Would love to get a review when you get a chance. Thanks!

fede-kamel · 2026-03-27T02:29:43Z

Hey team 👋 — just checking in on this one. Happy to address any feedback or make adjustments to scope if that helps move things along. Let me know if there's anything I can do on my end! @chrisalexiuk-nvidia @anushapant @shashank3959

fede-kamel · 2026-03-27T13:17:35Z

✅ Cross-validated with NeMo Agent Toolkit OCI integration

This OKE deployment is now serving as the live inference backend for NVIDIA/NeMo-Agent-Toolkit#1804, which adds first-class OCI Generative AI support to the Agent Toolkit.

The full Agent Toolkit OCI test suite — 11/11 tests pass — was validated against the nvidia/Llama-3.1-Nemotron-Nano-8B-v1 endpoint running on this exact private OKE infrastructure in us-phoenix-1. Both PRs together deliver a complete story: reproducible OCI deployment (this PR) powering a production-ready LLM provider and LangChain integration (Toolkit PR).

fede-kamel · 2026-03-27T13:21:49Z

@chrisalexiuk-nvidia @anushapant @shashank3959 — Quick update — we just used this exact OKE deployment to validate the full OCI integration for NeMo Agent Toolkit! 🚀

The nvidia/Llama-3.1-Nemotron-Nano-8B-v1 model running on the private Phoenix cluster documented in this cookbook passed 11/11 tests as the live inference backend for NVIDIA/NeMo-Agent-Toolkit#1804 — covering the OCI provider, LangChain wrapper, and an end-to-end agent workflow.

Really exciting to see both pieces come together: this PR provides the reproducible OCI deployment, and the Toolkit PR builds on top of it with a first-class integration. Two repos, one Nemotron story on OCI. Looking forward to your review!

fede-kamel · 2026-04-07T06:44:41Z

Hey @chrisalexiuk-nvidia @anushapant @shashank3959 @chtruong814 — wanted to check in and see if any of you have had a chance to take a look at this one.

This PR is part of the broader NVIDIA × OCI partnership effort — it provides a reproducible OKE deployment for Nemotron Nano 8B on Oracle Cloud, and has already been validated as the live backend for the OCI integration in NeMo Agent Toolkit. Would love to get it across the finish line when bandwidth allows.

No pressure — just want to make sure it's on your radar. Happy to address any feedback!

chrisalexiuk-nvidia · 2026-04-09T18:48:38Z

Hey - I wasn't getting pings on these, I've fixed this up.

This LGTM - would love to merge it in!

fede-kamel · 2026-04-09T18:52:37Z

Nice!. Are you able to help on moving this one forward? Thank you.

chrisalexiuk-nvidia · 2026-04-09T18:53:21Z

Yessir! Will get the review going and let's go from there!

chrisalexiuk-nvidia · 2026-04-09T20:41:42Z

/claude review.

claude · 2026-04-09T20:43:03Z

+  modelSpec:
+    - name: "llama31-nemotron-nano-8b"
+      repository: "vllm/vllm-openai"
+      tag: "latest"


This cookbook is documented as a "validated" / "known-good" configuration, but using tag: "latest" means anyone deploying later will get a different vLLM version than what was actually validated. This could silently break the deployment (e.g., tool-calling behavior, memory profile, chat template paths).

Consider pinning to the specific vLLM tag that was validated.

Good catch — fixed in d863095. Pinned to v0.14.0, which is the vLLM release that was current when this cookbook was validated on the live OKE cluster (Jan 22 2026).

Correction on the version — confirmed by exec'ing into the running Nemotron pod on the live private OKE cluster: vllm.__version__ reports 0.17.1. Updated the pin to v0.17.1 in 0d8eeeb.

Final update — upgraded the live OKE cluster to vLLM v0.19.0 (latest) and re-validated. Pinned to v0.19.0 in c984916. All endpoints (health, models, chat completion) confirmed working through both engine and router.

fede-kamel · 2026-04-15T12:44:51Z

@chrisalexiuk-nvidia — all commits are signed off and the PR is ready for review. Thanks!

fede-kamel · 2026-04-23T15:06:18Z

Updated in a3363ae:

Appendix A — jump-host VM alternative to OCI Bastion. OCI Bastion port-forwarding sessions close immediately after publickey auth when the client is OpenSSH 10.x (shipped with macOS 15+ and recent Linux distros). Appendix A replaces Step 4 and the bastion-session block inside Step 6 with a public-IP jump host and a plain ssh -L tunnel.
Step 7b — soft-reset each node after the filesystem expansion in Step 7. The in-place systemctl restart kubelet in Step 7 does not refresh Node.Capacity.ephemeral-storage (kubelet caches it at startup; the restart is also killed by SIGKILL when containerd bounces mid-exec). Without Step 7b the engine pod is evicted repeatedly mid image pull for disk-pressure, despite df showing the expanded filesystem. Step 7b drains, soft-resets, waits for Ready, uncordons, and gates on a capacity-verification snippet before Step 8.
Step 5 AD picker — replaces the hardcoded data[1].name with a loop that queries gpu-a10-count availability per AD and selects the first with non-zero capacity.
Appendix A.1 — nc -z wait on port 22 before the first SSH, to avoid the cloud-init race between VM RUNNING and authorized_keys being installed.
Two new troubleshooting entries covering the symptoms above.

Validated end-to-end against nvidia/Llama-3.1-Nemotron-Nano-8B-v1 on a fresh private OKE cluster: /health returns 200, /v1/models lists the served model, chat completion returns the expected sentinel, tool calling yields finish_reason: tool_calls with the correct function selected.

Ready for review.

fede-kamel · 2026-04-23T18:49:43Z

@chrisalexiuk-nvidia DCO is green and I just ran the updated cookbook end-to-end in one uninterrupted pass on a fresh private OKE cluster in us-phoenix-1. Step 12 results:

/health → {"status":"healthy"}
/v1/models → nvidia/Llama-3.1-Nemotron-Nano-8B-v1
chat completion → {"content":"NEMOTRON_OK","finish_reason":"stop"}
tool call → {"finish_reason":"tool_calls","tool_calls":[{"name":"get_utc_time"}]}

Kubelet capacity after Step 7b: 198 GiB on the GPU node (VM.GPU.A10.1, 200 GB boot volume), 89 GiB on the CPU node (100 GB boot volume). The expansion took effect, no pod evictions for disk-pressure during the Helm deploy.

Two signed-off commits on this branch:

a7a1fab — OpenSSH 10.x workaround (Appendix A: jump-host VM alternative to OCI Bastion) + Step 7b (VM soft-reset so kubelet re-reads Node.Capacity.ephemeral-storage) + AD capacity-aware picker in Step 5 + nc -z cloud-init race guard in A.1.
8a22372 — Step 7b instance-OCID lookup via .spec.providerID. The earlier oci ce node-pool list query returned empty because list does not populate the nested nodes array; providerID is a reliable one-liner.

Ready for review.

fede-kamel · 2026-04-30T23:47:24Z

Hey @chrisalexiuk-nvidia @anushapant @shashank3959 @chtruong814 — quick update: the companion PR NVIDIA/NeMo-Agent-Toolkit#1804 ("Add OCI LangChain support for hosted Nemotron workflows") was merged on April 14 ✅

That PR's OCI integration was validated end-to-end against the exact private OKE deployment documented in this cookbook (11/11 tests pass on nvidia/Llama-3.1-Nemotron-Nano-8B-v1 running in us-phoenix-1), so the two pieces are tightly coupled: the Toolkit ships the provider, this PR ships the reproducible deployment that backs it.

On this PR's side everything is green — DCO passing, signed-off, end-to-end re-validated on a fresh cluster after the OpenSSH 10.x / Step 7b updates (8a22372). Would love to get this across the line so the OCI + Nemotron story can be announced as a single coordinated drop alongside the Toolkit release. What's needed from my end to land it?

Thanks!

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

Replace tag: "latest" with the specific vLLM release that was running when the cookbook was validated on the private OKE cluster (Jan 22 2026). This ensures reproducible deployments. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

Confirmed by exec'ing into the running Nemotron pod on the validated private OKE cluster — vllm.__version__ reports 0.17.1. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

Upgraded the nemotron-vllm-phx cluster from v0.17.1 to v0.19.0 and validated chat completion, health, and model listing through both the engine service and the router. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

Document that the cookbook was validated against vLLM v0.19.0. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

Rewrite the README as a complete step-by-step guide based on a clean end-to-end deployment validated on 2026-04-15 against OKE v1.31.10 in us-phoenix-1 with vllm-stack chart 0.1.10. Key fixes from live deployment validation: - Move chatTemplate from chart field to vllmConfig.extraArgs (chart 0.1.10 prepends /templates/ to the path, breaking absolute container paths) - Document vllm-templates-pvc prerequisite (new in chart 0.1.10, engine pod stays Pending without it) - Add boot volume filesystem expansion steps with iSCSI rescan for online-resized volumes - Document CPU node boot volume must be >= 100 GB (router image is ~10.5 GB, default 47 GB causes eviction) - Add CoreDNS and kube-dns-autoscaler GPU toleration patches - Add StorageClass creation step - Add Bastion tunnel setup with kubeconfig CA cert workaround - Add troubleshooting for all issues encountered during deployment Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

Switch from the OKE Terraform module to manual OCI CLI commands based on the Oracle vLLM Production Stack guide. The Terraform module's NSG configuration blocks OCI Bastion port-forwarding to the private control plane endpoint. The CLI approach creates a dedicated bastion subnet with proper security list rules, which provides reliable tunnel access to the private OKE cluster. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

- Add explicit wait gates between steps (cluster ACTIVE before node pools, nodes ACTIVE before connecting) - Use unique pod names per node for boot volume expansion - Add NVIDIA device plugin note (pre-installed on enhanced OKE) - Add reference to checked-in values file - Add cleanup section with teardown commands - Add note about Terraform directory as alternative reference - Clarify AD selection for GPU capacity Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

- Add ASCII architecture diagram showing VCN layout - Change default profile to DEFAULT (not user-specific) - Add image ID verification step with fallback query - Explain why bastion subnet is public - Improve cleanup section with pool listing command and subnet deletion loop - Add commented fallback for manual image selection Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

- Add Appendix A: jump-host VM alternative to OCI Bastion. OpenSSH 10.x clients (macOS 15+, recent Linux) cannot sustain OCI Bastion port-forwarding sessions; the server closes the connection immediately after publickey auth. Appendix A replaces Step 4 and the bastion-session block inside Step 6 with a public-IP jump-host in the bastion subnet and a plain SSH -L tunnel. - Add Step 7b: soft-reset each node after filesystem expansion so kubelet re-reads ephemeral-storage capacity. The in-place systemctl restart kubelet in Step 7 never worked (SIGKILL from containerd bounce; kubelet also caches capacity at startup), causing the engine pod to be evicted mid image pull with disk-pressure despite df showing the expanded filesystem. Step 7b includes a capacity verification loop that gates on the expected Ki values. - Replace Step 5's hardcoded data[1] AD picker with a capacity-aware loop that queries gpu-a10-count availability per AD and selects the first AD with non-zero capacity. - Appendix A.1 waits for port 22 with nc -z before the first SSH, to avoid the cloud-init race between VM RUNNING and authorized_keys being installed. - New troubleshooting entries for stale kubelet capacity and SSH tunnel closure on OpenSSH 10.x. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

The previous snippet used `oci ce node-pool list` with a query over the nested `nodes[]` array, but the `list` API does not populate `nodes` — it always returns `null` — so the lookup produced an empty `INSTANCE_ID` and the subsequent `compute instance action` looped with `Parameter --instance-id cannot be whitespace or empty string`. OKE sets each node's `.spec.providerID` to `oci://<instance-ocid>`, so `kubectl get node <ip> -o jsonpath='{.spec.providerID}'` gives a reliable one-line lookup. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

fede-kamel · 2026-05-13T01:31:13Z

Hey @chrisalexiuk-nvidia — friendly nudge on this one. Last word from you was an LGTM on Apr 9; since then the cookbook's been re-validated end-to-end on a fresh OKE cluster, the branch is rebased clean on main (13 ahead, 0 behind), DCO is green, and the companion NeMo-Agent-Toolkit PR (#1804) merged on Apr 14.

Anything blocking on your side, or is it just an Approve click away? Happy to address anything else if needed.

fede-kamel mentioned this pull request Mar 27, 2026

Add OCI LangChain support for hosted Nemotron workflows NVIDIA/NeMo-Agent-Toolkit#1804

Merged

claude Bot reviewed Apr 9, 2026

View reviewed changes

fede-kamel force-pushed the fk/oci-phoenix-private-nemotron branch 2 times, most recently from d876121 to 7c59d19 Compare April 15, 2026 12:44

fede-kamel force-pushed the fk/oci-phoenix-private-nemotron branch from 88ee6ab to 8a22372 Compare April 23, 2026 18:46

fede-kamel added 12 commits May 6, 2026 11:56

Add private OCI OKE cookbook for Llama Nemotron Nano 8B

ce02c86

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

Add private OCI OKE Terraform sample for Nemotron

704ebfe

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

Refine OCI positioning in Nemotron docs

69dfc5b

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

Update pinned vLLM tag to v0.17.1

d0aa126

Confirmed by exec'ing into the running Nemotron pod on the validated private OKE cluster — vllm.__version__ reports 0.17.1. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

Add vLLM version to tested environment section

49848f2

Document that the cookbook was validated against vLLM v0.19.0. Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>

fede-kamel force-pushed the fk/oci-phoenix-private-nemotron branch from 4798f74 to 69bbedd Compare May 6, 2026 16:11

Conversation

fede-kamel commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Notes

Uh oh!

fede-kamel commented Mar 24, 2026

Uh oh!

fede-kamel commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fede-kamel commented Mar 27, 2026

✅ Cross-validated with NeMo Agent Toolkit OCI integration

Uh oh!

fede-kamel commented Mar 27, 2026

Uh oh!

fede-kamel commented Apr 7, 2026

Uh oh!

chrisalexiuk-nvidia commented Apr 9, 2026

Uh oh!

fede-kamel commented Apr 9, 2026

Uh oh!

chrisalexiuk-nvidia commented Apr 9, 2026

Uh oh!

chrisalexiuk-nvidia commented Apr 9, 2026

Uh oh!

claude Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

fede-kamel Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

fede-kamel Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

fede-kamel Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

fede-kamel commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fede-kamel commented Apr 23, 2026

Uh oh!

fede-kamel commented Apr 23, 2026

Uh oh!

fede-kamel commented Apr 30, 2026

Uh oh!

fede-kamel commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fede-kamel commented Mar 16, 2026 •

edited

Loading

fede-kamel commented Mar 27, 2026 •

edited

Loading

fede-kamel commented Apr 15, 2026 •

edited

Loading