Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,7 @@ Practical deployment and model usage guides for Nemotron models.
|-------|----------|--------------|-----------|
| [**Nemotron 3 Super 120B A12B**](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16) | Production deployments needing strong reasoning | 1M context, in NVFP4 single B200, RAG & tool calling | [Cookbooks](./usage-cookbook/Nemotron-3-Super) |
| [**Nemotron 3 Nano 30B A3B**](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16) | Resource-constrained environments | 1M context, sparse MoE hybrid Mamba-2, controllable reasoning | [Cookbooks](./usage-cookbook/Nemotron-3-Nano) |
| [**Llama-3.1-Nemotron-Nano-8B-v1**](https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1) | Small-footprint OCI deployments | Validated on private OKE in Phoenix with `vLLM`, OCI Bastion service, tool calling, and OpenAI-compatible `/v1` inference; provides a reproducible OCI path comparable to common AWS GPU/Kubernetes deployment patterns | [Cookbooks](./usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1) |
| [**NVIDIA-Nemotron-Nano-12B-v2-VL**](https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL) | Document intelligence and video understanding | 12B VLM, video reasoning, Efficient Video Sampling | [Cookbooks](./usage-cookbook/Nemotron-Nano2-VL/) |
| [**Llama-3.1-Nemotron-Safety-Guard-8B-v3**](https://huggingface.co/nvidia/Llama-3.1-Nemotron-Safety-Guard-8B-v3) | Multilingual content moderation | 9 languages, 23 safety categories | [Cookbooks](./usage-cookbook/Llama-3.1-Nemotron-Safety-Guard-V3/) |
| **Nemotron-Parse** | Document parsing for RAG and AI agents | Table extraction, semantic segmentation | [Cookbooks](./usage-cookbook/Nemotron-Parse-v1.1/) |
Expand Down
874 changes: 874 additions & 0 deletions usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/README.md

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.terraform/
terraform.tfvars
terraform.tfstate
terraform.tfstate.*
tfplan

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

97 changes: 97 additions & 0 deletions usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Terraform: Private OCI OKE for Llama-3.1-Nemotron-Nano-8B-v1

This Terraform example provisions the **private-only** OCI infrastructure for
the validated Phoenix deployment described in the parent cookbook.

It is intended to give Nemotron users a reproducible OCI path for NVIDIA model
serving that highlights Oracle Cloud's operational strengths: private OKE,
managed Bastion access, and a clean infrastructure-as-code path for GPU-backed
Nemotron deployments.

It creates:

- a VCN
- a **private** OKE cluster
- a private CPU node pool
- a private GPU node pool targeting `VM.GPU.A10.1`
- an **OCI Bastion service** resource for private access

It does **not** create:

- a public Kubernetes API endpoint
- public worker-node IPs
- a public bastion host
- a public inference endpoint

## Bastion note

This sample provisions the **OCI Bastion service** so that private-cluster
access is reproducible from Terraform.

That is intentionally different from creating a public bastion VM:

- no public bastion compute instance is created
- no worker node receives a public IP
- the Kubernetes API remains private

If your environment already manages private-cluster access through a separate
operator workflow, you can remove the `oci_bastion_bastion` resource and keep
the rest of the sample unchanged.

## Module choice

This wrapper intentionally uses Oracle's official OKE Terraform module:

- `oracle-terraform-modules/oke/oci`

The Nemotron-specific layer in this directory adds:

- the Phoenix defaults
- the no-public-IP constraints
- the A10-focused worker pool defaults
- the OCI Bastion service resource required for private access

## Files

- [`main.tf`](./main.tf) - private OKE cluster, worker pools, OCI Bastion
- [`variables.tf`](./variables.tf) - deployment inputs
- [`outputs.tf`](./outputs.tf) - useful IDs and private endpoint information
- [`terraform.tfvars.example`](./terraform.tfvars.example) - starting point

## Usage

```bash
cp terraform.tfvars.example terraform.tfvars
terraform init
terraform plan
terraform apply
```

The validated live run completed successfully in `us-phoenix-1`, including:

- private OKE cluster creation
- OCI Bastion service creation
- CPU node pool creation
- GPU node pool creation on `VM.GPU.A10.1` in `PHX-AD-2`

After the infrastructure is ready:

1. create an OCI Bastion session to reach the private cluster
2. deploy the model with:
- [`../vllm_oke_phoenix_private_values.yaml`](../vllm_oke_phoenix_private_values.yaml)
3. validate:
- `/health`
- `/v1/models`
- chat completion
- tool calling
- streaming

## Notes

- The validated live deployment used `us-phoenix-1`.
- The validated GPU pool used Phoenix `AD-2`, exposed as `gpu_placement_ads`.
- The Bastion resource here is the OCI managed Bastion service, not a public
bastion VM.
- `ssh_public_key_path` must point to an actual OpenSSH public key file; the
wrapper reads the file contents with Terraform's `file()` function before
passing it to OKE.
112 changes: 112 additions & 0 deletions usage-cookbook/Llama-3.1-Nemotron-Nano-8B-v1/terraform/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
provider "oci" {
config_file_profile = var.config_file_profile
tenancy_ocid = var.tenancy_ocid
region = var.region
}

locals {
common_tags = merge(var.freeform_tags, {
model = "nvidia/Llama-3.1-Nemotron-Nano-8B-v1"
deployment = "private-oke"
region = var.region
})
}

module "oke" {
source = "oracle-terraform-modules/oke/oci"
version = "5.4.1"

providers = {
oci.home = oci
}

tenancy_id = var.tenancy_ocid
compartment_id = var.compartment_ocid
region = var.region

cluster_name = var.cluster_name
kubernetes_version = var.kubernetes_version
cluster_type = "enhanced"
cni_type = "flannel"
pods_cidr = var.pods_cidr
services_cidr = var.services_cidr
vcn_cidrs = var.vcn_cidrs
ssh_public_key = file(var.ssh_public_key_path)
output_detail = true
create_vcn = true
create_bastion = false
create_operator = false
control_plane_is_public = false
assign_public_ip_to_control_plane = false
worker_is_public = false
allow_worker_internet_access = true
allow_pod_internet_access = true
allow_worker_ssh_access = false
preferred_load_balancer = "internal"
load_balancers = "internal"
freeform_tags = { all = local.common_tags }

subnets = {
cp = {
create = "always"
newbits = 13
netnum = 2
}
workers = {
create = "always"
newbits = 2
netnum = 1
}
pods = {
create = "always"
newbits = 2
netnum = 2
}
int_lb = {
create = "always"
newbits = 11
netnum = 16
}
pub_lb = {
create = "never"
}
bastion = {
create = "never"
}
operator = {
create = "never"
}
}

worker_pool_mode = "node-pool"
worker_pool_size = 1
worker_pools = {
cpu = {
size = var.cpu_pool_size
shape = var.cpu_shape
ocpus = var.cpu_ocpus
memory = var.cpu_memory_gbs
boot_volume_size = 100
assign_public_ip = false
create = true
}
gpu = {
size = var.gpu_pool_size
shape = var.gpu_shape
boot_volume_size = var.gpu_boot_volume_size
assign_public_ip = false
create = true
placement_ads = var.gpu_placement_ads
}
}
}

resource "oci_bastion_bastion" "oci_bastion" {
compartment_id = var.compartment_ocid
bastion_type = "STANDARD"
target_subnet_id = module.oke.worker_subnet_id
client_cidr_block_allow_list = var.bastion_client_cidrs
max_session_ttl_in_seconds = 10800
name = "${var.cluster_name}-bastion"
freeform_tags = local.common_tags
}
Loading