Add Karpenter integration docs

carlory · carlory · commit dfd9eb6804b2 · 2025-06-10T19:06:35.000+08:00
Signed-off-by: carlory &lt;baofa.fan@daocloud.io&gt;
diff --git a/site/content/en/docs/getting-started/installation.md b/site/content/en/docs/getting-started/installation.md
@@ -13,6 +13,17 @@ description: >
 helm install llmaz oci://registry-1.docker.io/inftyai/llmaz --namespace llmaz-system --create-namespace --version 0.0.9
 ```
 
+To enable InftyAI scheduler, please apply the following value file to the above command:
+
+```yaml
+kube-scheduler:
+  enabled: true
+
+globalConfig:
+  configData: |-
+    scheduler-name: inftyai-scheduler
+```
+
 ### Uninstall
 
 ```cmd
@@ -38,7 +49,6 @@ If you want to change the default configurations, please change the values in [v
 
 **Do not change** the values in _values.yaml_ because it's auto-generated and will be overwritten.
 
-
 ### Install
 
 ```cmd
@@ -70,4 +80,4 @@ Once you changed your code, run the command to upgrade the controller:
 
 ```cmd
 IMG=<image-registry>:<tag> make helm-upgrade
-```
+```
diff --git a/site/content/en/docs/integrations/karpenter.md b/site/content/en/docs/integrations/karpenter.md
@@ -0,0 +1,242 @@
+---
+title: Karpenter
+weight: 2
+---
+
+[Karpenter](https://github.com/kubernetes-sigs/karpenter) automatically launches just the right compute resources to handle your cluster's applications, but it is built to adhere to the scheduling decisions of kube-scheduler, so it's certainly possible we would run across some cases where Karpenter makes incorrect decisions when the InftyAI scheduler is in the mix. 
+
+We forked the Karpenter project and re-complie the karpenter image for cloud providers like AWS, and you can find the details in [this proposal](https://github.com/InftyAI/llmaz/blob/main/docs/proposals/106-spot-instance-karpenter/README.md). This document provides deployment steps to install and configure Customized Karpenter in an EKS cluster.
+
+## How to use
+
+Please run the following command in the same terminal.
+
+### Create a cluster and add Karpenter
+
+Please refer to the [Getting Started with Karpenter](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html) to create a cluster and add Karpenter.
+
+### Install the gpu operator
+
+```shell
+helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
+    && helm repo update
+helm install --wait --generate-name \
+    -n gpu-operator --create-namespace \
+    nvidia/gpu-operator \
+    --version=v25.3.0
+```
+
+### Install llmaz with InftyAI scheduler enabled
+
+Please refer to [installation](../getting-started/installation.md).
+
+### Configure Karpenter with customized image
+
+We need to assign the `karpenter-core-llmaz` cluster role to the `karpenter` service account and update the karpenter image to the customized one.
+
+```shell
+cat <<EOF | envsubst | kubectl apply -f -
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+metadata:
+  name: karpenter-core-llmaz
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: ClusterRole
+  name: karpenter-core-llmaz
+subjects:
+- kind: ServiceAccount
+  name: karpenter
+  namespace: ${KARPENTER_NAMESPACE}
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+  name: karpenter-core-llmaz
+rules:
+- apiGroups: ["llmaz.io"]
+  resources: ["openmodels"]
+  verbs: ["get", "list", "watch"]
+EOF
+
+helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter --version "${KARPENTER_VERSION}" --namespace "${KARPENTER_NAMESPACE}" --create-namespace \
+  --set "settings.clusterName=${CLUSTER_NAME}" \
+  --set "settings.interruptionQueue=${CLUSTER_NAME}" \
+  --set controller.resources.requests.cpu=1 \
+  --set controller.resources.requests.memory=1Gi \
+  --set controller.resources.limits.cpu=1 \
+  --set controller.resources.limits.memory=1Gi \
+  --wait \
+  --set controller.image.repository=inftyai/aws-karpenter \
+  --set "controller.image.tag=${KARPENTER_VERSION}" \
+  --set controller.image.digest=""
+```
+
+## Basic Example
+
+1. Create a gpu node pool
+
+```yaml
+cat <<EOF | envsubst | kubectl apply -f -
+apiVersion: karpenter.k8s.aws/v1
+kind: EC2NodeClass
+metadata:
+  name: llmaz-demo            # you can change the name to a more meaningful one, please align with the node pool's nodeClassRef.
+spec:
+  amiSelectorTerms:
+  - alias: al2023@${ALIAS_VERSION}
+  blockDeviceMappings:
+  # the default volume size of the selected AMI is 20Gi, it is not enough for kubelet to pull
+  # the images and run the workloads. So we need to map a larger volume to the root device. 
+  # You can change the volume size to a larger value according to your actual needs.
+  - deviceName: /dev/xvda
+    ebs:
+      deleteOnTermination: true
+      volumeSize: 50Gi     
+      volumeType: gp3
+  role: KarpenterNodeRole-${CLUSTER_NAME}          # replace with your cluster name
+  securityGroupSelectorTerms:
+  - tags:
+      karpenter.sh/discovery: ${CLUSTER_NAME}      # replace with your cluster name
+  subnetSelectorTerms:
+  - tags:
+      karpenter.sh/discovery: ${CLUSTER_NAME}      # replace with your cluster name
+---
+apiVersion: karpenter.sh/v1
+kind: NodePool
+metadata:
+  name: llmaz-demo-gpu-nodepool  # you can change the name to a more meaningful one. 
+spec:
+  disruption:
+    budgets:
+    - nodes: 10%
+    consolidateAfter: 5m        
+    consolidationPolicy: WhenEmptyOrUnderutilized
+  limits:  # You can change the limits to match your actual needs.
+    cpu: 1000
+  template:
+    spec:
+      expireAfter: 720h
+      nodeClassRef:
+        group: karpenter.k8s.aws
+        kind: EC2NodeClass
+        name: llmaz-demo
+      requirements:
+      - key: kubernetes.io/arch
+        operator: In
+        values:
+        - amd64
+      - key: kubernetes.io/os
+        operator: In
+        values:
+        - linux
+      - key: karpenter.sh/capacity-type
+        operator: In
+        values:
+        - spot
+      - key: karpenter.k8s.aws/instance-family
+        operator: In
+        values:                                # replace with your instance-family with gpu supported
+        - g4dn
+        - g5g
+      taints:
+      - effect: NoSchedule
+        key: nvidia.com/gpu
+        value: "true"
+```
+
+2. Deploy a model with flavors
+
+```shell
+cat <<EOF | kubectl apply -f -
+apiVersion: llmaz.io/v1alpha1
+kind: OpenModel
+metadata:
+  name: qwen2-0--5b
+spec:
+  familyName: qwen2
+  source:
+    modelHub:
+      modelID: Qwen/Qwen2-0.5B-Instruct
+  inferenceConfig:
+    flavors:
+      # The g5g instance family in the aws cloud can provide the t4g GPU type.
+      # we define the instance family in the node pool like llmaz-demo-gpu-nodepool.
+      - name: t4g
+        limits:
+          nvidia.com/gpu: 1
+        # The flavorName is not recongnized by the Karpenter, so we need to specify the
+        # instance-gpu-name via nodeSelector to match the t4g GPU type when node is provisioned
+        # by Karpenter from multiple node pools.
+        #
+        # When you only have a single node pool to provision the GPU instance and the node pool
+        # only has one GPU type, it is okay to not specify the nodeSelector. But in practice,
+        # it is better to specify the nodeSelector to make the provisioned node more predictable.
+        #
+        # The available node labels for selecting the target GPU device is listed below:
+        # karpenter.k8s.aws/instance-gpu-count
+        # karpenter.k8s.aws/instance-gpu-manufacturer
+        # karpenter.k8s.aws/instance-gpu-memory
+        # karpenter.k8s.aws/instance-gpu-name
+        nodeSelector:
+          karpenter.k8s.aws/instance-gpu-name: t4g
+      # The g4dn instance family in the aws cloud can provide the t4 GPU type.
+      # we define the instance family in the node pool like llmaz-demo-gpu-nodepool.
+      - name: t4
+        limits:
+          nvidia.com/gpu: 1
+        # The flavorName is not recongnized by the Karpenter, so we need to specify the
+        # instance-gpu-name via nodeSelector to match the t4 GPU type when node is provisioned
+        # by Karpenter from multiple node pools.
+        #
+        # When you only have a single node pool to provision the GPU instance and the node pool
+        # only has one GPU type, it is okay to not specify the nodeSelector. But in practice,
+        # it is better to specify the nodeSelector to make the provisioned node more predictable.
+        #
+        # The available node labels for selecting the target GPU device is listed below:
+        # karpenter.k8s.aws/instance-gpu-count
+        # karpenter.k8s.aws/instance-gpu-manufacturer
+        # karpenter.k8s.aws/instance-gpu-memory
+        # karpenter.k8s.aws/instance-gpu-name
+        nodeSelector:
+          karpenter.k8s.aws/instance-gpu-name: t4
+---
+# Currently, the Playground resource type does not support to configure tolerations
+# for the generated pods. But luckily, when a pod with the `nvidia.com/gpu` resource  
+# is created on the eks cluster, the generated pod will be tweaked with the following
+# tolerations:
+#   - effect: NoExecute
+#      key: node.kubernetes.io/not-ready
+#      operator: Exists
+#      tolerationSeconds: 300
+#   - effect: NoExecute
+#     key: node.kubernetes.io/unreachable
+#     operator: Exists
+#     tolerationSeconds: 300
+#   - effect: NoSchedule
+#     key: nvidia.com/gpu
+#     operator: Exists
+apiVersion: inference.llmaz.io/v1alpha1
+kind: Playground
+metadata:
+  labels:
+    llmaz.io/model-name: qwen2-0--5b
+  name: qwen2-0--5b
+spec:
+  backendRuntimeConfig:
+    backendName: tgi
+    # Due to the limitation of our aws account, we have to decrease the resources to match
+    # the avaliable instance type which is g4dn.xlarge. If your account has no such limitation,
+    # you can remove the custom resources settings below.
+    resources:
+      limits:
+        cpu: "2"
+        memory: 4Gi
+      requests:
+        cpu: "2"
+        memory: 4Gi
+  modelClaim:
+    modelName: qwen2-0--5b
+  replicas: 1
+EOF
+```
diff --git a/site/content/en/docs/integrations/open-webui.md b/site/content/en/docs/integrations/open-webui.md
@@ -1,6 +1,6 @@
 ---
 title: Open-WebUI
-weight: 2
+weight: 3
 ---
 
 [Open WebUI](https://github.com/open-webui/open-webui) is a user-friendly AI interface with OpenAI-compatible APIs, serving as the default chatbot for llmaz.
diff --git a/site/content/en/docs/integrations/prometheus-operator.md b/site/content/en/docs/integrations/prometheus-operator.md
@@ -1,6 +1,6 @@
 ---
 title: Prometheus Operator
-weight: 3
+weight: 4
 ---
 
 This document provides deployment steps to install and configure Prometheus Operator in a Kubernetes cluster.
diff --git a/site/content/en/docs/integrations/support-backends.md b/site/content/en/docs/integrations/support-backends.md
@@ -1,6 +1,6 @@
 ---
 title: Supported Inference Backends
-weight: 4
+weight: 5
 ---
 
 If you want to integrate more backends into llmaz, please refer to this [PR](https://github.com/InftyAI/llmaz/pull/182). It's always welcomed.