Skip to content

Commit dfd9eb6

Browse files
committed
Add Karpenter integration docs
Signed-off-by: carlory <baofa.fan@daocloud.io>
1 parent 9a3167e commit dfd9eb6

5 files changed

Lines changed: 257 additions & 5 deletions

File tree

site/content/en/docs/getting-started/installation.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,17 @@ description: >
1313
helm install llmaz oci://registry-1.docker.io/inftyai/llmaz --namespace llmaz-system --create-namespace --version 0.0.9
1414
```
1515

16+
To enable InftyAI scheduler, please apply the following value file to the above command:
17+
18+
```yaml
19+
kube-scheduler:
20+
enabled: true
21+
22+
globalConfig:
23+
configData: |-
24+
scheduler-name: inftyai-scheduler
25+
```
26+
1627
### Uninstall
1728
1829
```cmd
@@ -38,7 +49,6 @@ If you want to change the default configurations, please change the values in [v
3849

3950
**Do not change** the values in _values.yaml_ because it's auto-generated and will be overwritten.
4051

41-
4252
### Install
4353

4454
```cmd
@@ -70,4 +80,4 @@ Once you changed your code, run the command to upgrade the controller:
7080

7181
```cmd
7282
IMG=<image-registry>:<tag> make helm-upgrade
73-
```
83+
```
Lines changed: 242 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,242 @@
1+
---
2+
title: Karpenter
3+
weight: 2
4+
---
5+
6+
[Karpenter](https://github.com/kubernetes-sigs/karpenter) automatically launches just the right compute resources to handle your cluster's applications, but it is built to adhere to the scheduling decisions of kube-scheduler, so it's certainly possible we would run across some cases where Karpenter makes incorrect decisions when the InftyAI scheduler is in the mix.
7+
8+
We forked the Karpenter project and re-complie the karpenter image for cloud providers like AWS, and you can find the details in [this proposal](https://github.com/InftyAI/llmaz/blob/main/docs/proposals/106-spot-instance-karpenter/README.md). This document provides deployment steps to install and configure Customized Karpenter in an EKS cluster.
9+
10+
## How to use
11+
12+
Please run the following command in the same terminal.
13+
14+
### Create a cluster and add Karpenter
15+
16+
Please refer to the [Getting Started with Karpenter](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html) to create a cluster and add Karpenter.
17+
18+
### Install the gpu operator
19+
20+
```shell
21+
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
22+
&& helm repo update
23+
helm install --wait --generate-name \
24+
-n gpu-operator --create-namespace \
25+
nvidia/gpu-operator \
26+
--version=v25.3.0
27+
```
28+
29+
### Install llmaz with InftyAI scheduler enabled
30+
31+
Please refer to [installation](../getting-started/installation.md).
32+
33+
### Configure Karpenter with customized image
34+
35+
We need to assign the `karpenter-core-llmaz` cluster role to the `karpenter` service account and update the karpenter image to the customized one.
36+
37+
```shell
38+
cat <<EOF | envsubst | kubectl apply -f -
39+
apiVersion: rbac.authorization.k8s.io/v1
40+
kind: ClusterRoleBinding
41+
metadata:
42+
name: karpenter-core-llmaz
43+
roleRef:
44+
apiGroup: rbac.authorization.k8s.io
45+
kind: ClusterRole
46+
name: karpenter-core-llmaz
47+
subjects:
48+
- kind: ServiceAccount
49+
name: karpenter
50+
namespace: ${KARPENTER_NAMESPACE}
51+
---
52+
apiVersion: rbac.authorization.k8s.io/v1
53+
kind: ClusterRole
54+
metadata:
55+
name: karpenter-core-llmaz
56+
rules:
57+
- apiGroups: ["llmaz.io"]
58+
resources: ["openmodels"]
59+
verbs: ["get", "list", "watch"]
60+
EOF
61+
62+
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter --version "${KARPENTER_VERSION}" --namespace "${KARPENTER_NAMESPACE}" --create-namespace \
63+
--set "settings.clusterName=${CLUSTER_NAME}" \
64+
--set "settings.interruptionQueue=${CLUSTER_NAME}" \
65+
--set controller.resources.requests.cpu=1 \
66+
--set controller.resources.requests.memory=1Gi \
67+
--set controller.resources.limits.cpu=1 \
68+
--set controller.resources.limits.memory=1Gi \
69+
--wait \
70+
--set controller.image.repository=inftyai/aws-karpenter \
71+
--set "controller.image.tag=${KARPENTER_VERSION}" \
72+
--set controller.image.digest=""
73+
```
74+
75+
## Basic Example
76+
77+
1. Create a gpu node pool
78+
79+
```yaml
80+
cat <<EOF | envsubst | kubectl apply -f -
81+
apiVersion: karpenter.k8s.aws/v1
82+
kind: EC2NodeClass
83+
metadata:
84+
name: llmaz-demo # you can change the name to a more meaningful one, please align with the node pool's nodeClassRef.
85+
spec:
86+
amiSelectorTerms:
87+
- alias: al2023@${ALIAS_VERSION}
88+
blockDeviceMappings:
89+
# the default volume size of the selected AMI is 20Gi, it is not enough for kubelet to pull
90+
# the images and run the workloads. So we need to map a larger volume to the root device.
91+
# You can change the volume size to a larger value according to your actual needs.
92+
- deviceName: /dev/xvda
93+
ebs:
94+
deleteOnTermination: true
95+
volumeSize: 50Gi
96+
volumeType: gp3
97+
role: KarpenterNodeRole-${CLUSTER_NAME} # replace with your cluster name
98+
securityGroupSelectorTerms:
99+
- tags:
100+
karpenter.sh/discovery: ${CLUSTER_NAME} # replace with your cluster name
101+
subnetSelectorTerms:
102+
- tags:
103+
karpenter.sh/discovery: ${CLUSTER_NAME} # replace with your cluster name
104+
---
105+
apiVersion: karpenter.sh/v1
106+
kind: NodePool
107+
metadata:
108+
name: llmaz-demo-gpu-nodepool # you can change the name to a more meaningful one.
109+
spec:
110+
disruption:
111+
budgets:
112+
- nodes: 10%
113+
consolidateAfter: 5m
114+
consolidationPolicy: WhenEmptyOrUnderutilized
115+
limits: # You can change the limits to match your actual needs.
116+
cpu: 1000
117+
template:
118+
spec:
119+
expireAfter: 720h
120+
nodeClassRef:
121+
group: karpenter.k8s.aws
122+
kind: EC2NodeClass
123+
name: llmaz-demo
124+
requirements:
125+
- key: kubernetes.io/arch
126+
operator: In
127+
values:
128+
- amd64
129+
- key: kubernetes.io/os
130+
operator: In
131+
values:
132+
- linux
133+
- key: karpenter.sh/capacity-type
134+
operator: In
135+
values:
136+
- spot
137+
- key: karpenter.k8s.aws/instance-family
138+
operator: In
139+
values: # replace with your instance-family with gpu supported
140+
- g4dn
141+
- g5g
142+
taints:
143+
- effect: NoSchedule
144+
key: nvidia.com/gpu
145+
value: "true"
146+
```
147+
148+
2. Deploy a model with flavors
149+
150+
```shell
151+
cat <<EOF | kubectl apply -f -
152+
apiVersion: llmaz.io/v1alpha1
153+
kind: OpenModel
154+
metadata:
155+
name: qwen2-0--5b
156+
spec:
157+
familyName: qwen2
158+
source:
159+
modelHub:
160+
modelID: Qwen/Qwen2-0.5B-Instruct
161+
inferenceConfig:
162+
flavors:
163+
# The g5g instance family in the aws cloud can provide the t4g GPU type.
164+
# we define the instance family in the node pool like llmaz-demo-gpu-nodepool.
165+
- name: t4g
166+
limits:
167+
nvidia.com/gpu: 1
168+
# The flavorName is not recongnized by the Karpenter, so we need to specify the
169+
# instance-gpu-name via nodeSelector to match the t4g GPU type when node is provisioned
170+
# by Karpenter from multiple node pools.
171+
#
172+
# When you only have a single node pool to provision the GPU instance and the node pool
173+
# only has one GPU type, it is okay to not specify the nodeSelector. But in practice,
174+
# it is better to specify the nodeSelector to make the provisioned node more predictable.
175+
#
176+
# The available node labels for selecting the target GPU device is listed below:
177+
# karpenter.k8s.aws/instance-gpu-count
178+
# karpenter.k8s.aws/instance-gpu-manufacturer
179+
# karpenter.k8s.aws/instance-gpu-memory
180+
# karpenter.k8s.aws/instance-gpu-name
181+
nodeSelector:
182+
karpenter.k8s.aws/instance-gpu-name: t4g
183+
# The g4dn instance family in the aws cloud can provide the t4 GPU type.
184+
# we define the instance family in the node pool like llmaz-demo-gpu-nodepool.
185+
- name: t4
186+
limits:
187+
nvidia.com/gpu: 1
188+
# The flavorName is not recongnized by the Karpenter, so we need to specify the
189+
# instance-gpu-name via nodeSelector to match the t4 GPU type when node is provisioned
190+
# by Karpenter from multiple node pools.
191+
#
192+
# When you only have a single node pool to provision the GPU instance and the node pool
193+
# only has one GPU type, it is okay to not specify the nodeSelector. But in practice,
194+
# it is better to specify the nodeSelector to make the provisioned node more predictable.
195+
#
196+
# The available node labels for selecting the target GPU device is listed below:
197+
# karpenter.k8s.aws/instance-gpu-count
198+
# karpenter.k8s.aws/instance-gpu-manufacturer
199+
# karpenter.k8s.aws/instance-gpu-memory
200+
# karpenter.k8s.aws/instance-gpu-name
201+
nodeSelector:
202+
karpenter.k8s.aws/instance-gpu-name: t4
203+
---
204+
# Currently, the Playground resource type does not support to configure tolerations
205+
# for the generated pods. But luckily, when a pod with the `nvidia.com/gpu` resource
206+
# is created on the eks cluster, the generated pod will be tweaked with the following
207+
# tolerations:
208+
# - effect: NoExecute
209+
# key: node.kubernetes.io/not-ready
210+
# operator: Exists
211+
# tolerationSeconds: 300
212+
# - effect: NoExecute
213+
# key: node.kubernetes.io/unreachable
214+
# operator: Exists
215+
# tolerationSeconds: 300
216+
# - effect: NoSchedule
217+
# key: nvidia.com/gpu
218+
# operator: Exists
219+
apiVersion: inference.llmaz.io/v1alpha1
220+
kind: Playground
221+
metadata:
222+
labels:
223+
llmaz.io/model-name: qwen2-0--5b
224+
name: qwen2-0--5b
225+
spec:
226+
backendRuntimeConfig:
227+
backendName: tgi
228+
# Due to the limitation of our aws account, we have to decrease the resources to match
229+
# the avaliable instance type which is g4dn.xlarge. If your account has no such limitation,
230+
# you can remove the custom resources settings below.
231+
resources:
232+
limits:
233+
cpu: "2"
234+
memory: 4Gi
235+
requests:
236+
cpu: "2"
237+
memory: 4Gi
238+
modelClaim:
239+
modelName: qwen2-0--5b
240+
replicas: 1
241+
EOF
242+
```

site/content/en/docs/integrations/open-webui.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Open-WebUI
3-
weight: 2
3+
weight: 3
44
---
55

66
[Open WebUI](https://github.com/open-webui/open-webui) is a user-friendly AI interface with OpenAI-compatible APIs, serving as the default chatbot for llmaz.

site/content/en/docs/integrations/prometheus-operator.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Prometheus Operator
3-
weight: 3
3+
weight: 4
44
---
55

66
This document provides deployment steps to install and configure Prometheus Operator in a Kubernetes cluster.

site/content/en/docs/integrations/support-backends.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Supported Inference Backends
3-
weight: 4
3+
weight: 5
44
---
55

66
If you want to integrate more backends into llmaz, please refer to this [PR](https://github.com/InftyAI/llmaz/pull/182). It's always welcomed.

0 commit comments

Comments
 (0)