Skip to content

Commit 70ad650

Browse files
authored
Adapt to latest changes in llm microservice famliy (#696)
Signed-off-by: Lianhao Lu <lianhao.lu@intel.com>
1 parent be4e21a commit 70ad650

12 files changed

Lines changed: 222 additions & 97 deletions

helm-charts/common/llm-uservice/.helmignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,3 +21,5 @@
2121
.idea/
2222
*.tmproj
2323
.vscode/
24+
# CI values
25+
ci*-values.yaml
Lines changed: 62 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,55 +1,90 @@
11
# llm-uservice
22

3-
Helm chart for deploying LLM microservice.
3+
Helm chart for deploying OPEA LLM microservices.
44

5-
llm-uservice depends on TGI, you should set TGI_LLM_ENDPOINT as tgi endpoint.
5+
## Installing the chart
66

7-
## (Option1): Installing the chart separately
7+
`llm-uservice` depends on one of the following inference backend services:
88

9-
First, you need to install the tgi chart, please refer to the [tgi](../tgi) chart for more information.
9+
- TGI: please refer to [tgi](../tgi) chart for more information
1010

11-
After you've deployted the tgi chart successfully, please run `kubectl get svc` to get the tgi service endpoint, i.e. `http://tgi`.
11+
- vLLM: please refer to [vllm](../vllm) chart for more information
1212

13-
To install the chart, run the following:
13+
First, you need to install one of the dependent chart, i.e. `tgi` or `vllm` helm chart.
1414

15-
```console
16-
cd GenAIInfra/helm-charts/common/llm-uservice
17-
export HFTOKEN="insert-your-huggingface-token-here"
18-
export TGI_LLM_ENDPOINT="http://tgi"
19-
helm dependency update
20-
helm install llm-uservice . --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set TGI_LLM_ENDPOINT=${TGI_LLM_ENDPOINT} --wait
21-
```
15+
After you've deployed the dependent chart successfully, please run `kubectl get svc` to get the backend inference service endpoint, e.g. `http://tgi`, `http://vllm`.
2216

23-
## (Option2): Installing the chart with dependencies automatically
17+
To install the `llm-uservice` chart, run the following:
2418

2519
```console
2620
cd GenAIInfra/helm-charts/common/llm-uservice
27-
export HFTOKEN="insert-your-huggingface-token-here"
2821
helm dependency update
29-
helm install llm-uservice . --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set tgi.enabled=true --wait
22+
export HFTOKEN="insert-your-huggingface-token-here"
23+
# set backend inferene service endpoint URL
24+
# for tgi
25+
export LLM_ENDPOINT="http://tgi"
26+
# for vllm
27+
# export LLM_ENDPOINT="http://vllm"
28+
29+
# set the same model used by the backend inference service
30+
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
31+
32+
# install llm-textgen with TGI backend
33+
helm install llm-uservice . --set TEXTGEN_BACKEND="TGI" --set LLM_ENDPOINT=${LLM_ENDPOINT} --set LLM_MODEL_ID=${LLM_MODEL_ID} --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --wait
34+
35+
# install llm-textgen with vLLM backend
36+
# helm install llm-uservice . --set TEXTGEN_BACKEND="vLLM" --set LLM_ENDPOINT=${LLM_ENDPOINT} --set LLM_MODEL_ID=${LLM_MODEL_ID} --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --wait
37+
38+
# install llm-docsum with TGI backend
39+
# helm install llm-uservice . --set image.repository="opea/llm-docsum" --set DOCSUM_BACKEND="TGI" --set LLM_ENDPOINT=${LLM_ENDPOINT} --set LLM_MODEL_ID=${LLM_MODEL_ID} --set MAX_INPUT_TOKENS=2048 --set MAX_TOTAL_TOKENS=4096 --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --wait
40+
41+
# install llm-docsum with vLLM backend
42+
# helm install llm-uservice . --set image.repository="opea/llm-docsum" --set DOCSUM_BACKEND="vLLM" --set LLM_ENDPOINT=${LLM_ENDPOINT} --set LLM_MODEL_ID=${LLM_MODEL_ID} --set MAX_INPUT_TOKENS=2048 --set MAX_TOTAL_TOKENS=4096 --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --wait
43+
44+
# install llm-faqgen with TGI backend
45+
# helm install llm-uservice . --set image.repository="opea/llm-faqgen" --set FAQGEN_BACKEND="TGI" --set LLM_ENDPOINT=${LLM_ENDPOINT} --set LLM_MODEL_ID=${LLM_MODEL_ID} --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --wait
46+
47+
# install llm-faqgen with vLLM backend
48+
# helm install llm-uservice . --set image.repository="opea/llm-faqgen" --set FAQGEN_BACKEND="vLLM" --set LLM_ENDPOINT=${LLM_ENDPOINT} --set LLM_MODEL_ID=${LLM_MODEL_ID} --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --wait
3049
```
3150

3251
## Verify
3352

3453
To verify the installation, run the command `kubectl get pod` to make sure all pods are running.
3554

36-
Then run the command `kubectl port-forward svc/llm-uservice 9000:9000` to expose the llm-uservice service for access.
55+
Then run the command `kubectl port-forward svc/llm-uservice 9000:9000` to expose the service for access.
3756

3857
Open another terminal and run the following command to verify the service if working:
3958

4059
```console
60+
# for llm-textgen service
4161
curl http://localhost:9000/v1/chat/completions \
42-
-X POST \
43-
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
44-
-H 'Content-Type: application/json'
62+
-X POST \
63+
-d d '{"model": "${LLM_MODEL_ID}", "messages": "What is Deep Learning?", "max_tokens":17}' \
64+
-H 'Content-Type: application/json'
65+
66+
# for llm-docsum service
67+
curl http://localhost:9000/v1/docsum \
68+
-X POST \
69+
-d '{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en"}' \
70+
-H 'Content-Type: application/json'
71+
72+
# for llm-faqgen service
73+
curl http://localhost:9000/v1/faqgen \
74+
-X POST \
75+
-d '{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.","max_tokens": 128}' \
76+
-H 'Content-Type: application/json'
4577
```
4678

4779
## Values
4880

49-
| Key | Type | Default | Description |
50-
| ------------------------------- | ------ | ---------------- | ------------------------------- |
51-
| global.HUGGINGFACEHUB_API_TOKEN | string | `""` | Your own Hugging Face API token |
52-
| image.repository | string | `"opea/llm-tgi"` | |
53-
| service.port | string | `"9000"` | |
54-
| TGI_LLM_ENDPOINT | string | `""` | LLM endpoint |
55-
| global.monitoring | bool | `false` | Service usage metrics |
81+
| Key | Type | Default | Description |
82+
| ------------------------------- | ------ | ----------------------------- | -------------------------------------------------------------------------------- |
83+
| global.HUGGINGFACEHUB_API_TOKEN | string | `""` | Your own Hugging Face API token |
84+
| image.repository | string | `"opea/llm-textgen"` | one of "opea/llm-textgen", "opea/llm-docsum", "opea/llm-faqgen" |
85+
| LLM_ENDPOINT | string | `""` | backend inference service endpoint |
86+
| LLM_MODEL_ID | string | `"Intel/neural-chat-7b-v3-3"` | model used by the inference backend |
87+
| TEXTGEN_BACKEND | string | `"tgi"` | backend inference engine, only valid for llm-textgen image, one of "TGI", "vLLM" |
88+
| DOCSUM_BACKEND | string | `"tgi"` | backend inference engine, only valid for llm-docsum image, one of "TGI", "vLLM" |
89+
| FAQGEN_BACKEND | string | `"tgi"` | backend inference engine, only valid for llm-faqgen image, one of "TGi", "vLLM" |
90+
| global.monitoring | bool | `false` | Service usage metrics |

helm-charts/common/llm-uservice/ci-docsum-values.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,15 @@
22
# SPDX-License-Identifier: Apache-2.0
33

44
image:
5-
repository: opea/llm-docsum-tgi
5+
repository: opea/llm-docsum
66
tag: "latest"
77

8+
LLM_MODEL_ID: "Intel/neural-chat-7b-v3-3"
89
MAX_INPUT_TOKENS: 2048
910
MAX_TOTAL_TOKENS: 4096
1011

1112
tgi:
13+
LLM_MODEL_ID: "Intel/neural-chat-7b-v3-3"
1214
enabled: true
1315
MAX_INPUT_LENGTH: 2048
1416
MAX_TOTAL_TOKENS: 4096

helm-charts/common/llm-uservice/ci-faqgen-values.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,11 @@
22
# SPDX-License-Identifier: Apache-2.0
33

44
image:
5-
repository: opea/llm-faqgen-tgi
5+
repository: opea/llm-faqgen
66
tag: "latest"
77

8+
LLM_MODEL_ID: meta-llama/Meta-Llama-3-8B-Instruct
9+
810
tgi:
911
enabled: true
1012
LLM_MODEL_ID: meta-llama/Meta-Llama-3-8B-Instruct
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
image:
5+
repository: opea/llm-docsum
6+
tag: "latest"
7+
8+
DOCSUM_BACKEND: "vLLM"
9+
LLM_MODEL_ID: "Intel/neural-chat-7b-v3-3"
10+
MAX_INPUT_TOKENS: 2048
11+
MAX_TOTAL_TOKENS: 4096
12+
13+
14+
tgi:
15+
enabled: false
16+
vllm:
17+
enabled: true
18+
image:
19+
repository: opea/vllm-gaudi
20+
tag: "latest"
21+
LLM_MODEL_ID: Intel/neural-chat-7b-v3-3
22+
OMPI_MCA_btl_vader_single_copy_mechanism: none
23+
extraCmdArgs: ["--tensor-parallel-size","1","--block-size","128","--max-num-seqs","256","--max-seq_len-to-capture","2048"]
24+
resources:
25+
limits:
26+
habana.ai/gaudi: 1

helm-charts/common/llm-uservice/ci-vllm-gaudi-values.yaml

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,5 @@ vllm:
1818
limits:
1919
habana.ai/gaudi: 1
2020

21-
vLLM_ENDPOINT: ""
21+
TEXTGEN_BACKEND: "vLLM"
2222
LLM_MODEL_ID: Intel/neural-chat-7b-v3-3
23-
image:
24-
repository: opea/llm-vllm
25-
tag: "latest"

helm-charts/common/llm-uservice/templates/configmap.yaml

Lines changed: 45 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -8,30 +8,59 @@ metadata:
88
labels:
99
{{- include "llm-uservice.labels" . | nindent 4 }}
1010
data:
11-
{{- if .Values.TGI_LLM_ENDPOINT }}
12-
TGI_LLM_ENDPOINT: {{ .Values.TGI_LLM_ENDPOINT | quote }}
11+
{{- if hasSuffix "llm-textgen" .Values.image.repository }}
12+
{{- if eq "TGI" .Values.TEXTGEN_BACKEND }}
13+
LLM_COMPONENT_NAME: "OPEA_LLM"
14+
{{- if not .Values.LLM_ENDPOINT }}
15+
LLM_ENDPOINT: "http://{{ .Release.Name }}-tgi"
16+
{{- end }}
17+
{{- else if eq "vLLM" .Values.TEXTGEN_BACKEND }}
18+
LLM_COMPONENT_NAME: "OPEA_LLM"
19+
{{- if not .Values.LLM_ENDPOINT }}
20+
LLM_ENDPOINT: "http://{{ .Release.Name }}-vllm"
21+
{{- end }}
1322
{{- else }}
14-
TGI_LLM_ENDPOINT: "http://{{ .Release.Name }}-tgi"
23+
{{- cat "Invalid TEXTGEN_BACKEND:" .Values.TEXTGEN_BACKEND | fail }}
24+
{{- end }}
25+
{{- else if hasSuffix "llm-docsum" .Values.image.repository }}
26+
MAX_INPUT_TOKENS: {{ .Values.MAX_INPUT_TOKENS | default "" | quote }}
27+
MAX_TOTAL_TOKENS: {{ .Values.MAX_TOTAL_TOKENS | default "" | quote }}
28+
{{- if eq "TGI" .Values.DOCSUM_BACKEND }}
29+
DocSum_COMPONENT_NAME: "OPEADocSum_TGI"
30+
{{- if not .Values.LLM_ENDPOINT }}
31+
LLM_ENDPOINT: "http://{{ .Release.Name }}-tgi"
32+
{{- end }}
33+
{{- else if eq "vLLM" .Values.DOCSUM_BACKEND }}
34+
DocSum_COMPONENT_NAME: "OPEADocSum_vLLM"
35+
{{- if not .Values.LLM_ENDPOINT }}
36+
LLM_ENDPOINT: "http://{{ .Release.Name }}-vllm"
1537
{{- end }}
16-
{{- if .Values.vLLM_ENDPOINT }}
17-
vLLM_ENDPOINT: {{ .Values.vLLM_ENDPOINT | quote }}
1838
{{- else }}
19-
vLLM_ENDPOINT: "http://{{ .Release.Name }}-vllm"
39+
{{- cat "Invalid DOCUSM_BACKEND:" .Values.DOCSUM_BACKEND | fail }}
2040
{{- end }}
21-
{{- if .Values.LLM_MODEL_ID }}
22-
# NOTE:
23-
# delete LLM_MODEL once https://github.com/opea-project/GenAIComps/pull/1089 is merged
24-
LLM_MODEL: {{ .Values.LLM_MODEL_ID | quote }}
25-
LLM_MODEL_ID: {{ .Values.LLM_MODEL_ID | quote }}
41+
{{- else if hasSuffix "llm-faqgen" .Values.image.repository }}
42+
{{- if eq "TGI" .Values.FAQGEN_BACKEND }}
43+
FAQGen_COMPONENT_NAME: "OPEAFAQGen_TGI"
44+
{{- if not .Values.LLM_ENDPOINT }}
45+
LLM_ENDPOINT: "http://{{ .Release.Name }}-tgi"
2646
{{- end }}
27-
{{- if .Values.MAX_INPUT_TOKENS }}
28-
MAX_INPUT_TOKENS: {{ .Values.MAX_INPUT_TOKENS | quote }}
47+
{{- else if eq "vLLM" .Values.FAQGEN_BACKEND }}
48+
FAQGen_COMPONENT_NAME: "OPEAFAQGen_vLLM"
49+
{{- if not .Values.LLM_ENDPOINT }}
50+
LLM_ENDPOINT: "http://{{ .Release.Name }}-vllm"
2951
{{- end }}
30-
{{- if .Values.MAX_TOTAL_TOKENS }}
31-
MAX_TOTAL_TOKENS: {{ .Values.MAX_TOTAL_TOKENS | quote }}
52+
{{- else }}
53+
{{- cat "Invalid FAQGEN_BACKEND:" .Values.FAQGEN_BACKEND | fail }}
54+
{{- end }}
55+
{{- end }}
56+
{{- if .Values.LLM_ENDPOINT }}
57+
LLM_ENDPOINT: {{ tpl .Values.LLM_ENDPOINT . | quote }}
58+
{{- end }}
59+
{{- if .Values.LLM_MODEL_ID }}
60+
LLM_MODEL_ID: {{ .Values.LLM_MODEL_ID | quote }}
3261
{{- end }}
33-
HUGGINGFACEHUB_API_TOKEN: {{ .Values.global.HUGGINGFACEHUB_API_TOKEN | quote }}
3462
HF_HOME: "/tmp/.cache/huggingface"
63+
HF_TOKEN: {{ .Values.global.HUGGINGFACEHUB_API_TOKEN | quote }}
3564
{{- if .Values.global.HF_ENDPOINT }}
3665
HF_ENDPOINT: {{ .Values.global.HF_ENDPOINT | quote }}
3766
{{- end }}

helm-charts/common/llm-uservice/templates/deployment.yaml

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,38 @@ spec:
2828
serviceAccountName: {{ include "llm-uservice.serviceAccountName" . }}
2929
securityContext:
3030
{{- toYaml .Values.podSecurityContext | nindent 8 }}
31+
initContainers:
32+
- name: wait-for-llm
33+
envFrom:
34+
- configMapRef:
35+
name: {{ include "llm-uservice.fullname" . }}-config
36+
{{- if .Values.global.extraEnvConfig }}
37+
- configMapRef:
38+
name: {{ .Values.global.extraEnvConfig }}
39+
optional: true
40+
{{- end }}
41+
securityContext:
42+
{{- toYaml .Values.securityContext | nindent 12 }}
43+
image: busybox:1.36
44+
command: ["sh", "-c"]
45+
args:
46+
- |
47+
proto=$(echo ${LLM_ENDPOINT} | sed -n 's/.*\(http[s]\?\):\/\/\([^ :]\+\):\?\([0-9]*\).*/\1/p');
48+
host=$(echo ${LLM_ENDPOINT} | sed -n 's/.*\(http[s]\?\):\/\/\([^ :]\+\):\?\([0-9]*\).*/\2/p');
49+
port=$(echo ${LLM_ENDPOINT} | sed -n 's/.*\(http[s]\?\):\/\/\([^ :]\+\):\?\([0-9]*\).*/\3/p');
50+
if [ -z "$port" ]; then
51+
port=80;
52+
[[ "$proto" = "https" ]] && port=443;
53+
fi;
54+
retry_count={{ .Values.retryCount | default 60 }};
55+
j=1;
56+
while ! nc -z ${host} ${port}; do
57+
[[ $j -ge ${retry_count} ]] && echo "ERROR: ${host}:${port} is NOT reachable in $j seconds!" && exit 1;
58+
j=$((j+1)); sleep 1;
59+
done;
60+
echo "${host}:${port} is reachable within $j seconds.";
3161
containers:
32-
- name: {{ .Release.Name }}
62+
- name: {{ .Chart.Name }}
3363
envFrom:
3464
- configMapRef:
3565
name: {{ include "llm-uservice.fullname" . }}-config

helm-charts/common/llm-uservice/templates/tests/test-pod.yaml

Lines changed: 12 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -17,26 +17,22 @@ spec:
1717
command: ['bash', '-c']
1818
args:
1919
- |
20+
{{- if contains "llm-docsum" .Values.image.repository }}
21+
url="http://{{ include "llm-uservice.fullname" . }}:{{ .Values.service.port }}/v1/docsum";
22+
body='{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.","max_tokens":17}';
23+
{{- else if contains "llm-faqgen" .Values.image.repository }}
24+
url="http://{{ include "llm-uservice.fullname" . }}:{{ .Values.service.port }}/v1/faqgen";
25+
body='{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.","max_tokens":17}';
26+
{{- else }}
27+
url="http://{{ include "llm-uservice.fullname" . }}:{{ .Values.service.port }}/v1/chat/completions";
28+
body='{"model": "{{ .Values.LLM_MODEL_ID }}", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}';
29+
{{- end }}
2030
max_retry=20;
2131
for ((i=1; i<=max_retry; i++)); do
22-
{{- if contains "llm-docsum-tgi" .Values.image.repository }}
23-
# Try with docsum endpoint
24-
curl http://{{ include "llm-uservice.fullname" . }}:{{ .Values.service.port }}/v1/chat/docsum -sS --fail-with-body \
25-
-X POST \
26-
-d '{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.","max_tokens":17}' \
27-
-H 'Content-Type: application/json' && break;
28-
{{- else if contains "llm-faqgen-tgi" .Values.image.repository }}
29-
# Try with faqgen endpoint
30-
curl http://{{ include "llm-uservice.fullname" . }}:{{ .Values.service.port }}/v1/faqgen -sS --fail-with-body \
32+
curl "$url" -sS --fail-with-body \
3133
-X POST \
32-
-d '{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.","max_tokens":17}' \
34+
-d "$body" \
3335
-H 'Content-Type: application/json' && break;
34-
{{- else }}
35-
curl http://{{ include "llm-uservice.fullname" . }}:{{ .Values.service.port }}/v1/chat/completions -sS --fail-with-body \
36-
-X POST \
37-
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
38-
-H 'Content-Type: application/json' && break;
39-
{{- end }}
4036
curlcode=$?
4137
if [[ $curlcode -eq 7 ]]; then sleep 10; else echo "curl failed with code $curlcode"; exit 1; fi;
4238
done;

0 commit comments

Comments
 (0)