Skip to content

Commit e357dfc

Browse files
committed
✏️ Updated chatqna setup instructions for OpenVINO vLLM
1 parent 0d82e1c commit e357dfc

1 file changed

Lines changed: 30 additions & 9 deletions

File tree

helm-charts/chatqna/README.md

Lines changed: 30 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,30 +9,50 @@ Helm chart for deploying ChatQnA service. ChatQnA depends on the following servi
99
- [redis-vector-db](../common/redis-vector-db)
1010
- [reranking-usvc](../common/reranking-usvc)
1111
- [teirerank](../common/teirerank)
12-
- [llm-uservice](../common/llm-uservice)
13-
- [tgi](../common/tgi)
12+
13+
Apart from above mentioned services, there are following conditional dependencies (out of which, one are required):
14+
15+
1. If we want to use TGI as our inference service, following 2 services will be required:
16+
17+
- [llm-uservice](../common/llm-uservice)
18+
- [tgi](../common/tgi)
19+
20+
2. If we want to use OpenVINO vLLM inference service, following 2 services would be required:
21+
- [llm-vllm-uservice](../common/llm-vllm-uservice)
22+
- [vllm-openvino](../common/vllm-openvino)
23+
1424

1525
## Installing the Chart
1626

1727
To install the chart, run the following:
1828

19-
```console
29+
```bash
2030
cd GenAIInfra/helm-charts/
2131
./update_dependency.sh
2232
helm dependency update chatqna
2333
export HFTOKEN="insert-your-huggingface-token-here"
2434
export MODELDIR="/mnt/opea-models"
2535
export MODELNAME="Intel/neural-chat-7b-v3-3"
2636
helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set tgi.LLM_MODEL_ID=${MODELNAME}
37+
2738
# To use Gaudi device
28-
#helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set tgi.LLM_MODEL_ID=${MODELNAME} -f chatqna/gaudi-values.yaml
39+
helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set tgi.LLM_MODEL_ID=${MODELNAME} -f chatqna/gaudi-values.yaml
40+
2941
# To use Nvidia GPU
30-
#helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set tgi.LLM_MODEL_ID=${MODELNAME} -f chatqna/nv-values.yaml
42+
helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set tgi.LLM_MODEL_ID=${MODELNAME} -f chatqna/nv-values.yaml
43+
44+
45+
# To use OpenVINO vLLM inference engine on Xeon device
46+
47+
helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set global.LLM_MODEL_ID=${MODELNAME} --set tags.tgi=false --set vllm-openvino.enabled=true
3148
```
3249

50+
3351
### IMPORTANT NOTE
3452

35-
1. Make sure your `MODELDIR` exists on the node where your workload is schedueled so you can cache the downloaded model for next time use. Otherwise, set `global.modelUseHostPath` to 'null' if you don't want to cache the model.
53+
1. Make sure your `MODELDIR` exists on the node where your workload is scheduled so you can cache the downloaded model for next time use. Otherwise, set `global.modelUseHostPath` to 'null' if you don't want to cache the model.
54+
55+
2. Please set `http_proxy`, `https_proxy` and `no_proxy` values while installing chart, if you are behind a proxy.
3656

3757
## Verify
3858

@@ -46,8 +66,9 @@ Run the command `kubectl port-forward svc/chatqna 8888:8888` to expose the servi
4666

4767
Open another terminal and run the following command to verify the service if working:
4868

49-
```console
69+
```bash
5070
curl http://localhost:8888/v1/chatqna \
71+
-X POST \
5172
-H "Content-Type: application/json" \
5273
-d '{"messages": "What is the revenue of Nike in 2023?"}'
5374
```
@@ -71,7 +92,6 @@ docker save -o ui.tar opea/chatqna-conversation-ui:latest
7192
sudo ctr -n k8s.io image import ui.tar
7293

7394
# install UI using helm chart. Replace image tag if required
74-
cd
7595
cd GenAIInfra/helm-charts/
7696
helm install ui common/chatqna-ui --set BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna",DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep",image.tag="latest"
7797

@@ -88,4 +108,5 @@ Access `http://localhost:5174` to play with the ChatQnA workload through UI.
88108
| image.repository | string | `"opea/chatqna"` | |
89109
| service.port | string | `"8888"` | |
90110
| tgi.LLM_MODEL_ID | string | `"Intel/neural-chat-7b-v3-3"` | Models id from https://huggingface.co/, or predownloaded model directory |
91-
| global.horizontalPodAutoscaler.enabled | bop; | false | HPA autoscaling for the TGI and TEI service deployments based on metrics they provide. See HPA section in ../README.md before enabling! |
111+
| vllm-openvino.LLM_MODEL_ID | string | `"Intel/neural-chat-7b-v3-3"` | Models id from https://huggingface.co/, or predownloaded model directory |
112+
| global.horizontalPodAutoscaler.enabled | bool | false | HPA autoscaling for the TGI and TEI service deployments based on metrics they provide. See HPA section in ../README.md before enabling! |

0 commit comments

Comments
 (0)