You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
RUN curl -sL 'https://keyserver.ubuntu.com/pks/lookup?fingerprint=on&op=get&search=0x0C0E6AF955CE463C03FC51574D098D70AFBE5E1F' | tee /etc/apt/trusted.gpg.d/driver.asc
Copy file name to clipboardExpand all lines: EdgeCraftRAG/docker_compose/intel/gpu/arc/README.md
+63-41Lines changed: 63 additions & 41 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,10 @@
1
1
# Example Edge Craft Retrieval-Augmented Generation Deployment on Intel® Arc® Platform
2
2
3
-
This document outlines the deployment process for Edge Craft Retrieval-Augmented Generation service on Intel Arc server. This example includes the following sections:
3
+
[中文版](README_zh.md)
4
4
5
-
-[EdgeCraftRAG Quick Start Deployment](#edgecraftrag-quick-start-deployment): Demonstrates how to quickly deploy a Edge Craft Retrieval-Augmented Generation service/pipeline on Intel® Arc® platform.
5
+
This document outlines the deployment process for Edge Craft Retrieval-Augmented Generation service on Intel® Arc® Platform. This example includes the following sections:
6
+
7
+
-[EdgeCraftRAG Quick Start Deployment](#edgecraftrag-quick-start-deployment): Demonstrates how to quickly deploy a Edge Craft Retrieval-Augmented Generation service/pipeline on Intel® Arc® Platform.
6
8
-[EdgeCraftRAG Docker Compose Files](#edgecraftrag-docker-compose-files): Describes some example deployments and their docker compose files.
7
9
-[EdgeCraftRAG Service Configuration](#edgecraftrag-service-configuration): Describes the service and possible configuration changes.
8
10
@@ -20,15 +22,22 @@ This section describes how to quickly deploy and test the EdgeCraftRAG service m
20
22
21
23
### 1. Prerequisites
22
24
23
-
EC-RAG supports vLLM deployment(default method) and local OpenVINO deployment for Intel Arc GPU. Prerequisites are shown as below:
24
-
Hardware: Intel Arc A770
25
-
OS: Ubuntu Server 22.04.1 or newer (at least 6.2 LTS kernel)
26
-
Driver & libraries: please to [Installing GPUs Drivers](https://dgpu-docs.intel.com/driver/installation-rolling.html#installing-gpu-drivers) for detailed driver & libraries setup
25
+
EC-RAG supports vLLM deployment(default method) and local OpenVINO deployment for Intel Arc GPU and Core Ultra Platform. Prerequisites are shown as below:
26
+
27
+
#### Core Ultra
28
+
**OS**: Ubuntu 24.04 or newer
29
+
**Driver & libraries**: Please refer to [Installing Client GPUs on Ubuntu Desktop](https://dgpu-docs.intel.com/driver/client/overview.html#installing-client-gpus-on-ubuntu-desktop)
30
+
**Available Inferencing Framework**: openVINO
27
31
28
-
Hardware: Intel Arc B60
29
-
please to [Install Native Environment](https://github.com/intel/llm-scaler/tree/main/vllm#11-install-native-environment) for detailed setup
32
+
#### Intel Arc B60
33
+
**OS**: Ubuntu 25.04 Desktop (for Core Ultra and Xeon-W), Ubuntu 25.04 Server (for Xeon-SP).
34
+
**Driver & libraries**: Please refer to [Install Bare Metal Environment](https://github.com/intel/llm-scaler/tree/main/vllm#11-install-bare-metal-environment) for detailed setup
Below steps are based on **vLLM** as inference engine, if you want to choose **OpenVINO**, please refer to [OpenVINO Local Inference](../../../../docs/Advanced_Setup.md#openvino-local-inference)
37
+
#### Intel Arc A770
38
+
**OS**: Ubuntu Server 22.04.1 or newer (at least 6.2 LTS kernel)
39
+
**Driver & libraries**: Please refer to [Installing GPUs Drivers](https://dgpu-docs.intel.com/driver/installation-rolling.html#installing-gpu-drivers) for detailed driver & libraries setup
For more advanced env variables and configurations, please refer to [Prepare env variables for vLLM deployment](../../../../docs/Advanced_Setup.md#prepare-env-variables-for-vllm-deployment)
95
127
96
-
### 5. Deploy the Service on Intel GPU Using Docker Compose
97
-
98
-
set Milvus DB and chat history round for inference:
128
+
Set Milvus DB and chat history round for inference:
99
129
100
130
```bash
101
131
# EC-RAG support Milvus as persistent database, by default milvus is disabled, you can choose to set MILVUS_ENABLED=1 to enable it
@@ -107,37 +137,29 @@ export MILVUS_ENABLED=0
107
137
# export CHAT_HISTORY_ROUND= # change to your preference
108
138
```
109
139
110
-
#### option a. Deploy the Service on Arc A770 Using Docker Compose
140
+
###5. Deploy the Service on Intel GPU Using Docker Compose
111
141
112
-
```bash
113
-
export VLLM_SERVICE_PORT_A770=8086 # You can set your own port for vllm service
142
+
#### Option a. Deploy openVINO LLM based EC-RAG for Core Ultra, Arc B60, Arc A770
114
143
115
-
# Launch EC-RAG service with compose
116
-
docker compose --profile a770 -f docker_compose/intel/gpu/arc/compose.yaml up -d
144
+
Make sure you have prepared [openVINO models](#openvino)
145
+
```bash
146
+
docker compose -f docker_compose/intel/gpu/arc/compose.yaml up -d
117
147
```
118
148
119
-
#### option b. Deploy the Service on Arc B60 Using Docker Compose
149
+
#### Option b.1. Deploy vLLM based EC-RAG for Arc B60
150
+
Make sure you have prepared [vLLM models](#vllm)
120
151
121
152
```bash
122
-
# Besides MILVUS_ENABLED and CHAT_HISTORY_ROUND, below environments are exposed for vLLM config, you can change them to your preference:
123
-
# export VLLM_SERVICE_PORT_B60=8086
124
-
# export DTYPE=float16
125
-
# export TP=1 # for multi GPU, you can change TP value
126
-
# export DP=1
127
-
# export ZE_AFFINITY_MASK=0 # for multi GPU, you can export ZE_AFFINITY_MASK=0,1,2...
128
-
# export ENFORCE_EAGER=1
129
-
# export TRUST_REMOTE_CODE=1
130
-
# export DISABLE_SLIDING_WINDOW=1
131
-
# export GPU_MEMORY_UTIL=0.8
132
-
# export NO_ENABLE_PREFIX_CACHING=1
133
-
# export MAX_NUM_BATCHED_TOKENS=8192
134
-
# export DISABLE_LOG_REQUESTS=1
135
-
# export MAX_MODEL_LEN=49152
136
-
# export BLOCK_SIZE=64
137
-
# export QUANTIZATION=fp8
138
153
docker compose --profile b60 -f docker_compose/intel/gpu/arc/compose.yaml up -d
139
154
```
140
155
156
+
#### Option b.2. Deploy vLLM based EC-RAG for Arc A770
157
+
Make sure you have prepared [vLLM models](#vllm)
158
+
159
+
```bash
160
+
docker compose --profile a770 -f docker_compose/intel/gpu/arc/compose.yaml up -d
0 commit comments