Skip to content

Commit 8202313

Browse files
committed
add new feature for EC-RAG
Signed-off-by: Yongbozzz <yongbo.zhu@intel.com>
1 parent a02931e commit 8202313

76 files changed

Lines changed: 3738 additions & 2593 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

EdgeCraftRAG/Dockerfile.server

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ RUN apt-get remove -y libze-intel-gpu1 libigc1 libigdfcl1 libze-dev || true; \
66
apt-get update; \
77
apt-get install -y curl
88
RUN curl -sL 'https://keyserver.ubuntu.com/pks/lookup?fingerprint=on&op=get&search=0x0C0E6AF955CE463C03FC51574D098D70AFBE5E1F' | tee /etc/apt/trusted.gpg.d/driver.asc
9-
RUN echo -e "Types: deb\nURIs: https://ppa.launchpadcontent.net/kobuk-team/intel-graphics/ubuntu/\nSuites: plucky\nComponents: main\nSigned-By: /etc/apt/trusted.gpg.d/driver.asc" > /etc/apt/sources.list.d/driver.sources
9+
RUN echo -e "Types: deb\nURIs: https://ppa.launchpadcontent.net/kobuk-team/intel-graphics/ubuntu/\nSuites: questing\nComponents: main\nSigned-By: /etc/apt/trusted.gpg.d/driver.asc" > /etc/apt/sources.list.d/driver.sources
1010
RUN apt-get update && apt-get install -y libze-intel-gpu1 libze1 intel-metrics-discovery intel-opencl-icd clinfo intel-gsc && apt-get install -y libze-intel-gpu1 libze1 intel-metrics-discovery intel-opencl-icd clinfo intel-gsc && apt-get install -y libze-dev intel-ocloc libze-intel-gpu-raytracing
1111

1212
RUN useradd -m -s /bin/bash user && \

EdgeCraftRAG/docker_compose/intel/gpu/arc/README.md

Lines changed: 63 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
11
# Example Edge Craft Retrieval-Augmented Generation Deployment on Intel® Arc® Platform
22

3-
This document outlines the deployment process for Edge Craft Retrieval-Augmented Generation service on Intel Arc server. This example includes the following sections:
3+
[中文版](README_zh.md)
44

5-
- [EdgeCraftRAG Quick Start Deployment](#edgecraftrag-quick-start-deployment): Demonstrates how to quickly deploy a Edge Craft Retrieval-Augmented Generation service/pipeline on Intel® Arc® platform.
5+
This document outlines the deployment process for Edge Craft Retrieval-Augmented Generation service on Intel® Arc® Platform. This example includes the following sections:
6+
7+
- [EdgeCraftRAG Quick Start Deployment](#edgecraftrag-quick-start-deployment): Demonstrates how to quickly deploy a Edge Craft Retrieval-Augmented Generation service/pipeline on Intel® Arc® Platform.
68
- [EdgeCraftRAG Docker Compose Files](#edgecraftrag-docker-compose-files): Describes some example deployments and their docker compose files.
79
- [EdgeCraftRAG Service Configuration](#edgecraftrag-service-configuration): Describes the service and possible configuration changes.
810

@@ -20,15 +22,22 @@ This section describes how to quickly deploy and test the EdgeCraftRAG service m
2022

2123
### 1. Prerequisites
2224

23-
EC-RAG supports vLLM deployment(default method) and local OpenVINO deployment for Intel Arc GPU. Prerequisites are shown as below:
24-
Hardware: Intel Arc A770
25-
OS: Ubuntu Server 22.04.1 or newer (at least 6.2 LTS kernel)
26-
Driver & libraries: please to [Installing GPUs Drivers](https://dgpu-docs.intel.com/driver/installation-rolling.html#installing-gpu-drivers) for detailed driver & libraries setup
25+
EC-RAG supports vLLM deployment(default method) and local OpenVINO deployment for Intel Arc GPU and Core Ultra Platform. Prerequisites are shown as below:
26+
27+
#### Core Ultra
28+
**OS**: Ubuntu 24.04 or newer
29+
**Driver & libraries**: Please refer to [Installing Client GPUs on Ubuntu Desktop](https://dgpu-docs.intel.com/driver/client/overview.html#installing-client-gpus-on-ubuntu-desktop)
30+
**Available Inferencing Framework**: openVINO
2731

28-
Hardware: Intel Arc B60
29-
please to [Install Native Environment](https://github.com/intel/llm-scaler/tree/main/vllm#11-install-native-environment) for detailed setup
32+
#### Intel Arc B60
33+
**OS**: Ubuntu 25.04 Desktop (for Core Ultra and Xeon-W), Ubuntu 25.04 Server (for Xeon-SP).
34+
**Driver & libraries**: Please refer to [Install Bare Metal Environment](https://github.com/intel/llm-scaler/tree/main/vllm#11-install-bare-metal-environment) for detailed setup
35+
**Available Inferencing Framework**: openVINO, vLLM
3036

31-
Below steps are based on **vLLM** as inference engine, if you want to choose **OpenVINO**, please refer to [OpenVINO Local Inference](../../../../docs/Advanced_Setup.md#openvino-local-inference)
37+
#### Intel Arc A770
38+
**OS**: Ubuntu Server 22.04.1 or newer (at least 6.2 LTS kernel)
39+
**Driver & libraries**: Please refer to [Installing GPUs Drivers](https://dgpu-docs.intel.com/driver/installation-rolling.html#installing-gpu-drivers) for detailed driver & libraries setup
40+
**Available Inferencing Framework**: openVINO, vLLM
3241

3342
### 2. Access the Code
3443

@@ -39,23 +48,46 @@ git clone https://github.com/opea-project/GenAIExamples.git
3948
cd GenAIExamples/EdgeCraftRAG
4049
```
4150

42-
Checkout a released version, such as v1.5:
43-
44-
```
45-
git checkout v1.5
46-
```
51+
> **NOTE**: If you want to checkout a released version, such as v1.5:
52+
>
53+
>```
54+
>git checkout v1.5
55+
>```
4756
4857
### 3. Prepare models
4958
59+
There are 3 models need to be prepared: **Embedding**, **Reranking**, **LLM**
60+
You'll need to decide the inferencing framework for these models.
61+
62+
#### Embedding and Reranking
63+
64+
Embedding and reranking are usually servered by local OpenVINO inferencing, to prepare these 2 models:
65+
5066
```bash
5167
# Prepare models for embedding, reranking:
5268
export MODEL_PATH="${PWD}/models" # Your model path for embedding, reranking and LLM models
5369
mkdir -p $MODEL_PATH
5470
pip install --upgrade --upgrade-strategy eager "optimum[openvino]"
5571
optimum-cli export openvino -m BAAI/bge-small-en-v1.5 ${MODEL_PATH}/BAAI/bge-small-en-v1.5 --task sentence-similarity
5672
optimum-cli export openvino -m BAAI/bge-reranker-large ${MODEL_PATH}/BAAI/bge-reranker-large --task text-classification
73+
```
74+
75+
#### LLM
76+
77+
##### openVINO
78+
If you have Core Ultra platform only, please prepare openVINO models:
79+
You can also run openVINO models on discrete GPU.
80+
81+
```bash
82+
# Prepare LLM model for openVINO
83+
optimum-cli export openvino --model Qwen/Qwen3-8B ${MODEL_PATH}/Qwen/Qwen3-8B/INT4_compressed_weights --task text-generation-with-past --weight-format int4 --group-size 128 --ratio 0.8
84+
```
5785

58-
# Prepare LLM model
86+
##### vLLM
87+
Alternatively, if you have discrete GPU and want to use vLLM, please prepare models for vLLM:
88+
89+
```bash
90+
# Prepare LLM model for vLLM
5991
export LLM_MODEL="Qwen/Qwen3-8B" # Your model id
6092
pip install modelscope
6193
modelscope download --model $LLM_MODEL --local_dir "${MODEL_PATH}/${LLM_MODEL}"
@@ -66,7 +98,7 @@ modelscope download --model $LLM_MODEL --local_dir "${MODEL_PATH}/${LLM_MODEL}"
6698

6799
### 4. Prepare env variables and configurations
68100

69-
#### Prepare env variables for vLLM deployment
101+
#### Prepare env variables for deployment
70102

71103
```bash
72104
ip_address=$(hostname -I | awk '{print $1}')
@@ -93,9 +125,7 @@ chown 1000:1000 -R $HOME/.cache
93125

94126
For more advanced env variables and configurations, please refer to [Prepare env variables for vLLM deployment](../../../../docs/Advanced_Setup.md#prepare-env-variables-for-vllm-deployment)
95127

96-
### 5. Deploy the Service on Intel GPU Using Docker Compose
97-
98-
set Milvus DB and chat history round for inference:
128+
Set Milvus DB and chat history round for inference:
99129

100130
```bash
101131
# EC-RAG support Milvus as persistent database, by default milvus is disabled, you can choose to set MILVUS_ENABLED=1 to enable it
@@ -107,37 +137,29 @@ export MILVUS_ENABLED=0
107137
# export CHAT_HISTORY_ROUND= # change to your preference
108138
```
109139

110-
#### option a. Deploy the Service on Arc A770 Using Docker Compose
140+
### 5. Deploy the Service on Intel GPU Using Docker Compose
111141

112-
```bash
113-
export VLLM_SERVICE_PORT_A770=8086 # You can set your own port for vllm service
142+
#### Option a. Deploy openVINO LLM based EC-RAG for Core Ultra, Arc B60, Arc A770
114143

115-
# Launch EC-RAG service with compose
116-
docker compose --profile a770 -f docker_compose/intel/gpu/arc/compose.yaml up -d
144+
Make sure you have prepared [openVINO models](#openvino)
145+
```bash
146+
docker compose -f docker_compose/intel/gpu/arc/compose.yaml up -d
117147
```
118148

119-
#### option b. Deploy the Service on Arc B60 Using Docker Compose
149+
#### Option b.1. Deploy vLLM based EC-RAG for Arc B60
150+
Make sure you have prepared [vLLM models](#vllm)
120151

121152
```bash
122-
# Besides MILVUS_ENABLED and CHAT_HISTORY_ROUND, below environments are exposed for vLLM config, you can change them to your preference:
123-
# export VLLM_SERVICE_PORT_B60=8086
124-
# export DTYPE=float16
125-
# export TP=1 # for multi GPU, you can change TP value
126-
# export DP=1
127-
# export ZE_AFFINITY_MASK=0 # for multi GPU, you can export ZE_AFFINITY_MASK=0,1,2...
128-
# export ENFORCE_EAGER=1
129-
# export TRUST_REMOTE_CODE=1
130-
# export DISABLE_SLIDING_WINDOW=1
131-
# export GPU_MEMORY_UTIL=0.8
132-
# export NO_ENABLE_PREFIX_CACHING=1
133-
# export MAX_NUM_BATCHED_TOKENS=8192
134-
# export DISABLE_LOG_REQUESTS=1
135-
# export MAX_MODEL_LEN=49152
136-
# export BLOCK_SIZE=64
137-
# export QUANTIZATION=fp8
138153
docker compose --profile b60 -f docker_compose/intel/gpu/arc/compose.yaml up -d
139154
```
140155

156+
#### Option b.2. Deploy vLLM based EC-RAG for Arc A770
157+
Make sure you have prepared [vLLM models](#vllm)
158+
159+
```bash
160+
docker compose --profile a770 -f docker_compose/intel/gpu/arc/compose.yaml up -d
161+
```
162+
141163
### 6. Access UI
142164

143165
Open your browser, access http://${HOST_IP}:8082
Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
# 在 Intel® Arc® 平台上部署 Edge Craft 检索增强生成(EC-RAG)示例
2+
3+
[English](README.md)
4+
5+
本文档介绍了在 Intel® Arc® 平台上部署 Edge Craft 检索增强生成服务的流程。该示例包含以下部分:
6+
7+
- [EdgeCraftRAG 快速开始部署](#edgecraftrag-快速开始部署):演示如何在 Intel® Arc® 平台上快速部署 Edge Craft 检索增强生成服务/流水线。
8+
- [EdgeCraftRAG Docker Compose 文件](#edgecraftrag-docker-compose-文件):说明一些示例部署及其 docker compose 文件。
9+
- [EdgeCraftRAG 服务配置](#edgecraftrag-服务配置):说明服务以及可进行的配置变更。
10+
11+
## EdgeCraftRAG 快速开始部署
12+
13+
本节介绍如何在 Intel® Arc® 平台上手动快速部署并测试 EdgeCraftRAG 服务。基本步骤如下:
14+
15+
1. [前置条件](#1-前置条件)
16+
2. [获取代码](#2-获取代码)
17+
3. [准备模型](#3-准备模型)
18+
4. [准备环境变量和配置](#4-准备环境变量和配置)
19+
5. [使用 Docker Compose 在 Arc GPU 上部署服务](#5-使用-docker-compose-在-intel-gpu-上部署服务)
20+
6. [访问 UI](#6-访问-ui)
21+
7. [清理部署](#7-清理部署)
22+
23+
### 1. 前置条件
24+
25+
EC-RAG 支持 vLLM 部署(默认方式)以及面向 Intel Arc GPU 和 Core Ultra 平台的本地 OpenVINO 部署。前置条件如下:
26+
27+
#### Core Ultra
28+
**操作系统**:Ubuntu 24.04 或更高版本
29+
**驱动与库**:请参考 [Installing Client GPUs on Ubuntu Desktop](https://dgpu-docs.intel.com/driver/client/overview.html#installing-client-gpus-on-ubuntu-desktop)
30+
**可用推理框架**:openVINO
31+
32+
#### Intel Arc B60
33+
**操作系统**:Ubuntu 25.04 Desktop(适用于 Core Ultra 和 Xeon-W),Ubuntu 25.04 Server(适用于 Xeon-SP)。
34+
**驱动与库**:详细安装请参考 [Install Bare Metal Environment](https://github.com/intel/llm-scaler/tree/main/vllm#11-install-bare-metal-environment)
35+
**可用推理框架**:openVINO、vLLM
36+
37+
#### Intel Arc A770
38+
**操作系统**:Ubuntu Server 22.04.1 或更高版本(至少 6.2 LTS 内核)
39+
**驱动与库**:详细驱动与库安装请参考 [Installing GPUs Drivers](https://dgpu-docs.intel.com/driver/installation-rolling.html#installing-gpu-drivers)
40+
**可用推理框架**:openVINO、vLLM
41+
42+
### 2. 获取代码
43+
44+
克隆 GenAIExample 仓库,并进入 EdgeCraftRAG 在 Intel® Arc® 平台上的 Docker Compose 文件与配套脚本目录:
45+
46+
```
47+
git clone https://github.com/opea-project/GenAIExamples.git
48+
cd GenAIExamples/EdgeCraftRAG
49+
```
50+
51+
> **注意**:如果你想切换到某个发布版本,例如 v1.5:
52+
>
53+
>```
54+
>git checkout v1.5
55+
>```
56+
57+
### 3. 准备模型
58+
59+
需要准备 3 个模型:**Embedding**、**Reranking**、**LLM**。
60+
你需要为这些模型选择推理框架。
61+
62+
#### Embedding 和 Reranking
63+
64+
Embedding 和 reranking 通常由本地 OpenVINO 推理提供服务。准备这 2 个模型:
65+
66+
```bash
67+
# 准备 embedding、reranking 模型:
68+
export MODEL_PATH="${PWD}/models" # embedding、reranking、LLM 模型路径
69+
mkdir -p $MODEL_PATH
70+
pip install --upgrade --upgrade-strategy eager "optimum[openvino]"
71+
optimum-cli export openvino -m BAAI/bge-small-en-v1.5 ${MODEL_PATH}/BAAI/bge-small-en-v1.5 --task sentence-similarity
72+
optimum-cli export openvino -m BAAI/bge-reranker-large ${MODEL_PATH}/BAAI/bge-reranker-large --task text-classification
73+
```
74+
75+
#### LLM
76+
77+
##### openVINO
78+
如果你只有 Core Ultra 平台,请准备 openVINO 模型:
79+
你也可以在独立 GPU 上运行 openVINO 模型。
80+
81+
```bash
82+
# 准备 openVINO 的 LLM 模型
83+
optimum-cli export openvino --model Qwen/Qwen3-8B ${MODEL_PATH}/Qwen/Qwen3-8B/INT4_compressed_weights --task text-generation-with-past --weight-format int4 --group-size 128 --ratio 0.8
84+
```
85+
86+
##### vLLM
87+
另外,如果你有独立显卡,可以为 vLLM 准备模型:
88+
89+
```bash
90+
# 准备 vLLM 的 LLM 模型
91+
export LLM_MODEL="Qwen/Qwen3-8B" # 你的模型 id
92+
pip install modelscope
93+
modelscope download --model $LLM_MODEL --local_dir "${MODEL_PATH}/${LLM_MODEL}"
94+
# 可选:你也可以用 huggingface 下载模型:
95+
# pip install -U huggingface_hub
96+
# huggingface-cli download $LLM_MODEL --local-dir "${MODEL_PATH}/${LLM_MODEL}"
97+
```
98+
99+
### 4. 准备环境变量和配置
100+
101+
#### 为部署准备环境变量
102+
103+
```bash
104+
ip_address=$(hostname -I | awk '{print $1}')
105+
# 使用 `ip a` 查看当前活动 ip
106+
export HOST_IP=$ip_address # 主机 ip
107+
108+
# 查看 video 与 render 的 group id
109+
export VIDEOGROUPID=$(getent group video | cut -d: -f3)
110+
export RENDERGROUPID=$(getent group render | cut -d: -f3)
111+
112+
# 若已配置代理,执行以下命令
113+
export no_proxy=${no_proxy},${HOST_IP},edgecraftrag,edgecraftrag-server
114+
export NO_PROXY=${NO_PROXY},${HOST_IP},edgecraftrag,edgecraftrag-server
115+
# 如果配置了 HF 镜像,会传入容器
116+
# export HF_ENDPOINT=https://hf-mirror.com # 你的 HF 镜像地址"
117+
118+
# 确保以下 3 个文件夹权限为 1000:1000,否则
119+
export DOC_PATH=${PWD}/tests
120+
export TMPFILE_PATH=${PWD}/tests
121+
chown 1000:1000 ${MODEL_PATH} ${DOC_PATH} ${TMPFILE_PATH}
122+
# 此外还要确保 .cache 文件夹权限为 1000:1000,否则
123+
chown 1000:1000 -R $HOME/.cache
124+
```
125+
126+
如需更高级的环境变量和配置,请参考 [Prepare env variables for vLLM deployment](../../../../docs/Advanced_Setup.md#prepare-env-variables-for-vllm-deployment)
127+
128+
为推理设置 Milvus 数据库与聊天历史轮数:
129+
130+
```bash
131+
# EC-RAG 支持 Milvus 持久化数据库,默认关闭;可设置 MILVUS_ENABLED=1 开启
132+
export MILVUS_ENABLED=0
133+
# 如果启用 Milvus,默认存储路径为 PWD,如需修改请取消注释:
134+
# export DOCKER_VOLUME_DIRECTORY= # 按需修改
135+
136+
# EC-RAG 支持聊天历史轮数设置,默认关闭;可通过 CHAT_HISTORY_ROUND 控制
137+
# export CHAT_HISTORY_ROUND= # 按需修改
138+
```
139+
140+
### 5. 使用 Docker Compose 在 Intel GPU 上部署服务
141+
142+
#### Option a. 为 Core Ultra、Arc B60、Arc A770 部署基于 openVINO LLM 的 EC-RAG
143+
144+
请确保已准备好 [openVINO 模型](#openvino)
145+
146+
```bash
147+
docker compose -f docker_compose/intel/gpu/arc/compose.yaml up -d
148+
```
149+
150+
#### Option b.1. 为 Arc B60 部署基于 vLLM 的 EC-RAG
151+
152+
请确保已准备好 [vLLM 模型](#vllm)
153+
154+
```bash
155+
docker compose --profile b60 -f docker_compose/intel/gpu/arc/compose.yaml up -d
156+
```
157+
158+
#### Option b.2. 为 Arc A770 部署基于 vLLM 的 EC-RAG
159+
160+
请确保已准备好 [vLLM 模型](#vllm)
161+
162+
```bash
163+
docker compose --profile a770 -f docker_compose/intel/gpu/arc/compose.yaml up -d
164+
```
165+
166+
### 6. 访问 UI
167+
168+
打开浏览器访问 http://${HOST_IP}:8082
169+
170+
> 浏览器应运行在与控制台相同的主机上;否则你需要使用主机域名而不是 ${HOST_IP} 来访问 UI。
171+
172+
下图为 UI 首页。有关 UI 操作和 EC-RAG 设置的详细说明,请参考 [Explore_Edge_Craft_RAG](../../../../docs/Explore_Edge_Craft_RAG.md)
173+
![front_page](../../../../assets/img/front_page.png)
174+
175+
### 7. 清理部署
176+
177+
若要停止与本次部署关联的容器,请执行以下命令:
178+
179+
```
180+
docker compose -f docker_compose/intel/gpu/arc/compose.yaml down
181+
```
182+
183+
执行完 `down` 命令后,所有 EdgeCraftRAG 容器都会停止并被移除。
184+
185+
## EdgeCraftRAG Docker Compose 文件
186+
187+
`compose.yaml` 是默认的 compose 文件,使用 tgi 作为服务框架。
188+
189+
| 服务名称 | 镜像名称 |
190+
| ------------------- | ---------------------------------------- |
191+
| etcd | quay.io/coreos/etcd:v3.5.5 |
192+
| minio | minio/minio:RELEASE.2023-03-20T20-16-18Z |
193+
| milvus-standalone | milvusdb/milvus:v2.4.6 |
194+
| edgecraftrag-server | opea/edgecraftrag-server:latest |
195+
| edgecraftrag-ui | opea/edgecraftrag-ui:latest |
196+
| ecrag | opea/edgecraftrag:latest |
197+
198+
## EdgeCraftRAG 服务配置
199+
200+
下表全面概述了示例 Docker Compose 文件中各类部署所使用的 EdgeCraftRAG 服务。表中每一行代表一个独立服务,详细说明了可用镜像及其在部署架构中的功能描述。
201+
202+
| 服务名称 | 可选镜像名称 | 可选 | 描述 |
203+
| ------------------- | ---------------------------------------- | ---- | ------------------------------------------------------------------------------------------------ |
204+
| etcd | quay.io/coreos/etcd:v3.5.5 || 提供分布式键值存储,用于服务发现和配置管理。 |
205+
| minio | minio/minio:RELEASE.2023-03-20T20-16-18Z || 提供对象存储服务,用于存储文档和模型文件。 |
206+
| milvus-standalone | milvusdb/milvus:v2.4.6 || 提供向量数据库能力,用于管理 embedding 和相似度检索。 |
207+
| edgecraftrag-server | opea/edgecraftrag-server:latest || 作为 EdgeCraftRAG 服务后端,具体形态随部署方式不同而变化。 |
208+
| edgecraftrag-ui | opea/edgecraftrag-ui:latest || 提供 EdgeCraftRAG 服务的用户界面。 |
209+
| ecrag | opea/edgecraftrag:latest || 作为反向代理,管理 UI 与后端服务之间的流量。 |

0 commit comments

Comments
 (0)