Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 9 additions & 4 deletions EdgeCraftRAG/Dockerfile.server
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ RUN apt-get remove -y libze-intel-gpu1 libigc1 libigdfcl1 libze-dev || true; \
apt-get update; \
apt-get install -y curl
RUN curl -sL 'https://keyserver.ubuntu.com/pks/lookup?fingerprint=on&op=get&search=0x0C0E6AF955CE463C03FC51574D098D70AFBE5E1F' | tee /etc/apt/trusted.gpg.d/driver.asc
RUN echo -e "Types: deb\nURIs: https://ppa.launchpadcontent.net/kobuk-team/intel-graphics/ubuntu/\nSuites: plucky\nComponents: main\nSigned-By: /etc/apt/trusted.gpg.d/driver.asc" > /etc/apt/sources.list.d/driver.sources
RUN echo -e "Types: deb\nURIs: https://ppa.launchpadcontent.net/kobuk-team/intel-graphics/ubuntu/\nSuites: questing\nComponents: main\nSigned-By: /etc/apt/trusted.gpg.d/driver.asc" > /etc/apt/sources.list.d/driver.sources
RUN apt-get update && apt-get install -y libze-intel-gpu1 libze1 intel-metrics-discovery intel-opencl-icd clinfo intel-gsc && apt-get install -y libze-intel-gpu1 libze1 intel-metrics-discovery intel-opencl-icd clinfo intel-gsc && apt-get install -y libze-dev intel-ocloc libze-intel-gpu-raytracing

RUN useradd -m -s /bin/bash user && \
Expand All @@ -18,11 +18,13 @@ RUN mkdir /templates && \
COPY ./edgecraftrag/prompt_template/default_prompt.txt /templates/
RUN chown -R user /templates/default_prompt.txt

COPY ./edgecraftrag /home/user/edgecraftrag

RUN mkdir -p /home/user/ui_cache
RUN mkdir -p /home/user/ui_cache /home/user/edgecraftrag
ENV UI_UPLOAD_PATH=/home/user/ui_cache

# Copy requirements first so pip install is cached independently from source changes
COPY ./edgecraftrag/requirements.txt /home/user/edgecraftrag/requirements.txt
RUN chown -R user /home/user/edgecraftrag

USER user

WORKDIR /home/user/edgecraftrag
Expand All @@ -37,4 +39,7 @@ ENV PYTHONPATH="$PYTHONPATH:/home/user/genai/tools/llm_bench"

RUN python3 -m nltk.downloader -d /home/user/nltk_data punkt_tab averaged_perceptron_tagger_eng

# Copy the full source last — changes here no longer bust the pip cache layers above
COPY ./edgecraftrag /home/user/edgecraftrag

ENTRYPOINT ["python3", "-m", "edgecraftrag.server"]
7 changes: 3 additions & 4 deletions EdgeCraftRAG/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,9 @@ quality and performance.

## What's New

1. Support Agent component and enable deep_search agent
2. Optimize pipeline execution performance with asynchronous api
3. Support session list display in UI
4. Support vllm-based embedding service
1. Support decouple operation for pipeline and knowledge base
2. Optimize Agentic workflow user experience
3. User Guide enhancement

## Table of contents

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed EdgeCraftRAG/assets/img/kbadmin_index.png
Binary file not shown.
157 changes: 49 additions & 108 deletions EdgeCraftRAG/docker_compose/intel/gpu/arc/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
# Example Edge Craft Retrieval-Augmented Generation Deployment on Intel® Arc® Platform

This document outlines the deployment process for Edge Craft Retrieval-Augmented Generation service on Intel Arc server. This example includes the following sections:
[中文版](README_zh.md)

- [EdgeCraftRAG Quick Start Deployment](#edgecraftrag-quick-start-deployment): Demonstrates how to quickly deploy a Edge Craft Retrieval-Augmented Generation service/pipeline on Intel® Arc® platform.
This document outlines the deployment process for Edge Craft Retrieval-Augmented Generation service on Intel® Arc® Platform. This example includes the following sections:

- [EdgeCraftRAG Quick Start Deployment](#edgecraftrag-quick-start-deployment): Demonstrates how to quickly deploy a Edge Craft Retrieval-Augmented Generation service/pipeline on Intel® Arc® Platform.
- [EdgeCraftRAG Docker Compose Files](#edgecraftrag-docker-compose-files): Describes some example deployments and their docker compose files.
- [EdgeCraftRAG Service Configuration](#edgecraftrag-service-configuration): Describes the service and possible configuration changes.

Expand All @@ -12,23 +14,31 @@ This section describes how to quickly deploy and test the EdgeCraftRAG service m

1. [Prerequisites](#1-prerequisites)
2. [Access the Code](#2-access-the-code)
3. [Prepare models](#3-prepare-models)
4. [Prepare env variables and configurations](#4-prepare-env-variables-and-configurations)
5. [Deploy the Service on Arc GPU Using Docker Compose](#5-deploy-the-service-on-intel-gpu-using-docker-compose)
6. [Access UI](#6-access-ui)
7. [Cleanup the Deployment](#7-cleanup-the-deployment)
3. [Run quick_start.sh](#3-run-quick_startsh)
4. [Access UI](#4-access-ui)
5. [Cleanup the Deployment](#5-cleanup-the-deployment)

### 1. Prerequisites

EC-RAG supports vLLM deployment(default method) and local OpenVINO deployment for Intel Arc GPU. Prerequisites are shown as below:
Hardware: Intel Arc A770
OS: Ubuntu Server 22.04.1 or newer (at least 6.2 LTS kernel)
Driver & libraries: please to [Installing GPUs Drivers](https://dgpu-docs.intel.com/driver/installation-rolling.html#installing-gpu-drivers) for detailed driver & libraries setup
EC-RAG supports vLLM deployment(default method) and local OpenVINO deployment for Intel Arc GPU and Core Ultra Platform. Prerequisites are shown as below:

#### Core Ultra

**OS**: Ubuntu 24.04 or newer
**Driver & libraries**: Please refer to [Installing Client GPUs on Ubuntu Desktop](https://dgpu-docs.intel.com/driver/client/overview.html#installing-client-gpus-on-ubuntu-desktop)
**Available Inferencing Framework**: openVINO

#### Intel Arc B60

Hardware: Intel Arc B60
please to [Install Native Environment](https://github.com/intel/llm-scaler/tree/main/vllm#11-install-native-environment) for detailed setup
**OS**: Ubuntu 25.04 Desktop (for Core Ultra and Xeon-W), Ubuntu 25.04 Server (for Xeon-SP).
**Driver & libraries**: Please refer to [Install Bare Metal Environment](https://github.com/intel/llm-scaler/tree/main/vllm#11-install-bare-metal-environment) for detailed setup
**Available Inferencing Framework**: openVINO, vLLM

Below steps are based on **vLLM** as inference engine, if you want to choose **OpenVINO**, please refer to [OpenVINO Local Inference](../../../../docs/Advanced_Setup.md#openvino-local-inference)
#### Intel Arc A770

**OS**: Ubuntu Server 22.04.1 or newer (at least 6.2 LTS kernel)
**Driver & libraries**: Please refer to [Installing GPUs Drivers](https://dgpu-docs.intel.com/driver/installation-rolling.html#installing-gpu-drivers) for detailed driver & libraries setup
**Available Inferencing Framework**: openVINO, vLLM

### 2. Access the Code

Expand All @@ -39,123 +49,54 @@ git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/EdgeCraftRAG
```

Checkout a released version, such as v1.5:

```
git checkout v1.5
```

### 3. Prepare models
> **NOTE**: If you want to checkout a released version, such as v1.5:
>
> ```
> git checkout v1.5
> ```

```bash
# Prepare models for embedding, reranking:
export MODEL_PATH="${PWD}/models" # Your model path for embedding, reranking and LLM models
mkdir -p $MODEL_PATH
pip install --upgrade --upgrade-strategy eager "optimum[openvino]"
optimum-cli export openvino -m BAAI/bge-small-en-v1.5 ${MODEL_PATH}/BAAI/bge-small-en-v1.5 --task sentence-similarity
optimum-cli export openvino -m BAAI/bge-reranker-large ${MODEL_PATH}/BAAI/bge-reranker-large --task text-classification

# Prepare LLM model
export LLM_MODEL="Qwen/Qwen3-8B" # Your model id
pip install modelscope
modelscope download --model $LLM_MODEL --local_dir "${MODEL_PATH}/${LLM_MODEL}"
# Optionally, you can also download models with huggingface:
# pip install -U huggingface_hub
# huggingface-cli download $LLM_MODEL --local-dir "${MODEL_PATH}/${LLM_MODEL}"
```
### 3. Run quick_start.sh

### 4. Prepare env variables and configurations

#### Prepare env variables for vLLM deployment
Run quick start script from the `EdgeCraftRAG` root directory:

```bash
ip_address=$(hostname -I | awk '{print $1}')
# Use `ip a` to check your active ip
export HOST_IP=$ip_address # Your host ip

# Check group id of video and render
export VIDEOGROUPID=$(getent group video | cut -d: -f3)
export RENDERGROUPID=$(getent group render | cut -d: -f3)

# If you have a proxy configured, execute below line
export no_proxy=${no_proxy},${HOST_IP},edgecraftrag,edgecraftrag-server
export NO_PROXY=${NO_PROXY},${HOST_IP},edgecraftrag,edgecraftrag-server
# If you have a HF mirror configured, it will be imported to the container
# export HF_ENDPOINT=https://hf-mirror.com # your HF mirror endpoint"

# Make sure all 3 folders have 1000:1000 permission, otherwise
export DOC_PATH=${PWD}/tests
export TMPFILE_PATH=${PWD}/tests
chown 1000:1000 ${MODEL_PATH} ${DOC_PATH} ${TMPFILE_PATH}
# In addition, also make sure the .cache folder has 1000:1000 permission, otherwise
chown 1000:1000 -R $HOME/.cache
./tools/quick_start.sh
```

For more advanced env variables and configurations, please refer to [Prepare env variables for vLLM deployment](../../../../docs/Advanced_Setup.md#prepare-env-variables-for-vllm-deployment)
The script is located in the `tools` directory. For detailed usage of `quick_start.sh` and `build_images.sh`, please refer to [tools/README.md](../../../../tools/README.md).

### 5. Deploy the Service on Intel GPU Using Docker Compose
By default, this script starts local OpenVINO deployment when no environment variables are configured.

set Milvus DB and chat history round for inference:
If you prefer manual model preparation, env setup, and docker compose options, please refer to [Manual deployment details in Advanced Setup](../../../../docs/Advanced_Setup.md#manual-deployment-details-for-arc-platform).

```bash
# EC-RAG support Milvus as persistent database, by default milvus is disabled, you can choose to set MILVUS_ENABLED=1 to enable it
export MILVUS_ENABLED=0
# If you enable Milvus, the default storage path is PWD, uncomment if you want to change:
# export DOCKER_VOLUME_DIRECTORY= # change to your preference
### 4. Access UI

# EC-RAG support chat history round setting, by default chat history is disabled, you can set CHAT_HISTORY_ROUND to control it
# export CHAT_HISTORY_ROUND= # change to your preference
```

#### option a. Deploy the Service on Arc A770 Using Docker Compose

```bash
export VLLM_SERVICE_PORT_A770=8086 # You can set your own port for vllm service

# Launch EC-RAG service with compose
docker compose --profile a770 -f docker_compose/intel/gpu/arc/compose.yaml up -d
```
Open your browser, access http://${HOST_IP}:8082

#### option b. Deploy the Service on Arc B60 Using Docker Compose
After startup completes, `quick_start.sh` will print:

```bash
# Besides MILVUS_ENABLED and CHAT_HISTORY_ROUND, below environments are exposed for vLLM config, you can change them to your preference:
# export VLLM_SERVICE_PORT_B60=8086
# export DTYPE=float16
# export TP=1 # for multi GPU, you can change TP value
# export DP=1
# export ZE_AFFINITY_MASK=0 # for multi GPU, you can export ZE_AFFINITY_MASK=0,1,2...
# export ENFORCE_EAGER=1
# export TRUST_REMOTE_CODE=1
# export DISABLE_SLIDING_WINDOW=1
# export GPU_MEMORY_UTIL=0.8
# export NO_ENABLE_PREFIX_CACHING=1
# export MAX_NUM_BATCHED_TOKENS=8192
# export DISABLE_LOG_REQUESTS=1
# export MAX_MODEL_LEN=49152
# export BLOCK_SIZE=64
# export QUANTIZATION=fp8
docker compose --profile b60 -f docker_compose/intel/gpu/arc/compose.yaml up -d
```text
Service launched successfully.
UI access URL: http://${HOST_IP}:8082
If you are accessing from another machine, replace ${HOST_IP} with your server's reachable IP or hostname.
```

### 6. Access UI

Open your browser, access http://${HOST_IP}:8082

> Your browser should be running on the same host of your console, otherwise you will need to access UI with your host domain name instead of ${HOST_IP}.

Below is the UI front page, for detailed operations on UI and EC-RAG settings, please refer to [Explore_Edge_Craft_RAG](../../../../docs/Explore_Edge_Craft_RAG.md)
![front_page](../../../../assets/img/front_page.png)

### 7. Cleanup the Deployment
### 5. Cleanup the Deployment

To stop the containers associated with the deployment, execute the following command:
To stop the containers associated with the deployment, execute the helper script command:

```bash
./tools/quick_start.sh cleanup
```
docker compose -f docker_compose/intel/gpu/arc/compose.yaml down
```

All the EdgeCraftRAG containers will be stopped and then removed on completion of the "down" command.
All the EdgeCraftRAG containers will be stopped and then removed on completion.

If you prefer the manual docker compose cleanup command, please refer to [Manual cleanup details in Advanced Setup](../../../../docs/Advanced_Setup.md#6-cleanup-the-deployment-manual).

## EdgeCraftRAG Docker Compose Files

Expand Down
125 changes: 125 additions & 0 deletions EdgeCraftRAG/docker_compose/intel/gpu/arc/README_zh.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# 在 Intel® Arc® 平台上部署 Edge Craft 检索增强生成(EC-RAG)示例

[English](README.md)

本文档介绍了在 Intel® Arc® 平台上部署 Edge Craft 检索增强生成服务的流程。该示例包含以下部分:

- [EdgeCraftRAG 快速开始部署](#edgecraftrag-快速开始部署):演示如何在 Intel® Arc® 平台上快速部署 Edge Craft 检索增强生成服务/流水线。
- [EdgeCraftRAG Docker Compose 文件](#edgecraftrag-docker-compose-文件):说明一些示例部署及其 docker compose 文件。
- [EdgeCraftRAG 服务配置](#edgecraftrag-服务配置):说明服务以及可进行的配置变更。

## EdgeCraftRAG 快速开始部署

本节介绍如何在 Intel® Arc® 平台上手动快速部署并测试 EdgeCraftRAG 服务。基本步骤如下:

1. [前置条件](#1-前置条件)
2. [获取代码](#2-获取代码)
3. [运行 quick_start.sh](#3-运行-quick_startsh)
4. [访问 UI](#4-访问-ui)
5. [清理部署](#5-清理部署)

### 1. 前置条件

EC-RAG 支持 vLLM 部署(默认方式)以及面向 Intel Arc GPU 和 Core Ultra 平台的本地 OpenVINO 部署。前置条件如下:

#### Core Ultra

**操作系统**:Ubuntu 24.04 或更高版本
**驱动与库**:请参考 [Installing Client GPUs on Ubuntu Desktop](https://dgpu-docs.intel.com/driver/client/overview.html#installing-client-gpus-on-ubuntu-desktop)
**可用推理框架**:openVINO

#### Intel Arc B60

**操作系统**:Ubuntu 25.04 Desktop(适用于 Core Ultra 和 Xeon-W),Ubuntu 25.04 Server(适用于 Xeon-SP)。
**驱动与库**:详细安装请参考 [Install Bare Metal Environment](https://github.com/intel/llm-scaler/tree/main/vllm#11-install-bare-metal-environment)
**可用推理框架**:openVINO、vLLM

#### Intel Arc A770

**操作系统**:Ubuntu Server 22.04.1 或更高版本(至少 6.2 LTS 内核)
**驱动与库**:详细驱动与库安装请参考 [Installing GPUs Drivers](https://dgpu-docs.intel.com/driver/installation-rolling.html#installing-gpu-drivers)
**可用推理框架**:openVINO、vLLM

### 2. 获取代码

克隆 GenAIExample 仓库,并进入 EdgeCraftRAG 在 Intel® Arc® 平台上的 Docker Compose 文件与配套脚本目录:

```
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/EdgeCraftRAG
```

> **注意**:如果你想切换到某个发布版本,例如 v1.5:
>
> ```
> git checkout v1.5
> ```

### 3. 运行 quick_start.sh

在 `EdgeCraftRAG` 根目录下运行快速启动脚本:

```bash
./tools/quick_start.sh
```

该脚本位于 `tools` 目录。有关 `quick_start.sh` 和 `build_images.sh` 的详细用法,请参考 [tools/README_zh.md](../../../../tools/README_zh.md)。

在不配置任何环境变量时,脚本默认启动本地 OpenVINO 部署。

如果你希望使用手动方式(模型准备、环境变量配置、Docker Compose 启动),请参考 [Advanced Setup 中的手动部署说明](../../../../docs/Advanced_Setup_zh.md#arc-平台手动部署详细说明)。

### 4. 访问 UI

打开浏览器访问 http://${HOST_IP}:8082

启动完成后,`quick_start.sh` 会输出:

```text
Service launched successfully.
UI access URL: http://${HOST_IP}:8082
If you are accessing from another machine, replace ${HOST_IP} with your server's reachable IP or hostname.
```

> 浏览器应运行在与控制台相同的主机上;否则你需要使用主机域名而不是 ${HOST_IP} 来访问 UI。

下图为 UI 首页。有关 UI 操作和 EC-RAG 设置的详细说明,请参考 [Explore_Edge_Craft_RAG](../../../../docs/Explore_Edge_Craft_RAG_zh.md)
![front_page](../../../../assets/img/front_page.png)

### 5. 清理部署

若要停止与本次部署关联的容器,请执行脚本命令:

```bash
./tools/quick_start.sh cleanup
```

执行完成后,所有 EdgeCraftRAG 容器都会停止并被移除。

如果你希望使用手动 docker compose 清理命令,请参考 [Advanced Setup 中的手动清理说明](../../../../docs/Advanced_Setup_zh.md#6-清理部署手动)。

## EdgeCraftRAG Docker Compose 文件

`compose.yaml` 是默认的 compose 文件,使用 tgi 作为服务框架。

| 服务名称 | 镜像名称 |
| ------------------- | ---------------------------------------- |
| etcd | quay.io/coreos/etcd:v3.5.5 |
| minio | minio/minio:RELEASE.2023-03-20T20-16-18Z |
| milvus-standalone | milvusdb/milvus:v2.4.6 |
| edgecraftrag-server | opea/edgecraftrag-server:latest |
| edgecraftrag-ui | opea/edgecraftrag-ui:latest |
| ecrag | opea/edgecraftrag:latest |

## EdgeCraftRAG 服务配置

下表全面概述了示例 Docker Compose 文件中各类部署所使用的 EdgeCraftRAG 服务。表中每一行代表一个独立服务,详细说明了可用镜像及其在部署架构中的功能描述。

| 服务名称 | 可选镜像名称 | 可选 | 描述 |
| ------------------- | ---------------------------------------- | ---- | ---------------------------------------------------------- |
| etcd | quay.io/coreos/etcd:v3.5.5 | 否 | 提供分布式键值存储,用于服务发现和配置管理。 |
| minio | minio/minio:RELEASE.2023-03-20T20-16-18Z | 否 | 提供对象存储服务,用于存储文档和模型文件。 |
| milvus-standalone | milvusdb/milvus:v2.4.6 | 否 | 提供向量数据库能力,用于管理 embedding 和相似度检索。 |
| edgecraftrag-server | opea/edgecraftrag-server:latest | 否 | 作为 EdgeCraftRAG 服务后端,具体形态随部署方式不同而变化。 |
| edgecraftrag-ui | opea/edgecraftrag-ui:latest | 否 | 提供 EdgeCraftRAG 服务的用户界面。 |
| ecrag | opea/edgecraftrag:latest | 否 | 作为反向代理,管理 UI 与后端服务之间的流量。 |
6 changes: 3 additions & 3 deletions EdgeCraftRAG/docker_compose/intel/gpu/arc/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -207,14 +207,14 @@ services:
https_proxy: ${https_proxy}
MODEL_PATH: "/llm/models"
SERVED_MODEL_NAME: ${LLM_MODEL}
TENSOR_PARALLEL_SIZE: ${TENSOR_PARALLEL_SIZE:-1}
TENSOR_PARALLEL_SIZE: ${TP:-1}
MAX_NUM_SEQS: ${MAX_NUM_SEQS:-64}
MAX_NUM_BATCHED_TOKENS: ${MAX_NUM_BATCHED_TOKENS:-10240}
MAX_MODEL_LEN: ${MAX_MODEL_LEN:-10240}
LOAD_IN_LOW_BIT: ${LOAD_IN_LOW_BIT:-fp8}
LOAD_IN_LOW_BIT: ${QUANTIZATION:-fp8}
CCL_DG2_USM: ${CCL_DG2_USM:-""}
PORT: ${VLLM_SERVICE_PORT_A770:-8086}
ZE_AFFINITY_MASK: ${SELECTED_XPU_0:-0}
ZE_AFFINITY_MASK: ${ZE_AFFINITY_MASK:-0}
shm_size: '32g'
entrypoint: /bin/bash -c "\
cd /llm && \
Expand Down
Loading
Loading