opea-project
diff --git a/‎EdgeCraftRAG/Dockerfile.server‎
Lines changed: 1 addition & 1 deletion b/‎EdgeCraftRAG/Dockerfile.server‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎EdgeCraftRAG/docker_compose/intel/gpu/arc/README.md‎
Lines changed: 63 additions & 41 deletions b/‎EdgeCraftRAG/docker_compose/intel/gpu/arc/README.md‎
Lines changed: 63 additions & 41 deletions
diff --git a/‎EdgeCraftRAG/docker_compose/intel/gpu/arc/README_zh.md‎
Lines changed: 209 additions & 0 deletions b/‎EdgeCraftRAG/docker_compose/intel/gpu/arc/README_zh.md‎
Lines changed: 209 additions & 0 deletions
@@ -6,7 +6,7 @@ RUN apt-get remove -y libze-intel-gpu1 libigc1 libigdfcl1 libze-dev || true; \
     apt-get update; \
     apt-get install -y curl
 RUN curl -sL 'https://keyserver.ubuntu.com/pks/lookup?fingerprint=on&op=get&search=0x0C0E6AF955CE463C03FC51574D098D70AFBE5E1F' | tee /etc/apt/trusted.gpg.d/driver.asc
-RUN echo -e "Types: deb\nURIs: https://ppa.launchpadcontent.net/kobuk-team/intel-graphics/ubuntu/\nSuites: plucky\nComponents: main\nSigned-By: /etc/apt/trusted.gpg.d/driver.asc" > /etc/apt/sources.list.d/driver.sources
+RUN echo -e "Types: deb\nURIs: https://ppa.launchpadcontent.net/kobuk-team/intel-graphics/ubuntu/\nSuites: questing\nComponents: main\nSigned-By: /etc/apt/trusted.gpg.d/driver.asc" > /etc/apt/sources.list.d/driver.sources
 RUN apt-get update && apt-get install -y libze-intel-gpu1 libze1 intel-metrics-discovery intel-opencl-icd clinfo intel-gsc && apt-get install -y libze-intel-gpu1 libze1 intel-metrics-discovery intel-opencl-icd clinfo intel-gsc && apt-get install -y libze-dev intel-ocloc libze-intel-gpu-raytracing
 
 RUN useradd -m -s /bin/bash user && \
 
@@ -1,8 +1,10 @@
 # Example Edge Craft Retrieval-Augmented Generation Deployment on Intel® Arc® Platform
 
-This document outlines the deployment process for Edge Craft Retrieval-Augmented Generation service on Intel Arc server. This example includes the following sections:
+[中文版](README_zh.md)
 
-- [EdgeCraftRAG Quick Start Deployment](#edgecraftrag-quick-start-deployment): Demonstrates how to quickly deploy a Edge Craft Retrieval-Augmented Generation service/pipeline on Intel® Arc® platform.
+This document outlines the deployment process for Edge Craft Retrieval-Augmented Generation service on Intel® Arc® Platform. This example includes the following sections:
+
+- [EdgeCraftRAG Quick Start Deployment](#edgecraftrag-quick-start-deployment): Demonstrates how to quickly deploy a Edge Craft Retrieval-Augmented Generation service/pipeline on Intel® Arc® Platform.
 - [EdgeCraftRAG Docker Compose Files](#edgecraftrag-docker-compose-files): Describes some example deployments and their docker compose files.
 - [EdgeCraftRAG Service Configuration](#edgecraftrag-service-configuration): Describes the service and possible configuration changes.
 
@@ -20,15 +22,22 @@ This section describes how to quickly deploy and test the EdgeCraftRAG service m
 
 ### 1. Prerequisites
 
-EC-RAG supports vLLM deployment(default method) and local OpenVINO deployment for Intel Arc GPU. Prerequisites are shown as below:  
-Hardware: Intel Arc A770  
-OS: Ubuntu Server 22.04.1 or newer (at least 6.2 LTS kernel)  
-Driver & libraries: please to [Installing GPUs Drivers](https://dgpu-docs.intel.com/driver/installation-rolling.html#installing-gpu-drivers) for detailed driver & libraries setup
+EC-RAG supports vLLM deployment(default method) and local OpenVINO deployment for Intel Arc GPU and Core Ultra Platform. Prerequisites are shown as below:  
+
+#### Core Ultra
+**OS**: Ubuntu 24.04 or newer  
+**Driver & libraries**: Please refer to [Installing Client GPUs on Ubuntu Desktop](https://dgpu-docs.intel.com/driver/client/overview.html#installing-client-gpus-on-ubuntu-desktop)  
+**Available Inferencing Framework**: openVINO
 
-Hardware: Intel Arc B60  
-please to [Install Native Environment](https://github.com/intel/llm-scaler/tree/main/vllm#11-install-native-environment) for detailed setup
+#### Intel Arc B60
+**OS**: Ubuntu 25.04 Desktop (for Core Ultra and Xeon-W), Ubuntu 25.04 Server (for Xeon-SP).   
+**Driver & libraries**: Please refer to [Install Bare Metal Environment](https://github.com/intel/llm-scaler/tree/main/vllm#11-install-bare-metal-environment) for detailed setup  
+**Available Inferencing Framework**: openVINO, vLLM
 
-Below steps are based on **vLLM** as inference engine, if you want to choose **OpenVINO**, please refer to [OpenVINO Local Inference](../../../../docs/Advanced_Setup.md#openvino-local-inference)
+#### Intel Arc A770
+**OS**: Ubuntu Server 22.04.1 or newer (at least 6.2 LTS kernel)  
+**Driver & libraries**: Please refer to [Installing GPUs Drivers](https://dgpu-docs.intel.com/driver/installation-rolling.html#installing-gpu-drivers) for detailed driver & libraries setup  
+**Available Inferencing Framework**: openVINO, vLLM
 
 ### 2. Access the Code
 
@@ -39,23 +48,46 @@ git clone https://github.com/opea-project/GenAIExamples.git
 cd GenAIExamples/EdgeCraftRAG
 ```
 
-Checkout a released version, such as v1.5:
-
-```
-git checkout v1.5
-```
+> **NOTE**: If you want to checkout a released version, such as v1.5:
+>
+>```
+>git checkout v1.5
+>```
 
 ### 3. Prepare models
 
+There are 3 models need to be prepared: **Embedding**, **Reranking**, **LLM**  
+You'll need to decide the inferencing framework for these models.
+
+#### Embedding and Reranking
+
+Embedding and reranking are usually servered by local OpenVINO inferencing, to prepare these 2 models:
+
 ```bash
 # Prepare models for embedding, reranking:
 export MODEL_PATH="${PWD}/models" # Your model path for embedding, reranking and LLM models
 mkdir -p $MODEL_PATH
 pip install --upgrade --upgrade-strategy eager "optimum[openvino]"
 optimum-cli export openvino -m BAAI/bge-small-en-v1.5 ${MODEL_PATH}/BAAI/bge-small-en-v1.5 --task sentence-similarity
 optimum-cli export openvino -m BAAI/bge-reranker-large ${MODEL_PATH}/BAAI/bge-reranker-large --task text-classification
+```
+
+#### LLM
+
+##### openVINO
+If you have Core Ultra platform only, please prepare openVINO models:  
+You can also run openVINO models on discrete GPU.
+
+```bash
+# Prepare LLM model for openVINO
+optimum-cli export openvino --model Qwen/Qwen3-8B ${MODEL_PATH}/Qwen/Qwen3-8B/INT4_compressed_weights --task text-generation-with-past --weight-format int4 --group-size 128 --ratio 0.8
+```
 
-# Prepare LLM model
+##### vLLM
+Alternatively, if you have discrete GPU and want to use vLLM, please prepare models for vLLM:
+
+```bash
+# Prepare LLM model for vLLM
 export LLM_MODEL="Qwen/Qwen3-8B" # Your model id
 pip install modelscope
 modelscope download --model $LLM_MODEL --local_dir "${MODEL_PATH}/${LLM_MODEL}"
@@ -66,7 +98,7 @@ modelscope download --model $LLM_MODEL --local_dir "${MODEL_PATH}/${LLM_MODEL}"
 
 ### 4. Prepare env variables and configurations
 
-#### Prepare env variables for vLLM deployment
+#### Prepare env variables for deployment
 
 ```bash
 ip_address=$(hostname -I | awk '{print $1}')
@@ -93,9 +125,7 @@ chown 1000:1000 -R $HOME/.cache
 
 For more advanced env variables and configurations, please refer to [Prepare env variables for vLLM deployment](../../../../docs/Advanced_Setup.md#prepare-env-variables-for-vllm-deployment)
 
-### 5. Deploy the Service on Intel GPU Using Docker Compose
-
-set Milvus DB and chat history round for inference:
+Set Milvus DB and chat history round for inference:
 
 ```bash
 # EC-RAG support Milvus as persistent database, by default milvus is disabled, you can choose to set MILVUS_ENABLED=1 to enable it
@@ -107,37 +137,29 @@ export MILVUS_ENABLED=0
 # export CHAT_HISTORY_ROUND= # change to your preference
 ```
 
-#### option a. Deploy the Service on Arc A770 Using Docker Compose
+### 5. Deploy the Service on Intel GPU Using Docker Compose
 
-```bash
-export VLLM_SERVICE_PORT_A770=8086 # You can set your own port for vllm service
+#### Option a. Deploy openVINO LLM based EC-RAG for Core Ultra, Arc B60, Arc A770
 
-# Launch EC-RAG service with compose
-docker compose --profile a770 -f docker_compose/intel/gpu/arc/compose.yaml up -d
+Make sure you have prepared [openVINO models](#openvino)  
+```bash
+docker compose -f docker_compose/intel/gpu/arc/compose.yaml up -d
 ```
 
-#### option b. Deploy the Service on Arc B60 Using Docker Compose
+#### Option b.1. Deploy vLLM based EC-RAG for Arc B60
+Make sure you have prepared [vLLM models](#vllm) 
 
 ```bash
-# Besides MILVUS_ENABLED and CHAT_HISTORY_ROUND, below environments are exposed for vLLM config, you can change them to your preference:
-# export VLLM_SERVICE_PORT_B60=8086
-# export DTYPE=float16
-# export TP=1 # for multi GPU, you can change TP value
-# export DP=1
-# export ZE_AFFINITY_MASK=0 # for multi GPU, you can export ZE_AFFINITY_MASK=0,1,2...
-# export ENFORCE_EAGER=1
-# export TRUST_REMOTE_CODE=1
-# export DISABLE_SLIDING_WINDOW=1
-# export GPU_MEMORY_UTIL=0.8
-# export NO_ENABLE_PREFIX_CACHING=1
-# export MAX_NUM_BATCHED_TOKENS=8192
-# export DISABLE_LOG_REQUESTS=1
-# export MAX_MODEL_LEN=49152
-# export BLOCK_SIZE=64
-# export QUANTIZATION=fp8
 docker compose --profile b60 -f docker_compose/intel/gpu/arc/compose.yaml up -d
 ```
 
+#### Option b.2. Deploy vLLM based EC-RAG for Arc A770
+Make sure you have prepared [vLLM models](#vllm) 
+
+```bash
+docker compose --profile a770 -f docker_compose/intel/gpu/arc/compose.yaml up -d
+```
+
 ### 6. Access UI
 
 Open your browser, access http://${HOST_IP}:8082
 
@@ -0,0 +1,209 @@
+# 在 Intel® Arc® 平台上部署 Edge Craft 检索增强生成（EC-RAG）示例
+
+[English](README.md)
+
+本文档介绍了在 Intel® Arc® 平台上部署 Edge Craft 检索增强生成服务的流程。该示例包含以下部分：
+
+- [EdgeCraftRAG 快速开始部署](#edgecraftrag-快速开始部署)：演示如何在 Intel® Arc® 平台上快速部署 Edge Craft 检索增强生成服务/流水线。
+- [EdgeCraftRAG Docker Compose 文件](#edgecraftrag-docker-compose-文件)：说明一些示例部署及其 docker compose 文件。
+- [EdgeCraftRAG 服务配置](#edgecraftrag-服务配置)：说明服务以及可进行的配置变更。
+
+## EdgeCraftRAG 快速开始部署
+
+本节介绍如何在 Intel® Arc® 平台上手动快速部署并测试 EdgeCraftRAG 服务。基本步骤如下：
+
+1. [前置条件](#1-前置条件)
+2. [获取代码](#2-获取代码)
+3. [准备模型](#3-准备模型)
+4. [准备环境变量和配置](#4-准备环境变量和配置)
+5. [使用 Docker Compose 在 Arc GPU 上部署服务](#5-使用-docker-compose-在-intel-gpu-上部署服务)
+6. [访问 UI](#6-访问-ui)
+7. [清理部署](#7-清理部署)
+
+### 1. 前置条件
+
+EC-RAG 支持 vLLM 部署（默认方式）以及面向 Intel Arc GPU 和 Core Ultra 平台的本地 OpenVINO 部署。前置条件如下：
+
+#### Core Ultra
+**操作系统**：Ubuntu 24.04 或更高版本  
+**驱动与库**：请参考 [Installing Client GPUs on Ubuntu Desktop](https://dgpu-docs.intel.com/driver/client/overview.html#installing-client-gpus-on-ubuntu-desktop)  
+**可用推理框架**：openVINO
+
+#### Intel Arc B60
+**操作系统**：Ubuntu 25.04 Desktop（适用于 Core Ultra 和 Xeon-W），Ubuntu 25.04 Server（适用于 Xeon-SP）。  
+**驱动与库**：详细安装请参考 [Install Bare Metal Environment](https://github.com/intel/llm-scaler/tree/main/vllm#11-install-bare-metal-environment)  
+**可用推理框架**：openVINO、vLLM
+
+#### Intel Arc A770
+**操作系统**：Ubuntu Server 22.04.1 或更高版本（至少 6.2 LTS 内核）  
+**驱动与库**：详细驱动与库安装请参考 [Installing GPUs Drivers](https://dgpu-docs.intel.com/driver/installation-rolling.html#installing-gpu-drivers)  
+**可用推理框架**：openVINO、vLLM
+
+### 2. 获取代码
+
+克隆 GenAIExample 仓库，并进入 EdgeCraftRAG 在 Intel® Arc® 平台上的 Docker Compose 文件与配套脚本目录：
+
+```
+git clone https://github.com/opea-project/GenAIExamples.git
+cd GenAIExamples/EdgeCraftRAG
+```
+
+> **注意**：如果你想切换到某个发布版本，例如 v1.5：
+>
+>```
+>git checkout v1.5
+>```
+
+### 3. 准备模型
+
+需要准备 3 个模型：**Embedding**、**Reranking**、**LLM**。  
+你需要为这些模型选择推理框架。
+
+#### Embedding 和 Reranking
+
+Embedding 和 reranking 通常由本地 OpenVINO 推理提供服务。准备这 2 个模型：
+
+```bash
+# 准备 embedding、reranking 模型：
+export MODEL_PATH="${PWD}/models" # embedding、reranking、LLM 模型路径
+mkdir -p $MODEL_PATH
+pip install --upgrade --upgrade-strategy eager "optimum[openvino]"
+optimum-cli export openvino -m BAAI/bge-small-en-v1.5 ${MODEL_PATH}/BAAI/bge-small-en-v1.5 --task sentence-similarity
+optimum-cli export openvino -m BAAI/bge-reranker-large ${MODEL_PATH}/BAAI/bge-reranker-large --task text-classification
+```
+
+#### LLM
+
+##### openVINO
+如果你只有 Core Ultra 平台，请准备 openVINO 模型：
+你也可以在独立 GPU 上运行 openVINO 模型。
+
+```bash
+# 准备 openVINO 的 LLM 模型
+optimum-cli export openvino --model Qwen/Qwen3-8B ${MODEL_PATH}/Qwen/Qwen3-8B/INT4_compressed_weights --task text-generation-with-past --weight-format int4 --group-size 128 --ratio 0.8
+```
+
+##### vLLM
+另外，如果你有独立显卡，可以为 vLLM 准备模型：
+
+```bash
+# 准备 vLLM 的 LLM 模型
+export LLM_MODEL="Qwen/Qwen3-8B" # 你的模型 id
+pip install modelscope
+modelscope download --model $LLM_MODEL --local_dir "${MODEL_PATH}/${LLM_MODEL}"
+# 可选：你也可以用 huggingface 下载模型：
+# pip install -U huggingface_hub
+# huggingface-cli download $LLM_MODEL --local-dir "${MODEL_PATH}/${LLM_MODEL}"
+```
+
+### 4. 准备环境变量和配置
+
+#### 为部署准备环境变量
+
+```bash
+ip_address=$(hostname -I | awk '{print $1}')
+# 使用 `ip a` 查看当前活动 ip
+export HOST_IP=$ip_address # 主机 ip
+
+# 查看 video 与 render 的 group id
+export VIDEOGROUPID=$(getent group video | cut -d: -f3)
+export RENDERGROUPID=$(getent group render | cut -d: -f3)
+
+# 若已配置代理，执行以下命令
+export no_proxy=${no_proxy},${HOST_IP},edgecraftrag,edgecraftrag-server
+export NO_PROXY=${NO_PROXY},${HOST_IP},edgecraftrag,edgecraftrag-server
+# 如果配置了 HF 镜像，会传入容器
+# export HF_ENDPOINT=https://hf-mirror.com # 你的 HF 镜像地址"
+
+# 确保以下 3 个文件夹权限为 1000:1000，否则
+export DOC_PATH=${PWD}/tests
+export TMPFILE_PATH=${PWD}/tests
+chown 1000:1000 ${MODEL_PATH} ${DOC_PATH} ${TMPFILE_PATH}
+# 此外还要确保 .cache 文件夹权限为 1000:1000，否则
+chown 1000:1000 -R $HOME/.cache
+```
+
+如需更高级的环境变量和配置，请参考 [Prepare env variables for vLLM deployment](../../../../docs/Advanced_Setup.md#prepare-env-variables-for-vllm-deployment)
+
+为推理设置 Milvus 数据库与聊天历史轮数：
+
+```bash
+# EC-RAG 支持 Milvus 持久化数据库，默认关闭；可设置 MILVUS_ENABLED=1 开启
+export MILVUS_ENABLED=0
+# 如果启用 Milvus，默认存储路径为 PWD，如需修改请取消注释：
+# export DOCKER_VOLUME_DIRECTORY= # 按需修改
+
+# EC-RAG 支持聊天历史轮数设置，默认关闭；可通过 CHAT_HISTORY_ROUND 控制
+# export CHAT_HISTORY_ROUND= # 按需修改
+```
+
+### 5. 使用 Docker Compose 在 Intel GPU 上部署服务
+
+#### Option a. 为 Core Ultra、Arc B60、Arc A770 部署基于 openVINO LLM 的 EC-RAG
+
+请确保已准备好 [openVINO 模型](#openvino)
+
+```bash
+docker compose -f docker_compose/intel/gpu/arc/compose.yaml up -d
+```
+
+#### Option b.1. 为 Arc B60 部署基于 vLLM 的 EC-RAG
+
+请确保已准备好 [vLLM 模型](#vllm)
+
+```bash
+docker compose --profile b60 -f docker_compose/intel/gpu/arc/compose.yaml up -d
+```
+
+#### Option b.2. 为 Arc A770 部署基于 vLLM 的 EC-RAG
+
+请确保已准备好 [vLLM 模型](#vllm)
+
+```bash
+docker compose --profile a770 -f docker_compose/intel/gpu/arc/compose.yaml up -d
+```
+
+### 6. 访问 UI
+
+打开浏览器访问 http://${HOST_IP}:8082
+
+> 浏览器应运行在与控制台相同的主机上；否则你需要使用主机域名而不是 ${HOST_IP} 来访问 UI。
+
+下图为 UI 首页。有关 UI 操作和 EC-RAG 设置的详细说明，请参考 [Explore_Edge_Craft_RAG](../../../../docs/Explore_Edge_Craft_RAG.md)
+![front_page](../../../../assets/img/front_page.png)
+
+### 7. 清理部署
+
+若要停止与本次部署关联的容器，请执行以下命令：
+
+```
+docker compose -f docker_compose/intel/gpu/arc/compose.yaml down
+```
+
+执行完 `down` 命令后，所有 EdgeCraftRAG 容器都会停止并被移除。
+
+## EdgeCraftRAG Docker Compose 文件
+
+`compose.yaml` 是默认的 compose 文件，使用 tgi 作为服务框架。
+
+| 服务名称            | 镜像名称                                 |
+| ------------------- | ---------------------------------------- |
+| etcd                | quay.io/coreos/etcd:v3.5.5               |
+| minio               | minio/minio:RELEASE.2023-03-20T20-16-18Z |
+| milvus-standalone   | milvusdb/milvus:v2.4.6                   |
+| edgecraftrag-server | opea/edgecraftrag-server:latest          |
+| edgecraftrag-ui     | opea/edgecraftrag-ui:latest              |
+| ecrag               | opea/edgecraftrag:latest                 |
+
+## EdgeCraftRAG 服务配置
+
+下表全面概述了示例 Docker Compose 文件中各类部署所使用的 EdgeCraftRAG 服务。表中每一行代表一个独立服务，详细说明了可用镜像及其在部署架构中的功能描述。
+
+| 服务名称            | 可选镜像名称                             | 可选 | 描述                                                                                             |
+| ------------------- | ---------------------------------------- | ---- | ------------------------------------------------------------------------------------------------ |
+| etcd                | quay.io/coreos/etcd:v3.5.5               | 否   | 提供分布式键值存储，用于服务发现和配置管理。                                                     |
+| minio               | minio/minio:RELEASE.2023-03-20T20-16-18Z | 否   | 提供对象存储服务，用于存储文档和模型文件。                                                       |
+| milvus-standalone   | milvusdb/milvus:v2.4.6                   | 否   | 提供向量数据库能力，用于管理 embedding 和相似度检索。                                            |
+| edgecraftrag-server | opea/edgecraftrag-server:latest          | 否   | 作为 EdgeCraftRAG 服务后端，具体形态随部署方式不同而变化。                                       |
+| edgecraftrag-ui     | opea/edgecraftrag-ui:latest              | 否   | 提供 EdgeCraftRAG 服务的用户界面。                                                               |
+| ecrag               | opea/edgecraftrag:latest                 | 否   | 作为反向代理，管理 UI 与后端服务之间的流量。                                                     |