Skip to content

Commit 50f6b3e

Browse files
joshuayaopre-commit-ci[bot]yinghu5
authored
CodeGen Docker Compose: Separate Files for Serving Frameworks (#2064)
Signed-off-by: Yi Yao <yi.a.yao@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ying Hu <ying.hu@intel.com>
1 parent c873fac commit 50f6b3e

8 files changed

Lines changed: 499 additions & 227 deletions

File tree

CodeGen/docker_compose/intel/cpu/xeon/README.md

Lines changed: 61 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -33,65 +33,67 @@ This guide focuses on running the pre-configured CodeGen service using Docker Co
3333

3434
## Quick Start Deployment
3535

36-
This uses the default vLLM-based deployment profile (`codegen-xeon-vllm`).
36+
This uses the default vLLM-based deployment using `compose.yaml`.
3737

3838
1. **Configure Environment:**
3939
Set required environment variables in your shell:
4040

41-
```bash
42-
# Replace with your host's external IP address (do not use localhost or 127.0.0.1)
43-
export HOST_IP="your_external_ip_address"
44-
# Replace with your Hugging Face Hub API token
45-
export HF_TOKEN="your_huggingface_token"
46-
47-
# Optional: Configure proxy if needed
48-
# export http_proxy="your_http_proxy"
49-
# export https_proxy="your_https_proxy"
50-
# export no_proxy="localhost,127.0.0.1,${HOST_IP}" # Add other hosts if necessary
51-
source intel/set_env.sh
52-
cd /intel/cpu/xeon
53-
```
41+
```bash
42+
# Replace with your host's external IP address (do not use localhost or 127.0.0.1)
43+
export HOST_IP="your_external_ip_address"
44+
# Replace with your Hugging Face Hub API token
45+
export HF_TOKEN="your_huggingface_token"
46+
47+
# Optional: Configure proxy if needed
48+
# export http_proxy="your_http_proxy"
49+
# export https_proxy="your_https_proxy"
50+
# export no_proxy="localhost,127.0.0.1,${HOST_IP}" # Add other hosts if necessary
51+
source intel/set_env.sh
52+
cd /intel/cpu/xeon
53+
```
5454

55-
_Note: The compose file might read additional variables from set_env.sh. Ensure all required variables like ports (`LLM_SERVICE_PORT`, `MEGA_SERVICE_PORT`, etc.) are set if not using defaults from the compose file._
55+
_Note: The compose file might read additional variables from set_env.sh. Ensure all required variables like ports (`LLM_SERVICE_PORT`, `MEGA_SERVICE_PORT`, etc.) are set if not using defaults from the compose file._
5656

5757
For instance, edit the set_env.sh to change the LLM model
5858

59-
```
60-
export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-7B-Instruct"
61-
```
62-
can be changed to other model if needed
63-
```
64-
export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-32B-Instruct"
65-
```
59+
```bash
60+
export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-7B-Instruct"
61+
```
6662

67-
2. **Start Services (vLLM Profile):**
63+
can be changed to other model if needed
6864

6965
```bash
70-
docker compose --profile codegen-xeon-vllm up -d
66+
export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-32B-Instruct"
67+
```
68+
69+
2. **Start Services (vLLM):**
70+
71+
```bash
72+
docker compose up -d
7173
```
7274

7375
3. **Validate:**
7476
Wait several minutes for models to download (especially the first time) and services to initialize. Check container logs (`docker compose logs -f <service_name>`) or proceed to the validation steps below.
7577

7678
### Available Deployment Options
7779

78-
The `compose.yaml` file uses Docker Compose profiles to select the LLM serving backend.
80+
Different Docker Compose files are available to select the LLM serving backend.
7981

80-
#### Default: vLLM-based Deployment (`--profile codegen-xeon-vllm`)
82+
#### Default: vLLM-based Deployment (`compose.yaml`)
8183

82-
- **Profile:** `codegen-xeon-vllm`
83-
- **Description:** Uses vLLM optimized for Intel CPUs as the LLM serving engine. This is the default profile used in the Quick Start.
84+
- **Compose File:** `compose.yaml`
85+
- **Description:** Uses vLLM optimized for Intel CPUs as the LLM serving engine. This is the default deployment option used in the Quick Start.
8486
- **Services Deployed:** `codegen-vllm-server`, `codegen-llm-server`, `codegen-tei-embedding-server`, `codegen-retriever-server`, `redis-vector-db`, `codegen-dataprep-server`, `codegen-backend-server`, `codegen-gradio-ui-server`.
8587

86-
#### TGI-based Deployment (`--profile codegen-xeon-tgi`)
88+
#### TGI-based Deployment (`compose_tgi.yaml`)
8789

88-
- **Profile:** `codegen-xeon-tgi`
90+
- **Compose File:** `compose_tgi.yaml`
8991
- **Description:** Uses Hugging Face Text Generation Inference (TGI) optimized for Intel CPUs as the LLM serving engine.
9092
- **Services Deployed:** `codegen-tgi-server`, `codegen-llm-server`, `codegen-tei-embedding-server`, `codegen-retriever-server`, `redis-vector-db`, `codegen-dataprep-server`, `codegen-backend-server`, `codegen-gradio-ui-server`.
9193
- **To Run:**
9294
```bash
9395
# Ensure environment variables (HOST_IP, HF_TOKEN) are set
94-
docker compose --profile codegen-xeon-tgi up -d
96+
docker compose -f compose_tgi.yaml up -d
9597
```
9698

9799
### Configuration Parameters
@@ -100,28 +102,28 @@ The `compose.yaml` file uses Docker Compose profiles to select the LLM serving b
100102

101103
Key parameters are configured via environment variables set before running `docker compose up`.
102104

103-
| Environment Variable | Description | Default (Set Externally) |
104-
| :-------------------------------------- | :------------------------------------------------------------------------------------------------------------------ | :--------------------------------------------- | ------------------------------------ |
105-
| `HOST_IP` | External IP address of the host machine. **Required.** | `your_external_ip_address` |
106-
| `HF_TOKEN` | Your Hugging Face Hub token for model access. **Required.** | `your_huggingface_token` |
107-
| `LLM_MODEL_ID` | Hugging Face model ID for the CodeGen LLM (used by TGI/vLLM service). Configured within `compose.yaml` environment. | `Qwen/Qwen2.5-Coder-7B-Instruct` |
108-
| `EMBEDDING_MODEL_ID` | Hugging Face model ID for the embedding model (used by TEI service). Configured within `compose.yaml` environment. | `BAAI/bge-base-en-v1.5` |
109-
| `LLM_ENDPOINT` | Internal URL for the LLM serving endpoint (used by `codegen-llm-server`). Configured in `compose.yaml`. | `http://codegen-vllm | tgi-server:9000/v1/chat/completions` |
110-
| `TEI_EMBEDDING_ENDPOINT` | Internal URL for the Embedding service. Configured in `compose.yaml`. | `http://codegen-tei-embedding-server:80/embed` |
111-
| `DATAPREP_ENDPOINT` | Internal URL for the Data Preparation service. Configured in `compose.yaml`. | `http://codegen-dataprep-server:80/dataprep` |
112-
| `BACKEND_SERVICE_ENDPOINT` | External URL for the CodeGen Gateway (MegaService). Derived from `HOST_IP` and port `7778`. | `http://${HOST_IP}:7778/v1/codegen` |
113-
| `*_PORT` (Internal) | Internal container ports (e.g., `80`, `6379`). Defined in `compose.yaml`. | N/A |
114-
| `http_proxy` / `https_proxy`/`no_proxy` | Network proxy settings (if required). | `""` |
105+
| Environment Variable | Description | Default (Set Externally) |
106+
| :-------------------------------------- | :----------------------------------------------------------------------------------------------------- | :---------------------------------------------------- |
107+
| `HOST_IP` | External IP address of the host machine. **Required.** | `your_external_ip_address` |
108+
| `HF_TOKEN` | Your Hugging Face Hub token for model access. **Required.** | `your_huggingface_token` |
109+
| `LLM_MODEL_ID` | Hugging Face model ID for the CodeGen LLM (used by TGI/vLLM service). Configured within composes files | `Qwen/Qwen2.5-Coder-7B-Instruct` |
110+
| `EMBEDDING_MODEL_ID` | Hugging Face model ID for the embedding model (used by TEI service). Configured within compose files. | `BAAI/bge-base-en-v1.5` |
111+
| `LLM_ENDPOINT` | Internal URL for the LLM serving endpoint (used by `codegen-llm-server`). Configured in compose files. | `http://codegen-vllm-server:9000/v1/chat/completions` |
112+
| `TEI_EMBEDDING_ENDPOINT` | Internal URL for the Embedding service. Configured in compose files. | `http://codegen-tei-embedding-server:80/embed` |
113+
| `DATAPREP_ENDPOINT` | Internal URL for the Data Preparation service. Configured in compose files. | `http://codegen-dataprep-server:80/dataprep` |
114+
| `BACKEND_SERVICE_ENDPOINT` | External URL for the CodeGen Gateway (MegaService). Derived from `HOST_IP` and port `7778`. | `http://${HOST_IP}:7778/v1/codegen` |
115+
| `*_PORT` (Internal) | Internal container ports (e.g., `80`, `6379`). Defined in compose files. | N/A |
116+
| `http_proxy` / `https_proxy`/`no_proxy` | Network proxy settings (if required). | `""` |
115117

116118
Most of these parameters are in `set_env.sh`, you can either modify this file or overwrite the env variables by setting them.
117119

118120
```shell
119121
source CodeGen/docker_compose/set_env.sh
120122
```
121123

122-
#### Compose Profiles
124+
#### Compose Files
123125

124-
Docker Compose profiles (`codegen-xeon-vllm`, `codegen-xeon-tgi`) control which LLM serving backend (vLLM or TGI) and its associated dependencies are started. Only one profile should typically be active.
126+
Different Docker Compose files (`compose.yaml`, `compose_tgi.yaml`) control which LLM serving backend (vLLM or TGI) and its associated dependencies are started. Choose the appropriate compose file based on your requirements.
125127

126128
## Building Custom Images (Optional)
127129

@@ -130,19 +132,20 @@ If you need to modify the microservices:
130132
1. Clone the [OPEA GenAIComps](https://github.com/opea-project/GenAIComps) repository.
131133
2. Follow build instructions in the respective component directories (e.g., `comps/llms/text-generation`, `comps/codegen`, `comps/ui/gradio`, etc.). Use the provided Dockerfiles (e.g., `CodeGen/Dockerfile`, `CodeGen/ui/docker/Dockerfile.gradio`).
132134
3. Tag your custom images appropriately (e.g., `my-custom-codegen:latest`).
133-
4. Update the `image:` fields in the `compose.yaml` file to use your custom image tags.
135+
4. Update the `image:` fields in the compose files (`compose.yaml` or `compose_tgi.yaml`) to use your custom image tags.
134136

135137
_Refer to the main [CodeGen README](../../../../README.md) for links to relevant GenAIComps components._
136138

137139
## Validate Services
138140

139141
### Check Container Status
140142

141-
Ensure all containers associated with the chosen profile are running:
143+
Ensure all containers associated with the chosen compose file are running:
142144

143145
```bash
144-
docker compose --profile <profile_name> ps
145-
# Example: docker compose --profile codegen-xeon-vllm ps
146+
docker compose -f <compose-file> ps
147+
# Example: docker compose ps # for vLLM (compose.yaml)
148+
# Example: docker compose -f compose_tgi.yaml ps # for TGI
146149
```
147150

148151
Check logs for specific services: `docker compose logs <service_name>`
@@ -173,7 +176,7 @@ Use `curl` commands to test the main service endpoints. Ensure `HOST_IP` is corr
173176

174177
## Accessing the User Interface (UI)
175178

176-
Multiple UI options can be configured via the `compose.yaml`.
179+
Multiple UI options can be configured via the compose files.
177180

178181
### Gradio UI (Default)
179182

@@ -186,16 +189,16 @@ _(Port `5173` is the default host mapping for `codegen-gradio-ui-server`)_
186189

187190
### Svelte UI (Optional)
188191

189-
1. Modify `compose.yaml`: Comment out the `codegen-gradio-ui-server` service and uncomment/add the `codegen-xeon-ui-server` (Svelte) service definition, ensuring the port mapping is correct (e.g., `"- 5173:5173"`).
190-
2. Restart Docker Compose: `docker compose --profile <profile_name> up -d`
192+
1. Modify the compose file (either `compose.yaml` or `compose_tgi.yaml`): Comment out the `codegen-gradio-ui-server` service and uncomment/add the `codegen-xeon-ui-server` (Svelte) service definition, ensuring the port mapping is correct (e.g., `"- 5173:5173"`).
193+
2. Restart Docker Compose: `docker compose up -d` or `docker compose -f compose_tgi.yaml up -d`
191194
3. Access: `http://{HOST_IP}:5173` (or the host port you mapped).
192195

193196
![Svelte UI Init](../../../../assets/img/codeGen_ui_init.jpg)
194197

195198
### React UI (Optional)
196199

197-
1. Modify `compose.yaml`: Comment out the default UI service and uncomment/add the `codegen-xeon-react-ui-server` definition, ensuring correct port mapping (e.g., `"- 5174:80"`).
198-
2. Restart Docker Compose: `docker compose --profile <profile_name> up -d`
200+
1. Modify the compose file (either `compose.yaml` or `compose_tgi.yaml`): Comment out the default UI service and uncomment/add the `codegen-xeon-react-ui-server` definition, ensuring correct port mapping (e.g., `"- 5174:80"`).
201+
2. Restart Docker Compose: `docker compose up -d` or `docker compose -f compose_tgi.yaml up -d`
199202
3. Access: `http://{HOST_IP}:5174` (or the host port you mapped).
200203

201204
![React UI](../../../../assets/img/codegen_react.png)
@@ -218,21 +221,18 @@ Users can interact with the backend service using the `Neural Copilot` VS Code e
218221

219222
- **Model Download Issues:** Check `HF_TOKEN`. Ensure internet connectivity or correct proxy settings. Check logs of `tgi-service`/`vllm-service` and `tei-embedding-server`. Gated models need prior Hugging Face access.
220223
- **Connection Errors:** Verify `HOST_IP` is correct and accessible. Check `docker ps` for port mappings. Ensure `no_proxy` includes `HOST_IP` if using a proxy. Check logs of the service failing to connect (e.g., `codegen-backend-server` logs if it can't reach `codegen-llm-server`).
221-
- **"Container name is in use"**: Stop existing containers (`docker compose down`) or change `container_name` in `compose.yaml`.
224+
- **"Container name is in use"**: Stop existing containers (`docker compose down`) or change `container_name` in the compose file.
222225
- **Resource Issues:** CodeGen models can be memory-intensive. Monitor host RAM usage. Increase Docker resources if needed.
223226
224227
## Stopping the Application
225228
226229
```bash
227-
docker compose --profile <profile_name> down
228-
# Example: docker compose --profile codegen-xeon-vllm down
230+
docker compose down # for vLLM (compose.yaml)
231+
# or
232+
docker compose -f compose_tgi.yaml down # for TGI
229233
```
230234
231235
## Next Steps
232236
233237
- Consult the [OPEA GenAIComps](https://github.com/opea-project/GenAIComps) repository for details on individual microservices.
234238
- Refer to the main [CodeGen README](../../../../README.md) for links to benchmarking and Kubernetes deployment options.
235-
236-
```
237-
238-
```

CodeGen/docker_compose/intel/cpu/xeon/compose.yaml

Lines changed: 0 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -3,33 +3,9 @@
33

44
services:
55

6-
tgi-service:
7-
image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
8-
container_name: tgi-server
9-
profiles:
10-
- codegen-xeon-tgi
11-
ports:
12-
- "8028:80"
13-
volumes:
14-
- "${MODEL_CACHE:-./data}:/data"
15-
shm_size: 1g
16-
environment:
17-
no_proxy: ${no_proxy}
18-
http_proxy: ${http_proxy}
19-
https_proxy: ${https_proxy}
20-
HF_TOKEN: ${HF_TOKEN}
21-
host_ip: ${host_ip}
22-
healthcheck:
23-
test: ["CMD-SHELL", "curl -f http://localhost:80/health || exit 1"]
24-
interval: 10s
25-
timeout: 10s
26-
retries: 100
27-
command: --model-id ${LLM_MODEL_ID} --cuda-graphs 0
286
vllm-service:
297
image: ${REGISTRY:-opea}/vllm:${TAG:-latest}
308
container_name: vllm-server
31-
profiles:
32-
- codegen-xeon-vllm
339
ports:
3410
- "8028:80"
3511
volumes:
@@ -58,22 +34,9 @@ services:
5834
LLM_MODEL_ID: ${LLM_MODEL_ID}
5935
HUGGINGFACEHUB_API_TOKEN: ${HF_TOKEN}
6036
restart: unless-stopped
61-
llm-tgi-service:
62-
extends: llm-base
63-
container_name: llm-codegen-tgi-server
64-
profiles:
65-
- codegen-xeon-tgi
66-
ports:
67-
- "9000:9000"
68-
ipc: host
69-
depends_on:
70-
tgi-service:
71-
condition: service_healthy
7237
llm-vllm-service:
7338
extends: llm-base
7439
container_name: llm-codegen-vllm-server
75-
profiles:
76-
- codegen-xeon-vllm
7740
ports:
7841
- "9000:9000"
7942
ipc: host

0 commit comments

Comments
 (0)