This document outlines the deployment process for a CodeGen application utilizing the GenAIComps microservice pipeline on Intel Xeon servers. The pipeline integrates openGauss as the vector database (VectorDB) and includes microservices such as embedding, retriever, and llm.
To set up environment variables for deploying CodeGen services, follow these steps:
-
Set the required environment variables:
# Example: host_ip="192.168.1.1" export host_ip="External_Public_IP" export HOST_IP=$host_ip export HF_TOKEN="Your_Huggingface_API_Token" export GS_USER="gaussdb" export GS_PASSWORD="openGauss@123" export GS_DB="postgres" export GS_CONNECTION_STRING="opengauss+psycopg2://${GS_USER}:${GS_PASSWORD}@${host_ip}:5432/${GS_DB}"
-
If you are in a proxy environment, also set the proxy-related environment variables:
export http_proxy="Your_HTTP_Proxy" export https_proxy="Your_HTTPS_Proxy" # Example: no_proxy="localhost,127.0.0.1,192.168.1.1" export no_proxy="Your_No_Proxy",codegen-xeon-ui-server,codegen-xeon-backend-server,dataprep-opengauss-server,tei-embedding-serving,retriever-opengauss-server,vllm-server
-
Set up other environment variables:
source ../set_env_opengauss.sh
docker compose -f compose_opengauss.yaml up -dIt will automatically download the Docker images from Docker Hub:
docker pull opea/codegen:latest
docker pull opea/codegen-ui:latestNote: You should build docker images from source yourself if:
- You are developing off the git main branch (as the container's ports in the repo may be different from the published docker image).
- You can't download the docker image.
- You want to use a specific version of Docker image.
Please refer to Build Docker Images below.
curl http://${host_ip}:7778/v1/codegen \
-H "Content-Type: application/json" \
-d '{
"messages": "Write a Python function to calculate fibonacci numbers"
}'First of all, you need to build Docker Images locally.
git clone https://github.com/opea-project/GenAIComps.git
cd GenAICompsdocker build --no-cache -t opea/retriever:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/retrievers/src/Dockerfile .docker build --no-cache -t opea/dataprep:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/src/Dockerfile .
cd ..git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/CodeGen
docker build --no-cache -t opea/codegen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .cd GenAIExamples/CodeGen/ui
docker build --no-cache -t opea/codegen-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile .Then run the command docker images, you will have the following Docker Images:
opea/dataprep:latestopea/retriever:latestopea/codegen:latestopea/codegen-ui:latest
By default, the embedding and LLM models are set to default values as listed below:
| Service | Model |
|---|---|
| Embedding | BAAI/bge-base-en-v1.5 |
| LLM | Qwen/Qwen2.5-Coder-7B-Instruct |
Change the xxx_MODEL_ID in the environment for your needs.
Note: When verifying the microservices by curl or API from a remote client, please make sure the ports of the microservices are opened in the firewall of the cloud node.
curl ${host_ip}:8090/embed \
-X POST \
-d '{"inputs":"What is Deep Learning?"}' \
-H 'Content-Type: application/json'To consume the retriever microservice, you need to generate a mock embedding vector by Python script. The length of the embedding vector is determined by the embedding model. Here we use the model EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5", which vector size is 768.
export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
curl http://${host_ip}:7000/v1/retrieval \
-X POST \
-d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \
-H 'Content-Type: application/json'In the first startup, this service will take more time to download, load, and warm up the model. After it's finished, the service will be ready.
Try the command below to check whether the LLM serving is ready:
docker logs vllm-server 2>&1 | grep completeIf the service is ready, you will get the response like below:
INFO: Application startup complete.
Then try the cURL command below to validate services:
curl http://${host_ip}:8028/v1/chat/completions \
-X POST \
-d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Write a hello world in Python"}], "max_tokens":50}' \
-H 'Content-Type: application/json'curl http://${host_ip}:7778/v1/codegen \
-H "Content-Type: application/json" \
-d '{
"messages": "Write a Python function to sort a list"
}'If you want to update the default knowledge base, you can use the following commands:
Upload a file:
curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \
-H "Content-Type: multipart/form-data" \
-F "files=@./your_code_file.py"Add Knowledge Base via HTTP Links:
curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \
-H "Content-Type: multipart/form-data" \
-F 'link_list=["https://example.com/code"]'Delete uploaded files:
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete" \
-d '{"file_path": "all"}' \
-H "Content-Type: application/json"To access the frontend, open the following URL in your browser: http://{host_ip}:5173
By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the compose_opengauss.yaml file as shown below:
codegen-xeon-ui-server:
image: opea/codegen-ui:latest
...
ports:
- "80:5173"