This document outlines the single node deployment process for a AgentQnA application utilizing the GenAIComps microservices on AMD GPU (ROCm) server. The steps include pulling Docker images, container deployment via Docker Compose, and service execution using microservices agent.
- AgentQnA Quick Start Deployment
- Configuration Parameters
- AgentQnA Docker Compose Files
- Validate Services
- Conclusion
This section describes how to quickly deploy and test the AgentQnA service manually on AMD GPU (ROCm) server. The basic steps are:
- Access the Code
- Configure the Deployment Environment
- Deploy the Services Using Docker Compose
- Ingest Data into the Vector Database
- Cleanup the Deployment
Clone the GenAIExample repository and access the AgentQnA AMD GPU (ROCm) server Docker Compose files and supporting scripts:
export WORKDIR=<your-work-directory>
cd $WORKDIR
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/AgentQnAThen checkout a released version, such as v1.4:
git checkout v1.4### Replace the string 'server_address' with your local server IP address
export host_ip='server_address'
### Replace the string 'your_huggingfacehub_token' with your HuggingFacehub repository access token.
export HF_TOKEN='your_huggingfacehub_token'
### Replace the string 'your_langchain_api_key' with your LANGCHAIN API KEY.
export LANGCHAIN_API_KEY='your_langchain_api_key'
export LANGCHAIN_TRACING_V2=""cd GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
bash launch_agent_service_vllm_rocm.shcd GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
bash launch_agent_service_tgi_rocm.shAfter launching agent services, check if all the containers launched via docker compose have started:
- dataprep-redis-server
- doc-index-retriever-server
- embedding-server
- rag-agent-endpoint
- react-agent-endpoint
- redis-vector-db
- reranking-tei-xeon-server
- retriever-redis-server
- sql-agent-endpoint
- tei-embedding-server
- tei-reranking-server
- vllm-service
- dataprep-redis-server
- doc-index-retriever-server
- embedding-server
- rag-agent-endpoint
- react-agent-endpoint
- redis-vector-db
- reranking-tei-xeon-server
- retriever-redis-server
- sql-agent-endpoint
- tei-embedding-server
- tei-reranking-server
- tgi-service
To stop the containers associated with the deployment, execute the following command:
cd GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
bash stop_agent_service_vllm_rocm.shcd GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
bash stop_agent_service_tgi_rocm.shKey parameters are configured via environment variables set before running docker compose up.
| Environment Variable | Description | Default (Set Externally) |
|---|---|---|
ip_address |
External IP address of the host machine. Required. | your_external_ip_address |
HF_TOKEN |
Your Hugging Face Hub token for model access. Required. | your_huggingface_token |
VLLM_LLM_MODEL_ID |
Hugging Face model ID for the AgentQnA LLM. Configured within compose.yaml environment. |
Intel/neural-chat-7b-v3-3 |
TOOLSET_PATH |
Local path to the tool Yaml file. Configured in compose.yaml. |
${WORKPATH}/../../../tools/ |
CRAG_SERVER |
CRAG server URL. Derived from ip_address and port 8080. |
http://${ip_address}:8080 |
WORKER_AGENT_URL |
Worker agent URL. Derived from ip_address and port 9095. |
http://${ip_address}:9095/v1/chat/completions |
SQL_AGENT_URL |
SQL agent URL. Derived from ip_address and port 9096. |
http://${ip_address}:9096/v1/chat/completions |
http_proxy / https_proxy/no_proxy |
Network proxy settings (if required). | "" |
In the context of deploying a AgentQnA pipeline on an Intel® Xeon® platform, we can pick and choose different large language model serving frameworks. The table below outlines the various configurations that are available as part of the application. These configurations can be used as templates and can be extended to different components available in GenAIComps.
| File | Description |
|---|---|
| compose.yaml | Default compose file using tgi as serving framework |
| compose_vllm.yaml | The LLM serving framework is vLLM. All other configurations remain the same as the default |
DATA='{"model": "Intel/neural-chat-7b-v3-3t", '\
'"messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens": 256}'
curl http://${HOST_IP}:${VLLM_SERVICE_PORT}/v1/chat/completions \
-X POST \
-d "$DATA" \
-H 'Content-Type: application/json'Checking the response from the service. The response should be similar to JSON:
{
"id": "chatcmpl-142f34ef35b64a8db3deedd170fed951",
"object": "chat.completion",
"created": 1742270316,
"model": "Intel/neural-chat-7b-v3-3",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "length",
"stop_reason": null
}
],
"usage": { "prompt_tokens": 66, "total_tokens": 322, "completion_tokens": 256, "prompt_tokens_details": null },
"prompt_logprobs": null
}If the service response has a meaningful response in the value of the "choices.message.content" key, then we consider the vLLM service to be successfully launched
DATA='{"inputs":"What is Deep Learning?",'\
'"parameters":{"max_new_tokens":256,"do_sample": true}}'
curl http://${HOST_IP}:${TGI_SERVICE_PORT}/generate \
-X POST \
-d "$DATA" \
-H 'Content-Type: application/json'Checking the response from the service. The response should be similar to JSON:
{
"generated_text": " "
}If the service response has a meaningful response in the value of the "generated_text" key, then we consider the TGI service to be successfully launched
export agent_port=${WORKER_RAG_AGENT_PORT}
prompt="Tell me about Michael Jackson song Thriller"
python3 ~/agentqna-install/GenAIExamples/AgentQnA/tests/test.py --prompt "$prompt" --agent_role "worker" --ext_port $agent_portThe response must contain the meaningful text of the response to the request from the "prompt" variable
export agent_port=${WORKER_SQL_AGENT_PORT}
prompt="How many employees are there in the company?"
python3 ~/agentqna-install/GenAIExamples/AgentQnA/tests/test.py --prompt "$prompt" --agent_role "worker" --ext_port $agent_portThe answer should make sense - "8 employees in the company"
export agent_port=${SUPERVISOR_REACT_AGENT_PORT}
python3 ~/agentqna-install/GenAIExamples/AgentQnA/tests/test.py --agent_role "supervisor" --ext_port $agent_port --streamThe response should contain "Iron Maiden"
This guide provides a comprehensive workflow for deploying, configuring, and validating the AgentQnA system on AMD GPU (ROCm), enabling flexible integration with both OpenAI-compatible and remote LLM services.