Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
compose.yaml	compose.yaml
compose_vllm.yaml	compose_vllm.yaml
launch_agent_service_tgi_rocm.sh	launch_agent_service_tgi_rocm.sh
launch_agent_service_vllm_rocm.sh	launch_agent_service_vllm_rocm.sh
stop_agent_service_tgi_rocm.sh	stop_agent_service_tgi_rocm.sh
stop_agent_service_vllm_rocm.sh	stop_agent_service_vllm_rocm.sh

Deploy AgentQnA on AMD GPU (ROCm)

This document outlines the single node deployment process for a AgentQnA application utilizing the GenAIComps microservices on AMD GPU (ROCm) server. The steps include pulling Docker images, container deployment via Docker Compose, and service execution using microservices agent.

AgentQnA Quick Start Deployment
Configuration Parameters
AgentQnA Docker Compose Files
Validate Services
Conclusion

AgentQnA Quick Start Deployment

This section describes how to quickly deploy and test the AgentQnA service manually on AMD GPU (ROCm) server. The basic steps are:

Access the Code
Configure the Deployment Environment
Deploy the Services Using Docker Compose
Ingest Data into the Vector Database
Cleanup the Deployment

Access the Code

Clone the GenAIExample repository and access the AgentQnA AMD GPU (ROCm) server Docker Compose files and supporting scripts:

export WORKDIR=<your-work-directory>
cd $WORKDIR
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/AgentQnA

Then checkout a released version, such as v1.4:

git checkout v1.4

Configure the Deployment Environment

### Replace the string 'server_address' with your local server IP address
export host_ip='server_address'
### Replace the string 'your_huggingfacehub_token' with your HuggingFacehub repository access token.
export HF_TOKEN='your_huggingfacehub_token'
### Replace the string 'your_langchain_api_key' with your LANGCHAIN API KEY.
export LANGCHAIN_API_KEY='your_langchain_api_key'
export LANGCHAIN_TRACING_V2=""

Deploy the Services Using Docker Compose

If you use vLLM

cd GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
bash launch_agent_service_vllm_rocm.sh

If you use TGI

cd GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
bash launch_agent_service_tgi_rocm.sh

Check the Deployment Status

After launching agent services, check if all the containers launched via docker compose have started:

If you use vLLM

dataprep-redis-server
doc-index-retriever-server
embedding-server
rag-agent-endpoint
react-agent-endpoint
redis-vector-db
reranking-tei-xeon-server
retriever-redis-server
sql-agent-endpoint
tei-embedding-server
tei-reranking-server
vllm-service

If you use TGI

dataprep-redis-server
doc-index-retriever-server
embedding-server
rag-agent-endpoint
react-agent-endpoint
redis-vector-db
reranking-tei-xeon-server
retriever-redis-server
sql-agent-endpoint
tei-embedding-server
tei-reranking-server
tgi-service

Cleanup the Deployment

To stop the containers associated with the deployment, execute the following command:

If you use vLLM

cd GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
bash stop_agent_service_vllm_rocm.sh

If you use TGI

cd GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
bash stop_agent_service_tgi_rocm.sh

Configuration Parameters

Key parameters are configured via environment variables set before running docker compose up.

Environment Variable	Description	Default (Set Externally)
`ip_address`	External IP address of the host machine. Required.	`your_external_ip_address`
`HF_TOKEN`	Your Hugging Face Hub token for model access. Required.	`your_huggingface_token`
`VLLM_LLM_MODEL_ID`	Hugging Face model ID for the AgentQnA LLM. Configured within `compose.yaml` environment.	`Intel/neural-chat-7b-v3-3`
`TOOLSET_PATH`	Local path to the tool Yaml file. Configured in `compose.yaml`.	`${WORKPATH}/../../../tools/`
`CRAG_SERVER`	CRAG server URL. Derived from `ip_address` and port `8080`.	`http://${ip_address}:8080`
`WORKER_AGENT_URL`	Worker agent URL. Derived from `ip_address` and port `9095`.	`http://${ip_address}:9095/v1/chat/completions`
`SQL_AGENT_URL`	SQL agent URL. Derived from `ip_address` and port `9096`.	`http://${ip_address}:9096/v1/chat/completions`
`http_proxy` / `https_proxy`/`no_proxy`	Network proxy settings (if required).	`""`

AgentQnA Docker Compose Files

In the context of deploying a AgentQnA pipeline on an Intel® Xeon® platform, we can pick and choose different large language model serving frameworks. The table below outlines the various configurations that are available as part of the application. These configurations can be used as templates and can be extended to different components available in GenAIComps.

File	Description
compose.yaml	Default compose file using tgi as serving framework
compose_vllm.yaml	The LLM serving framework is vLLM. All other configurations remain the same as the default

Validate Services

1. Validate the vLLM/TGI Service

If you use vLLM:

DATA='{"model": "Intel/neural-chat-7b-v3-3t", '\
'"messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens": 256}'

curl http://${HOST_IP}:${VLLM_SERVICE_PORT}/v1/chat/completions \
  -X POST \
  -d "$DATA" \
  -H 'Content-Type: application/json'

Checking the response from the service. The response should be similar to JSON:

{
  "id": "chatcmpl-142f34ef35b64a8db3deedd170fed951",
  "object": "chat.completion",
  "created": 1742270316,
  "model": "Intel/neural-chat-7b-v3-3",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null
    }
  ],
  "usage": { "prompt_tokens": 66, "total_tokens": 322, "completion_tokens": 256, "prompt_tokens_details": null },
  "prompt_logprobs": null
}

If the service response has a meaningful response in the value of the "choices.message.content" key, then we consider the vLLM service to be successfully launched

If you use TGI:

DATA='{"inputs":"What is Deep Learning?",'\
'"parameters":{"max_new_tokens":256,"do_sample": true}}'

curl http://${HOST_IP}:${TGI_SERVICE_PORT}/generate \
  -X POST \
  -d "$DATA" \
  -H 'Content-Type: application/json'

Checking the response from the service. The response should be similar to JSON:

{
  "generated_text": " "
}

If the service response has a meaningful response in the value of the "generated_text" key, then we consider the TGI service to be successfully launched

2. Validate Agent Services

Validate RAG Agent Service

export agent_port=${WORKER_RAG_AGENT_PORT}
prompt="Tell me about Michael Jackson song Thriller"
python3 ~/agentqna-install/GenAIExamples/AgentQnA/tests/test.py --prompt "$prompt" --agent_role "worker" --ext_port $agent_port

The response must contain the meaningful text of the response to the request from the "prompt" variable

Validate SQL Agent Service

export agent_port=${WORKER_SQL_AGENT_PORT}
prompt="How many employees are there in the company?"
python3 ~/agentqna-install/GenAIExamples/AgentQnA/tests/test.py --prompt "$prompt" --agent_role "worker" --ext_port $agent_port

The answer should make sense - "8 employees in the company"

Validate React (Supervisor) Agent Service

export agent_port=${SUPERVISOR_REACT_AGENT_PORT}
python3 ~/agentqna-install/GenAIExamples/AgentQnA/tests/test.py --agent_role "supervisor" --ext_port $agent_port --stream

The response should contain "Iron Maiden"

Conclusion

This guide provides a comprehensive workflow for deploying, configuring, and validating the AgentQnA system on AMD GPU (ROCm), enabling flexible integration with both OpenAI-compatible and remote LLM services.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Deploy AgentQnA on AMD GPU (ROCm)

Table of Contents

AgentQnA Quick Start Deployment

Access the Code

Configure the Deployment Environment

Deploy the Services Using Docker Compose

If you use vLLM

If you use TGI

Check the Deployment Status

If you use vLLM

If you use TGI

Cleanup the Deployment

If you use vLLM

If you use TGI

Configuration Parameters

AgentQnA Docker Compose Files

Validate Services

1. Validate the vLLM/TGI Service

If you use vLLM:

If you use TGI:

2. Validate Agent Services

Validate RAG Agent Service

Validate SQL Agent Service

Validate React (Supervisor) Agent Service

Conclusion

FilesExpand file tree

rocm

Directory actions

More options

Directory actions

More options

Latest commit

History

rocm

Folders and files

parent directory

README.md

Deploy AgentQnA on AMD GPU (ROCm)

Table of Contents

AgentQnA Quick Start Deployment

Access the Code

Configure the Deployment Environment

Deploy the Services Using Docker Compose

If you use vLLM

If you use TGI

Check the Deployment Status

If you use vLLM

If you use TGI

Cleanup the Deployment

If you use vLLM

If you use TGI

Configuration Parameters

AgentQnA Docker Compose Files

Validate Services

1. Validate the vLLM/TGI Service

If you use vLLM:

If you use TGI:

2. Validate Agent Services

Validate RAG Agent Service

Validate SQL Agent Service

Validate React (Supervisor) Agent Service

Conclusion