This document outlines the single node deployment process for a AgentQnA application utilizing the GenAIComps microservices on Intel Gaudi® server. The steps include pulling Docker images, container deployment via Docker Compose, and service execution using microservices agent.
- AgentQnA Quick Start Deployment
- Configuration Parameters
- AgentQnA Docker Compose Files
- Validate Services
- Interact with the agent system with UI
- Register other tools with the AI agent
- Conclusion
This section describes how to quickly deploy and test the AgentQnA service manually on an Intel® Gaudi® processor. The basic steps are:
- Access the Code
- Configure the Deployment Environment
- Deploy the Services Using Docker Compose
- Ingest Data into the Vector Database
- Cleanup the Deployment
Clone the GenAIExample repository and access the AgentQnA Intel® Gaudi® platform Docker Compose files and supporting scripts:
export WORKDIR=<your-work-directory>
cd $WORKDIR
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/AgentQnATo checkout a released version, such as v1.4:
git checkout v1.4To set up environment variables for deploying AgentQnA services, set up some parameters specific to the deployment environment and source the set_env.sh script in this directory:
export host_ip="External_Public_IP" # ip address of the node
export HF_TOKEN="Your_HuggingFace_API_Token" # the huggingface API token you applied
export http_proxy="Your_HTTP_Proxy" # http proxy if any
export https_proxy="Your_HTTPs_Proxy" # https proxy if any
export no_proxy=localhost,127.0.0.1,$host_ip # additional no proxies if neededTo use OpenAI models, generate a key following these instructions.
To use a remote server running Intel® AI for Enterprise Inference, contact the cloud service provider or owner of the on-prem machine for a key to access the desired model on the server.
Then set the environment variable OPENAI_API_KEY with the key contents:
export OPENAI_API_KEY=<your-openai-key>source $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/hpu/gaudi/set_env.shWe make it convenient to launch the whole system with docker compose, which includes microservices for LLM, agents, UI, retrieval tool, vector database, dataprep, and telemetry. There are 3 docker compose files, which make it easy for users to pick and choose. Users can choose a different retrieval tool other than the DocIndexRetriever example provided in our GenAIExamples repo. Users can choose not to launch the telemetry containers.
On Gaudi, meta-llama/Meta-Llama-3.3-70B-Instruct will be served using vllm. The command below will launch the multi-agent system with the DocIndexRetriever as the retrieval tool for the Worker RAG agent.
cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/hpu/gaudi/
docker compose -f $WORKDIR/GenAIExamples/DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml -f compose.yaml up -dNote: To enable the web search tool, skip this step and proceed to the "[Optional] Web Search Tool Support" section.
To enable Open Telemetry Tracing, compose.telemetry.yaml file need to be merged along with default compose.yaml file. Gaudi example with Open Telemetry feature:
cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/hpu/gaudi/
docker compose -f $WORKDIR/GenAIExamples/DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml -f compose.yaml -f compose.telemetry.yaml up -dInstructions
A web search tool is supported in this example and can be enabled by running docker compose with the `compose.webtool.yaml` file. The Google Search API is used. Follow the [instructions](https://python.langchain.com/docs/integrations/tools/google_search) to create an API key and enable the Custom Search API on a Google account. The environment variables `GOOGLE_CSE_ID` and `GOOGLE_API_KEY` need to be set.cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/hpu/gaudi/
export GOOGLE_CSE_ID="YOUR_ID"
export GOOGLE_API_KEY="YOUR_API_KEY"
docker compose -f $WORKDIR/GenAIExamples/DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml -f compose.yaml -f compose.webtool.yaml up -dPlease refer to the table below to build different microservices from source:
| Microservice | Deployment Guide |
|---|---|
| Agent | Agent build guide |
| UI | Basic UI build guide |
The run_ingest_data.sh script will use an example jsonl file to ingest example documents into a vector database. Other ways to ingest data and other types of documents supported can be found in the OPEA dataprep microservice located in the opea-project/GenAIComps repo.
cd $WORKDIR/GenAIExamples/AgentQnA/retrieval_tool/
bash run_ingest_data.shNote: This is a one-time operation.
To stop the containers associated with the deployment, execute the following command:
# for OpenAI Models
docker compose -f compose_openai.yaml down
# for Models on Remote Server
docker compose -f compose_remote.yaml downKey parameters are configured via environment variables set before running docker compose up.
| Environment Variable | Description | Default (Set Externally) |
|---|---|---|
ip_address |
External IP address of the host machine. Required. | your_external_ip_address |
OPENAI_API_KEY |
Your OpenAI API key for model access. Required. | your_openai_api_key |
model |
Hugging Face model ID for the AgentQnA LLM. Configured within compose.yaml environment. |
gpt-4o-mini-2024-07-18 |
TOOLSET_PATH |
Local path to the tool Yaml file. Configured in compose.yaml. |
$WORKDIR/GenAIExamples/AgentQnA/tools/ |
CRAG_SERVER |
CRAG server URL. Derived from ip_address and port 8080. |
http://${ip_address}:8080 |
WORKER_AGENT_URL |
Worker agent URL. Derived from ip_address and port 9095. |
http://${ip_address}:9095/v1/chat/completions |
SQL_AGENT_URL |
SQL agent URL. Derived from ip_address and port 9096. |
http://${ip_address}:9096/v1/chat/completions |
http_proxy / https_proxy/no_proxy |
Network proxy settings (if required). | "" |
In the context of deploying a AgentQnA pipeline on an Intel® Gaudi® platform, we can pick and choose different large language model serving frameworks. The table below outlines the various configurations that are available as part of the application. These configurations can be used as templates and can be extended to different components available in GenAIComps.
| File | Description |
|---|---|
| compose.yaml | Default compose file using vLLM as the serving framework |
| compose.webtool.yaml | This compose file is used to start a supervisor agent with webtools |
| compose.telemetry.yaml | This compose file is used to enable telemetry related services |
- First look at logs for each of the agent docker containers:
# worker RAG agent
docker logs rag-agent-endpoint
# worker SQL agent
docker logs sql-agent-endpoint
# supervisor agent
docker logs react-agent-endpointLook for the message "HTTP server setup successful" to confirm the agent docker container has started successfully.
- Use python to validate each agent is working properly:
# RAG worker agent
python $WORKDIR/GenAIExamples/AgentQnA/tests/test.py --prompt "Tell me about Michael Jackson song Thriller" --agent_role "worker" --ext_port 9095
# SQL agent
python $WORKDIR/GenAIExamples/AgentQnA/tests/test.py --prompt "How many employees in company" --agent_role "worker" --ext_port 9096
# supervisor agent: this will test a two-turn conversation
python $WORKDIR/GenAIExamples/AgentQnA/tests/test.py --agent_role "supervisor" --ext_port 9090The UI microservice is launched in the previous step with the other microservices.
To see the UI, open a web browser to http://${ip_address}:5173 to access the UI. Note the ip_address here is the host IP of the UI microservice.
- Click on the arrow above
Get started. Create an admin account with a name, email, and password. - Add an OpenAI-compatible API endpoint. In the upper right, click on the circle button with the user's initial, go to
Admin Settings->Connections. UnderManage OpenAI API Connections, click on the+to add a connection. Fill in these fields:
- URL:
http://${ip_address}:9090/v1, do not forget thev1 - Key: any value
- Model IDs: any name i.e.
opea-agent, then press+to add it
Click "Save".
- Test OPEA agent with UI. Return to
New Chatand ensure the model (i.e.opea-agent) is selected near the upper left. Enter in any prompt to interact with the agent.
The tools folder contains YAML and Python files for additional tools for the supervisor and worker agents. Refer to the "Provide your own tools" section in the instructions here to add tools and customize the AI agents.
This guide provides a comprehensive workflow for deploying, configuring, and validating the AgentQnA system on Intel® Gaudi® processors, enabling flexible integration with both OpenAI-compatible and remote LLM services.

