You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This example showcases a hierarchical multi-agent system for question-answering applications. The architecture diagram below shows a supervisor agent that interfaces with the user and dispatches tasks to two worker agents to gather information and come up with answers. The worker RAG agent uses the retrieval tool to retrieve relevant documents from a knowledge base - a vector database. The worker SQL agent retrieves relevant data from a SQL database. Although not included in this example by default, other tools such as a web search tool or a knowledge graph query tool can be used by the supervisor agent to gather information from additional sources.
The AgentQnA example is implemented using the component-level microservices defined in [GenAIComps](https://github.com/opea-project/GenAIComps). The flow chart below shows the information flow between different microservices for this example.
18
-
19
-
```mermaid
20
-
---
21
-
config:
22
-
flowchart:
23
-
nodeSpacing: 400
24
-
rankSpacing: 100
25
-
curve: linear
26
-
themeVariables:
27
-
fontSize: 50px
28
-
---
29
-
flowchart LR
30
-
%% Colors %%
31
-
classDef blue fill:#ADD8E6,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
### Why should AI Agents be used for question-answering?
3
+
AI agents bring unique advantages to question-answering tasks, as demonstrated in the AgentQnA application:
88
4
89
5
1.**Improve relevancy of retrieved context.**
90
6
RAG agents can rephrase user queries, decompose user queries, and iterate to get the most relevant context for answering a user's question. Compared to conventional RAG, RAG agents significantly improve the correctness and relevancy of the answer because of the iterations it goes through.
Expert worker agents, such as RAG agents and SQL agents, can provide high-quality output for different aspects of a complex query, and the supervisor agent can aggregate the information to provide a comprehensive answer. If only one agent is used and all tools are provided to this single agent, it can lead to large overhead or not use the best tool to provide accurate answers.
Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token).
122
-
123
-
Then set an environment variable with the token and another for a directory to download the models:
124
-
125
-
```bash
126
-
export HF_TOKEN=<your-HF-token>
127
-
export HF_CACHE_DIR=<directory-where-llms-are-downloaded># to avoid redownloading models
128
-
```
129
-
130
-
##### [Optional] OPENAI_API_KEY to use OpenAI models or Intel® AI for Enterprise Inference
131
-
132
-
To use OpenAI models, generate a key following these [instructions](https://platform.openai.com/api-keys).
133
-
134
-
To use a remote server running Intel® AI for Enterprise Inference, contact the cloud service provider or owner of the on-prem machine for a key to access the desired model on the server.
135
-
136
-
Then set the environment variable `OPENAI_API_KEY` with the key contents:
137
-
138
-
```bash
139
-
export OPENAI_API_KEY=<your-openai-key>
140
-
```
141
-
142
-
#### Third, set up environment variables for the selected hardware using the corresponding `set_env.sh`
We make it convenient to launch the whole system with docker compose, which includes microservices for LLM, agents, UI, retrieval tool, vector database, dataprep, and telemetry. There are 3 docker compose files, which make it easy for users to pick and choose. Users can choose a different retrieval tool other than the `DocIndexRetriever` example provided in our GenAIExamples repo. Users can choose not to launch the telemetry containers.
161
-
162
-
#### Launch on Gaudi
163
-
164
-
On Gaudi, `meta-llama/Meta-Llama-3.3-70B-Instruct` will be served using vllm. The command below will launch the multi-agent system with the `DocIndexRetriever` as the retrieval tool for the Worker RAG agent.
docker compose -f $WORKDIR/GenAIExamples/DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml -f compose.yaml -f compose.telemetry.yaml up -d
179
-
```
180
-
181
-
##### [Optional] Web Search Tool Support
182
-
183
-
<details>
184
-
<summary> Instructions </summary>
185
-
A web search tool is supported in this example and can be enabled by running docker compose with the `compose.webtool.yaml` file.
186
-
The Google Search API is used. Follow the [instructions](https://python.langchain.com/docs/integrations/tools/google_search) to create an API key and enable the Custom Search API on a Google account. The environment variables `GOOGLE_CSE_ID` and `GOOGLE_API_KEY` need to be set.
The command below will launch the multi-agent system with the `DocIndexRetriever` as the retrieval tool for the Worker RAG agent.
209
-
210
-
```bash
211
-
docker compose -f $WORKDIR/GenAIExamples/DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml -f compose_openai.yaml up -d
212
-
```
213
-
214
-
##### Models on Remote Server
215
-
216
-
When models are deployed on a remote server with Intel® AI for Enterprise Inference, a base URL and an API key are required to access them. To run the Agent microservice on Xeon while using models deployed on a remote server, add `compose_remote.yaml` to the `docker compose` command and set additional environment variables.
217
-
218
-
###### Notes
219
-
220
-
-`OPENAI_API_KEY` is already set in a previous step.
221
-
-`model` is used to overwrite the value set for this environment variable in `set_env.sh`.
222
-
-`LLM_ENDPOINT_URL` is the base URL given from the owner of the on-prem machine or cloud service provider. It will follow this format: "https://<DNS>". Here is an example: "https://api.inference.example.com".
docker compose -f $WORKDIR/GenAIExamples/DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml -f compose_openai.yaml -f compose_remote.yaml up -d
228
-
```
229
-
230
-
### 3. Ingest Data into the vector database
231
-
232
-
The `run_ingest_data.sh` script will use an example jsonl file to ingest example documents into a vector database. Other ways to ingest data and other types of documents supported can be found in the OPEA dataprep microservice located in the opea-project/GenAIComps repo.
233
-
234
-
```bash
235
-
cd$WORKDIR/GenAIExamples/AgentQnA/retrieval_tool/
236
-
bash run_ingest_data.sh
237
-
```
238
-
239
-
> **Note**: This is a one-time operation.
240
-
241
-
## How to interact with the agent system with UI
242
-
243
-
The UI microservice is launched in the previous step with the other microservices.
244
-
To see the UI, open a web browser to `http://${ip_address}:5173` to access the UI. Note the `ip_address` here is the host IP of the UI microservice.
245
-
246
-
1. Click on the arrow above `Get started`. Create an admin account with a name, email, and password.
247
-
2. Add an OpenAI-compatible API endpoint. In the upper right, click on the circle button with the user's initial, go to `Admin Settings`->`Connections`. Under `Manage OpenAI API Connections`, click on the `+` to add a connection. Fill in these fields:
248
-
249
-
-**URL**: `http://${ip_address}:9090/v1`, do not forget the `v1`
250
-
-**Key**: any value
251
-
-**Model IDs**: any name i.e. `opea-agent`, then press `+` to add it
3. Test OPEA agent with UI. Return to `New Chat` and ensure the model (i.e. `opea-agent`) is selected near the upper left. Enter in any prompt to interact with the agent.
4.[Monitoring and Tracing](./README_miscellaneous.md)
281
18
282
-
2. Use python to validate each agent is working properly:
19
+
## Architecture
283
20
284
-
```bash
285
-
# RAG worker agent
286
-
python $WORKDIR/GenAIExamples/AgentQnA/tests/test.py --prompt "Tell me about Michael Jackson song Thriller" --agent_role "worker" --ext_port 9095
21
+
This example showcases a hierarchical multi-agent system for question-answering applications. The architecture diagram below shows a supervisor agent that interfaces with the user and dispatches tasks to two worker agents to gather information and come up with answers. The worker RAG agent uses the retrieval tool to retrieve relevant documents from a knowledge base - a vector database. The worker SQL agent retrieves relevant data from a SQL database. Although not included in this example by default, other tools such as a web search tool or a knowledge graph query tool can be used by the supervisor agent to gather information from additional sources.
287
22
288
-
# SQL agent
289
-
python $WORKDIR/GenAIExamples/AgentQnA/tests/test.py --prompt "How many employees in company" --agent_role "worker" --ext_port 9096
23
+
The AgentQnA application is an end-to-end workflow that leverages the capability of agents and tools. The workflow falls into the following architecture:
290
24
291
-
# supervisor agent: this will test a two-turn conversation
The AgentQnA example is implemented using the component-level microservices defined in [GenAIComps](https://github.com/opea-project/GenAIComps).
296
28
297
-
The [tools](./tools) folder contains YAML and Python files for additional tools for the supervisor and worker agents. Refer to the "Provide your own tools" section in the instructions [here](https://github.com/opea-project/GenAIComps/tree/main/comps/agent/src/README.md) to add tools and customize the AI agents.
29
+
## Deployment Options
298
30
299
-
## Monitor and Tracing
31
+
The table below lists currently available deployment options. They outline in detail the implementation of this example on selected hardware.
300
32
301
-
Follow [OpenTelemetry OPEA Guide](https://opea-project.github.io/latest/tutorial/OpenTelemetry/OpenTelemetry_OPEA_Guide.html) to understand how to use OpenTelemetry tracing and metrics in OPEA.
302
-
For AgentQnA specific tracing and metrics monitoring, follow [OpenTelemetry on AgentQnA](https://opea-project.github.io/latest/tutorial/OpenTelemetry/deploy/AgentQnA.html) section.
0 commit comments