feat: add RAG toolkit for knowledge base queries#1003
Conversation
|
@Wendong-Fan @4pmtong please review the PR and let me know your feedbacks. thanks. |
|
cool! I am open to your idea :) |
|
hello @MkDev11 thanks for the PR! i think we already have some overlapping features in the main camel repo here -
perhaps, if theres a specific functionality missing in those modules, would you be open to raising a PR there? thanks! |
Thanks for the feedback @JINO-ROHIT! You're right - I've refactored the PR to use CAMEL's existing infrastructure instead of duplicating functionality. Changes MadeThe
Eigent-Specific AdditionsThese are the features that are specific to eigent and wouldn't belong in the main CAMEL repo:
Why Not Contribute Upstream?The task isolation and Let me know if you have any other feedback! |
|
forwarding:
But more decisively I believe Wendong will leave comments 👍 |
|
Thanks @MkDev11 and @a7m-1st ! I want to share some architectural feedback. The Problem Looking at the code, the RAGToolkit currently mixes generic RAG functionality with eigent-specific concerns: The toolkit itself shouldn't know about eigent's task isolation strategy. This makes the code:
Suggested Architecture
This way:
Would you be open to refactoring along these lines? Happy to discuss further! |
|
@Wendong-Fan @a7m-1st Just refactored the PR! please review again and let me know the result |
thanks @MkDev11 ! could @a7m-1st and @bytecraftii help reviewing this? |
|
@Wendong-Fan @a7m-1st any update for me? |
|
I could test it after #999 |
a74e8c2 to
6a3ed3d
Compare
Closes eigent-ai#410 - Add RAGToolkit with document ingestion and retrieval capabilities - Use CAMEL's VectorRetriever with QdrantStorage for local vector storage - Provide add_document, query_knowledge_base, list_knowledge_bases tools - Register RAG toolkit in agent.py toolkits dictionary - Add 15 unit tests covering toolkit functionality
…ient, unstructured)
Based on feedback from JINO-ROHIT, refactored RAGToolkit to use CAMEL's infrastructure via composition. Features: - Uses CAMEL's RetrievalToolkit for file/URL retrieval - Uses CAMEL's VectorRetriever for raw text support - Task-based collection isolation - Eigent AbstractToolkit integration Tools provided: - add_document: Add raw text to knowledge base - query_knowledge_base: Query added documents - information_retrieval: Query files/URLs (CAMEL's method) - list_knowledge_bases: List available KBs
…on layer Per Wendong-Fan's architectural feedback: - RAGToolkit now accepts collection_name and storage_path as params - Removed hardcoded task_* patterns from toolkit - Task isolation handled in get_toolkits() in agent.py - Toolkit is now generic and portable for upstream contribution
Changes: - Add DEFAULT_RAG_STORAGE_PATH and DEFAULT_COLLECTION_NAME constants - Add TODO comments for embedding model flexibility - Add get_task_collection_name() helper function in agent.py - Simplify query results format (numbered list, no scores) - Remove list_knowledge_bases from exposed tools (not useful with task isolation) - Update tests to expect 3 tools instead of 4
- Add RAW_TEXT_SUBDIR constant for path cleaner - Fix docstring format with types (str), (int), etc. - Change logger.debug to logger.warning for missing API key - Add validation to raise ValueError if collection_name is None - Update tests to pass collection_name and add validation test
6a3ed3d to
e1dcca9
Compare
please review onece more! thanks. |
|
Can you fix the tests before merging? |
Fixed! |
|
@Wendong-Fan Can you please review the PR and let me know your feedback? |
|
also need to register the toolkit in from app.utils.toolkit.rag_toolkit import RAGToolkit
# ...
toolkits = {
# ...
"rag_toolkit": RAGToolkit,
# ...
} |
@Wendong-Fan just fixed all comments you mentioned, please review once more 🙏 |
Closes #410
Description
This PR adds RAG (Retrieval-Augmented Generation) capability to eigent using CAMEL's built-in vector retrieval infrastructure.
What it does:
RAGToolkitthat lets agents store and query knowledge basesTools provided:
add_document- Add text content to a knowledge base with optional metadataquery_knowledge_base- Search for relevant information using semantic similaritylist_knowledge_bases- Show available knowledge basesHow to use:
When creating an agent, select
rag_toolkitfrom the tools list. The agent can then store and retrieve information from its knowledge base during task execution.Live Test Results
Dependencies Added
qdrant-client- Local vector databaseunstructured- Document parsing for CAMEL's VectorRetrieverWhat is the purpose of this pull request?