|
1 | | -# Project Aether: Event-Driven RAG Engine |
| 1 | +# Project Aether: RAG Pipeline with Event-Driven Workflows |
2 | 2 |
|
3 | | - |
4 | | - |
5 | | - |
| 3 | +## Overview |
| 4 | +Project Aether is a Retrieval-Augmented Generation (RAG) system built with Python and LlamaIndex. It implements a document ingestion and retrieval pipeline using an event-driven architecture (Workflows) to handle complex tasks like query transformation, metadata enrichment, and semantic caching. |
6 | 5 |
|
7 | | -**Author:** Gabriel (Gabaoun) Penha |
| 6 | +The project is designed as a modular reference for building RAG applications that require more than simple linear processing, incorporating retries, asynchronous operations, and a clear separation of concerns. |
8 | 7 |
|
9 | | -> *A highly resilient, event-driven Retrieval-Augmented Generation (RAG) engine optimizing semantic search latency by 80% while ensuring robust PII masking and enterprise-grade reliability.* |
| 8 | +## Features |
| 9 | +- **Event-Driven Ingestion:** Processes documents through a series of discrete steps (Loading -> PII Masking -> Semantic Splitting -> Enrichment -> Indexing). |
| 10 | +- **Advanced Retrieval:** Implements HyDE (Hypothetical Document Embeddings), query refinement loops, and relevance judgment (Chain-of-Thought) before generating answers. |
| 11 | +- **Semantic Caching:** Uses Redis to store and retrieve previously generated answers for identical or highly similar queries to reduce LLM latency and cost. |
| 12 | +- **PII Masking:** Basic regex-based masking of emails and phone numbers during the ingestion phase. |
| 13 | +- **Resiliency:** Uses `tenacity` for exponential backoff retries on LLM and database operations. |
| 14 | +- **Memory Efficiency:** Uses Python generators during document splitting to handle larger datasets without high memory consumption. |
10 | 15 |
|
11 | | -Project Aether is a world-class reference implementation of a complex RAG system. By shifting from standard linear pipelines to LlamaIndex Workflows, it introduces cycles, streaming, and robust failure recovery natively into the ingestion and retrieval processes. |
| 16 | +## Tech Stack |
| 17 | +- **Language:** Python 3.11+ |
| 18 | +- **Orchestration:** LlamaIndex (Workflows) |
| 19 | +- **Vector Database:** Qdrant |
| 20 | +- **Cache:** Redis |
| 21 | +- **LLM:** OpenAI (GPT-4o, GPT-4o-mini) |
| 22 | +- **Embeddings:** HuggingFace (BGE models) |
| 23 | +- **Configuration:** Pydantic Settings |
12 | 24 |
|
13 | | -## 🌟 Key Features |
| 25 | +## Key Technical Points |
| 26 | +- **Modular Refactoring:** Logic is split into `core` (business logic), `services` (external integrations), `pipeline` (workflow orchestration), and `models` (data structures). |
| 27 | +- **Asynchronous Execution:** Heavy use of `asyncio` for non-blocking I/O, particularly in PII masking and LLM calls. |
| 28 | +- **Custom Splitter:** Implements a `SemanticDoubleMergingSplitter` which performs an initial semantic split and then merges small chunks that fall below a minimum size threshold. |
14 | 29 |
|
15 | | -* **Event-Driven Workflows:** Employs LlamaIndex `Workflow` and `Event` classes to orchestrate query decomposition, HyDE, and Chain-of-Thought (CoT) relevance judgments with self-correction loops. |
16 | | -* **Semantic Caching (Redis):** Caches query vectors via HNSW indices, intercepting recurrent queries to deliver sub-100ms response times and drastically reduce LLM API costs. |
17 | | -* **Enterprise Governance:** Integrates an asynchronous Microsoft Presidio masking layer to strip Personally Identifiable Information (PII) before documents ever hit the vector database. |
18 | | -* **Resilient Infrastructure:** Bulletproofed with `tenacity` for exponential backoff on all critical third-party I/O (LLMs, Qdrant). |
19 | | -* **Memory-Optimized Ingestion:** Implements a custom `SemanticDoubleMergingSplitter` leveraging Python Generators to process massive document sets without memory bloat. |
| 30 | +## Design Decisions |
| 31 | +- **LlamaIndex Workflows over Pipelines:** Chosen to allow for non-linear logic, such as the query refinement loop in the retrieval workflow which can re-run if initial results are deemed irrelevant. |
| 32 | +- **BGE-Reranker:** Integrated to improve precision by re-evaluating the top retrieved nodes using a cross-encoder model. |
| 33 | +- **Strict Typing:** All major functions and classes use Python type hints for better maintainability and error detection. |
20 | 34 |
|
21 | | -## 📈 Benchmarks |
| 35 | +## Limitations |
| 36 | +- **Regex-based PII:** The current PII masker uses basic regular expressions and is not a substitute for a production-grade NER (Named Entity Recognition) system. |
| 37 | +- **Simplified Semantic Cache:** The current implementation uses exact string matching in Redis for the cache keys rather than true vector-based similarity search. |
| 38 | +- **Single Collection:** Currently hardcoded to use a single Qdrant collection for all documents. |
22 | 39 |
|
23 | | -| Metric | Basic RAG | Project Aether | Impact | |
24 | | -|--------|-----------|----------------|--------| |
25 | | -| **Faithfulness (Hallucination Rate)** | 62% | **88%** | ⬇️ HyDE & CoT Evaluation | |
26 | | -| **Answer Relevance** | 70% | **92%** | ⬆️ BGE-Reranker & Reordering | |
27 | | -| **Context Precision** | 55% | **85%** | ⬆️ Semantic Chunking Generators | |
28 | | -| **Avg. Latency (P95)** | 5.2s | **0.8s** | ⚡ Semantic Cache (80% Hit Rate) | |
29 | | - |
30 | | -## 🛠 Architecture Decision Records (ADR) |
31 | | -We maintain a robust architecture history. See the `docs/adr/` directory for detailed reasoning on our stack: |
32 | | -- [ADR 001: Native Vector Search on Redis](docs/adr/ADR-001-Native-Vector-Search-Redis.md) |
33 | | -- [ADR 002: LlamaIndex Workflows for Event-Driven RAG](docs/adr/002-LlamaIndex-Workflows-for-Event-Driven-RAG.md) |
34 | | -- [ADR 003: Semantic Chunking Strategy](docs/adr/003-Semantic-Chunking-Strategy.md) |
35 | | - |
36 | | -## 🚀 Getting Started |
| 40 | +## Getting Started |
37 | 41 |
|
38 | 42 | ### Prerequisites |
39 | | -- Docker & Docker Compose |
40 | | -- Python 3.11+ (Uses `async/await` heavily) |
| 43 | +- Docker and Docker Compose |
| 44 | +- Python 3.11+ |
41 | 45 | - OpenAI API Key |
42 | 46 |
|
43 | 47 | ### Installation |
44 | | -1. Clone the repository and navigate to the directory: |
| 48 | +1. Clone the repository: |
45 | 49 | ```bash |
46 | | - git clone https://github.com/gabaoun/Project-Aether.git |
47 | | - cd Project-Aether |
| 50 | + git clone https://github.com/your-username/Project-Aether.git |
| 51 | + cd Project-Aether |
48 | 52 | ``` |
49 | 53 | 2. Install dependencies: |
50 | 54 | ```bash |
51 | 55 | pip install -r requirements.txt |
52 | 56 | ``` |
53 | | -3. Environment Setup: |
| 57 | +3. Setup environment variables: |
54 | 58 | ```bash |
55 | 59 | cp .env.example .env |
56 | | - # Add your OPENAI_API_KEY to .env |
| 60 | + # Edit .env with your OpenAI API Key and other settings |
57 | 61 | ``` |
58 | | -4. Start the infrastructure (Qdrant, Postgres, Redis): |
| 62 | +4. Start infrastructure: |
59 | 63 | ```bash |
60 | 64 | docker-compose up -d |
61 | 65 | ``` |
62 | 66 |
|
63 | 67 | ### Usage |
64 | | -Execute the main application to start ingestion (if `./data` is populated) and the interactive retrieval loop: |
| 68 | +Ensure you have documents in the `./data` directory (as specified in your `.env`), then run: |
65 | 69 | ```bash |
66 | 70 | python main.py |
67 | 71 | ``` |
68 | 72 |
|
69 | | -### Testing |
70 | | -Run the comprehensive test suite: |
| 73 | +## Testing |
| 74 | +Run the test suite using pytest: |
71 | 75 | ```bash |
72 | 76 | pytest tests/ |
73 | 77 | ``` |
74 | | -## ⚖️ License |
75 | | -Distributed under the Apache 2.0 License. See `LICENSE` for more information. |
| 78 | + |
| 79 | +## Purpose |
| 80 | +This project was developed to demonstrate a technically sound approach to building RAG systems. It focuses on clean architecture, error handling, and implementing advanced RAG patterns in a way that is maintainable and extensible. |
0 commit comments