Skip to content

Latest commit

 

History

History
566 lines (449 loc) · 15.4 KB

File metadata and controls

566 lines (449 loc) · 15.4 KB

Cache Augmentation Generation (CAG) System

Python 3.9+ FastAPI License: MIT

A sophisticated document-based question-answering system that combines Retrieval-Augmented Generation (RAG) with intelligent caching mechanisms to provide fast, accurate responses using Google's Gemini 2.5 Flash Lite model.

🎯 Project Overview

The CAG (Cache Augmentation Generation) system enhances traditional RAG architectures by implementing a multi-layered caching strategy that:

  • Reduces API calls by caching previous responses
  • Improves response time through semantic cache lookup
  • Enhances answer quality by leveraging similar past queries
  • Processes documents efficiently with chunking and vector embeddings
  • Provides context-aware answers using document retrieval and cached knowledge

Architecture Components

  1. Document Processor: Extracts and chunks text from PDF, TXT, and MD files
  2. Vector Store: Uses ChromaDB for semantic document search with embeddings
  3. Cache Manager: Implements LRU cache with semantic similarity search
  4. Gemini Client: Interfaces with Google's Gemini 2.5 Flash Lite API
  5. CAG Service: Orchestrates all components for intelligent query processing
  6. FastAPI Server: Provides RESTful API endpoints
  7. Gradio UI: Interactive web interface for easy document Q&A

✨ What's New (v1.0.0)

This project has been reorganized to follow professional Python packaging standards:

  • Src-layout structure for better code organization
  • Modular architecture with clear separation of concerns (API, Core, Services, UI)
  • Comprehensive documentation in docs/ directory
  • Professional Git repository structure
  • Enhanced testing with updated test suite
  • Docker support with multi-service deployment

See CHANGELOG.md for full details.

🚀 Getting Started

Prerequisites

  • Python 3.9 or higher
  • pip (Python package manager)
  • Virtual environment tool (venv, conda, etc.)
  • Google Gemini API key (Get one here)

Installation

  1. Clone the repository (if you haven't already):

    git clone <repository-url>
    cd CAG-Code
  2. Create and activate virtual environment:

    # Windows
    python -m venv venv
    .\venv\Scripts\activate
    
    # Linux/Mac
    python3 -m venv venv
    source venv/bin/activate
  3. Install dependencies:

    pip install --upgrade pip
    pip install -r requirements.txt

    For development with additional tools:

    pip install -e ".[dev]"
  4. Set up environment variables:

    Create a .env file in the project root:

    cp .env.example .env

    Edit .env and add your Gemini API key:

    GEMINI_API_KEY=your_actual_gemini_api_key_here
    CACHE_TYPE=memory
    CACHE_TTL=3600
    MAX_CACHE_SIZE=1000
    EMBEDDING_MODEL=all-MiniLM-L6-v2
    CHUNK_SIZE=1000
    CHUNK_OVERLAP=200

⚠️ Important: Never commit your .env file to version control. It's already in .gitignore.

Running the Application

  1. Start the FastAPI server:

    python main.py

    Or using uvicorn directly:

    uvicorn src.cag.api.main:app --reload --host 0.0.0.0 --port 8000
  2. Access the API:

    • API Base URL: http://localhost:8000
    • Interactive API Docs: http://localhost:8000/docs
    • Alternative Docs: http://localhost:8000/redoc

Running with Gradio Interface

For a user-friendly web interface, use the Gradio application:

python app_gradio.py

Access the Gradio UI:

  • Web Interface: http://localhost:7860
  • The interface provides:
    • 💬 Chat Tab: Interactive Q&A with your documents
    • 📤 Upload Tab: Easy document upload with drag-and-drop
    • 📊 Statistics Tab: Monitor cache performance and manage system

Gradio Features:

  • Beautiful, intuitive interface
  • Real-time chat with conversation history
  • Configurable query settings (cache, semantic search, top-k)
  • Document upload with instant feedback
  • System statistics dashboard
  • Cache and vector store management

📚 API Endpoints

Health Check

GET /
GET /health

Upload Document

POST /upload
Content-Type: multipart/form-data

Parameters:
- file: PDF, TXT, or MD file

Example using curl:

curl -X POST "http://localhost:8000/upload" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@Provider_FAQ_v2.pdf"

Response:

{
  "status": "success",
  "file_name": "Provider_FAQ_v2.pdf",
  "num_chunks": 45,
  "total_length": 12500,
  "document_ids": ["uuid1", "uuid2", ...]
}

Query Document

POST /query
Content-Type: application/json

Request Body:

{
  "question": "What are the provider requirements?",
  "use_cache": true,
  "use_semantic_search": true,
  "top_k": 5
}

Response:

{
  "answer": "Based on the document...",
  "source": "generated",
  "cached": false,
  "relevant_chunks": [
    {
      "text": "Relevant section from document...",
      "metadata": {"chunk_id": 0, "start_pos": 0}
    }
  ],
  "similar_cached_queries": [
    {
      "query": "Similar previous question",
      "similarity": 0.85
    }
  ],
  "metadata": {
    "num_relevant_chunks": 3,
    "num_cached_similar": 2,
    "used_cache": false
  }
}

Get Statistics

GET /stats

Response:

{
  "cache": {
    "total_entries": 10,
    "valid_entries": 8,
    "max_size": 1000,
    "ttl_seconds": 3600
  },
  "vector_store": {
    "collection_name": "documents",
    "document_count": 45
  },
  "documents_loaded": true
}

Clear Cache

DELETE /cache

Reset Vector Store

DELETE /vector-store

🔧 Configuration

Edit .env file to customize:

Variable Description Default
GEMINI_API_KEY Your Google Gemini API key Required
CACHE_TYPE Cache storage type memory
CACHE_TTL Cache time-to-live in seconds 3600
MAX_CACHE_SIZE Maximum cache entries 1000
EMBEDDING_MODEL Sentence transformer model all-MiniLM-L6-v2
CHUNK_SIZE Document chunk size 1000
CHUNK_OVERLAP Overlap between chunks 200

🧪 Usage Examples

Python Example

import requests

BASE_URL = "http://localhost:8000"

# Upload a document
with open("Provider_FAQ_v2.pdf", "rb") as f:
    files = {"file": f}
    response = requests.post(f"{BASE_URL}/upload", files=files)
    print(response.json())

# Query the document
query_data = {
    "question": "What are the eligibility criteria?",
    "use_cache": True,
    "use_semantic_search": True,
    "top_k": 5
}
response = requests.post(f"{BASE_URL}/query", json=query_data)
print(response.json()["answer"])

# Get statistics
response = requests.get(f"{BASE_URL}/stats")
print(response.json())

JavaScript Example

// Upload document
const formData = new FormData();
formData.append('file', fileInput.files[0]);

const uploadResponse = await fetch('http://localhost:8000/upload', {
  method: 'POST',
  body: formData
});
const uploadData = await uploadResponse.json();

// Query document
const queryResponse = await fetch('http://localhost:8000/query', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    question: 'What are the provider requirements?',
    use_cache: true,
    use_semantic_search: true,
    top_k: 5
  })
});
const queryData = await queryResponse.json();
console.log(queryData.answer);

🏗️ Project Structure

CAG-Code/
├── src/cag/                    # Main application package
│   ├── api/                    # FastAPI application
│   │   ├── main.py            # API entry point
│   │   └── routes/            # API route modules
│   ├── core/                   # Core configuration and models
│   │   ├── config.py          # Settings management
│   │   └── models.py          # Pydantic models
│   ├── services/               # Business logic services
│   │   ├── cag_service.py     # Main orchestration
│   │   ├── cache_manager.py   # Intelligent caching
│   │   ├── document_processor.py  # Document processing
│   │   ├── vector_store.py    # Vector database
│   │   └── gemini_client.py   # AI model client
│   └── ui/                     # User interfaces
│       └── gradio_app.py      # Gradio web UI
├── tests/                      # Test suite
│   ├── test_cache_manager.py
│   ├── test_document_processor.py
│   ├── test_main.py
│   └── test_vector_store.py
├── scripts/                    # Utility scripts
│   ├── dev_helpers.py
│   ├── init_db.py
│   └── setup.sh
├── docs/                       # Documentation
│   ├── ARCHITECTURE.md        # System architecture
│   ├── SETUP.md              # Setup guide
│   └── API.md                # API reference
├── docker/                     # Docker configurations
├── config/                     # Configuration files
├── main.py                     # FastAPI entry point
├── app_gradio.py              # Gradio entry point
├── requirements.txt           # Python dependencies
├── pyproject.toml            # Project metadata
├── Dockerfile                # Docker image definition
├── docker-compose.yml        # Docker Compose config
├── CONTRIBUTING.md           # Contribution guidelines
├── CHANGELOG.md              # Version history
├── .env.example              # Environment template
└── uploads/                  # Document uploads

🔍 How CAG Works

  1. Document Upload:

    • User uploads a document (PDF/TXT/MD)
    • Document is processed and chunked
    • Chunks are embedded and stored in vector database
  2. Query Processing:

    • Check cache for exact or similar queries
    • If cached, return immediately
    • Otherwise, retrieve relevant document chunks
    • Find semantically similar cached responses
    • Send context to Gemini 2.0 Flash
    • Cache the new response
    • Return answer with metadata
  3. Cache Strategy:

    • LRU (Least Recently Used) eviction
    • TTL (Time To Live) expiration
    • Semantic similarity search for related queries
    • Embedding-based cache lookup

🛠️ Troubleshooting

Common Issues

  1. Module Not Found Errors:

    # Install in development mode
    pip install -e .
  2. Import Error for google.generativeai:

    pip install --upgrade google-generativeai
  3. ChromaDB Issues:

    pip install --upgrade chromadb
    # Or clear the database
    rm -rf chroma_db/  # Linux/Mac
    rmdir /s chroma_db  # Windows
  4. API Key Error:

    • Verify your .env file exists in the project root
    • Check that GEMINI_API_KEY is set correctly
    • Ensure no extra spaces around the key
    • Restart the application after changes
  5. Port Already in Use:

    # Change port when running
    uvicorn src.cag.api.main:app --port 8001

For more troubleshooting help, see docs/SETUP.md.

📊 Performance Optimization

  • Cache Hit Rate: Monitor /stats endpoint to track cache efficiency
  • Chunk Size: Adjust CHUNK_SIZE for better context retrieval
  • Top K: Tune top_k parameter for optimal relevance vs. speed
  • Embedding Model: Use larger models for better semantic understanding

🔐 Security Considerations

  • Never commit .env file to version control
  • Use environment variables for sensitive data
  • Implement authentication for production deployment
  • Validate and sanitize file uploads
  • Set appropriate CORS policies

� Documentation

Comprehensive documentation is available in the docs/ directory:

🧪 Testing

Run the test suite:

# Run all tests
pytest

# Run with coverage report
pytest --cov=src --cov-report=html

# Run specific test file
pytest tests/test_cache_manager.py

# Run with verbose output
pytest -v

🐳 Docker Deployment

Quick Start with Docker Compose

# Build and start all services
docker-compose up --build

# Run in detached mode
docker-compose up -d

# Stop all services
docker-compose down

Available Services

  • cag-app: FastAPI server (port 8000)
  • cag-gradio: Gradio UI (port 7860)
  • redis: Redis cache (port 6379)
  • prometheus: Monitoring (port 9090) - with --profile monitoring
  • grafana: Visualization (port 3000) - with --profile monitoring

With Monitoring

docker-compose --profile monitoring up

🔧 Development

Code Quality Tools

# Format code
black src/ tests/

# Sort imports
isort src/ tests/

# Lint code
flake8 src/ tests/

# Type checking
mypy src/

Pre-commit Hooks

# Install pre-commit hooks
pre-commit install

# Run manually on all files
pre-commit run --all-files

📝 License

This project is provided as-is for educational and development purposes.

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines on:

  • Code style and standards
  • Testing requirements
  • Pull request process
  • Development workflow

📧 Support

  • Documentation: Check the docs/ directory
  • API Docs: http://localhost:8000/docs (when server is running)
  • Issues: Report bugs or request features via GitHub Issues
  • Questions: Start a discussion or check existing documentation

🙏 Acknowledgments

Built with:

📈 Project Status

  • Version: 1.0.0
  • Status: Active Development
  • Python: 3.9+
  • License: MIT

⭐ If you find this project useful, please consider giving it a star!