Cache Augmentation Generation (CAG) System

A sophisticated document-based question-answering system that combines Retrieval-Augmented Generation (RAG) with intelligent caching mechanisms to provide fast, accurate responses using Google's Gemini 2.5 Flash Lite model.

🎯 Project Overview

The CAG (Cache Augmentation Generation) system enhances traditional RAG architectures by implementing a multi-layered caching strategy that:

Reduces API calls by caching previous responses
Improves response time through semantic cache lookup
Enhances answer quality by leveraging similar past queries
Processes documents efficiently with chunking and vector embeddings
Provides context-aware answers using document retrieval and cached knowledge

Architecture Components

Document Processor: Extracts and chunks text from PDF, TXT, and MD files
Vector Store: Uses ChromaDB for semantic document search with embeddings
Cache Manager: Implements LRU cache with semantic similarity search
Gemini Client: Interfaces with Google's Gemini 2.5 Flash Lite API
CAG Service: Orchestrates all components for intelligent query processing
FastAPI Server: Provides RESTful API endpoints
Gradio UI: Interactive web interface for easy document Q&A

✨ What's New (v1.0.0)

This project has been reorganized to follow professional Python packaging standards:

✅ Src-layout structure for better code organization
✅ Modular architecture with clear separation of concerns (API, Core, Services, UI)
✅ Comprehensive documentation in docs/ directory
✅ Professional Git repository structure
✅ Enhanced testing with updated test suite
✅ Docker support with multi-service deployment

See CHANGELOG.md for full details.

🚀 Getting Started

Prerequisites

Python 3.9 or higher
pip (Python package manager)
Virtual environment tool (venv, conda, etc.)
Google Gemini API key (Get one here)

Installation

Clone the repository (if you haven't already):
```
git clone <repository-url>
cd CAG-Code
```

Create and activate virtual environment:

# Windows
python -m venv venv
.\venv\Scripts\activate

# Linux/Mac
python3 -m venv venv
source venv/bin/activate

Install dependencies:

pip install --upgrade pip
pip install -r requirements.txt

For development with additional tools:

pip install -e ".[dev]"

Set up environment variables:

Create a .env file in the project root:

cp .env.example .env

Edit .env and add your Gemini API key:

GEMINI_API_KEY=your_actual_gemini_api_key_here
CACHE_TYPE=memory
CACHE_TTL=3600
MAX_CACHE_SIZE=1000
EMBEDDING_MODEL=all-MiniLM-L6-v2
CHUNK_SIZE=1000
CHUNK_OVERLAP=200

⚠️ Important: Never commit your .env file to version control. It's already in .gitignore.

Running the Application

Start the FastAPI server:

python main.py

Or using uvicorn directly:

uvicorn src.cag.api.main:app --reload --host 0.0.0.0 --port 8000

Access the API:
- API Base URL: http://localhost:8000
- Interactive API Docs: http://localhost:8000/docs
- Alternative Docs: http://localhost:8000/redoc

Running with Gradio Interface

For a user-friendly web interface, use the Gradio application:

python app_gradio.py

Access the Gradio UI:

Web Interface: http://localhost:7860
The interface provides:
- 💬 Chat Tab: Interactive Q&A with your documents
- 📤 Upload Tab: Easy document upload with drag-and-drop
- 📊 Statistics Tab: Monitor cache performance and manage system

Gradio Features:

Beautiful, intuitive interface
Real-time chat with conversation history
Configurable query settings (cache, semantic search, top-k)
Document upload with instant feedback
System statistics dashboard
Cache and vector store management

📚 API Endpoints

Health Check

GET /
GET /health

Upload Document

POST /upload
Content-Type: multipart/form-data

Parameters:
- file: PDF, TXT, or MD file

Example using curl:

curl -X POST "http://localhost:8000/upload" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@Provider_FAQ_v2.pdf"

Response:

{
  "status": "success",
  "file_name": "Provider_FAQ_v2.pdf",
  "num_chunks": 45,
  "total_length": 12500,
  "document_ids": ["uuid1", "uuid2", ...]
}

Query Document

POST /query
Content-Type: application/json

Request Body:

{
  "question": "What are the provider requirements?",
  "use_cache": true,
  "use_semantic_search": true,
  "top_k": 5
}

Response:

{
  "answer": "Based on the document...",
  "source": "generated",
  "cached": false,
  "relevant_chunks": [
    {
      "text": "Relevant section from document...",
      "metadata": {"chunk_id": 0, "start_pos": 0}
    }
  ],
  "similar_cached_queries": [
    {
      "query": "Similar previous question",
      "similarity": 0.85
    }
  ],
  "metadata": {
    "num_relevant_chunks": 3,
    "num_cached_similar": 2,
    "used_cache": false
  }
}

Get Statistics

GET /stats

Response:

{
  "cache": {
    "total_entries": 10,
    "valid_entries": 8,
    "max_size": 1000,
    "ttl_seconds": 3600
  },
  "vector_store": {
    "collection_name": "documents",
    "document_count": 45
  },
  "documents_loaded": true
}

Clear Cache

DELETE /cache

Reset Vector Store

DELETE /vector-store

🔧 Configuration

Edit .env file to customize:

Variable	Description	Default
`GEMINI_API_KEY`	Your Google Gemini API key	Required
`CACHE_TYPE`	Cache storage type	`memory`
`CACHE_TTL`	Cache time-to-live in seconds	`3600`
`MAX_CACHE_SIZE`	Maximum cache entries	`1000`
`EMBEDDING_MODEL`	Sentence transformer model	`all-MiniLM-L6-v2`
`CHUNK_SIZE`	Document chunk size	`1000`
`CHUNK_OVERLAP`	Overlap between chunks	`200`

🧪 Usage Examples

Python Example

import requests

BASE_URL = "http://localhost:8000"

# Upload a document
with open("Provider_FAQ_v2.pdf", "rb") as f:
    files = {"file": f}
    response = requests.post(f"{BASE_URL}/upload", files=files)
    print(response.json())

# Query the document
query_data = {
    "question": "What are the eligibility criteria?",
    "use_cache": True,
    "use_semantic_search": True,
    "top_k": 5
}
response = requests.post(f"{BASE_URL}/query", json=query_data)
print(response.json()["answer"])

# Get statistics
response = requests.get(f"{BASE_URL}/stats")
print(response.json())

JavaScript Example

// Upload document
const formData = new FormData();
formData.append('file', fileInput.files[0]);

const uploadResponse = await fetch('http://localhost:8000/upload', {
  method: 'POST',
  body: formData
});
const uploadData = await uploadResponse.json();

// Query document
const queryResponse = await fetch('http://localhost:8000/query', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    question: 'What are the provider requirements?',
    use_cache: true,
    use_semantic_search: true,
    top_k: 5
  })
});
const queryData = await queryResponse.json();
console.log(queryData.answer);

🏗️ Project Structure

CAG-Code/
├── src/cag/                    # Main application package
│   ├── api/                    # FastAPI application
│   │   ├── main.py            # API entry point
│   │   └── routes/            # API route modules
│   ├── core/                   # Core configuration and models
│   │   ├── config.py          # Settings management
│   │   └── models.py          # Pydantic models
│   ├── services/               # Business logic services
│   │   ├── cag_service.py     # Main orchestration
│   │   ├── cache_manager.py   # Intelligent caching
│   │   ├── document_processor.py  # Document processing
│   │   ├── vector_store.py    # Vector database
│   │   └── gemini_client.py   # AI model client
│   └── ui/                     # User interfaces
│       └── gradio_app.py      # Gradio web UI
├── tests/                      # Test suite
│   ├── test_cache_manager.py
│   ├── test_document_processor.py
│   ├── test_main.py
│   └── test_vector_store.py
├── scripts/                    # Utility scripts
│   ├── dev_helpers.py
│   ├── init_db.py
│   └── setup.sh
├── docs/                       # Documentation
│   ├── ARCHITECTURE.md        # System architecture
│   ├── SETUP.md              # Setup guide
│   └── API.md                # API reference
├── docker/                     # Docker configurations
├── config/                     # Configuration files
├── main.py                     # FastAPI entry point
├── app_gradio.py              # Gradio entry point
├── requirements.txt           # Python dependencies
├── pyproject.toml            # Project metadata
├── Dockerfile                # Docker image definition
├── docker-compose.yml        # Docker Compose config
├── CONTRIBUTING.md           # Contribution guidelines
├── CHANGELOG.md              # Version history
├── .env.example              # Environment template
└── uploads/                  # Document uploads

🔍 How CAG Works

Document Upload:
- User uploads a document (PDF/TXT/MD)
- Document is processed and chunked
- Chunks are embedded and stored in vector database
Query Processing:
- Check cache for exact or similar queries
- If cached, return immediately
- Otherwise, retrieve relevant document chunks
- Find semantically similar cached responses
- Send context to Gemini 2.0 Flash
- Cache the new response
- Return answer with metadata
Cache Strategy:
- LRU (Least Recently Used) eviction
- TTL (Time To Live) expiration
- Semantic similarity search for related queries
- Embedding-based cache lookup

🛠️ Troubleshooting

Common Issues

Module Not Found Errors:

# Install in development mode
pip install -e .

Import Error for google.generativeai:

pip install --upgrade google-generativeai

ChromaDB Issues:

pip install --upgrade chromadb
# Or clear the database
rm -rf chroma_db/  # Linux/Mac
rmdir /s chroma_db  # Windows

API Key Error:
- Verify your .env file exists in the project root
- Check that GEMINI_API_KEY is set correctly
- Ensure no extra spaces around the key
- Restart the application after changes

Port Already in Use:

# Change port when running
uvicorn src.cag.api.main:app --port 8001

For more troubleshooting help, see docs/SETUP.md.

📊 Performance Optimization

Cache Hit Rate: Monitor /stats endpoint to track cache efficiency
Chunk Size: Adjust CHUNK_SIZE for better context retrieval
Top K: Tune top_k parameter for optimal relevance vs. speed
Embedding Model: Use larger models for better semantic understanding

🔐 Security Considerations

Never commit .env file to version control
Use environment variables for sensitive data
Implement authentication for production deployment
Validate and sanitize file uploads
Set appropriate CORS policies

� Documentation

Comprehensive documentation is available in the docs/ directory:

Architecture Guide: System design, components, and data flow
Setup Guide: Detailed installation and configuration
API Reference: Complete API endpoint documentation
Migration Guide: Guide for understanding the new structure
Project Structure: Detailed directory organization

🧪 Testing

Run the test suite:

# Run all tests
pytest

# Run with coverage report
pytest --cov=src --cov-report=html

# Run specific test file
pytest tests/test_cache_manager.py

# Run with verbose output
pytest -v

🐳 Docker Deployment

Quick Start with Docker Compose

# Build and start all services
docker-compose up --build

# Run in detached mode
docker-compose up -d

# Stop all services
docker-compose down

Available Services

cag-app: FastAPI server (port 8000)
cag-gradio: Gradio UI (port 7860)
redis: Redis cache (port 6379)
prometheus: Monitoring (port 9090) - with --profile monitoring
grafana: Visualization (port 3000) - with --profile monitoring

With Monitoring

docker-compose --profile monitoring up

🔧 Development

Code Quality Tools

# Format code
black src/ tests/

# Sort imports
isort src/ tests/

# Lint code
flake8 src/ tests/

# Type checking
mypy src/

Pre-commit Hooks

# Install pre-commit hooks
pre-commit install

# Run manually on all files
pre-commit run --all-files

📝 License

This project is provided as-is for educational and development purposes.

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines on:

Code style and standards
Testing requirements
Pull request process
Development workflow

📧 Support

Documentation: Check the docs/ directory
API Docs: http://localhost:8000/docs (when server is running)
Issues: Report bugs or request features via GitHub Issues
Questions: Start a discussion or check existing documentation

🙏 Acknowledgments

Built with:

FastAPI: Modern web framework
Google Gemini: AI model (Gemini 2.5 Flash Lite)
ChromaDB: Vector database
Sentence Transformers: Embeddings
Gradio: Interactive UI
Pydantic: Data validation

📈 Project Status

Version: 1.0.0
Status: Active Development
Python: 3.9+
License: MIT

⭐ If you find this project useful, please consider giving it a star!

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Cache Augmentation Generation (CAG) System

🎯 Project Overview

Architecture Components

✨ What's New (v1.0.0)

🚀 Getting Started

Prerequisites

Installation

Running the Application

Running with Gradio Interface

📚 API Endpoints

Health Check

Upload Document

Query Document

Get Statistics

Clear Cache

Reset Vector Store

🔧 Configuration

🧪 Usage Examples

Python Example

JavaScript Example

🏗️ Project Structure

🔍 How CAG Works

🛠️ Troubleshooting

Common Issues

📊 Performance Optimization

🔐 Security Considerations

� Documentation

🧪 Testing

🐳 Docker Deployment

Quick Start with Docker Compose

Available Services

With Monitoring

🔧 Development

Code Quality Tools

Pre-commit Hooks

📝 License

🤝 Contributing

📧 Support

🙏 Acknowledgments

📈 Project Status