A production-ready Agentic RAG (Retrieval-Augmented Generation) starter kit for building AI-powered document Q&A systems.
This template demonstrates how to ingest documents, chunk content, classify and summarize text, generate vector embeddings, and query a knowledge base using an agentic retrieval pipeline with citations.
Built on Backblaze B2 cloud storage, LanceDB vector database, and LangChain. Works with OpenAI (one key for everything) or Anthropic Claude for chat.
What you get out of the box:
- Chat UI with RAG citations: ChatGPT/Claude-style interface with streaming responses, clickable source references, and live pipeline step visualization
- Document processing pipeline: automatic chunking (recursive or semantic), classification, summarization, contextual enrichment, and embedding on upload
- Agentic retrieval pipeline: intent routing, corpus-aware query rewriting, multi-query planning, hybrid BM25 + dense vector search, Reciprocal Rank Fusion, cross-encoder reranking, Corrective RAG (CRAG) grading, evidence validation with retry loops
- Cross-encoder reranking: lightweight 22M-param model (ms-marco-MiniLM-L-6-v2) replaces per-candidate LLM calls — ~100-200ms for 20 candidates vs ~10s
- Hybrid search: BM25 full-text + dense vector search with automatic RRF fusion via LanceDB
- RAGAS evaluation: automated faithfulness and context precision scoring (LLM-as-judge) runs asynchronously after each response
- Corrective RAG (CRAG): grades retrieval quality as Correct/Ambiguous/Wrong and takes corrective action — strips irrelevant evidence, adds caveats for partial matches
- Graceful degradation: every pipeline step has fallback handling — cross-encoder failure falls back to score-based ranking, CRAG failure keeps evidence, full pipeline failure still generates a response
- Session analytics dashboard: session-level drill-down with per-message RAGAS scores (faithfulness, context precision), retrieval metrics, and agent behavior analytics
- LanceDB vector store on S3/B2: no separate vector database infrastructure to manage
- LangChain orchestration: pluggable LLM providers (OpenAI default, Anthropic optional), one API key to start
- File management: drag-and-drop upload with progress tracking, file browser, and dashboard
- Full-stack: Next.js 16 + React 19 + Tailwind v4 + shadcn/ui frontend, FastAPI + Pydantic backend
- Strict layered architecture: enforced by structural tests, lints, and SDK containment rules
- Agent-optimized codebase: AGENTS.md, ARCHITECTURE.md, and feature docs let AI coding agents read the repo and start contributing immediately
This repo is optimized for coding agents. Use the template, point your agent at it, and start building.
The structure follows the principle that repository knowledge is the system of record. Anything an agent can't access in-context doesn't exist, so everything it needs to reason about the codebase is versioned, co-located, and discoverable from the repo itself.
AGENTS.md is the single source of truth for all coding agents. A ~100 line entry point gives agents the repository layout, architectural invariants, commands, conventions, and pointers to deeper docs. Agent-specific files (CLAUDE.md, .cursorrules, etc.) are thin pointers back to AGENTS.md.
Architecture is enforced mechanically, not by convention. Layering rules, import boundaries, file size limits, and SDK containment are verified by structural tests and lints that run on every change. When rules are enforceable by code, agents follow them reliably.
The knowledge base is structured for progressive disclosure:
AGENTS.md Single source of truth: layout, invariants, commands, conventions
ARCHITECTURE.md System layout, layering rules, data flows
docs/
features/ Feature docs (inputs, outputs, flows, edge cases)
app-workflows.md User journeys
dev-workflows.md Engineering workflows and testing
SECURITY.md Security principles
RELIABILITY.md Reliability expectations
exec-plans/ Execution plans and tech debt tracker
| Principle | Implementation |
|---|---|
| Give agents a single source of truth | AGENTS.md ~100 lines: layout, invariants, commands, conventions |
| Enforce invariants mechanically | Structural tests + ruff + ESLint verify boundaries |
| DRY documentation | Each fact lives in one place; no redundant files to drift |
| Strict layered architecture | types -> config -> repo -> service -> runtime, enforced by tests |
| Prefer boring, composable libraries | stdlib logging over frameworks, Pydantic over ad-hoc validation |
| Contain external SDKs | boto3, lancedb, langchain* only in repo/, verified by structural tests |
| Keep files agent-sized | 300-line limit per file, enforced by test |
| Docs updated with code | Same-PR requirement prevents documentation rot |
| Structured observability | JSON logging, /metrics endpoint, request tracing |
This approach draws from OpenAI's experience building with Codex: agents work best in environments with strict boundaries, predictable structure, and progressive context disclosure.
You need: Node.js >= 20, pnpm >= 9, Python >= 3.11, and a free Backblaze B2 account.
Option 1: GitHub Template (recommended)
Click the green "Use this template" button at the top of this repo, name your project, then:
git clone https://github.com/yourorg/my-cool-app.git
cd my-cool-appOption 2: Clone and reinitialize
git clone https://github.com/backblaze-b2-samples/agentic-rag-vector-starter-kit.git my-rag-app
cd my-rag-app
rm -rf .git
git init
git add .
git commit -m "Initial commit from agentic-rag-vector-starter-kit"Either way you get a clean project with no upstream history, ready to push to your own repo and point your agent at it.
1. Install dependencies
pnpm install2. Set up the backend
cd services/api
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cd ../..3. Add your B2 credentials
Create a bucket and an application key in your B2 dashboard (the key needs readFiles, writeFiles, deleteFiles permissions), then:
cp .env.example .envFill in your .env:
# B2 storage (required)
B2_S3_ENDPOINT=https://s3.us-west-004.backblazeb2.com
B2_APPLICATION_KEY_ID=your-key-id
B2_APPLICATION_KEY=your-key
B2_BUCKET_NAME=your-bucket
# LLM + Embeddings: one API key for everything (default: OpenAI)
OPENAI_API_KEY=your-openai-key
That's the minimum. By default, OpenAI handles both chat (gpt-4o) and embeddings (text-embedding-3-small).
To use Anthropic Claude for chat instead, add:
ANTHROPIC_API_KEY=your-anthropic-key
LLM_PROVIDER=anthropic
LLM_MODEL=claude-sonnet-4-20250514
See .env.example for all options including LANCEDB_URI, CHUNK_SIZE, and MAX_CHUNKS_PER_DOC.
4. Run it
pnpm devThat's it. Frontend at localhost:3000, API at localhost:8000. Upload a document and ask it questions in the Chat page.
For production deployment, see Railway docs.
- Chat UI: streaming responses with citations and live pipeline step visualization
- Agentic Retrieval: multi-step RAG pipeline with intent routing, query rewriting, hybrid search, cross-encoder reranking, CRAG, and evidence validation
- Document Pipeline: recursive or semantic chunking, classification, summarization, contextual enrichment, embedding
- File Upload: drag-and-drop upload with real-time progress and RAG processing
- File Browser: list, preview, download, delete files
- Dashboard: session analytics with RAGAS evaluation scores, retrieval quality, agent behavior metrics
- Metadata Extraction: image dimensions, EXIF, PDF info, checksums
- Structural tests: verify layering rules, import boundaries, SDK containment, file size limits
- Structured JSON logging: every request traced with
request_idand timing /healthendpoint: B2 connectivity check/metricsendpoint: Prometheus-format counters (request count, latency, uploads)
- TypeScript, Next.js 16, React 19, Tailwind v4, shadcn/ui, Recharts
- Python 3.11+, FastAPI, Pydantic v2, Pillow, PyPDF2, sentence-transformers
- LanceDB (vector store on S3/B2 with hybrid BM25 + dense search), LangChain (LLM orchestration)
- Cross-encoder reranking (ms-marco-MiniLM-L-6-v2, CPU-friendly)
- OpenAI (default for chat + embeddings) or Anthropic Claude (optional for chat)
- Backblaze B2 (S3-compatible object storage for files + vectors)
- pnpm workspaces (monorepo)
| Command | What it does |
|---|---|
pnpm dev |
Start frontend + backend |
pnpm dev:web |
Frontend only |
pnpm dev:api |
Backend only |
pnpm build |
Build frontend |
pnpm lint |
Lint frontend |
pnpm lint:api |
Lint backend (ruff) |
pnpm test:api |
Run backend tests |
pnpm check:structure |
Verify layering rules |
pnpm test:e2e |
Playwright e2e tests |
| Doc | Purpose |
|---|---|
| AGENTS.md | Agent table of contents (start here) |
| ARCHITECTURE.md | System layout, layering, data flows |
| docs/features/ | Feature docs (chat, retrieval, pipeline, upload, browser, dashboard) |
| docs/app-workflows.md | User journeys |
| docs/dev-workflows.md | Engineering workflows and testing |
| docs/SECURITY.md | Security principles |
| docs/RELIABILITY.md | Reliability expectations |
| docs/exec-plans/ | Execution plans and tech debt tracker |
Start with AGENTS.md. It's the map, and everything else is discoverable from there.
MIT License - see LICENSE for details.
Manage Backblaze B2 from your terminal using natural language (list/search, audits, stale or large file detection, security checks, safe cleanup).
Repo: https://github.com/backblaze-b2-samples/claude-skill-b2-cloud-storage

