Agentic RAG Vector Starter Kit

A production-ready Agentic RAG (Retrieval-Augmented Generation) starter kit for building AI-powered document Q&A systems.

This template demonstrates how to ingest documents, chunk content, classify and summarize text, generate vector embeddings, and query a knowledge base using an agentic retrieval pipeline with citations.

Screenshots

Built on Backblaze B2 cloud storage, LanceDB vector database, and LangChain. Works with OpenAI (one key for everything) or Anthropic Claude for chat.

What you get out of the box:

Chat UI with RAG citations: ChatGPT/Claude-style interface with streaming responses, clickable source references, and live pipeline step visualization
Document processing pipeline: automatic chunking (recursive or semantic), classification, summarization, contextual enrichment, and embedding on upload
Agentic retrieval pipeline: intent routing, corpus-aware query rewriting, multi-query planning, hybrid BM25 + dense vector search, Reciprocal Rank Fusion, cross-encoder reranking, Corrective RAG (CRAG) grading, evidence validation with retry loops
Cross-encoder reranking: lightweight 22M-param model (ms-marco-MiniLM-L-6-v2) replaces per-candidate LLM calls — ~100-200ms for 20 candidates vs ~10s
Hybrid search: BM25 full-text + dense vector search with automatic RRF fusion via LanceDB
RAGAS evaluation: automated faithfulness and context precision scoring (LLM-as-judge) runs asynchronously after each response
Corrective RAG (CRAG): grades retrieval quality as Correct/Ambiguous/Wrong and takes corrective action — strips irrelevant evidence, adds caveats for partial matches
Graceful degradation: every pipeline step has fallback handling — cross-encoder failure falls back to score-based ranking, CRAG failure keeps evidence, full pipeline failure still generates a response
Session analytics dashboard: session-level drill-down with per-message RAGAS scores (faithfulness, context precision), retrieval metrics, and agent behavior analytics
LanceDB vector store on S3/B2: no separate vector database infrastructure to manage
LangChain orchestration: pluggable LLM providers (OpenAI default, Anthropic optional), one API key to start
File management: drag-and-drop upload with progress tracking, file browser, and dashboard
Full-stack: Next.js 16 + React 19 + Tailwind v4 + shadcn/ui frontend, FastAPI + Pydantic backend
Strict layered architecture: enforced by structural tests, lints, and SDK containment rules
Agent-optimized codebase: AGENTS.md, ARCHITECTURE.md, and feature docs let AI coding agents read the repo and start contributing immediately

Agent-First Architecture

This repo is optimized for coding agents. Use the template, point your agent at it, and start building.

The structure follows the principle that repository knowledge is the system of record. Anything an agent can't access in-context doesn't exist, so everything it needs to reason about the codebase is versioned, co-located, and discoverable from the repo itself.

How it works

AGENTS.md is the single source of truth for all coding agents. A ~100 line entry point gives agents the repository layout, architectural invariants, commands, conventions, and pointers to deeper docs. Agent-specific files (CLAUDE.md, .cursorrules, etc.) are thin pointers back to AGENTS.md.

Architecture is enforced mechanically, not by convention. Layering rules, import boundaries, file size limits, and SDK containment are verified by structural tests and lints that run on every change. When rules are enforceable by code, agents follow them reliably.

The knowledge base is structured for progressive disclosure:

AGENTS.md              Single source of truth: layout, invariants, commands, conventions
ARCHITECTURE.md        System layout, layering rules, data flows
docs/
  features/            Feature docs (inputs, outputs, flows, edge cases)
  app-workflows.md     User journeys
  dev-workflows.md     Engineering workflows and testing
  SECURITY.md          Security principles
  RELIABILITY.md       Reliability expectations
  exec-plans/          Execution plans and tech debt tracker

Key design decisions

Principle	Implementation
Give agents a single source of truth	AGENTS.md ~100 lines: layout, invariants, commands, conventions
Enforce invariants mechanically	Structural tests + ruff + ESLint verify boundaries
DRY documentation	Each fact lives in one place; no redundant files to drift
Strict layered architecture	`types -> config -> repo -> service -> runtime`, enforced by tests
Prefer boring, composable libraries	stdlib logging over frameworks, Pydantic over ad-hoc validation
Contain external SDKs	`boto3`, `lancedb`, `langchain*` only in `repo/`, verified by structural tests
Keep files agent-sized	300-line limit per file, enforced by test
Docs updated with code	Same-PR requirement prevents documentation rot
Structured observability	JSON logging, `/metrics` endpoint, request tracing

This approach draws from OpenAI's experience building with Codex: agents work best in environments with strict boundaries, predictable structure, and progressive context disclosure.

Quick Start

You need: Node.js >= 20, pnpm >= 9, Python >= 3.11, and a free Backblaze B2 account.

Start a new project

Option 1: GitHub Template (recommended)

Click the green "Use this template" button at the top of this repo, name your project, then:

git clone https://github.com/yourorg/my-cool-app.git
cd my-cool-app

Option 2: Clone and reinitialize

git clone https://github.com/backblaze-b2-samples/agentic-rag-vector-starter-kit.git my-rag-app
cd my-rag-app
rm -rf .git
git init
git add .
git commit -m "Initial commit from agentic-rag-vector-starter-kit"

Either way you get a clean project with no upstream history, ready to push to your own repo and point your agent at it.

Setup

1. Install dependencies

pnpm install

2. Set up the backend

cd services/api
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cd ../..

3. Add your B2 credentials

Create a bucket and an application key in your B2 dashboard (the key needs readFiles, writeFiles, deleteFiles permissions), then:

cp .env.example .env

Fill in your .env:

# B2 storage (required)
B2_S3_ENDPOINT=https://s3.us-west-004.backblazeb2.com
B2_APPLICATION_KEY_ID=your-key-id
B2_APPLICATION_KEY=your-key
B2_BUCKET_NAME=your-bucket

# LLM + Embeddings: one API key for everything (default: OpenAI)
OPENAI_API_KEY=your-openai-key

That's the minimum. By default, OpenAI handles both chat (gpt-4o) and embeddings (text-embedding-3-small).

To use Anthropic Claude for chat instead, add:

ANTHROPIC_API_KEY=your-anthropic-key
LLM_PROVIDER=anthropic
LLM_MODEL=claude-sonnet-4-20250514

See .env.example for all options including LANCEDB_URI, CHUNK_SIZE, and MAX_CHUNKS_PER_DOC.

4. Run it

pnpm dev

That's it. Frontend at localhost:3000, API at localhost:8000. Upload a document and ask it questions in the Chat page.

For production deployment, see Railway docs.

Core Features

Chat UI: streaming responses with citations and live pipeline step visualization
Agentic Retrieval: multi-step RAG pipeline with intent routing, query rewriting, hybrid search, cross-encoder reranking, CRAG, and evidence validation
Document Pipeline: recursive or semantic chunking, classification, summarization, contextual enrichment, embedding
File Upload: drag-and-drop upload with real-time progress and RAG processing
File Browser: list, preview, download, delete files
Dashboard: session analytics with RAGAS evaluation scores, retrieval quality, agent behavior metrics
Metadata Extraction: image dimensions, EXIF, PDF info, checksums
Structural tests: verify layering rules, import boundaries, SDK containment, file size limits
Structured JSON logging: every request traced with request_id and timing
/health endpoint: B2 connectivity check
/metrics endpoint: Prometheus-format counters (request count, latency, uploads)

Tech Stack

TypeScript, Next.js 16, React 19, Tailwind v4, shadcn/ui, Recharts
Python 3.11+, FastAPI, Pydantic v2, Pillow, PyPDF2, sentence-transformers
LanceDB (vector store on S3/B2 with hybrid BM25 + dense search), LangChain (LLM orchestration)
Cross-encoder reranking (ms-marco-MiniLM-L-6-v2, CPU-friendly)
OpenAI (default for chat + embeddings) or Anthropic Claude (optional for chat)
Backblaze B2 (S3-compatible object storage for files + vectors)
pnpm workspaces (monorepo)

Commands

Command	What it does
`pnpm dev`	Start frontend + backend
`pnpm dev:web`	Frontend only
`pnpm dev:api`	Backend only
`pnpm build`	Build frontend
`pnpm lint`	Lint frontend
`pnpm lint:api`	Lint backend (ruff)
`pnpm test:api`	Run backend tests
`pnpm check:structure`	Verify layering rules
`pnpm test:e2e`	Playwright e2e tests

Documentation Map

Doc	Purpose
AGENTS.md	Agent table of contents (start here)
ARCHITECTURE.md	System layout, layering, data flows
docs/features/	Feature docs (chat, retrieval, pipeline, upload, browser, dashboard)
docs/app-workflows.md	User journeys
docs/dev-workflows.md	Engineering workflows and testing
docs/SECURITY.md	Security principles
docs/RELIABILITY.md	Reliability expectations
docs/exec-plans/	Execution plans and tech debt tracker

Contributing

Start with AGENTS.md. It's the map, and everything else is discoverable from there.

License

MIT License - see LICENSE for details.

Claude Agent B2 Skill

Manage Backblaze B2 from your terminal using natural language (list/search, audits, stale or large file detection, security checks, safe cleanup).

Repo: https://github.com/backblaze-b2-samples/claude-skill-b2-cloud-storage

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.claude/worktrees		.claude/worktrees
apps/web		apps/web
docs		docs
infra/railway		infra/railway
packages/shared		packages/shared
services/api		services/api
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
ARCHITECTURE.md		ARCHITECTURE.md
Agentic-RAG-Chat.png		Agentic-RAG-Chat.png
Agentic-RAG-Dashboard.png		Agentic-RAG-Dashboard.png
CLAUDE.md		CLAUDE.md
CODE_REVIEW.md		CODE_REVIEW.md
LICENSE		LICENSE
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic RAG Vector Starter Kit

Screenshots

Agent-First Architecture

How it works

Key design decisions

Quick Start

Start a new project

Setup

Core Features

Tech Stack

Commands

Documentation Map

Contributing

License

Claude Agent B2 Skill

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Agentic RAG Vector Starter Kit

Screenshots

Agent-First Architecture

How it works

Key design decisions

Quick Start

Start a new project

Setup

Core Features

Tech Stack

Commands

Documentation Map

Contributing

License

Claude Agent B2 Skill

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages