Skip to content

backblaze-b2-samples/agentic-rag-vector-starter-kit

Repository files navigation

Agentic RAG Vector Starter Kit

A production-ready Agentic RAG (Retrieval-Augmented Generation) starter kit for building AI-powered document Q&A systems.

This template demonstrates how to ingest documents, chunk content, classify and summarize text, generate vector embeddings, and query a knowledge base using an agentic retrieval pipeline with citations.

Screenshots

Agentic RAG Dashboard

Agentic RAG Chat Interface

Built on Backblaze B2 cloud storage, LanceDB vector database, and LangChain. Works with OpenAI (one key for everything) or Anthropic Claude for chat.

What you get out of the box:

  • Chat UI with RAG citations: ChatGPT/Claude-style interface with streaming responses, clickable source references, and live pipeline step visualization
  • Document processing pipeline: automatic chunking (recursive or semantic), classification, summarization, contextual enrichment, and embedding on upload
  • Agentic retrieval pipeline: intent routing, corpus-aware query rewriting, multi-query planning, hybrid BM25 + dense vector search, Reciprocal Rank Fusion, cross-encoder reranking, Corrective RAG (CRAG) grading, evidence validation with retry loops
  • Cross-encoder reranking: lightweight 22M-param model (ms-marco-MiniLM-L-6-v2) replaces per-candidate LLM calls — ~100-200ms for 20 candidates vs ~10s
  • Hybrid search: BM25 full-text + dense vector search with automatic RRF fusion via LanceDB
  • RAGAS evaluation: automated faithfulness and context precision scoring (LLM-as-judge) runs asynchronously after each response
  • Corrective RAG (CRAG): grades retrieval quality as Correct/Ambiguous/Wrong and takes corrective action — strips irrelevant evidence, adds caveats for partial matches
  • Graceful degradation: every pipeline step has fallback handling — cross-encoder failure falls back to score-based ranking, CRAG failure keeps evidence, full pipeline failure still generates a response
  • Session analytics dashboard: session-level drill-down with per-message RAGAS scores (faithfulness, context precision), retrieval metrics, and agent behavior analytics
  • LanceDB vector store on S3/B2: no separate vector database infrastructure to manage
  • LangChain orchestration: pluggable LLM providers (OpenAI default, Anthropic optional), one API key to start
  • File management: drag-and-drop upload with progress tracking, file browser, and dashboard
  • Full-stack: Next.js 16 + React 19 + Tailwind v4 + shadcn/ui frontend, FastAPI + Pydantic backend
  • Strict layered architecture: enforced by structural tests, lints, and SDK containment rules
  • Agent-optimized codebase: AGENTS.md, ARCHITECTURE.md, and feature docs let AI coding agents read the repo and start contributing immediately

Agent-First Architecture

This repo is optimized for coding agents. Use the template, point your agent at it, and start building.

The structure follows the principle that repository knowledge is the system of record. Anything an agent can't access in-context doesn't exist, so everything it needs to reason about the codebase is versioned, co-located, and discoverable from the repo itself.

How it works

AGENTS.md is the single source of truth for all coding agents. A ~100 line entry point gives agents the repository layout, architectural invariants, commands, conventions, and pointers to deeper docs. Agent-specific files (CLAUDE.md, .cursorrules, etc.) are thin pointers back to AGENTS.md.

Architecture is enforced mechanically, not by convention. Layering rules, import boundaries, file size limits, and SDK containment are verified by structural tests and lints that run on every change. When rules are enforceable by code, agents follow them reliably.

The knowledge base is structured for progressive disclosure:

AGENTS.md              Single source of truth: layout, invariants, commands, conventions
ARCHITECTURE.md        System layout, layering rules, data flows
docs/
  features/            Feature docs (inputs, outputs, flows, edge cases)
  app-workflows.md     User journeys
  dev-workflows.md     Engineering workflows and testing
  SECURITY.md          Security principles
  RELIABILITY.md       Reliability expectations
  exec-plans/          Execution plans and tech debt tracker

Key design decisions

Principle Implementation
Give agents a single source of truth AGENTS.md ~100 lines: layout, invariants, commands, conventions
Enforce invariants mechanically Structural tests + ruff + ESLint verify boundaries
DRY documentation Each fact lives in one place; no redundant files to drift
Strict layered architecture types -> config -> repo -> service -> runtime, enforced by tests
Prefer boring, composable libraries stdlib logging over frameworks, Pydantic over ad-hoc validation
Contain external SDKs boto3, lancedb, langchain* only in repo/, verified by structural tests
Keep files agent-sized 300-line limit per file, enforced by test
Docs updated with code Same-PR requirement prevents documentation rot
Structured observability JSON logging, /metrics endpoint, request tracing

This approach draws from OpenAI's experience building with Codex: agents work best in environments with strict boundaries, predictable structure, and progressive context disclosure.

Quick Start

You need: Node.js >= 20, pnpm >= 9, Python >= 3.11, and a free Backblaze B2 account.

Start a new project

Option 1: GitHub Template (recommended)

Click the green "Use this template" button at the top of this repo, name your project, then:

git clone https://github.com/yourorg/my-cool-app.git
cd my-cool-app

Option 2: Clone and reinitialize

git clone https://github.com/backblaze-b2-samples/agentic-rag-vector-starter-kit.git my-rag-app
cd my-rag-app
rm -rf .git
git init
git add .
git commit -m "Initial commit from agentic-rag-vector-starter-kit"

Either way you get a clean project with no upstream history, ready to push to your own repo and point your agent at it.

Setup

1. Install dependencies

pnpm install

2. Set up the backend

cd services/api
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cd ../..

3. Add your B2 credentials

Create a bucket and an application key in your B2 dashboard (the key needs readFiles, writeFiles, deleteFiles permissions), then:

cp .env.example .env

Fill in your .env:

# B2 storage (required)
B2_S3_ENDPOINT=https://s3.us-west-004.backblazeb2.com
B2_APPLICATION_KEY_ID=your-key-id
B2_APPLICATION_KEY=your-key
B2_BUCKET_NAME=your-bucket

# LLM + Embeddings: one API key for everything (default: OpenAI)
OPENAI_API_KEY=your-openai-key

That's the minimum. By default, OpenAI handles both chat (gpt-4o) and embeddings (text-embedding-3-small).

To use Anthropic Claude for chat instead, add:

ANTHROPIC_API_KEY=your-anthropic-key
LLM_PROVIDER=anthropic
LLM_MODEL=claude-sonnet-4-20250514

See .env.example for all options including LANCEDB_URI, CHUNK_SIZE, and MAX_CHUNKS_PER_DOC.

4. Run it

pnpm dev

That's it. Frontend at localhost:3000, API at localhost:8000. Upload a document and ask it questions in the Chat page.

For production deployment, see Railway docs.

Core Features

  • Chat UI: streaming responses with citations and live pipeline step visualization
  • Agentic Retrieval: multi-step RAG pipeline with intent routing, query rewriting, hybrid search, cross-encoder reranking, CRAG, and evidence validation
  • Document Pipeline: recursive or semantic chunking, classification, summarization, contextual enrichment, embedding
  • File Upload: drag-and-drop upload with real-time progress and RAG processing
  • File Browser: list, preview, download, delete files
  • Dashboard: session analytics with RAGAS evaluation scores, retrieval quality, agent behavior metrics
  • Metadata Extraction: image dimensions, EXIF, PDF info, checksums
  • Structural tests: verify layering rules, import boundaries, SDK containment, file size limits
  • Structured JSON logging: every request traced with request_id and timing
  • /health endpoint: B2 connectivity check
  • /metrics endpoint: Prometheus-format counters (request count, latency, uploads)

Tech Stack

  • TypeScript, Next.js 16, React 19, Tailwind v4, shadcn/ui, Recharts
  • Python 3.11+, FastAPI, Pydantic v2, Pillow, PyPDF2, sentence-transformers
  • LanceDB (vector store on S3/B2 with hybrid BM25 + dense search), LangChain (LLM orchestration)
  • Cross-encoder reranking (ms-marco-MiniLM-L-6-v2, CPU-friendly)
  • OpenAI (default for chat + embeddings) or Anthropic Claude (optional for chat)
  • Backblaze B2 (S3-compatible object storage for files + vectors)
  • pnpm workspaces (monorepo)

Commands

Command What it does
pnpm dev Start frontend + backend
pnpm dev:web Frontend only
pnpm dev:api Backend only
pnpm build Build frontend
pnpm lint Lint frontend
pnpm lint:api Lint backend (ruff)
pnpm test:api Run backend tests
pnpm check:structure Verify layering rules
pnpm test:e2e Playwright e2e tests

Documentation Map

Doc Purpose
AGENTS.md Agent table of contents (start here)
ARCHITECTURE.md System layout, layering, data flows
docs/features/ Feature docs (chat, retrieval, pipeline, upload, browser, dashboard)
docs/app-workflows.md User journeys
docs/dev-workflows.md Engineering workflows and testing
docs/SECURITY.md Security principles
docs/RELIABILITY.md Reliability expectations
docs/exec-plans/ Execution plans and tech debt tracker

Contributing

Start with AGENTS.md. It's the map, and everything else is discoverable from there.

License

MIT License - see LICENSE for details.

Claude Agent B2 Skill

Manage Backblaze B2 from your terminal using natural language (list/search, audits, stale or large file detection, security checks, safe cleanup).

Repo: https://github.com/backblaze-b2-samples/claude-skill-b2-cloud-storage

About

Agentic RAG vector starter kit with AI coding agent architecture, Langchain, LanceDB, and Backblaze B2 for document upload, semantic search, grounded chat, and document citations.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages