Skip to content

Panepo/Fujinami

Repository files navigation

Fujinami RAG Service

A hybrid Retrieval-Augmented Generation (RAG) system that combines Microsoft GraphRAG, Semantic Kernel, and LanceDB to answer questions over your document collections using locally-hosted Ollama models.


Features

  • Hybrid search — blends dense vector search (LanceDB) with graph-based retrieval (GraphRAG knowledge graph) for richer answers
  • Three query modesvector, hybrid, and global (community-level summaries)
  • Multi-collection — manage independent document collections via a REST API
  • Rich document ingestion — supports .txt, .md, .pdf, .docx, and .doc files; embedded images are described inline by a VLM before indexing
  • Streaming responses — optional token-by-token streaming on query endpoints
  • Built-in Web UI — zero-configuration browser interface served at /
  • Fully local — all LLM, embedding, and VLM calls go to Ollama; no cloud APIs required
  • RAGAS evaluation — score RAG responses against 10 built-in metrics (Faithfulness, Context Recall, Context Precision, Response Relevancy, Factual Correctness, Noise Sensitivity, Semantic Similarity, BLEU, ROUGE) using a locally-hosted LLM

Architecture Overview

Documents (.txt .md .pdf .docx .doc)
        │
        ▼
  DocumentLoader  ──▶  VLM image descriptions (llava:7b)
        │
        ▼
  ┌─────────────────────────────────┐
  │         Index Pipeline          │
  │                                 │
  │  GraphRAG CLI  ──▶  entities,   │
  │  (subprocess)       communities │
  │                     reports     │
  │                                 │
  │  SK Embeddings ──▶  LanceDB     │
  │  (bge-m3:567m)      chunks      │
  └─────────────────────────────────┘
        │
        ▼
  FastAPI server  ──▶  Web UI  /  REST API
        │
        ▼
  Query (vector | hybrid | global)
        │
        ▼
  llama3.2:3b  →  answer + source chunks

See docs/dataflow-ragService.md for full pipeline diagrams.


Requirements

Requirement Version
Python 3.12 (3.13+ not supported by onnxruntime)
uv latest
Ollama running locally on port 11434

Required Ollama models

Pull these before first use:

# Chat and query-time embeddings (local)
ollama pull llama3.2:3b
ollama pull bge-m3:567m

# Index-time embeddings and VLM (can be on a remote GPU server)
ollama pull bge-m3:567m
ollama pull llava:7b

Setup

1. Create a .env file

# Remote Ollama server used during indexing (embeddings + VLM)
OLLAMA_INDEX_URL=

# Local Ollama server used at query time
OLLAMA_CHAT_URL=

# Model names
CHAT_MODEL=llama3.2:3b
EMBEDDING_MODEL=bge-m3:567m
VLM_MODEL=llava:7b

# Optional: VLM HTTP timeout in seconds (default 180)
VLM_TIMEOUT=180

# Model used for RAGAS evaluation (needs large context window, e.g. gemma4:e4b)
RAGAS_MODEL=gemma4:e4b

# Optional: Ollama request timeout for RAGAS evaluation in seconds (default 1800)
OLLAMA_TIMEOUT=1800

If you only have one Ollama instance, set both OLLAMA_INDEX_URL and OLLAMA_CHAT_URL to the same URL.

2. Create the virtual environment and install dependencies

# Install uv (once)
pip install uv

# Create .venv and install dependencies
uv venv
uv pip install -r requirements.txt

3. Start the development server

uv run poe dev
# equivalent to: uvicorn api:app --reload

Open http://localhost:8000 in your browser.


Usage

Web UI

Navigate to http://localhost:8000 for the built-in interface. From there you can:

  • Create and manage collections
  • Upload documents
  • Trigger indexing (with optional entity type selection)
  • Run queries with vector, hybrid, or global mode

REST API

Interactive docs are available at http://localhost:8000/docs.

Collections

GET    /collections                    # list all collections
POST   /collections                    # create a collection  { "name": "my-docs" }
PATCH  /collections/{name}             # rename               { "new_name": "new-name" }
DELETE /collections/{name}             # delete collection and all its data

Documents

GET    /collections/{name}/documents              # list uploaded documents
POST   /collections/{name}/documents              # upload a file (multipart/form-data)
DELETE /collections/{name}/documents/{filename}   # delete a document

Indexing

POST /collections/{name}/index          # trigger indexing (async, returns task_id)
                                        # body (optional): { "entity_types": ["person", "org"] }
GET  /collections/{name}/index/{task_id} # poll indexing status
GET  /tasks                              # list all pending/running tasks

Querying

POST /collections/{name}/query
{
  "query": "What are the main roles in the system?",
  "method": "hybrid",
  "top_k": 5,
  "stream": false
}
Field Values Default
method vector | hybrid | global hybrid
top_k integer 5
stream true | false false

Response includes answer, sources (chunk excerpts with doc references), and graphrag_context.

RAGAS Evaluation

GET  /api/metrics                  # list available metrics and their required fields
POST /api/evaluate/single          # evaluate a single sample
POST /api/evaluate/batch           # evaluate a batch from a JSON or CSV file

Single evaluation (POST /api/evaluate/single):

{
  "user_input": "What are the main roles in the system?",
  "response": "The main roles are Master, User, and Viewer.",
  "retrieved_contexts": ["Masters can manage …", "Viewers can only read …"],
  "reference": "The system has three roles: Master, User, and Viewer.",
  "metrics": ["faithfulness", "llm_context_recall", "response_relevancy"]
}

Returns { "scores": { "faithfulness": 0.95, "llm_context_recall": 0.88, … } }.

Batch evaluation (POST /api/evaluate/batch):

Upload a .json (array of sample objects) or .csv file via multipart/form-data with a metrics form field (JSON-encoded list of metric IDs).

Available metric IDs:

ID Display Name Required Fields LLM Embeddings
faithfulness Faithfulness user_input, response, retrieved_contexts
llm_context_recall LLM Context Recall user_input, retrieved_contexts, reference
llm_context_precision LLM Context Precision user_input, retrieved_contexts, reference
context_precision_without_reference Context Precision (No Ref) user_input, response, retrieved_contexts
response_relevancy Response Relevancy user_input, response
factual_correctness Factual Correctness response, reference
noise_sensitivity Noise Sensitivity user_input, retrieved_contexts, response, reference
semantic_similarity Semantic Similarity response, reference
bleu_score BLEU Score response, reference
rouge_score ROUGE Score response, reference

Project Structure

Fujinami/
├── .env                        # environment variables (create this)
├── python/
│   ├── api.py                  # FastAPI application and all HTTP endpoints
│   ├── ragService.py           # RagService: indexing + search logic
│   ├── document_loader.py      # PDF/DOCX/DOC/TXT loader with VLM image descriptions
│   ├── ragas_runner.py         # RAGAS metric registry and async evaluation runner
│   ├── models.py               # Pydantic request/response schemas
│   ├── install_dependency.py   # Dependency installer script
│   ├── pyproject.toml          # Project metadata and poe tasks
│   ├── static/
│   │   └── index.html          # Single-page Web UI
│   ├── data/                   # Uploaded source documents (per collection)
│   └── ragdata/                # GraphRAG artifacts + LanceDB vector store (per collection)
└── docs/
    └── dataflow-ragService.md  # Detailed pipeline and data-flow documentation

Query Modes

Mode How it works Best for
vector Dense cosine similarity over LanceDB chunk embeddings Precise factual lookups
hybrid Vector search + GraphRAG local search combined General question answering
global GraphRAG community-level summary search Broad thematic / cross-document questions

Entity Types

When triggering indexing you can pass a list of entity types to tune the GraphRAG knowledge graph extraction:

organization  person  geo  event  concept  technology  product  process  system

Omitting entity_types uses the GraphRAG defaults.


Error Handling

Condition Behaviour
VLM call fails or times out Warning logged; image position left blank; indexing continues
.doc file on non-Windows File skipped with warning
Unsupported file extension File rejected at upload with HTTP 422
graphrag index subprocess fails Indexing task transitions to error; detail message returned
Ollama server unreachable HTTP 500 propagated to API caller

About

A hybrid Retrieval-Augmented Generation (RAG) system

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors