Skip to content

Latest commit

 

History

History
50 lines (39 loc) · 1.78 KB

File metadata and controls

50 lines (39 loc) · 1.78 KB

Architecture

VisoRAG has two public surfaces:

  • VisoRAG_v8_Final (1).ipynb: the authoritative notebook provenance.
  • src/visorag/: an import-safe Python package extracted for maintainers and contributors.

Flow

sequenceDiagram
    participant Client
    participant API as FastAPI /query
    participant Ingest as document_ingestion
    participant Retrieve as visual_retrieval
    participant Store as In-memory Qdrant
    participant VLM as Qwen2.5-VL

    Client->>API: multipart file + query + query_type + top_k
    API->>Ingest: sanitize, validate, render pages
    Ingest-->>API: page images
    API->>Retrieve: embed page images with ColQwen2
    Retrieve->>Store: create per-request multivector collection
    API->>Retrieve: embed user query
    Retrieve->>Store: retrieve top visual pages
    API->>VLM: selected page images + prompt
    VLM-->>API: answer text
    API-->>Client: JSON extraction or answer
Loading

Boundaries

  • config.py owns environment-derived defaults.
  • pipeline.py owns the notebook-compatible request orchestration.
  • features/document_ingestion.py owns file conversion and page rendering.
  • features/visual_retrieval.py owns ColQwen2 model loading, embeddings, and Qdrant operations.
  • features/answer_generation.py owns local Qwen2.5-VL loading and generation.
  • api/app.py owns HTTP validation, auth, routes, and response mapping.
  • cli.py owns local operator commands.

Runtime State

Each request gets a temporary UUID directory and an in-memory Qdrant collection. Uploaded files, rendered pages, and vector data are not persisted by default.

Non-Goals

  • No hosted multi-user auth system.
  • No persistent vector database by default.
  • No CPU fallback for real Qwen2.5-VL generation.
  • No Streamlit UI in the first clean open-source release.