Semantic Search Architecture

This document explains the architecture of the semantic search feature in the Nextcloud MCP Server, including background synchronization, vector search, and optional AI-generated answers via MCP sampling.

Important

Status: Experimental

Disabled by default (ENABLE_SEMANTIC_SEARCH=false)
Currently supports Notes, Files (PDFs), News items, and Deck cards
Requires additional infrastructure (Qdrant vector database + Ollama embedding service)
RAG answer generation requires MCP client sampling support

Overview

What is Semantic Search?

Semantic search finds information based on meaning rather than exact keyword matches. It uses vector embeddings to understand that "car" and "automobile" are similar, or that "bread recipe" matches "how to bake bread."

Traditional keyword search:

Query: "machine learning"
Matches: Only notes containing "machine learning" exactly
Misses: Notes with "neural networks", "AI models", "deep learning"

Semantic search:

Query: "machine learning"
Matches: Notes about machine learning, neural networks, AI, deep learning, etc.
Understanding: Semantic similarity via vector embeddings

Why It Matters

Semantic search enables:

Natural language queries - Ask questions in plain language
Conceptual discovery - Find related content even with different terminology
Cross-reference insights - Connect ideas across your knowledge base
AI-powered answers - Generate summaries with citations (optional, requires MCP sampling)

Current Support

Supported Apps: Notes, Files (PDFs with text extraction), News items, Deck cards
Planned Apps: Calendar events, Calendar tasks, Contacts
Architecture: Multi-app plugin system ready for additional apps

System Components

graph TB
    subgraph "MCP Client"
        Client[Claude Desktop, IDEs, etc.]
    end

    subgraph "Nextcloud MCP Server"
        MCP[MCP Server]
        Scanner[Background Scanner<br/>Hourly Change Detection]
        Queue[Document Queue]
        Processor[Embedding Processors<br/>Concurrent Workers]
    end

    subgraph "Infrastructure"
        Qdrant[(Qdrant<br/>Vector Database)]
        Ollama[Ollama<br/>Embedding Service]
        NC[Nextcloud<br/>Notes API, CalDAV, etc.]
    end

    Client <-->|MCP Protocol| MCP
    Scanner -->|Fetch Changes| NC
    Scanner -->|Enqueue Documents| Queue
    Queue -->|Process Batch| Processor
    Processor -->|Generate Embeddings| Ollama
    Processor -->|Store Vectors| Qdrant
    MCP -->|Search Queries| Qdrant
    MCP -->|Verify Access| NC

Component Roles:

MCP Server: Exposes semantic search tools (nc_semantic_search, nc_semantic_search_answer, nc_get_vector_sync_status)
Background Scanner: Discovers changed documents every hour using ETag-based change detection
Document Queue: Holds pending documents for embedding generation
Embedding Processors: Generate vector embeddings via Ollama (concurrent workers)
Qdrant Vector Database: Stores document vectors with metadata and user_id filtering
Ollama Embedding Service: Converts text to 768-dimensional vectors (default: nomic-embed-text model)
Nextcloud APIs: Source of truth for documents and access control verification

How It Works: Background Synchronization

Background synchronization runs automatically when ENABLE_SEMANTIC_SEARCH=true, discovering changes and indexing documents without user intervention.

sequenceDiagram
    participant Timer
    participant Scanner
    participant NC as Nextcloud API
    participant Queue
    participant Processor
    participant Ollama
    participant Qdrant

    Timer->>Scanner: Trigger (hourly)
    Scanner->>NC: Fetch all notes<br/>(Notes API)
    NC-->>Scanner: Notes with ETags
    Scanner->>Qdrant: Check indexed documents
    Qdrant-->>Scanner: Existing ETags
    Scanner->>Scanner: Identify changes<br/>(new/modified/deleted)
    Scanner->>Queue: Enqueue changed docs

    loop Continuous Processing
        Processor->>Queue: Fetch batch
        Queue-->>Processor: Documents
        Processor->>Ollama: Generate embeddings
        Ollama-->>Processor: 768-dim vectors
        Processor->>Qdrant: Upsert vectors<br/>(with user_id, doc_type)
    end

Scanner Behavior

Hourly Trigger:

Runs every hour (configurable)
Fetches all notes from Nextcloud Notes API
Compares ETags with Qdrant's indexed state
Enqueues new/modified documents

Change Detection:

New documents: No entry in Qdrant → enqueue for indexing
Modified documents: ETag mismatch → enqueue for re-indexing
Deleted documents: In Qdrant but not in Nextcloud → delete from Qdrant

Multi-App Plugin Architecture:

# Each app implements DocumentScanner interface
class NotesScanner(DocumentScanner):
    async def scan(self) -> list[Document]:
        # Fetch notes, detect changes, return documents

Currently only NotesScanner is implemented. Future: CalendarScanner, DeckScanner, FilesScanner, etc.

Queue Processing

Document Queue:

In-memory FIFO queue (not persistent across restarts)
Holds documents pending embedding generation
Batch processing for efficiency

Processor Pool:

Concurrent workers using anyio.TaskGroup
Process documents in parallel (default: 4 workers)
Each worker: fetch document → generate embedding → store in Qdrant

Backpressure Handling:

Queue size limits prevent memory exhaustion
Slow consumers (Ollama) naturally pace the system

Vector Storage

Qdrant Collection Schema:

{
  "id": "note_123",
  "vector": [768 dimensions],
  "payload": {
    "user_id": "alice",
    "doc_type": "note",
    "doc_id": "123",
    "title": "Machine Learning Notes",
    "content": "Neural networks are...",
    "etag": "abc123",
    "last_modified": "2025-01-15T10:30:00Z"
  }
}

Key Fields:

user_id: Multi-tenancy filtering (each user's vectors isolated)
doc_type: App identifier ("note", "event", "card", etc.)
etag: Change detection for incremental updates
chunk_index: Position of this chunk within the document (0-indexed)
total_chunks: Total number of chunks for this document
excerpt: First 200 characters of chunk (for display)

Document Chunking Strategy

Documents are chunked before embedding to handle content larger than the embedding model's context window and to improve search precision.

Configuration:

DOCUMENT_CHUNK_SIZE=512       # Words per chunk (default)
DOCUMENT_CHUNK_OVERLAP=50     # Overlapping words between chunks (default)

Chunking Process:

Text combination: Document title + content (e.g., "Note Title\n\nNote content...")
Word-based splitting: Simple whitespace tokenization
Sliding window: Create overlapping chunks
Individual embedding: Each chunk gets its own vector
Separate storage: Each chunk stored as distinct point in Qdrant

Example:

Document (1000 words):
→ Chunk 0: words 0-511
→ Chunk 1: words 462-973 (overlaps by 50 words)
→ Chunk 2: words 924-999 (last chunk, partial)

Each chunk stored as separate vector with metadata:
- chunk_index: 0, 1, 2
- total_chunks: 3
- excerpt: First 200 chars of each chunk

Search Behavior:

Vector search operates on chunks (not whole documents)
Deduplication collapses multiple matching chunks from same document
Best match returns highest-scoring chunk's excerpt
Access verification still performed at document level

Tuning Recommendations:

Small chunks (256-384 words): More precise, less context, more storage
Large chunks (768-1024 words): More context, less precise, less storage
Overlap (10-20% of chunk size): Preserves context across boundaries
Match to embedding model: Consider model's context window when sizing

Important: Changing chunk size requires re-embedding all documents. Use the collection naming strategy to manage different chunking configurations.

Collection Naming and Model Switching

Auto-generated collection names:

Format: {deployment-id}-{model-name}
Deployment ID: OTEL_SERVICE_NAME (if configured) or hostname (fallback)
Model name: OLLAMA_EMBEDDING_MODEL
Example: "my-mcp-server-nomic-embed-text", "mcp-container-all-minilm"

Why model-based naming:

Ensures each embedding model gets its own collection
Prevents dimension mismatches when switching models
Enables safe model experimentation (new model = new collection)
Supports multi-server deployments (different deployment IDs)

Switching embedding models:

Collections are mutually exclusive - vectors from one embedding model cannot be used with another. When you change the embedding model:

New collection is created with the new model's dimensions
Full re-embedding occurs - scanner processes all documents again
Old collection remains - can be deleted manually if no longer needed
Dimension validation - server fails fast if collection dimension doesn't match model

Example workflow:

# Start with nomic-embed-text (768 dimensions)
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
# Collection: "my-server-nomic-embed-text"
# → Scanner indexes 1000 notes → 1000 vectors in collection

# Switch to all-minilm (384 dimensions)
OLLAMA_EMBEDDING_MODEL=all-minilm
# Collection: "my-server-all-minilm"
# → Scanner detects 0 indexed documents → re-embeds 1000 notes
# → Old collection "my-server-nomic-embed-text" still exists in Qdrant

Re-embedding performance:

CPU-only: 1-5 notes/second
With GPU: 50-200 notes/second
1000 notes: 3-16 minutes (CPU) or 5-20 seconds (GPU)

Multi-server deployments:

Multiple MCP servers can share one Qdrant instance safely:

# Server 1 (Production)
OTEL_SERVICE_NAME=mcp-prod
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
# → Collection: "mcp-prod-nomic-embed-text"

# Server 2 (Staging with different model)
OTEL_SERVICE_NAME=mcp-staging
OLLAMA_EMBEDDING_MODEL=all-minilm
# → Collection: "mcp-staging-all-minilm"

Each deployment gets its own collection - no naming collisions or dimension conflicts.

How It Works: Semantic Search

Semantic search converts user queries into vectors and finds similar documents using cosine similarity.

sequenceDiagram
    participant User
    participant MCP as MCP Server
    participant Ollama
    participant Qdrant
    participant NC as Nextcloud API

    User->>MCP: nc_semantic_search("machine learning")
    MCP->>MCP: Check OAuth scope<br/>(semantic.read)
    MCP->>Ollama: Generate query embedding
    Ollama-->>MCP: Query vector (768-dim)
    MCP->>Qdrant: Search similar vectors<br/>(filter: user_id=alice)
    Qdrant-->>MCP: Top K results<br/>(with similarity scores)

    loop For each result
        MCP->>NC: Verify access<br/>(fetch note by ID)
        alt Access granted
            NC-->>MCP: Note metadata
        else Access denied (404/401)
            MCP->>MCP: Filter out result
        end
    end

    MCP-->>User: Search results<br/>(with scores, excerpts)

Dual-Phase Authorization

Phase 1: OAuth Scope Check

Verify user has semantic.read scope
Rejects unauthorized users immediately

Phase 2: Per-Document Verification

For each search result, fetch document via app API (Notes, Calendar, etc.)
If fetch succeeds (200 OK), user has access
If fetch fails (404 Not Found, 401 Unauthorized), filter out result
Security: Prevents information leakage from vector search alone

Rationale:

Vector database doesn't know about sharing, permissions changes, or deleted documents
App APIs are source of truth for access control
Verification ensures users only see documents they can access

Search Flow

Query Embedding: Convert user query to 768-dimensional vector via Ollama
Vector Search: Find top K similar vectors in Qdrant (cosine similarity)
User Filtering: Qdrant pre-filters by user_id (multi-tenancy)
Access Verification: Fetch each document via app API to verify current access
Result Ranking: Return results sorted by similarity score
Response: Include document excerpts, metadata, and similarity scores

Performance

Query latency: 50-200ms typical (embedding + vector search + verification)
Accuracy: Depends on embedding model quality (nomic-embed-text recommended)
Scalability: Qdrant handles millions of vectors efficiently

How It Works: RAG with MCP Sampling (Optional)

The nc_semantic_search_answer tool generates AI-powered answers with citations using MCP sampling - requesting the MCP client's LLM to generate text.

sequenceDiagram
    participant User
    participant MCP as MCP Server
    participant Client as MCP Client<br/>(Claude Desktop)
    participant LLM as Client's LLM<br/>(Claude, GPT, etc.)

    User->>MCP: nc_semantic_search_answer("What are my Q1 goals?")
    MCP->>MCP: Semantic search<br/>(find relevant notes)
    MCP->>MCP: Construct prompt<br/>(query + documents + instructions)
    MCP->>Client: Sampling request<br/>(MCP Protocol)
    Client->>User: Prompt for approval<br/>(optional, client-controlled)
    User-->>Client: Approve
    Client->>LLM: Generate answer<br/>(with context)
    LLM-->>Client: Answer with citations
    Client-->>MCP: Sampling response
    MCP-->>User: Generated answer<br/>(with source documents)

MCP Sampling Architecture

Why MCP Sampling?

No server-side LLM: MCP server has no API keys, doesn't call LLMs directly
Client controls everything: Which model, who pays, user approval prompts
Privacy: Documents stay with the client's LLM provider, not a third-party
Flexibility: Works with any MCP client that supports sampling (Claude Desktop, future clients)

Prompt Construction:

User Query: {query}

Relevant Documents:
1. Document: {title} (Note)
   Content: {excerpt}

2. Document: {title} (Note)
   Content: {excerpt}

Instructions:
- Provide a comprehensive answer to the user's query
- Use the documents above as context
- Include citations: "According to Document 1 (title)..."
- If documents don't contain enough information, say so

Graceful Fallback:

try:
    result = await ctx.session.create_message(...)
    return answer_with_citations
except Exception as e:
    # Fallback: Return documents without generated answer
    return SearchResponse(
        generated_answer=f"[Sampling unavailable: {e}]",
        sources=search_results
    )

Client Support:

Requires: MCP client with sampling capability
Known support: Claude Desktop (as of Claude 3.5+)
Graceful degradation: Returns raw documents if sampling unavailable

Authentication & Security

OAuth Scopes

semantic.read - Search permission

Allows using nc_semantic_search and nc_semantic_search_answer tools
Does NOT grant access to documents (verified via app APIs)
Required for any semantic search operation

semantic.write - Sync control permission

Allows enabling/disabling background sync (provision_vector_sync, deprovision_vector_sync)
Controls whether user's documents are indexed
Currently not implemented in OAuth mode (BasicAuth only)

Dual-Phase Authorization Pattern

Phase 1: Scope Check (semantic.read)

Verifies user authorized to search
Prevents unauthorized vector database access

Phase 2: Document Verification (app-specific APIs)

For each search result, fetch via Notes API, CalDAV, etc.
If user can fetch → include in results
If user cannot fetch (404/401) → filter out
Security: Vector search cannot leak documents user shouldn't see

Example Scenario:

Alice creates note "Secret Project X"
Background sync indexes note with user_id=alice
Bob searches for "project"
Vector search finds "Secret Project X" (vector similarity)
Qdrant filters by user_id=bob → no match (Alice's note excluded)
Even if Bob somehow got the doc_id, Phase 2 verification would fail (404 Not Found)

Offline Access for Background Sync

Why needed:

Background scanner runs hourly without user interaction
Requires valid access tokens to fetch documents from Nextcloud APIs
User's session token expires after hours/days

OAuth Mode (ADR-004 Flow 2):

User explicitly provisions offline access via provision_nextcloud_access tool
Server requests offline_access scope → receives refresh token
Refresh token stored securely (database, encrypted)
Background sync uses refresh tokens to obtain access tokens

BasicAuth Mode:

Username/password stored in environment variables
Always available for background operations
Simpler but less secure (credentials never expire)

Deployment Modes

Authentication Modes

Mode	Security	Offline Access	Background Sync	Best For
BasicAuth	Lower (credentials in env)	Always available	✅ Works immediately	Single-user, development, testing
OAuth	Higher (tokens, scopes)	User must provision	⚠️ Not yet implemented	Multi-user, production

BasicAuth:

Set NEXTCLOUD_USERNAME and NEXTCLOUD_PASSWORD
Background sync works immediately when ENABLE_SEMANTIC_SEARCH=true
Credentials stored in .env file (secure server access required)

OAuth:

Client authenticates with semantic.read scope
User must explicitly provision offline access (future: provision_vector_sync tool)
Background sync only works for users who provisioned access
More secure: tokens expire, user controls access

Qdrant Deployment Modes

Mode	Configuration	Persistence	Scalability	Best For
In-Memory (default)	`QDRANT_LOCATION=:memory:`	❌ Lost on restart	Single instance	Testing, development
Persistent Local	`QDRANT_LOCATION=/data/qdrant`	✅ Survives restarts	Single instance	Small deployments
Network	`QDRANT_URL=http://qdrant:6333`	✅ Dedicated service	✅ Horizontal scaling	Production

In-Memory Mode:

ENABLE_SEMANTIC_SEARCH=true
# QDRANT_LOCATION not set → defaults to :memory:

Fastest startup
No disk I/O
Warning: All vectors lost when server restarts (must re-index)

Persistent Local Mode:

ENABLE_SEMANTIC_SEARCH=true
QDRANT_LOCATION=/var/lib/qdrant

Vectors survive restarts
Single server only (no distributed setup)
Disk I/O for durability

Network Mode (Recommended for Production):

ENABLE_SEMANTIC_SEARCH=true
QDRANT_URL=http://qdrant:6333
QDRANT_API_KEY=secret  # optional

Dedicated Qdrant service (Docker, Kubernetes)
Horizontal scaling (multiple MCP servers → one Qdrant)
High availability options

Embedding Service Options

Service	Configuration	Cost	Performance	Best For
Ollama (recommended)	`OLLAMA_BASE_URL=http://ollama:11434`	Free (self-hosted)	Fast (local GPU)	Production, development
OpenAI (future)	`OPENAI_API_KEY=sk-...`	Paid (API)	Fast (cloud)	Cloud deployments
Fallback	No config	Free	Slow (random)	Testing only (not production)

Ollama Setup (Recommended):

# docker-compose.yml
services:
  ollama:
    image: ollama/ollama
    volumes:
      - ollama-data:/root/.ollama
    ports:
      - "11434:11434"

# Pull embedding model
docker compose exec ollama ollama pull nomic-embed-text

Environment Configuration:

OLLAMA_BASE_URL=http://ollama:11434
OLLAMA_EMBEDDING_MODEL=nomic-embed-text  # 768-dimensional vectors

Model Options:

nomic-embed-text (default): 768-dim, optimized for semantic search
all-minilm: Smaller, faster, slightly less accurate
mxbai-embed-large: Larger, more accurate, slower

Configuration Overview

Key Environment Variables

Enable Semantic Search:

ENABLE_SEMANTIC_SEARCH=true  # Default: false (opt-in)

Qdrant Vector Database:

# In-memory mode (default if ENABLE_SEMANTIC_SEARCH=true)
# QDRANT_LOCATION not set → uses :memory:

# Persistent local mode
QDRANT_LOCATION=/var/lib/qdrant

# Network mode (production)
QDRANT_URL=http://qdrant:6333
QDRANT_API_KEY=secret  # optional

Ollama Embedding Service:

OLLAMA_BASE_URL=http://ollama:11434
OLLAMA_EMBEDDING_MODEL=nomic-embed-text  # Default

Scanner Configuration:

VECTOR_SYNC_INTERVAL=3600  # Scan interval in seconds (default: 1 hour)

Resource Requirements

Qdrant:

Memory: ~100-200 MB base + ~1 KB per vector (1M vectors ≈ 1 GB)
Disk: Persistent mode only, ~200 bytes per vector
CPU: Low (indexing) to moderate (search)

Ollama:

Memory: 2-4 GB for nomic-embed-text model
CPU: High during embedding generation, idle otherwise
GPU: Optional but recommended (10-100x faster)

MCP Server:

Memory: +50-100 MB for background sync workers
CPU: Moderate during scanning/processing, low otherwise

Trade-offs

Consideration	In-Memory Qdrant	Persistent Qdrant	Network Qdrant
Setup complexity	✅ Minimal	✅ Easy	⚠️ Requires separate service
Durability	❌ Lost on restart	✅ Survives restarts	✅ Survives restarts
Scalability	❌ Single instance	❌ Single instance	✅ Horizontal scaling
Performance	✅ Fastest	✅ Fast	⚠️ Network latency

Operational Behavior

What Happens When ENABLE_SEMANTIC_SEARCH=true

Immediate (Server Startup):

MCP server connects to Qdrant (creates collection if needed)
MCP server connects to Ollama (verifies embedding model available)
Background scanner starts (schedules hourly runs)
Document queue and processors initialize

First Scan (Within 1 hour):

Scanner fetches all notes from Nextcloud
Compares with Qdrant (likely empty on first run)
Enqueues all notes for indexing
Processors generate embeddings (may take minutes for large note collections)
Vectors stored in Qdrant with user_id filtering

Hourly Thereafter:

Scanner fetches all notes
Identifies new/modified/deleted notes (ETag comparison)
Enqueues changes only
Incremental updates processed

Performance Expectations

Embedding Generation:

Without GPU: 1-5 notes/second (CPU-bound)
With GPU: 50-200 notes/second (highly parallel)
Initial indexing: 100 notes ≈ 20-100 seconds (CPU), 1-2 seconds (GPU)

Search Query:

Embedding generation: 50-100ms
Vector search: 10-50ms (depends on collection size)
Access verification: 20-100ms per document (Nextcloud API calls)
Total latency: 100-300ms typical

Resource Usage:

Idle: Minimal (background scanner sleeps)
Scanning: Moderate CPU (ETag checks, API calls)
Processing: High CPU/GPU (embedding generation)
Searching: Low to moderate (depends on query frequency)

Background Sync Behavior

Scanner Triggers:

Hourly (configurable via VECTOR_SYNC_INTERVAL)
Manual trigger via nc_trigger_vector_sync (future)

Queue Processing:

Continuous (workers always running)
Batch processing (fetch 10 documents at a time)
Concurrent workers (4 by default)

Error Handling:

Individual document failures logged but don't stop scanning
Retries for transient errors (network timeouts, rate limits)
Failed documents skipped, re-attempted on next scan

What Gets Indexed:

Notes: All notes accessible to the authenticated user
Future: Calendar events, tasks, deck cards, files with text extraction, contacts

Monitoring & Observability

MCP Tools

nc_get_vector_sync_status - Check sync status

{
  "total_documents": 1234,
  "indexed_documents": 1200,
  "pending_documents": 34,
  "sync_enabled": true,
  "last_scan": "2025-01-15T14:30:00Z",
  "status": "syncing"  # idle | syncing | error
}

Interpreting Status:

idle: No pending work, last scan completed successfully
syncing: Currently processing documents
error: Last scan failed (check logs)

Logs to Check

Scanner Logs:

[INFO] Vector sync scanner started (interval: 3600s)
[INFO] Scanning notes: found 150 documents
[INFO] Changes detected: 5 new, 2 modified, 1 deleted
[INFO] Enqueued 7 documents for processing

Processor Logs:

[INFO] Processing document: note_123
[DEBUG] Generated embedding (768 dimensions)
[INFO] Stored vector in Qdrant: note_123

Error Logs:

[ERROR] Failed to generate embedding for note_123: Connection timeout
[WARN] Qdrant connection lost, retrying...
[ERROR] Ollama embedding failed: Model not found

Log Locations:

Docker: docker compose logs mcp
Local: stdout (redirect to file if needed)
Kubernetes: kubectl logs -f deployment/nextcloud-mcp-server

Metrics to Monitor

Indexing Progress:

Total documents vs indexed documents
Pending queue size
Processing rate (docs/second)

Search Performance:

Query latency (p50, p95, p99)
Results per query
Verification overhead (API calls per query)

Resource Usage:

Qdrant memory/disk usage
Ollama CPU/GPU usage
MCP server memory

For detailed observability setup, see docs/observability.md.

Troubleshooting from Architecture Perspective

Documents Not Appearing in Search

Diagnosis Flow:

Check sync status: nc_get_vector_sync_status
- sync_enabled: false → Enable with ENABLE_SEMANTIC_SEARCH=true
- status: error → Check scanner logs for failures
Check queue size:
- pending_documents > 0 → Processing in progress, wait
- pending_documents == 0 but indexed_documents low → Scan hasn't run yet (wait up to 1 hour)
Check Qdrant:
- Connection errors in logs → Verify QDRANT_URL or QDRANT_LOCATION
- Collection empty → First scan hasn't completed
Check Ollama:
- Embedding errors in logs → Verify OLLAMA_BASE_URL
- Model not found → Pull model: ollama pull nomic-embed-text

Common Causes:

Sync disabled (default): Enable ENABLE_SEMANTIC_SEARCH=true
Ollama not running: Start Ollama service
Qdrant not accessible: Check network/URL
First scan in progress: Wait up to 1 hour + processing time

Slow Search Performance

Diagnosis:

Query embedding slow (>500ms):
- Ollama overloaded or CPU-bound
- Solution: Use GPU, upgrade CPU, or reduce concurrent requests
Vector search slow (>200ms):
- Large collection (millions of vectors)
- Solution: Use network Qdrant with SSDs, add indexing
Verification slow (>500ms):
- Many results to verify (10+ documents)
- Nextcloud API slow or overloaded
- Solution: Reduce limit parameter, optimize Nextcloud

Performance Tuning:

Reduce search limit (default: 10 results)
Use network Qdrant for large collections
Enable Ollama GPU acceleration
Check Nextcloud API response times

Background Sync Stopped

Diagnosis:

Check logs for errors:
- Authentication failures (401/403) → Token expired (OAuth) or credentials invalid (BasicAuth)
- Connection timeouts → Network issues with Nextcloud/Qdrant/Ollama
- Rate limiting (429) → Reduce scan frequency
Check nc_get_vector_sync_status:
- status: error → See logs for details
- last_scan timestamp old (>2 hours) → Scanner may have crashed
Verify services:
- Qdrant accessible: curl http://qdrant:6333/
- Ollama accessible: curl http://ollama:11434/api/tags
- Nextcloud accessible: Check API health

OAuth Mode (Future):

Offline access token expired → Re-provision via provision_vector_sync
User deprovisioned access → Sync stops intentionally

Out of Memory

Diagnosis:

Check Qdrant mode:
- In-memory mode with large collection → Switch to persistent or network mode
Check embedding batch size:
- Too many documents processed simultaneously → Reduce worker count
Check Ollama memory:
- Large models loaded → Use smaller embedding model

Solutions:

Use persistent or network Qdrant (frees server memory)
Reduce concurrent processor workers
Use smaller embedding model (all-minilm instead of nomic-embed-text)
Increase server memory allocation

Limitations & Future Work

Current Limitations

Notes App Only
- Architecture supports multiple apps (plugin system ready)
- Only NotesScanner and NotesProcessor implemented
- Future: Calendar, Deck, Files, Contacts
MCP Sampling Support
- nc_semantic_search_answer requires client sampling capability
- Not all MCP clients support sampling yet
- Graceful fallback: Returns documents without generated answer
OAuth Background Sync
- User-controlled background jobs not yet implemented
- Currently works in BasicAuth mode only
- Future: Users opt-in via provision_vector_sync tool
No Incremental Updates
- Document changes trigger full re-embedding
- Cannot update just modified paragraphs
- Future: Paragraph-level chunking and incremental updates
No Query Caching
- Each search generates new query embedding
- Repeated queries re-search Qdrant
- Future: Cache recent query embeddings and results
Single Embedding Model
- Uses one model for all documents and queries
- Cannot customize per app or user
- Future: App-specific or user-selected models

Future Enhancements

Multi-App Support (In Progress):

Scanner plugins for Calendar, Deck, Files, Contacts
Unified vector search across all apps
App-specific metadata in vector payloads

User-Controlled Sync (OAuth Mode):

provision_vector_sync and deprovision_vector_sync tools
Per-user background job scheduling
User dashboard for sync status and controls

Advanced Search Features:

Hybrid search (vector + keyword combined)
Filtering by date range, app type, tags
Aggregations and faceted search
Search result explanations (why this matched)

Performance Optimizations:

Query caching for repeated searches
Incremental document updates (paragraph-level)
Batch query processing
Qdrant HNSW indexing tuning

Embedding Improvements:

Support for OpenAI embeddings (ada-002, text-embedding-3)
Multi-language embedding models
Fine-tuned models for Nextcloud content
Paragraph-level chunking for long documents

References

Architecture Decision Records (ADRs)

ADR-003: Vector Database Semantic Search - Qdrant selection rationale, embedding strategy, hybrid search (superseded by ADR-007 but technical decisions remain valid)
ADR-007: Background Vector Sync Job Management - Current implementation, Scanner-Queue-Processor architecture, plugin system
ADR-008: MCP Sampling for Semantic Search - RAG with MCP sampling, client-server separation, prompt construction
ADR-009: Semantic Search OAuth Scope - OAuth scope model, dual-phase authorization, security rationale

Configuration & Setup

Configuration Guide - Environment variables, Qdrant setup, Ollama setup, detailed configuration options
Installation Guide - Deployment options (Docker, Kubernetes, local)
Running the Server - Starting the server, transport options, testing

Monitoring & Troubleshooting

Observability Guide - Logging, metrics, tracing, debugging
Troubleshooting - General issues and solutions

FilesExpand file tree

semantic-search-architecture.md

Latest commit

History

semantic-search-architecture.md

File metadata and controls

Semantic Search Architecture

Overview

What is Semantic Search?

Why It Matters

Current Support

System Components

How It Works: Background Synchronization

Scanner Behavior

Queue Processing

Vector Storage

Document Chunking Strategy

Collection Naming and Model Switching

How It Works: Semantic Search

Dual-Phase Authorization

Search Flow

Performance

How It Works: RAG with MCP Sampling (Optional)

MCP Sampling Architecture

Authentication & Security

OAuth Scopes

Dual-Phase Authorization Pattern

Offline Access for Background Sync

Deployment Modes

Authentication Modes

Qdrant Deployment Modes

Embedding Service Options

Configuration Overview

Key Environment Variables

Resource Requirements

Trade-offs

Operational Behavior

What Happens When ENABLE_SEMANTIC_SEARCH=true

Performance Expectations

Background Sync Behavior

Monitoring & Observability

MCP Tools

Logs to Check

Metrics to Monitor

Troubleshooting from Architecture Perspective

Documents Not Appearing in Search

Slow Search Performance

Background Sync Stopped

Out of Memory

Limitations & Future Work

Current Limitations

Future Enhancements

References

Architecture Decision Records (ADRs)

Configuration & Setup

Monitoring & Troubleshooting

Related Documentation