This document explains the architecture of the semantic search feature in the Nextcloud MCP Server, including background synchronization, vector search, and optional AI-generated answers via MCP sampling.
Important
Status: Experimental
- Disabled by default (
ENABLE_SEMANTIC_SEARCH=false) - Currently supports Notes, Files (PDFs), News items, and Deck cards
- Requires additional infrastructure (Qdrant vector database + Ollama embedding service)
- RAG answer generation requires MCP client sampling support
Semantic search finds information based on meaning rather than exact keyword matches. It uses vector embeddings to understand that "car" and "automobile" are similar, or that "bread recipe" matches "how to bake bread."
Traditional keyword search:
Query: "machine learning"
Matches: Only notes containing "machine learning" exactly
Misses: Notes with "neural networks", "AI models", "deep learning"
Semantic search:
Query: "machine learning"
Matches: Notes about machine learning, neural networks, AI, deep learning, etc.
Understanding: Semantic similarity via vector embeddings
Semantic search enables:
- Natural language queries - Ask questions in plain language
- Conceptual discovery - Find related content even with different terminology
- Cross-reference insights - Connect ideas across your knowledge base
- AI-powered answers - Generate summaries with citations (optional, requires MCP sampling)
- Supported Apps: Notes, Files (PDFs with text extraction), News items, Deck cards
- Planned Apps: Calendar events, Calendar tasks, Contacts
- Architecture: Multi-app plugin system ready for additional apps
graph TB
subgraph "MCP Client"
Client[Claude Desktop, IDEs, etc.]
end
subgraph "Nextcloud MCP Server"
MCP[MCP Server]
Scanner[Background Scanner<br/>Hourly Change Detection]
Queue[Document Queue]
Processor[Embedding Processors<br/>Concurrent Workers]
end
subgraph "Infrastructure"
Qdrant[(Qdrant<br/>Vector Database)]
Ollama[Ollama<br/>Embedding Service]
NC[Nextcloud<br/>Notes API, CalDAV, etc.]
end
Client <-->|MCP Protocol| MCP
Scanner -->|Fetch Changes| NC
Scanner -->|Enqueue Documents| Queue
Queue -->|Process Batch| Processor
Processor -->|Generate Embeddings| Ollama
Processor -->|Store Vectors| Qdrant
MCP -->|Search Queries| Qdrant
MCP -->|Verify Access| NC
Component Roles:
- MCP Server: Exposes semantic search tools (
nc_semantic_search,nc_semantic_search_answer,nc_get_vector_sync_status) - Background Scanner: Discovers changed documents every hour using ETag-based change detection
- Document Queue: Holds pending documents for embedding generation
- Embedding Processors: Generate vector embeddings via Ollama (concurrent workers)
- Qdrant Vector Database: Stores document vectors with metadata and user_id filtering
- Ollama Embedding Service: Converts text to 768-dimensional vectors (default:
nomic-embed-textmodel) - Nextcloud APIs: Source of truth for documents and access control verification
Background synchronization runs automatically when ENABLE_SEMANTIC_SEARCH=true, discovering changes and indexing documents without user intervention.
sequenceDiagram
participant Timer
participant Scanner
participant NC as Nextcloud API
participant Queue
participant Processor
participant Ollama
participant Qdrant
Timer->>Scanner: Trigger (hourly)
Scanner->>NC: Fetch all notes<br/>(Notes API)
NC-->>Scanner: Notes with ETags
Scanner->>Qdrant: Check indexed documents
Qdrant-->>Scanner: Existing ETags
Scanner->>Scanner: Identify changes<br/>(new/modified/deleted)
Scanner->>Queue: Enqueue changed docs
loop Continuous Processing
Processor->>Queue: Fetch batch
Queue-->>Processor: Documents
Processor->>Ollama: Generate embeddings
Ollama-->>Processor: 768-dim vectors
Processor->>Qdrant: Upsert vectors<br/>(with user_id, doc_type)
end
Hourly Trigger:
- Runs every hour (configurable)
- Fetches all notes from Nextcloud Notes API
- Compares ETags with Qdrant's indexed state
- Enqueues new/modified documents
Change Detection:
- New documents: No entry in Qdrant → enqueue for indexing
- Modified documents: ETag mismatch → enqueue for re-indexing
- Deleted documents: In Qdrant but not in Nextcloud → delete from Qdrant
Multi-App Plugin Architecture:
# Each app implements DocumentScanner interface
class NotesScanner(DocumentScanner):
async def scan(self) -> list[Document]:
# Fetch notes, detect changes, return documentsCurrently only NotesScanner is implemented. Future: CalendarScanner, DeckScanner, FilesScanner, etc.
Document Queue:
- In-memory FIFO queue (not persistent across restarts)
- Holds documents pending embedding generation
- Batch processing for efficiency
Processor Pool:
- Concurrent workers using
anyio.TaskGroup - Process documents in parallel (default: 4 workers)
- Each worker: fetch document → generate embedding → store in Qdrant
Backpressure Handling:
- Queue size limits prevent memory exhaustion
- Slow consumers (Ollama) naturally pace the system
Qdrant Collection Schema:
{
"id": "note_123",
"vector": [768 dimensions],
"payload": {
"user_id": "alice",
"doc_type": "note",
"doc_id": "123",
"title": "Machine Learning Notes",
"content": "Neural networks are...",
"etag": "abc123",
"last_modified": "2025-01-15T10:30:00Z"
}
}
Key Fields:
user_id: Multi-tenancy filtering (each user's vectors isolated)doc_type: App identifier ("note", "event", "card", etc.)etag: Change detection for incremental updateschunk_index: Position of this chunk within the document (0-indexed)total_chunks: Total number of chunks for this documentexcerpt: First 200 characters of chunk (for display)
Documents are chunked before embedding to handle content larger than the embedding model's context window and to improve search precision.
Configuration:
DOCUMENT_CHUNK_SIZE=512 # Words per chunk (default)
DOCUMENT_CHUNK_OVERLAP=50 # Overlapping words between chunks (default)Chunking Process:
- Text combination: Document title + content (e.g.,
"Note Title\n\nNote content...") - Word-based splitting: Simple whitespace tokenization
- Sliding window: Create overlapping chunks
- Individual embedding: Each chunk gets its own vector
- Separate storage: Each chunk stored as distinct point in Qdrant
Example:
Document (1000 words):
→ Chunk 0: words 0-511
→ Chunk 1: words 462-973 (overlaps by 50 words)
→ Chunk 2: words 924-999 (last chunk, partial)
Each chunk stored as separate vector with metadata:
- chunk_index: 0, 1, 2
- total_chunks: 3
- excerpt: First 200 chars of each chunk
Search Behavior:
- Vector search operates on chunks (not whole documents)
- Deduplication collapses multiple matching chunks from same document
- Best match returns highest-scoring chunk's excerpt
- Access verification still performed at document level
Tuning Recommendations:
- Small chunks (256-384 words): More precise, less context, more storage
- Large chunks (768-1024 words): More context, less precise, less storage
- Overlap (10-20% of chunk size): Preserves context across boundaries
- Match to embedding model: Consider model's context window when sizing
Important: Changing chunk size requires re-embedding all documents. Use the collection naming strategy to manage different chunking configurations.
Auto-generated collection names:
- Format:
{deployment-id}-{model-name} - Deployment ID:
OTEL_SERVICE_NAME(if configured) orhostname(fallback) - Model name:
OLLAMA_EMBEDDING_MODEL - Example:
"my-mcp-server-nomic-embed-text","mcp-container-all-minilm"
Why model-based naming:
- Ensures each embedding model gets its own collection
- Prevents dimension mismatches when switching models
- Enables safe model experimentation (new model = new collection)
- Supports multi-server deployments (different deployment IDs)
Switching embedding models:
Collections are mutually exclusive - vectors from one embedding model cannot be used with another. When you change the embedding model:
- New collection is created with the new model's dimensions
- Full re-embedding occurs - scanner processes all documents again
- Old collection remains - can be deleted manually if no longer needed
- Dimension validation - server fails fast if collection dimension doesn't match model
Example workflow:
# Start with nomic-embed-text (768 dimensions)
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
# Collection: "my-server-nomic-embed-text"
# → Scanner indexes 1000 notes → 1000 vectors in collection
# Switch to all-minilm (384 dimensions)
OLLAMA_EMBEDDING_MODEL=all-minilm
# Collection: "my-server-all-minilm"
# → Scanner detects 0 indexed documents → re-embeds 1000 notes
# → Old collection "my-server-nomic-embed-text" still exists in QdrantRe-embedding performance:
- CPU-only: 1-5 notes/second
- With GPU: 50-200 notes/second
- 1000 notes: 3-16 minutes (CPU) or 5-20 seconds (GPU)
Multi-server deployments:
Multiple MCP servers can share one Qdrant instance safely:
# Server 1 (Production)
OTEL_SERVICE_NAME=mcp-prod
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
# → Collection: "mcp-prod-nomic-embed-text"
# Server 2 (Staging with different model)
OTEL_SERVICE_NAME=mcp-staging
OLLAMA_EMBEDDING_MODEL=all-minilm
# → Collection: "mcp-staging-all-minilm"Each deployment gets its own collection - no naming collisions or dimension conflicts.
Semantic search converts user queries into vectors and finds similar documents using cosine similarity.
sequenceDiagram
participant User
participant MCP as MCP Server
participant Ollama
participant Qdrant
participant NC as Nextcloud API
User->>MCP: nc_semantic_search("machine learning")
MCP->>MCP: Check OAuth scope<br/>(semantic.read)
MCP->>Ollama: Generate query embedding
Ollama-->>MCP: Query vector (768-dim)
MCP->>Qdrant: Search similar vectors<br/>(filter: user_id=alice)
Qdrant-->>MCP: Top K results<br/>(with similarity scores)
loop For each result
MCP->>NC: Verify access<br/>(fetch note by ID)
alt Access granted
NC-->>MCP: Note metadata
else Access denied (404/401)
MCP->>MCP: Filter out result
end
end
MCP-->>User: Search results<br/>(with scores, excerpts)
Phase 1: OAuth Scope Check
- Verify user has
semantic.readscope - Rejects unauthorized users immediately
Phase 2: Per-Document Verification
- For each search result, fetch document via app API (Notes, Calendar, etc.)
- If fetch succeeds (200 OK), user has access
- If fetch fails (404 Not Found, 401 Unauthorized), filter out result
- Security: Prevents information leakage from vector search alone
Rationale:
- Vector database doesn't know about sharing, permissions changes, or deleted documents
- App APIs are source of truth for access control
- Verification ensures users only see documents they can access
- Query Embedding: Convert user query to 768-dimensional vector via Ollama
- Vector Search: Find top K similar vectors in Qdrant (cosine similarity)
- User Filtering: Qdrant pre-filters by
user_id(multi-tenancy) - Access Verification: Fetch each document via app API to verify current access
- Result Ranking: Return results sorted by similarity score
- Response: Include document excerpts, metadata, and similarity scores
- Query latency: 50-200ms typical (embedding + vector search + verification)
- Accuracy: Depends on embedding model quality (
nomic-embed-textrecommended) - Scalability: Qdrant handles millions of vectors efficiently
The nc_semantic_search_answer tool generates AI-powered answers with citations using MCP sampling - requesting the MCP client's LLM to generate text.
sequenceDiagram
participant User
participant MCP as MCP Server
participant Client as MCP Client<br/>(Claude Desktop)
participant LLM as Client's LLM<br/>(Claude, GPT, etc.)
User->>MCP: nc_semantic_search_answer("What are my Q1 goals?")
MCP->>MCP: Semantic search<br/>(find relevant notes)
MCP->>MCP: Construct prompt<br/>(query + documents + instructions)
MCP->>Client: Sampling request<br/>(MCP Protocol)
Client->>User: Prompt for approval<br/>(optional, client-controlled)
User-->>Client: Approve
Client->>LLM: Generate answer<br/>(with context)
LLM-->>Client: Answer with citations
Client-->>MCP: Sampling response
MCP-->>User: Generated answer<br/>(with source documents)
Why MCP Sampling?
- No server-side LLM: MCP server has no API keys, doesn't call LLMs directly
- Client controls everything: Which model, who pays, user approval prompts
- Privacy: Documents stay with the client's LLM provider, not a third-party
- Flexibility: Works with any MCP client that supports sampling (Claude Desktop, future clients)
Prompt Construction:
User Query: {query}
Relevant Documents:
1. Document: {title} (Note)
Content: {excerpt}
2. Document: {title} (Note)
Content: {excerpt}
Instructions:
- Provide a comprehensive answer to the user's query
- Use the documents above as context
- Include citations: "According to Document 1 (title)..."
- If documents don't contain enough information, say so
Graceful Fallback:
try:
result = await ctx.session.create_message(...)
return answer_with_citations
except Exception as e:
# Fallback: Return documents without generated answer
return SearchResponse(
generated_answer=f"[Sampling unavailable: {e}]",
sources=search_results
)Client Support:
- Requires: MCP client with sampling capability
- Known support: Claude Desktop (as of Claude 3.5+)
- Graceful degradation: Returns raw documents if sampling unavailable
semantic.read - Search permission
- Allows using
nc_semantic_searchandnc_semantic_search_answertools - Does NOT grant access to documents (verified via app APIs)
- Required for any semantic search operation
semantic.write - Sync control permission
- Allows enabling/disabling background sync (
provision_vector_sync,deprovision_vector_sync) - Controls whether user's documents are indexed
- Currently not implemented in OAuth mode (BasicAuth only)
Phase 1: Scope Check (semantic.read)
- Verifies user authorized to search
- Prevents unauthorized vector database access
Phase 2: Document Verification (app-specific APIs)
- For each search result, fetch via Notes API, CalDAV, etc.
- If user can fetch → include in results
- If user cannot fetch (404/401) → filter out
- Security: Vector search cannot leak documents user shouldn't see
Example Scenario:
- Alice creates note "Secret Project X"
- Background sync indexes note with
user_id=alice - Bob searches for "project"
- Vector search finds "Secret Project X" (vector similarity)
- Qdrant filters by
user_id=bob→ no match (Alice's note excluded) - Even if Bob somehow got the doc_id, Phase 2 verification would fail (404 Not Found)
Why needed:
- Background scanner runs hourly without user interaction
- Requires valid access tokens to fetch documents from Nextcloud APIs
- User's session token expires after hours/days
OAuth Mode (ADR-004 Flow 2):
- User explicitly provisions offline access via
provision_nextcloud_accesstool - Server requests
offline_accessscope → receives refresh token - Refresh token stored securely (database, encrypted)
- Background sync uses refresh tokens to obtain access tokens
BasicAuth Mode:
- Username/password stored in environment variables
- Always available for background operations
- Simpler but less secure (credentials never expire)
| Mode | Security | Offline Access | Background Sync | Best For |
|---|---|---|---|---|
| BasicAuth | Lower (credentials in env) | Always available | ✅ Works immediately | Single-user, development, testing |
| OAuth | Higher (tokens, scopes) | User must provision | Multi-user, production |
BasicAuth:
- Set
NEXTCLOUD_USERNAMEandNEXTCLOUD_PASSWORD - Background sync works immediately when
ENABLE_SEMANTIC_SEARCH=true - Credentials stored in
.envfile (secure server access required)
OAuth:
- Client authenticates with
semantic.readscope - User must explicitly provision offline access (future:
provision_vector_synctool) - Background sync only works for users who provisioned access
- More secure: tokens expire, user controls access
| Mode | Configuration | Persistence | Scalability | Best For |
|---|---|---|---|---|
| In-Memory (default) | QDRANT_LOCATION=:memory: |
❌ Lost on restart | Single instance | Testing, development |
| Persistent Local | QDRANT_LOCATION=/data/qdrant |
✅ Survives restarts | Single instance | Small deployments |
| Network | QDRANT_URL=http://qdrant:6333 |
✅ Dedicated service | ✅ Horizontal scaling | Production |
In-Memory Mode:
ENABLE_SEMANTIC_SEARCH=true
# QDRANT_LOCATION not set → defaults to :memory:- Fastest startup
- No disk I/O
- Warning: All vectors lost when server restarts (must re-index)
Persistent Local Mode:
ENABLE_SEMANTIC_SEARCH=true
QDRANT_LOCATION=/var/lib/qdrant- Vectors survive restarts
- Single server only (no distributed setup)
- Disk I/O for durability
Network Mode (Recommended for Production):
ENABLE_SEMANTIC_SEARCH=true
QDRANT_URL=http://qdrant:6333
QDRANT_API_KEY=secret # optional- Dedicated Qdrant service (Docker, Kubernetes)
- Horizontal scaling (multiple MCP servers → one Qdrant)
- High availability options
| Service | Configuration | Cost | Performance | Best For |
|---|---|---|---|---|
| Ollama (recommended) | OLLAMA_BASE_URL=http://ollama:11434 |
Free (self-hosted) | Fast (local GPU) | Production, development |
| OpenAI (future) | OPENAI_API_KEY=sk-... |
Paid (API) | Fast (cloud) | Cloud deployments |
| Fallback | No config | Free | Slow (random) | Testing only (not production) |
Ollama Setup (Recommended):
# docker-compose.yml
services:
ollama:
image: ollama/ollama
volumes:
- ollama-data:/root/.ollama
ports:
- "11434:11434"
# Pull embedding model
docker compose exec ollama ollama pull nomic-embed-textEnvironment Configuration:
OLLAMA_BASE_URL=http://ollama:11434
OLLAMA_EMBEDDING_MODEL=nomic-embed-text # 768-dimensional vectorsModel Options:
nomic-embed-text(default): 768-dim, optimized for semantic searchall-minilm: Smaller, faster, slightly less accuratemxbai-embed-large: Larger, more accurate, slower
Enable Semantic Search:
ENABLE_SEMANTIC_SEARCH=true # Default: false (opt-in)Qdrant Vector Database:
# In-memory mode (default if ENABLE_SEMANTIC_SEARCH=true)
# QDRANT_LOCATION not set → uses :memory:
# Persistent local mode
QDRANT_LOCATION=/var/lib/qdrant
# Network mode (production)
QDRANT_URL=http://qdrant:6333
QDRANT_API_KEY=secret # optionalOllama Embedding Service:
OLLAMA_BASE_URL=http://ollama:11434
OLLAMA_EMBEDDING_MODEL=nomic-embed-text # DefaultScanner Configuration:
VECTOR_SYNC_INTERVAL=3600 # Scan interval in seconds (default: 1 hour)Qdrant:
- Memory: ~100-200 MB base + ~1 KB per vector (1M vectors ≈ 1 GB)
- Disk: Persistent mode only, ~200 bytes per vector
- CPU: Low (indexing) to moderate (search)
Ollama:
- Memory: 2-4 GB for
nomic-embed-textmodel - CPU: High during embedding generation, idle otherwise
- GPU: Optional but recommended (10-100x faster)
MCP Server:
- Memory: +50-100 MB for background sync workers
- CPU: Moderate during scanning/processing, low otherwise
| Consideration | In-Memory Qdrant | Persistent Qdrant | Network Qdrant |
|---|---|---|---|
| Setup complexity | ✅ Minimal | ✅ Easy | |
| Durability | ❌ Lost on restart | ✅ Survives restarts | ✅ Survives restarts |
| Scalability | ❌ Single instance | ❌ Single instance | ✅ Horizontal scaling |
| Performance | ✅ Fastest | ✅ Fast |
Immediate (Server Startup):
- MCP server connects to Qdrant (creates collection if needed)
- MCP server connects to Ollama (verifies embedding model available)
- Background scanner starts (schedules hourly runs)
- Document queue and processors initialize
First Scan (Within 1 hour):
- Scanner fetches all notes from Nextcloud
- Compares with Qdrant (likely empty on first run)
- Enqueues all notes for indexing
- Processors generate embeddings (may take minutes for large note collections)
- Vectors stored in Qdrant with user_id filtering
Hourly Thereafter:
- Scanner fetches all notes
- Identifies new/modified/deleted notes (ETag comparison)
- Enqueues changes only
- Incremental updates processed
Embedding Generation:
- Without GPU: 1-5 notes/second (CPU-bound)
- With GPU: 50-200 notes/second (highly parallel)
- Initial indexing: 100 notes ≈ 20-100 seconds (CPU), 1-2 seconds (GPU)
Search Query:
- Embedding generation: 50-100ms
- Vector search: 10-50ms (depends on collection size)
- Access verification: 20-100ms per document (Nextcloud API calls)
- Total latency: 100-300ms typical
Resource Usage:
- Idle: Minimal (background scanner sleeps)
- Scanning: Moderate CPU (ETag checks, API calls)
- Processing: High CPU/GPU (embedding generation)
- Searching: Low to moderate (depends on query frequency)
Scanner Triggers:
- Hourly (configurable via
VECTOR_SYNC_INTERVAL) - Manual trigger via
nc_trigger_vector_sync(future)
Queue Processing:
- Continuous (workers always running)
- Batch processing (fetch 10 documents at a time)
- Concurrent workers (4 by default)
Error Handling:
- Individual document failures logged but don't stop scanning
- Retries for transient errors (network timeouts, rate limits)
- Failed documents skipped, re-attempted on next scan
What Gets Indexed:
- Notes: All notes accessible to the authenticated user
- Future: Calendar events, tasks, deck cards, files with text extraction, contacts
nc_get_vector_sync_status - Check sync status
{
"total_documents": 1234,
"indexed_documents": 1200,
"pending_documents": 34,
"sync_enabled": true,
"last_scan": "2025-01-15T14:30:00Z",
"status": "syncing" # idle | syncing | error
}Interpreting Status:
idle: No pending work, last scan completed successfullysyncing: Currently processing documentserror: Last scan failed (check logs)
Scanner Logs:
[INFO] Vector sync scanner started (interval: 3600s)
[INFO] Scanning notes: found 150 documents
[INFO] Changes detected: 5 new, 2 modified, 1 deleted
[INFO] Enqueued 7 documents for processing
Processor Logs:
[INFO] Processing document: note_123
[DEBUG] Generated embedding (768 dimensions)
[INFO] Stored vector in Qdrant: note_123
Error Logs:
[ERROR] Failed to generate embedding for note_123: Connection timeout
[WARN] Qdrant connection lost, retrying...
[ERROR] Ollama embedding failed: Model not found
Log Locations:
- Docker:
docker compose logs mcp - Local: stdout (redirect to file if needed)
- Kubernetes:
kubectl logs -f deployment/nextcloud-mcp-server
Indexing Progress:
- Total documents vs indexed documents
- Pending queue size
- Processing rate (docs/second)
Search Performance:
- Query latency (p50, p95, p99)
- Results per query
- Verification overhead (API calls per query)
Resource Usage:
- Qdrant memory/disk usage
- Ollama CPU/GPU usage
- MCP server memory
For detailed observability setup, see docs/observability.md.
Diagnosis Flow:
- Check sync status:
nc_get_vector_sync_statussync_enabled: false→ Enable withENABLE_SEMANTIC_SEARCH=truestatus: error→ Check scanner logs for failures
- Check queue size:
pending_documents > 0→ Processing in progress, waitpending_documents == 0butindexed_documentslow → Scan hasn't run yet (wait up to 1 hour)
- Check Qdrant:
- Connection errors in logs → Verify
QDRANT_URLorQDRANT_LOCATION - Collection empty → First scan hasn't completed
- Connection errors in logs → Verify
- Check Ollama:
- Embedding errors in logs → Verify
OLLAMA_BASE_URL - Model not found → Pull model:
ollama pull nomic-embed-text
- Embedding errors in logs → Verify
Common Causes:
- Sync disabled (default): Enable
ENABLE_SEMANTIC_SEARCH=true - Ollama not running: Start Ollama service
- Qdrant not accessible: Check network/URL
- First scan in progress: Wait up to 1 hour + processing time
Diagnosis:
- Query embedding slow (>500ms):
- Ollama overloaded or CPU-bound
- Solution: Use GPU, upgrade CPU, or reduce concurrent requests
- Vector search slow (>200ms):
- Large collection (millions of vectors)
- Solution: Use network Qdrant with SSDs, add indexing
- Verification slow (>500ms):
- Many results to verify (10+ documents)
- Nextcloud API slow or overloaded
- Solution: Reduce
limitparameter, optimize Nextcloud
Performance Tuning:
- Reduce search
limit(default: 10 results) - Use network Qdrant for large collections
- Enable Ollama GPU acceleration
- Check Nextcloud API response times
Diagnosis:
- Check logs for errors:
- Authentication failures (401/403) → Token expired (OAuth) or credentials invalid (BasicAuth)
- Connection timeouts → Network issues with Nextcloud/Qdrant/Ollama
- Rate limiting (429) → Reduce scan frequency
- Check
nc_get_vector_sync_status:status: error→ See logs for detailslast_scantimestamp old (>2 hours) → Scanner may have crashed
- Verify services:
- Qdrant accessible:
curl http://qdrant:6333/ - Ollama accessible:
curl http://ollama:11434/api/tags - Nextcloud accessible: Check API health
- Qdrant accessible:
OAuth Mode (Future):
- Offline access token expired → Re-provision via
provision_vector_sync - User deprovisioned access → Sync stops intentionally
Diagnosis:
- Check Qdrant mode:
- In-memory mode with large collection → Switch to persistent or network mode
- Check embedding batch size:
- Too many documents processed simultaneously → Reduce worker count
- Check Ollama memory:
- Large models loaded → Use smaller embedding model
Solutions:
- Use persistent or network Qdrant (frees server memory)
- Reduce concurrent processor workers
- Use smaller embedding model (
all-minilminstead ofnomic-embed-text) - Increase server memory allocation
-
Notes App Only
- Architecture supports multiple apps (plugin system ready)
- Only
NotesScannerandNotesProcessorimplemented - Future: Calendar, Deck, Files, Contacts
-
MCP Sampling Support
nc_semantic_search_answerrequires client sampling capability- Not all MCP clients support sampling yet
- Graceful fallback: Returns documents without generated answer
-
OAuth Background Sync
- User-controlled background jobs not yet implemented
- Currently works in BasicAuth mode only
- Future: Users opt-in via
provision_vector_synctool
-
No Incremental Updates
- Document changes trigger full re-embedding
- Cannot update just modified paragraphs
- Future: Paragraph-level chunking and incremental updates
-
No Query Caching
- Each search generates new query embedding
- Repeated queries re-search Qdrant
- Future: Cache recent query embeddings and results
-
Single Embedding Model
- Uses one model for all documents and queries
- Cannot customize per app or user
- Future: App-specific or user-selected models
Multi-App Support (In Progress):
- Scanner plugins for Calendar, Deck, Files, Contacts
- Unified vector search across all apps
- App-specific metadata in vector payloads
User-Controlled Sync (OAuth Mode):
provision_vector_syncanddeprovision_vector_synctools- Per-user background job scheduling
- User dashboard for sync status and controls
Advanced Search Features:
- Hybrid search (vector + keyword combined)
- Filtering by date range, app type, tags
- Aggregations and faceted search
- Search result explanations (why this matched)
Performance Optimizations:
- Query caching for repeated searches
- Incremental document updates (paragraph-level)
- Batch query processing
- Qdrant HNSW indexing tuning
Embedding Improvements:
- Support for OpenAI embeddings (ada-002, text-embedding-3)
- Multi-language embedding models
- Fine-tuned models for Nextcloud content
- Paragraph-level chunking for long documents
- ADR-003: Vector Database Semantic Search - Qdrant selection rationale, embedding strategy, hybrid search (superseded by ADR-007 but technical decisions remain valid)
- ADR-007: Background Vector Sync Job Management - Current implementation, Scanner-Queue-Processor architecture, plugin system
- ADR-008: MCP Sampling for Semantic Search - RAG with MCP sampling, client-server separation, prompt construction
- ADR-009: Semantic Search OAuth Scope - OAuth scope model, dual-phase authorization, security rationale
- Configuration Guide - Environment variables, Qdrant setup, Ollama setup, detailed configuration options
- Installation Guide - Deployment options (Docker, Kubernetes, local)
- Running the Server - Starting the server, transport options, testing
- Observability Guide - Logging, metrics, tracing, debugging
- Troubleshooting - General issues and solutions
- Login Flow v2 - OAuth flows, scopes, token management
- Comparison with Context Agent - When to use Nextcloud MCP Server vs Context Agent
Questions or Issues?