Skip to content

Latest commit

 

History

History
1141 lines (928 loc) · 34.4 KB

File metadata and controls

1141 lines (928 loc) · 34.4 KB

mcp-websearch: Rust MCP Server - Implementation Plan

Context

Building a state-of-the-art Rust-based MCP server for web searches with enterprise-grade architecture, multiple provider support, semantic caching with RAG integration, and AI-first observability. This server enables AI assistants (Claude, Cursor, etc.) to perform intelligent web searches with automatic provider fallback, persistent semantic caching, and self-diagnosing health monitoring.


Architecture Decisions

Decision Choice Rationale
Use Case AI Assistant Integration MCP-native, shapes API for AI consumption
Providers DuckDuckGo, Brave, Synthetic Search Free tiers + premium fallback path
Distribution Weighted + Fallback Smart load distribution with resilience
Auth MCP Protocol Config Native MCP integration, simple
Response Format Normalized Schema Consistent for AI clients
Caching Persistent + RAG + Migrations SQLite default, external DB optional
Embeddings Ollama default, configurable providers Local-first with cloud fallback
Error Handling Circuit breaker + retry + cache fallback Enterprise resilience
Observability MCP tools (self-diagnosing) AI clients can query health
Runtime Tokio Industry standard, rich ecosystem
Architecture Hexagonal Clean separation, testable
Rust Version MSRV 1.80 LazyLock, latest stable
Testing Integration-focused with fixtures Tests against real behavior

Project Structure (Hexagonal Architecture)

mcp-websearch/
├── Cargo.toml
├── Cargo.lock
├── .cargo/
│   └── config.toml              # Build configurations
├── src/
│   ├── main.rs                  # Composition root (~100 lines)
│   ├── lib.rs                   # Public API exports (~50 lines)
│   │
│   ├── domain/                  # Core business logic - NO external deps
│   │   ├── mod.rs
│   │   ├── models.rs            # SearchQuery, SearchResult, CacheEntry (~300 lines)
│   │   ├── search_service.rs    # Core search orchestration (~400 lines)
│   │   ├── provider.rs          # SearchProvider trait (~150 lines)
│   │   ├── cache.rs             # Cache trait definitions (~200 lines)
│   │   ├── embeddings.rs        # EmbeddingProvider trait (~150 lines)
│   │   └── circuit_breaker.rs   # CircuitBreaker trait + state machine (~250 lines)
│   │
│   ├── application/             # Use case orchestration
│   │   ├── mod.rs
│   │   ├── use_cases/
│   │   │   ├── mod.rs
│   │   │   ├── search.rs        # Search use case (~350 lines)
│   │   │   ├── health.rs        # Health/observability use case (~200 lines)
│   │   │   ├── cache.rs         # Cache management use case (~250 lines)
│   │   │   └── config.rs        # Configuration use case (~200 lines)
│   │   └── dto.rs               # Data transfer objects (~150 lines)
│   │
│   ├── ports/                   # Port definitions (interfaces)
│   │   ├── mod.rs
│   │   ├── search_provider.rs   # SearchProvider port (~100 lines)
│   │   ├── cache_backend.rs     # CacheBackend port (~120 lines)
│   │   ├── embedding.rs         # EmbeddingProvider port (~100 lines)
│   │   └── mcp_transport.rs     # MCP transport abstraction (~150 lines)
│   │
│   ├── adapters/                # Outer hexagon - external integrations
│   │   ├── mod.rs
│   │   │
│   │   ├── mcp/                 # MCP protocol adapter
│   │   │   ├── mod.rs
│   │   │   ├── stdio.rs         # Stdio transport (~200 lines)
│   │   │   └── tools/
│   │   │       ├── mod.rs
│   │   │       ├── search.rs    # web_search tool (~300 lines)
│   │   │       ├── search_similar.rs # search_similar tool (RAG) (~250 lines)
│   │   │       ├── search_news.rs    # search_news tool (~250 lines)
│   │   │       ├── search_cached.rs  # search_cached tool (~200 lines)
│   │   │       ├── cache_stats.rs    # get_cache_stats tool (~150 lines)
│   │   │       ├── cache_clear.rs    # cache_clear tool (~150 lines)
│   │   │       ├── cache_migrate.rs  # cache_migrate tool (~200 lines)
│   │   │       ├── provider_status.rs # get_provider_status tool (~150 lines)
│   │   │       └── server_stats.rs   # get_server_stats tool (~200 lines)
│   │   │
│   │   ├── providers/           # Search provider adapters
│   │   │   ├── mod.rs
│   │   │   ├── duckduckgo.rs    # DuckDuckGo adapter (~350 lines)
│   │   │   ├── brave.rs         # Brave Search adapter (~350 lines)
│   │   │   ├── synthetic.rs     # Synthetic Search adapter (~400 lines)
│   │   │   ├── provider_pool.rs # Weighted provider pool (~300 lines)
│   │   │   └── normalized_result.rs # Result normalization (~250 lines)
│   │   │
│   │   ├── cache/               # Cache backend adapters
│   │   │   ├── mod.rs
│   │   │   ├── sqlite_backend.rs    # SQLite implementation (~450 lines)
│   │   │   ├── postgres_backend.rs  # PostgreSQL implementation (~450 lines)
│   │   │   ├── semantic_index.rs    # HNSW semantic index (~400 lines)
│   │   │   ├── cache_entry.rs       # Cache entry model (~150 lines)
│   │   │   └── migrations.rs        # Schema migrations (~300 lines)
│   │   │
│   │   ├── embeddings/          # Embedding provider adapters
│   │   │   ├── mod.rs
│   │   │   ├── ollama.rs        # Ollama embeddings (~300 lines)
│   │   │   ├── openai.rs        # OpenAI embeddings (~250 lines)
│   │   │   └── model_registry.rs # Available models (~150 lines)
│   │   │
│   │   └── circuit_breaker/     # Circuit breaker implementations
│   │       ├── mod.rs
│   │       └── in_memory.rs     # Lock-free circuit breaker (~300 lines)
│   │
│   └── infrastructure/          # Cross-cutting concerns
│       ├── mod.rs
│       ├── config.rs            # Configuration loading (~350 lines)
│       ├── error.rs             # Error types hierarchy (~300 lines)
│       ├── telemetry.rs         # Logging/tracing setup (~200 lines)
│       └── http_client.rs       # Shared HTTP client factory (~150 lines)
│
├── migrations/                  # Database migrations
│   ├── V1__initial_schema.sql
│   ├── V2__add_semantic_index.sql
│   └── V3__add_embedding_metadata.sql
│
├── tests/                       # Integration tests
│   ├── fixtures/
│   │   ├── duckduckgo/
│   │   ├── brave/
│   │   └── synthetic/
│   ├── integration/
│   │   ├── providers_test.rs
│   │   ├── cache_test.rs
│   │   └── embeddings_test.rs
│   └── e2e/
│       └── mcp_protocol_test.rs
│
├── config/
│   └── default.toml             # Default configuration
│
└── README.md

Domain Models

Core Types (domain/models.rs)

/// Normalized search result - consistent across all providers
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SearchResult {
    pub title: String,
    pub url: String,
    pub snippet: String,
    pub source_provider: ProviderName,
    pub published_date: Option<DateTime<Utc>>,
    pub relevance_score: Option<f32>,
}

/// Provider identification
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
pub enum ProviderName {
    DuckDuckGo,
    Brave,
    Synthetic,
    Cache, // For cached results
}

/// Search query with options
#[derive(Debug, Clone, Deserialize)]
pub struct SearchQuery {
    pub query: String,
    pub max_results: Option<u8>,
    pub providers: Option<Vec<ProviderName>>,
    pub use_cache: Option<bool>,
    pub freshness: Option<Freshness>,
}

pub enum Freshness {
    Day,
    Week,
    Month,
    Year,
}

/// Cache entry with semantic indexing
#[derive(Debug, Clone)]
pub struct CacheEntry {
    pub id: i64,
    pub query_hash: [u8; 32],       // Blake3 hash
    pub query_text: String,
    pub embedding: Vec<f32>,         // Quantized to f16 for storage
    pub results: Vec<SearchResult>,
    pub provider_used: ProviderName,
    pub created_at: DateTime<Utc>,
    pub last_accessed: DateTime<Utc>,
    pub access_count: u32,
    pub embedding_model: String,     // Track for dimension changes
    pub embedding_dimension: u16,
}

Port Definitions

SearchProvider Port (ports/search_provider.rs)

#[async_trait]
pub trait SearchProvider: Send + Sync {
    /// Provider name for logging/routing
    fn name(&self) -> ProviderName;

    /// Weight for weighted selection (higher = more traffic)
    fn weight(&self) -> u32;

    /// Execute search
    async fn search(&self, query: &SearchQuery) -> Result<Vec<SearchResult>, ProviderError>;

    /// Health check for circuit breaker
    async fn health_check(&self) -> ProviderHealth;

    /// Provider capabilities for routing decisions
    fn capabilities(&self) -> ProviderCapabilities;
}

pub struct ProviderCapabilities {
    pub max_concurrent_requests: u32,
    pub typical_latency_ms: u32,
    pub supports_pagination: bool,
    pub supports_freshness: bool,
    pub rate_limit_per_minute: u32,
}

pub struct ProviderHealth {
    pub is_healthy: bool,
    pub latency_ms: Option<u32>,
    pub error_rate: f32,
    pub last_success: Option<Instant>,
}

CacheBackend Port (ports/cache_backend.rs)

#[async_trait]
pub trait CacheBackend: Send + Sync {
    /// Exact key lookup
    async fn get_exact(&self, query_hash: &[u8; 32]) -> Result<Option<CacheEntry>, CacheError>;

    /// Semantic similarity search
    async fn get_semantic(
        &self,
        embedding: &[f32],
        threshold: f32,
        limit: usize,
    ) -> Result<Vec<SemanticMatch>, CacheError>;

    /// Store entry
    async fn store(&self, entry: CacheEntry) -> Result<(), CacheError>;

    /// Invalidate by query or age
    async fn invalidate(&self, query_hash: Option<&[u8; 32]>, older_than: Option<Duration>) -> Result<u64, CacheError>;

    /// Get statistics
    async fn stats(&self) -> Result<CacheStats, CacheError>;

    /// Run migrations
    async fn migrate(&self, target_version: u32) -> Result<(), CacheError>;
}

pub struct SemanticMatch {
    pub entry: CacheEntry,
    pub similarity: f32,
}

EmbeddingProvider Port (ports/embedding.rs)

#[async_trait]
pub trait EmbeddingProvider: Send + Sync {
    /// Generate embedding for text
    async fn embed(&self, text: &str) -> Result<Vec<f32>, EmbeddingError>;

    /// Model identifier
    fn model_name(&self) -> &str;

    /// Embedding dimension
    fn dimension(&self) -> u16;

    /// Health check
    async fn health_check(&self) -> bool;
}

Circuit Breaker Implementation

Lock-Free Circuit Breaker (adapters/circuit_breaker/in_memory.rs)

use std::sync::atomic::{AtomicU8, AtomicU32, AtomicU64, Ordering};

pub enum CircuitState {
    Closed = 0,   // Normal operation
    Open = 1,     // Failing fast
    HalfOpen = 2, // Testing recovery
}

pub struct CircuitBreaker {
    state: AtomicU8,
    failure_count: AtomicU32,
    success_count: AtomicU32,
    last_failure: AtomicU64,
    config: CircuitConfig,
}

pub struct CircuitConfig {
    pub failure_threshold: u32,      // Open after N failures
    pub success_threshold: u32,      // Close after N successes in half-open
    pub timeout_ms: u64,             // Time before half-open attempt
    pub base_delay_ms: u64,          // Base retry delay
    pub max_delay_ms: u64,           // Max retry delay
    pub backoff_multiplier: f32,     // Exponential backoff factor
}

impl CircuitBreaker {
    /// Two-tier: returns result or falls back to next provider
    pub async fn call_with_fallback<F, Fut, T>(
        &self,
        f: F,
        fallback: Option<Arc<dyn SearchProvider>>,
    ) -> Result<T, CircuitError>
    where
        F: FnOnce() -> Fut,
        Fut: std::future::Future<Output = Result<T, ProviderError>>,
    {
        if !self.allow_request() {
            if let Some(fb) = fallback {
                return fb.search(query).await;
            }
            return Err(CircuitError::Open);
        }

        // Execute with retry and jitter
        let mut delay = self.config.base_delay_ms;
        for attempt in 0..self.config.max_retries {
            match f().await {
                Ok(v) => {
                    self.record_success();
                    return Ok(v);
                }
                Err(e) if attempt < self.config.max_retries - 1 => {
                    let jitter = rand::random::<f32>() * 0.3;
                    tokio::time::sleep(Duration::from_millis(
                        (delay as f32 * (1.0 + jitter)) as u64
                    )).await;
                    delay = (delay as f32 * self.config.backoff_multiplier) as u64;
                    delay = delay.min(self.config.max_delay_ms);
                }
                Err(e) => {
                    self.record_failure();
                    return Err(e.into());
                }
            }
        }
        unreachable!()
    }
}

MCP Tools

Tool Definitions (10 tools total)

Tool Purpose File
web_search Primary search with provider fallback adapters/mcp/tools/search.rs
search_similar RAG-style semantic cache lookup adapters/mcp/tools/search_similar.rs
search_news News-focused search with freshness adapters/mcp/tools/search_news.rs
search_cached Cache-only search (no external calls) adapters/mcp/tools/search_cached.rs
get_cache_stats Cache hit rate, size, health adapters/mcp/tools/cache_stats.rs
cache_clear Clear cache entries adapters/mcp/tools/cache_clear.rs
cache_migrate Run schema migrations adapters/mcp/tools/cache_migrate.rs
get_provider_status Provider health + circuit breaker state adapters/mcp/tools/provider_status.rs
get_server_stats Server metrics (latency, throughput) adapters/mcp/tools/server_stats.rs

web_search Tool Schema

{
  "name": "web_search",
  "description": "Search the web using multiple providers with automatic fallback and semantic caching",
  "inputSchema": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "Search query"
      },
      "max_results": {
        "type": "integer",
        "default": 10,
        "minimum": 1,
        "maximum": 50
      },
      "providers": {
        "type": "array",
        "items": { "enum": ["duckduckgo", "brave", "synthetic"] },
        "description": "Specific providers to use (default: all available)"
      },
      "use_cache": {
        "type": "boolean",
        "default": true,
        "description": "Use semantic cache for faster results"
      },
      "freshness": {
        "type": "string",
        "enum": ["day", "week", "month", "year"],
        "description": "Result freshness filter"
      }
    },
    "required": ["query"]
  }
}

Semantic Cache Architecture

Multi-Tier Cache (adapters/cache/semantic_index.rs)

┌─────────────────────────────────────────┐
│  L1: Hot (in-memory HNSW, full precision)│ 1000 entries × 6KB = 6MB
│  TTL: 5 minutes                         │
├─────────────────────────────────────────┤
│  L2: Warm (in-memory HNSW, quantized)   │ 10000 entries × 192B = 2MB
│  TTL: 30 minutes                        │
├─────────────────────────────────────────┤  │  L3: Cold (SQLite, compressed)          │ Unlimited
│  TTL: 7 days                            │
└─────────────────────────────────────────┘

Embedding Quantization

/// Quantize f32 embedding to binary (96% size reduction)
pub fn quantize_to_binary(embedding: &[f32]) -> Vec<u8> {
    embedding.chunks(8)
        .map(|chunk| {
            chunk.iter().enumerate()
                .filter(|(_, v)| **v > 0.0)
                .map(|(i, _)| 1 << i)
                .fold(0u8, |acc, bit| acc | bit)
        })
        .collect()
}

/// Hamming distance for binary embeddings (fast XOR + popcount)
pub fn hamming_distance(a: &[u8], b: &[u8]) -> u32 {
    a.iter().zip(b.iter())
        .map(|(x, y)| (x ^ y).count_ones())
        .sum()
}

Database Schema

V1__initial_schema.sql

PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL;
PRAGMA cache_size = -64000;  -- 64MB cache
PRAGMA mmap_size = 268435456;  -- 256MB mmap

CREATE TABLE IF NOT EXISTS cache_entries (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    query_hash BLOB NOT NULL UNIQUE,      -- 32-byte Blake3 hash
    query_text TEXT NOT NULL,             -- For debugging/FTS
    embedding BLOB,                       -- Quantized f16/u8
    results_json BLOB NOT NULL,           -- zstd compressed
    provider_used TEXT NOT NULL,          -- "duckduckgo", "brave", "synthetic"
    created_at INTEGER NOT NULL,          -- Unix timestamp ms
    last_accessed INTEGER NOT NULL,
    access_count INTEGER DEFAULT 0,
    embedding_model TEXT NOT NULL,        -- "nomic-embed-text"
    embedding_dimension INTEGER NOT NULL  -- 768, 1536, etc.
);

-- Fast hash lookup
CREATE INDEX idx_query_hash ON cache_entries(query_hash);

-- Time-based cleanup
CREATE INDEX idx_created_at ON cache_entries(created_at);

-- Full-text search for pre-filtering
CREATE VIRTUAL TABLE cache_fts USING fts5(query_text, content='cache_entries');

V2__add_semantic_index.sql

-- HNSW index metadata for tracking loaded entries
CREATE TABLE semantic_index_state (
    id INTEGER PRIMARY KEY,
    entry_id INTEGER NOT NULL,
    hnsw_node_id INTEGER NOT NULL,
    tier TEXT NOT NULL,  -- 'hot' or 'warm'
    FOREIGN KEY (entry_id) REFERENCES cache_entries(id) ON DELETE CASCADE
);

CREATE INDEX idx_semantic_tier ON semantic_index_state(tier);

Configuration Schema

MCP Client Configuration (config.json)

{
  "mcpServers": {
    "websearch": {
      "command": "/path/to/mcp-websearch",
      "args": [],
      "env": {
        "RUST_LOG": "info"
      }
    }
  }
}

Server Configuration (internal)

#[derive(Debug, Deserialize)]
pub struct Config {
    pub providers: ProvidersConfig,
    pub cache: CacheConfig,
    pub embeddings: EmbeddingsConfig,
    pub circuit_breaker: CircuitBreakerConfig,
}

#[derive(Debug, Deserialize)]
pub struct ProvidersConfig {
    pub duckduckgo: Option<DuckDuckGoConfig>,
    pub brave: Option<BraveConfig>,
    pub synthetic: Option<SyntheticConfig>,
    pub weights: HashMap<String, u32>,
    pub fallback_order: Vec<String>,
}

#[derive(Debug, Deserialize)]
pub struct BraveConfig {
    pub api_key: SecretString,
    pub rate_limit_per_minute: u32,
}

#[derive(Debug, Deserialize)]
pub struct SyntheticConfig {
    pub api_key: SecretString,
    pub endpoint: String,
}

#[derive(Debug, Deserialize)]
pub struct CacheConfig {
    pub backend: CacheBackend,
    pub path: PathBuf,
    pub max_size_mb: u64,
    pub default_ttl_hours: u64,
    pub semantic_threshold: f32,
}

#[derive(Debug, Deserialize)]
pub enum CacheBackend {
    Sqlite,
    Postgres { connection_string: SecretString },
}

#[derive(Debug, Deserialize)]
pub struct EmbeddingsConfig {
    pub provider: EmbeddingProvider,
    pub model: String,
    pub dimension: u16,
    pub budget_limit_usd: Option<f32>,
}

#[derive(Debug, Deserialize)]
pub enum EmbeddingProvider {
    Ollama { endpoint: String },
    OpenAI { api_key: SecretString },
}

Error Handling

Error Hierarchy (infrastructure/error.rs)

#[derive(Error, Debug)]
pub enum SearchError {
    #[error("Provider error: {provider} - {message}")]
    Provider {
        provider: ProviderName,
        message: String,
        #[source]
        source: ProviderError,
    },

    #[error("All providers failed")]
    AllProvidersFailed {
        errors: Vec<(ProviderName, ProviderError)>,
    },

    #[error("Circuit breaker open for: {0}")]
    CircuitBreakerOpen(ProviderName),

    #[error("Cache error: {0}")]
    Cache(#[from] CacheError),

    #[error("Embedding generation failed: {0}")]
    Embedding(#[from] EmbeddingError),

    #[error("Query validation failed: {reasons:?}")]
    Validation { reasons: Vec<String> },
}

// Sanitized for MCP responses
impl From<SearchError> for McpError {
    fn from(err: SearchError) -> Self {
        match err {
            SearchError::AllProvidersFailed { .. } => {
                McpError::internal("All search providers are currently unavailable. Try again later.")
            }
            SearchError::CircuitBreakerOpen(provider) => {
                McpError::unavailable(&format!("Provider {} is temporarily unavailable", provider))
            }
            _ => McpError::internal("Search failed. Please try again."),
        }
    }
}

Security Measures

Critical Security Controls

Area Control Implementation
API Keys Memory protection secrecy crate, zero-on-drop
API Keys File permissions Enforce 0o600 on config files
SSRF URL allowlisting Block private IP ranges, disable redirects
Prompt Injection Input validation Unicode NFC normalization, pattern detection
Denial-of-Wallet Budget controls Circuit breakers, mandatory caps for paid APIs
Cache Poisoning Integrity Blocklist validation, content hashing
Supply Chain Audit cargo audit in CI, pin critical versions

Input Validation

pub fn validate_query(query: &str) -> Result<(), ValidationError> {
    // 1. Length check
    if query.len() > 1000 {
        return Err(ValidationError::TooLong);
    }

    // 2. Unicode normalization (prevent homograph attacks)
    let normalized = query.nfc().collect::<String>();

    // 3. Control character filtering
    if normalized.chars().any(|c| c.is_control()) {
        return Err(ValidationError::InvalidCharacters);
    }

    // 4. Prompt injection patterns (basic)
    let injection_patterns = [
        "ignore previous",
        "ignore all previous",
        "disregard",
        "system:",
        "[INST]",
    ];
    let lower = normalized.to_lowercase();
    for pattern in injection_patterns {
        if lower.contains(pattern) {
            return Err(ValidationError::SuspiciousPattern);
        }
    }

    Ok(())
}

Performance Targets

Response Time Budget (Target: <500ms p99)

Operation Budget Implementation Target
Cache lookup (HNSW hit) 1ms 0.5ms
Cache lookup (DB miss) 50ms 10-30ms
Embedding generation 100ms 50-200ms
Provider race timeout 250ms 250ms
Result normalization 5ms 1ms
Total 406ms 311-481ms

Resource Budgets

Resource Limit
RAM (resident) 256MB
RAM (with SQLite mmap) 512MB
Disk I/O <10MB/s sustained
File descriptors <200
CPU cores 2+

Testing Strategy

Integration Tests with Fixtures

tests/
├── fixtures/
│   ├── duckduckgo/
│   │   ├── rust_programming.json
│   │   └── error_rate_limit.json
│   ├── brave/
│   │   └── rust_programming.json
│   └── synthetic/
│       └── rust_programming.json
├── integration/
│   ├── providers_test.rs      # Real API calls with recorded responses
│   ├── cache_test.rs          # SQLite with testcontainers
│   └── embeddings_test.rs     # Ollama integration
└── e2e/
    └── mcp_protocol_test.rs   # Full MCP tool flow

Test Command

# Run all tests
cargo test

# Run with recording (saves fixtures)
RECORD_FIXTURES=1 cargo test --features recording

# Run integration tests only
cargo test --test integration

Dependencies

Core Dependencies (Cargo.toml)

[dependencies]
# Async runtime
tokio = { version = "1", features = ["full"] }

# MCP protocol
rmcp = "0.1"

# HTTP client (with rustls, NOT OpenSSL)
reqwest = { version = "0.12", features = ["rustls-tls", "json"], default-features = false }

# Serialization
serde = { version = "1", features = ["derive"] }
serde_json = "1"

# Database
sqlx = { version = "0.7", features = ["runtime-tokio", "sqlite", "postgres"] }

# Embeddings
async-openai = "0.20"  # Ollama-compatible

# Cryptography
blake3 = "1"

# Circuit breaker
tokio-circuit-breaker = "0.1"

# Secrets handling
secrecy = "0.8"

# Error handling
thiserror = "1"
anyhow = "1"

# Logging
tracing = "0.1"
tracing-subscriber = "0.3"

# Date/time
chrono = { version = "0.4", features = ["serde"] }

# HNSW for semantic search
instant-distance = "0.6"

[dev-dependencies]
tokio-test = "0.4"
testcontainers = "0.15"
wiremock = "0.5"

[build-dependencies]
# Minimal build.rs, prefer const evaluation

Implementation Phases

Phase 1: Foundation (~1-2 days)

  • Project setup (Cargo.toml, structure)
  • Domain models (domain/models.rs)
  • Port traits (ports/)
  • Error types (infrastructure/error.rs)
  • Configuration infrastructure

Phase 2: Core Adapters (~2-3 days)

  • DuckDuckGo provider (adapters/providers/duckduckgo.rs)
  • Brave provider (adapters/providers/brave.rs)
  • Synthetic provider (adapters/providers/synthetic.rs)
  • Result normalization (adapters/providers/normalized_result.rs)
  • Provider pool with weighting (adapters/providers/provider_pool.rs)

Phase 3: Circuit Breaker + Cache (~2-3 days)

  • Lock-free circuit breaker (adapters/circuit_breaker/in_memory.rs)
  • SQLite cache backend (adapters/cache/sqlite_backend.rs)
  • HNSW semantic index (adapters/cache/semantic_index.rs)
  • Cache migrations (adapters/cache/migrations.rs)

Phase 4: Embeddings (~1-2 days)

  • Ollama embedding provider (adapters/embeddings/ollama.rs)
  • OpenAI embedding provider (adapters/embeddings/openai.rs)
  • Embedding quantization utilities

Phase 5: MCP Integration (~2 days)

  • Stdio transport (adapters/mcp/stdio.rs)
  • web_search tool
  • search_similar tool (RAG)
  • search_news tool
  • search_cached tool
  • Admin tools (cache_clear, cache_migrate, etc.)
  • Observability tools (get_provider_status, get_server_stats)

Phase 6: Testing + Polish (~1-2 days)

  • Integration test fixtures
  • E2E MCP protocol tests
  • Documentation
  • README.md with usage examples

Verification Steps

1. Build and Test

cargo build --release
cargo test
cargo clippy -- -D warnings

2. Run MCP Server

RUST_LOG=debug ./target/release/mcp-websearch

3. Test with MCP Client

Add to Claude Desktop config.json:

{
  "mcpServers": {
    "websearch": {
      "command": "/path/to/mcp-websearch"
    }
  }
}

4. Verify Tools

In Claude:

  • "Search for Rust async programming" → Should use web_search tool
  • "Find similar results to previous search" → Should use search_similar tool
  • "What is the cache status?" → Should use get_cache_stats tool
  • "Check provider health" → Should use get_provider_status tool

5. Performance Validation

# Cache hit should be <10ms
# Cache miss should be <500ms
# Provider fallback should work (disable one provider, search should succeed)

Critical Files to Modify/Create

  1. New files (all) - Greenfield project
  2. Key implementation files:
    • src/domain/models.rs - Core types
    • src/ports/search_provider.rs - Provider abstraction
    • src/adapters/providers/*.rs - Provider implementations
    • src/adapters/cache/sqlite_backend.rs - Cache implementation
    • src/adapters/mcp/tools/*.rs - MCP tool handlers
    • migrations/V1__initial_schema.sql - Database schema

Review Feedback Incorporated

Architecture Review

  • ✅ Provider capabilities exposed via trait for routing decisions
  • ✅ Two-tier circuit breaker with fallback chain
  • ✅ CacheBackend trait supports exact and semantic lookup
  • ✅ MCP adapter is thin, delegates to application layer

Security Review

  • ✅ SSRF protection via URL allowlisting
  • ✅ Prompt injection detection in input validation
  • ✅ Budget controls for paid embedding APIs
  • secrecy crate for API key memory protection
  • ✅ File permission enforcement (0o600)
  • rustls instead of OpenSSL

Performance Review

  • ✅ Multi-tier cache (L1 HNSW, L2 quantized, L3 SQLite)
  • ✅ Embedding quantization (96% memory reduction)
  • ✅ Lock-free circuit breaker
  • ✅ Shared HTTP client with connection pooling
  • ✅ Provider racing with timeout (not waiting for all)
  • ✅ WAL mode for SQLite concurrency

Agent Guardrails (MANDATORY)

Cloned from agent-guardrails-template

The Four Laws of Agent Safety

  1. Read Before Editing - Never modify code without reading first
  2. Stay in Scope - Only touch authorized files
  3. Verify Before Committing - Test all changes
  4. Halt When Uncertain - Ask instead of guessing

Pre-Execution Checklist

Before ANY file modification, verify:

# Check Requirement
1 READ FIRST NEVER edit a file without reading it first
2 SCOPE LOCK Only modify files explicitly in scope
3 NO FEATURE CREEP Do NOT add features, refactor, or "improve" unrelated code
4 SUB-500 LINES No file should exceed 500 lines
5 TEST BEFORE COMMIT All tests must pass before committing
6 CHECK FAILURE REGISTRY Review known bugs for affected files
7 VERIFY FIXES INTACT Confirm previous fixes not being undone

Git Safety Rules

Rule Description
NO FORCE PUSH Never use git push --force
NO AMEND Do not amend commits you didn't create this session
NO CONFIG CHANGES Do not modify git config
NO PUSH WITHOUT PERMISSION Only push if user explicitly requests
NO SKIP HOOKS Never use --no-verify
NO REBASE Never rebase shared branches

Code Safety Rules

Rule Rationale
EXACT REPLACEMENT Use provided code exactly - no "improvements"
NO NEW IMPORTS Unless explicitly required by the task
PRESERVE FORMATTING Match existing indentation and style
NO SECRETS Never commit credentials, keys, tokens

HALT Conditions

Stop immediately and report to user if ANY of these occur:

  • Target file does not exist
  • Line numbers don't match expected
  • File has unexpected modifications
  • Syntax check fails after edit
  • Any test fails after edit
  • Merge conflicts encountered
  • Uncertain about ANY step
  • Edit tool reports "string not found"
  • Permission denied errors

Guardrails Files to Create

mcp-websearch/
├── CLAUDE.md                      # Agent guidelines (see below)
├── .guardrails/
│   ├── pre-work-check.md          # Pre-work checklist
│   ├── failure-registry.jsonl     # Bug database
│   └── prevention-rules/
│       ├── pattern-rules.json     # Regex-based rules
│       ├── semantic-rules.json    # AST-based rules
│       └── extracted-rules.json   # Rules from AGENT_GUARDRAILS.md
└── docs/
    └── AGENT_GUARDRAILS.md        # Full guardrails documentation

CLAUDE.md Content

Create /mnt/ollama/git/mcp-websearch/CLAUDE.md:

# mcp-websearch - Rust MCP Web Search Server

## Project Navigation

- **INDEX_MAP.md**: Find documents by keyword/category (TODO: create)
- **HEADER_MAP.md**: Find specific sections with file:line references (TODO: create)
- **Flow**: INDEX_MAP → identify doc → HEADER_MAP → read specific section

## Context

Rust-based MCP server for AI assistants to perform intelligent web searches with:
- Multiple providers (DuckDuckGo, Brave, Synthetic Search)
- Weighted round-robin with automatic fallback
- Semantic caching with RAG integration
- Enterprise-grade observability via MCP tools

## Stack

- **Language**: Rust 1.80+
- **Runtime**: Tokio async
- **Architecture**: Hexagonal (ports/adapters)
- **Database**: SQLite (default) / PostgreSQL (optional)
- **Embeddings**: Ollama (default) / OpenAI (optional)
- **Protocol**: MCP via stdio

## Quick Commands

```bash
# Build
cargo build --release

# Test
cargo test

# Run
RUST_LOG=info ./target/release/mcp-websearch

# Lint
cargo clippy -- -D warnings

File Organization

src/
├── domain/         # Core business logic (NO external deps)
├── application/    # Use case orchestration
├── ports/          # Trait definitions
├── adapters/       # External integrations (MCP, providers, cache, embeddings)
└── infrastructure/ # Cross-cutting concerns

Constraint: Sub-500 Lines

All source files MUST be under 500 lines. Split files that exceed this limit.

Agent Guardrails

MANDATORY: Read .guardrails/pre-work-check.md before any modifications.

The Four Laws

  1. Read Before Editing - Never modify without reading
  2. Stay in Scope - Only touch authorized files
  3. Verify Before Committing - Test all changes
  4. Halt When Uncertain - Ask instead of guessing

Git Rules

  • NO force push
  • NO skip hooks (--no-verify)
  • NO amend commits you didn't create
  • NO push without permission

Code Rules

  • NO secrets in code
  • NO feature creep
  • NO unnecessary imports
  • Production code BEFORE test code

Key Files

File Purpose
src/domain/models.rs Core types (SearchResult, SearchQuery, CacheEntry)
src/ports/search_provider.rs SearchProvider trait
src/adapters/providers/*.rs DuckDuckGo, Brave, Synthetic implementations
src/adapters/cache/sqlite_backend.rs SQLite cache with semantic search
src/adapters/mcp/tools/*.rs MCP tool handlers
migrations/*.sql Database schema

Testing

  • Integration tests with recorded fixtures in tests/fixtures/
  • Run with RECORD_FIXTURES=1 to capture real API responses
  • All external provider calls should use fixtures in CI