mcp-websearch: Rust MCP Server - Implementation Plan

Context

Building a state-of-the-art Rust-based MCP server for web searches with enterprise-grade architecture, multiple provider support, semantic caching with RAG integration, and AI-first observability. This server enables AI assistants (Claude, Cursor, etc.) to perform intelligent web searches with automatic provider fallback, persistent semantic caching, and self-diagnosing health monitoring.

Architecture Decisions

Decision	Choice	Rationale
Use Case	AI Assistant Integration	MCP-native, shapes API for AI consumption
Providers	DuckDuckGo, Brave, Synthetic Search	Free tiers + premium fallback path
Distribution	Weighted + Fallback	Smart load distribution with resilience
Auth	MCP Protocol Config	Native MCP integration, simple
Response Format	Normalized Schema	Consistent for AI clients
Caching	Persistent + RAG + Migrations	SQLite default, external DB optional
Embeddings	Ollama default, configurable providers	Local-first with cloud fallback
Error Handling	Circuit breaker + retry + cache fallback	Enterprise resilience
Observability	MCP tools (self-diagnosing)	AI clients can query health
Runtime	Tokio	Industry standard, rich ecosystem
Architecture	Hexagonal	Clean separation, testable
Rust Version	MSRV 1.80	LazyLock, latest stable
Testing	Integration-focused with fixtures	Tests against real behavior

Project Structure (Hexagonal Architecture)

mcp-websearch/
├── Cargo.toml
├── Cargo.lock
├── .cargo/
│   └── config.toml              # Build configurations
├── src/
│   ├── main.rs                  # Composition root (~100 lines)
│   ├── lib.rs                   # Public API exports (~50 lines)
│   │
│   ├── domain/                  # Core business logic - NO external deps
│   │   ├── mod.rs
│   │   ├── models.rs            # SearchQuery, SearchResult, CacheEntry (~300 lines)
│   │   ├── search_service.rs    # Core search orchestration (~400 lines)
│   │   ├── provider.rs          # SearchProvider trait (~150 lines)
│   │   ├── cache.rs             # Cache trait definitions (~200 lines)
│   │   ├── embeddings.rs        # EmbeddingProvider trait (~150 lines)
│   │   └── circuit_breaker.rs   # CircuitBreaker trait + state machine (~250 lines)
│   │
│   ├── application/             # Use case orchestration
│   │   ├── mod.rs
│   │   ├── use_cases/
│   │   │   ├── mod.rs
│   │   │   ├── search.rs        # Search use case (~350 lines)
│   │   │   ├── health.rs        # Health/observability use case (~200 lines)
│   │   │   ├── cache.rs         # Cache management use case (~250 lines)
│   │   │   └── config.rs        # Configuration use case (~200 lines)
│   │   └── dto.rs               # Data transfer objects (~150 lines)
│   │
│   ├── ports/                   # Port definitions (interfaces)
│   │   ├── mod.rs
│   │   ├── search_provider.rs   # SearchProvider port (~100 lines)
│   │   ├── cache_backend.rs     # CacheBackend port (~120 lines)
│   │   ├── embedding.rs         # EmbeddingProvider port (~100 lines)
│   │   └── mcp_transport.rs     # MCP transport abstraction (~150 lines)
│   │
│   ├── adapters/                # Outer hexagon - external integrations
│   │   ├── mod.rs
│   │   │
│   │   ├── mcp/                 # MCP protocol adapter
│   │   │   ├── mod.rs
│   │   │   ├── stdio.rs         # Stdio transport (~200 lines)
│   │   │   └── tools/
│   │   │       ├── mod.rs
│   │   │       ├── search.rs    # web_search tool (~300 lines)
│   │   │       ├── search_similar.rs # search_similar tool (RAG) (~250 lines)
│   │   │       ├── search_news.rs    # search_news tool (~250 lines)
│   │   │       ├── search_cached.rs  # search_cached tool (~200 lines)
│   │   │       ├── cache_stats.rs    # get_cache_stats tool (~150 lines)
│   │   │       ├── cache_clear.rs    # cache_clear tool (~150 lines)
│   │   │       ├── cache_migrate.rs  # cache_migrate tool (~200 lines)
│   │   │       ├── provider_status.rs # get_provider_status tool (~150 lines)
│   │   │       └── server_stats.rs   # get_server_stats tool (~200 lines)
│   │   │
│   │   ├── providers/           # Search provider adapters
│   │   │   ├── mod.rs
│   │   │   ├── duckduckgo.rs    # DuckDuckGo adapter (~350 lines)
│   │   │   ├── brave.rs         # Brave Search adapter (~350 lines)
│   │   │   ├── synthetic.rs     # Synthetic Search adapter (~400 lines)
│   │   │   ├── provider_pool.rs # Weighted provider pool (~300 lines)
│   │   │   └── normalized_result.rs # Result normalization (~250 lines)
│   │   │
│   │   ├── cache/               # Cache backend adapters
│   │   │   ├── mod.rs
│   │   │   ├── sqlite_backend.rs    # SQLite implementation (~450 lines)
│   │   │   ├── postgres_backend.rs  # PostgreSQL implementation (~450 lines)
│   │   │   ├── semantic_index.rs    # HNSW semantic index (~400 lines)
│   │   │   ├── cache_entry.rs       # Cache entry model (~150 lines)
│   │   │   └── migrations.rs        # Schema migrations (~300 lines)
│   │   │
│   │   ├── embeddings/          # Embedding provider adapters
│   │   │   ├── mod.rs
│   │   │   ├── ollama.rs        # Ollama embeddings (~300 lines)
│   │   │   ├── openai.rs        # OpenAI embeddings (~250 lines)
│   │   │   └── model_registry.rs # Available models (~150 lines)
│   │   │
│   │   └── circuit_breaker/     # Circuit breaker implementations
│   │       ├── mod.rs
│   │       └── in_memory.rs     # Lock-free circuit breaker (~300 lines)
│   │
│   └── infrastructure/          # Cross-cutting concerns
│       ├── mod.rs
│       ├── config.rs            # Configuration loading (~350 lines)
│       ├── error.rs             # Error types hierarchy (~300 lines)
│       ├── telemetry.rs         # Logging/tracing setup (~200 lines)
│       └── http_client.rs       # Shared HTTP client factory (~150 lines)
│
├── migrations/                  # Database migrations
│   ├── V1__initial_schema.sql
│   ├── V2__add_semantic_index.sql
│   └── V3__add_embedding_metadata.sql
│
├── tests/                       # Integration tests
│   ├── fixtures/
│   │   ├── duckduckgo/
│   │   ├── brave/
│   │   └── synthetic/
│   ├── integration/
│   │   ├── providers_test.rs
│   │   ├── cache_test.rs
│   │   └── embeddings_test.rs
│   └── e2e/
│       └── mcp_protocol_test.rs
│
├── config/
│   └── default.toml             # Default configuration
│
└── README.md

Domain Models

Core Types (`domain/models.rs`)

/// Normalized search result - consistent across all providers
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SearchResult {
    pub title: String,
    pub url: String,
    pub snippet: String,
    pub source_provider: ProviderName,
    pub published_date: Option<DateTime<Utc>>,
    pub relevance_score: Option<f32>,
}

/// Provider identification
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
pub enum ProviderName {
    DuckDuckGo,
    Brave,
    Synthetic,
    Cache, // For cached results
}

/// Search query with options
#[derive(Debug, Clone, Deserialize)]
pub struct SearchQuery {
    pub query: String,
    pub max_results: Option<u8>,
    pub providers: Option<Vec<ProviderName>>,
    pub use_cache: Option<bool>,
    pub freshness: Option<Freshness>,
}

pub enum Freshness {
    Day,
    Week,
    Month,
    Year,
}

/// Cache entry with semantic indexing
#[derive(Debug, Clone)]
pub struct CacheEntry {
    pub id: i64,
    pub query_hash: [u8; 32],       // Blake3 hash
    pub query_text: String,
    pub embedding: Vec<f32>,         // Quantized to f16 for storage
    pub results: Vec<SearchResult>,
    pub provider_used: ProviderName,
    pub created_at: DateTime<Utc>,
    pub last_accessed: DateTime<Utc>,
    pub access_count: u32,
    pub embedding_model: String,     // Track for dimension changes
    pub embedding_dimension: u16,
}

Port Definitions

SearchProvider Port (`ports/search_provider.rs`)

#[async_trait]
pub trait SearchProvider: Send + Sync {
    /// Provider name for logging/routing
    fn name(&self) -> ProviderName;

    /// Weight for weighted selection (higher = more traffic)
    fn weight(&self) -> u32;

    /// Execute search
    async fn search(&self, query: &SearchQuery) -> Result<Vec<SearchResult>, ProviderError>;

    /// Health check for circuit breaker
    async fn health_check(&self) -> ProviderHealth;

    /// Provider capabilities for routing decisions
    fn capabilities(&self) -> ProviderCapabilities;
}

pub struct ProviderCapabilities {
    pub max_concurrent_requests: u32,
    pub typical_latency_ms: u32,
    pub supports_pagination: bool,
    pub supports_freshness: bool,
    pub rate_limit_per_minute: u32,
}

pub struct ProviderHealth {
    pub is_healthy: bool,
    pub latency_ms: Option<u32>,
    pub error_rate: f32,
    pub last_success: Option<Instant>,
}

CacheBackend Port (`ports/cache_backend.rs`)

#[async_trait]
pub trait CacheBackend: Send + Sync {
    /// Exact key lookup
    async fn get_exact(&self, query_hash: &[u8; 32]) -> Result<Option<CacheEntry>, CacheError>;

    /// Semantic similarity search
    async fn get_semantic(
        &self,
        embedding: &[f32],
        threshold: f32,
        limit: usize,
    ) -> Result<Vec<SemanticMatch>, CacheError>;

    /// Store entry
    async fn store(&self, entry: CacheEntry) -> Result<(), CacheError>;

    /// Invalidate by query or age
    async fn invalidate(&self, query_hash: Option<&[u8; 32]>, older_than: Option<Duration>) -> Result<u64, CacheError>;

    /// Get statistics
    async fn stats(&self) -> Result<CacheStats, CacheError>;

    /// Run migrations
    async fn migrate(&self, target_version: u32) -> Result<(), CacheError>;
}

pub struct SemanticMatch {
    pub entry: CacheEntry,
    pub similarity: f32,
}

EmbeddingProvider Port (`ports/embedding.rs`)

#[async_trait]
pub trait EmbeddingProvider: Send + Sync {
    /// Generate embedding for text
    async fn embed(&self, text: &str) -> Result<Vec<f32>, EmbeddingError>;

    /// Model identifier
    fn model_name(&self) -> &str;

    /// Embedding dimension
    fn dimension(&self) -> u16;

    /// Health check
    async fn health_check(&self) -> bool;
}

Circuit Breaker Implementation

Lock-Free Circuit Breaker (`adapters/circuit_breaker/in_memory.rs`)

use std::sync::atomic::{AtomicU8, AtomicU32, AtomicU64, Ordering};

pub enum CircuitState {
    Closed = 0,   // Normal operation
    Open = 1,     // Failing fast
    HalfOpen = 2, // Testing recovery
}

pub struct CircuitBreaker {
    state: AtomicU8,
    failure_count: AtomicU32,
    success_count: AtomicU32,
    last_failure: AtomicU64,
    config: CircuitConfig,
}

pub struct CircuitConfig {
    pub failure_threshold: u32,      // Open after N failures
    pub success_threshold: u32,      // Close after N successes in half-open
    pub timeout_ms: u64,             // Time before half-open attempt
    pub base_delay_ms: u64,          // Base retry delay
    pub max_delay_ms: u64,           // Max retry delay
    pub backoff_multiplier: f32,     // Exponential backoff factor
}

impl CircuitBreaker {
    /// Two-tier: returns result or falls back to next provider
    pub async fn call_with_fallback<F, Fut, T>(
        &self,
        f: F,
        fallback: Option<Arc<dyn SearchProvider>>,
    ) -> Result<T, CircuitError>
    where
        F: FnOnce() -> Fut,
        Fut: std::future::Future<Output = Result<T, ProviderError>>,
    {
        if !self.allow_request() {
            if let Some(fb) = fallback {
                return fb.search(query).await;
            }
            return Err(CircuitError::Open);
        }

        // Execute with retry and jitter
        let mut delay = self.config.base_delay_ms;
        for attempt in 0..self.config.max_retries {
            match f().await {
                Ok(v) => {
                    self.record_success();
                    return Ok(v);
                }
                Err(e) if attempt < self.config.max_retries - 1 => {
                    let jitter = rand::random::<f32>() * 0.3;
                    tokio::time::sleep(Duration::from_millis(
                        (delay as f32 * (1.0 + jitter)) as u64
                    )).await;
                    delay = (delay as f32 * self.config.backoff_multiplier) as u64;
                    delay = delay.min(self.config.max_delay_ms);
                }
                Err(e) => {
                    self.record_failure();
                    return Err(e.into());
                }
            }
        }
        unreachable!()
    }
}

MCP Tools

Tool Definitions (10 tools total)

Tool	Purpose	File
`web_search`	Primary search with provider fallback	`adapters/mcp/tools/search.rs`
`search_similar`	RAG-style semantic cache lookup	`adapters/mcp/tools/search_similar.rs`
`search_news`	News-focused search with freshness	`adapters/mcp/tools/search_news.rs`
`search_cached`	Cache-only search (no external calls)	`adapters/mcp/tools/search_cached.rs`
`get_cache_stats`	Cache hit rate, size, health	`adapters/mcp/tools/cache_stats.rs`
`cache_clear`	Clear cache entries	`adapters/mcp/tools/cache_clear.rs`
`cache_migrate`	Run schema migrations	`adapters/mcp/tools/cache_migrate.rs`
`get_provider_status`	Provider health + circuit breaker state	`adapters/mcp/tools/provider_status.rs`
`get_server_stats`	Server metrics (latency, throughput)	`adapters/mcp/tools/server_stats.rs`

web_search Tool Schema

{
  "name": "web_search",
  "description": "Search the web using multiple providers with automatic fallback and semantic caching",
  "inputSchema": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "Search query"
      },
      "max_results": {
        "type": "integer",
        "default": 10,
        "minimum": 1,
        "maximum": 50
      },
      "providers": {
        "type": "array",
        "items": { "enum": ["duckduckgo", "brave", "synthetic"] },
        "description": "Specific providers to use (default: all available)"
      },
      "use_cache": {
        "type": "boolean",
        "default": true,
        "description": "Use semantic cache for faster results"
      },
      "freshness": {
        "type": "string",
        "enum": ["day", "week", "month", "year"],
        "description": "Result freshness filter"
      }
    },
    "required": ["query"]
  }
}

Semantic Cache Architecture

Multi-Tier Cache (`adapters/cache/semantic_index.rs`)

┌─────────────────────────────────────────┐
│  L1: Hot (in-memory HNSW, full precision)│ 1000 entries × 6KB = 6MB
│  TTL: 5 minutes                         │
├─────────────────────────────────────────┤
│  L2: Warm (in-memory HNSW, quantized)   │ 10000 entries × 192B = 2MB
│  TTL: 30 minutes                        │
├─────────────────────────────────────────┤  │  L3: Cold (SQLite, compressed)          │ Unlimited
│  TTL: 7 days                            │
└─────────────────────────────────────────┘

Embedding Quantization

/// Quantize f32 embedding to binary (96% size reduction)
pub fn quantize_to_binary(embedding: &[f32]) -> Vec<u8> {
    embedding.chunks(8)
        .map(|chunk| {
            chunk.iter().enumerate()
                .filter(|(_, v)| **v > 0.0)
                .map(|(i, _)| 1 << i)
                .fold(0u8, |acc, bit| acc | bit)
        })
        .collect()
}

/// Hamming distance for binary embeddings (fast XOR + popcount)
pub fn hamming_distance(a: &[u8], b: &[u8]) -> u32 {
    a.iter().zip(b.iter())
        .map(|(x, y)| (x ^ y).count_ones())
        .sum()
}

Database Schema

V1__initial_schema.sql

PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL;
PRAGMA cache_size = -64000;  -- 64MB cache
PRAGMA mmap_size = 268435456;  -- 256MB mmap

CREATE TABLE IF NOT EXISTS cache_entries (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    query_hash BLOB NOT NULL UNIQUE,      -- 32-byte Blake3 hash
    query_text TEXT NOT NULL,             -- For debugging/FTS
    embedding BLOB,                       -- Quantized f16/u8
    results_json BLOB NOT NULL,           -- zstd compressed
    provider_used TEXT NOT NULL,          -- "duckduckgo", "brave", "synthetic"
    created_at INTEGER NOT NULL,          -- Unix timestamp ms
    last_accessed INTEGER NOT NULL,
    access_count INTEGER DEFAULT 0,
    embedding_model TEXT NOT NULL,        -- "nomic-embed-text"
    embedding_dimension INTEGER NOT NULL  -- 768, 1536, etc.
);

-- Fast hash lookup
CREATE INDEX idx_query_hash ON cache_entries(query_hash);

-- Time-based cleanup
CREATE INDEX idx_created_at ON cache_entries(created_at);

-- Full-text search for pre-filtering
CREATE VIRTUAL TABLE cache_fts USING fts5(query_text, content='cache_entries');

V2__add_semantic_index.sql

-- HNSW index metadata for tracking loaded entries
CREATE TABLE semantic_index_state (
    id INTEGER PRIMARY KEY,
    entry_id INTEGER NOT NULL,
    hnsw_node_id INTEGER NOT NULL,
    tier TEXT NOT NULL,  -- 'hot' or 'warm'
    FOREIGN KEY (entry_id) REFERENCES cache_entries(id) ON DELETE CASCADE
);

CREATE INDEX idx_semantic_tier ON semantic_index_state(tier);

Configuration Schema

MCP Client Configuration (config.json)

{
  "mcpServers": {
    "websearch": {
      "command": "/path/to/mcp-websearch",
      "args": [],
      "env": {
        "RUST_LOG": "info"
      }
    }
  }
}

Server Configuration (internal)

#[derive(Debug, Deserialize)]
pub struct Config {
    pub providers: ProvidersConfig,
    pub cache: CacheConfig,
    pub embeddings: EmbeddingsConfig,
    pub circuit_breaker: CircuitBreakerConfig,
}

#[derive(Debug, Deserialize)]
pub struct ProvidersConfig {
    pub duckduckgo: Option<DuckDuckGoConfig>,
    pub brave: Option<BraveConfig>,
    pub synthetic: Option<SyntheticConfig>,
    pub weights: HashMap<String, u32>,
    pub fallback_order: Vec<String>,
}

#[derive(Debug, Deserialize)]
pub struct BraveConfig {
    pub api_key: SecretString,
    pub rate_limit_per_minute: u32,
}

#[derive(Debug, Deserialize)]
pub struct SyntheticConfig {
    pub api_key: SecretString,
    pub endpoint: String,
}

#[derive(Debug, Deserialize)]
pub struct CacheConfig {
    pub backend: CacheBackend,
    pub path: PathBuf,
    pub max_size_mb: u64,
    pub default_ttl_hours: u64,
    pub semantic_threshold: f32,
}

#[derive(Debug, Deserialize)]
pub enum CacheBackend {
    Sqlite,
    Postgres { connection_string: SecretString },
}

#[derive(Debug, Deserialize)]
pub struct EmbeddingsConfig {
    pub provider: EmbeddingProvider,
    pub model: String,
    pub dimension: u16,
    pub budget_limit_usd: Option<f32>,
}

#[derive(Debug, Deserialize)]
pub enum EmbeddingProvider {
    Ollama { endpoint: String },
    OpenAI { api_key: SecretString },
}

Error Handling

Error Hierarchy (`infrastructure/error.rs`)

#[derive(Error, Debug)]
pub enum SearchError {
    #[error("Provider error: {provider} - {message}")]
    Provider {
        provider: ProviderName,
        message: String,
        #[source]
        source: ProviderError,
    },

    #[error("All providers failed")]
    AllProvidersFailed {
        errors: Vec<(ProviderName, ProviderError)>,
    },

    #[error("Circuit breaker open for: {0}")]
    CircuitBreakerOpen(ProviderName),

    #[error("Cache error: {0}")]
    Cache(#[from] CacheError),

    #[error("Embedding generation failed: {0}")]
    Embedding(#[from] EmbeddingError),

    #[error("Query validation failed: {reasons:?}")]
    Validation { reasons: Vec<String> },
}

// Sanitized for MCP responses
impl From<SearchError> for McpError {
    fn from(err: SearchError) -> Self {
        match err {
            SearchError::AllProvidersFailed { .. } => {
                McpError::internal("All search providers are currently unavailable. Try again later.")
            }
            SearchError::CircuitBreakerOpen(provider) => {
                McpError::unavailable(&format!("Provider {} is temporarily unavailable", provider))
            }
            _ => McpError::internal("Search failed. Please try again."),
        }
    }
}

Security Measures

Critical Security Controls

Area	Control	Implementation
API Keys	Memory protection	`secrecy` crate, zero-on-drop
API Keys	File permissions	Enforce 0o600 on config files
SSRF	URL allowlisting	Block private IP ranges, disable redirects
Prompt Injection	Input validation	Unicode NFC normalization, pattern detection
Denial-of-Wallet	Budget controls	Circuit breakers, mandatory caps for paid APIs
Cache Poisoning	Integrity	Blocklist validation, content hashing
Supply Chain	Audit	`cargo audit` in CI, pin critical versions

Input Validation

pub fn validate_query(query: &str) -> Result<(), ValidationError> {
    // 1. Length check
    if query.len() > 1000 {
        return Err(ValidationError::TooLong);
    }

    // 2. Unicode normalization (prevent homograph attacks)
    let normalized = query.nfc().collect::<String>();

    // 3. Control character filtering
    if normalized.chars().any(|c| c.is_control()) {
        return Err(ValidationError::InvalidCharacters);
    }

    // 4. Prompt injection patterns (basic)
    let injection_patterns = [
        "ignore previous",
        "ignore all previous",
        "disregard",
        "system:",
        "[INST]",
    ];
    let lower = normalized.to_lowercase();
    for pattern in injection_patterns {
        if lower.contains(pattern) {
            return Err(ValidationError::SuspiciousPattern);
        }
    }

    Ok(())
}

Performance Targets

Response Time Budget (Target: <500ms p99)

Operation	Budget	Implementation Target
Cache lookup (HNSW hit)	1ms	0.5ms
Cache lookup (DB miss)	50ms	10-30ms
Embedding generation	100ms	50-200ms
Provider race timeout	250ms	250ms
Result normalization	5ms	1ms
Total	406ms	311-481ms

Resource Budgets

Resource	Limit
RAM (resident)	256MB
RAM (with SQLite mmap)	512MB
Disk I/O	<10MB/s sustained
File descriptors	<200
CPU cores	2+

Testing Strategy

Integration Tests with Fixtures

tests/
├── fixtures/
│   ├── duckduckgo/
│   │   ├── rust_programming.json
│   │   └── error_rate_limit.json
│   ├── brave/
│   │   └── rust_programming.json
│   └── synthetic/
│       └── rust_programming.json
├── integration/
│   ├── providers_test.rs      # Real API calls with recorded responses
│   ├── cache_test.rs          # SQLite with testcontainers
│   └── embeddings_test.rs     # Ollama integration
└── e2e/
    └── mcp_protocol_test.rs   # Full MCP tool flow

Test Command

# Run all tests
cargo test

# Run with recording (saves fixtures)
RECORD_FIXTURES=1 cargo test --features recording

# Run integration tests only
cargo test --test integration

Dependencies

Core Dependencies (Cargo.toml)

[dependencies]
# Async runtime
tokio = { version = "1", features = ["full"] }

# MCP protocol
rmcp = "0.1"

# HTTP client (with rustls, NOT OpenSSL)
reqwest = { version = "0.12", features = ["rustls-tls", "json"], default-features = false }

# Serialization
serde = { version = "1", features = ["derive"] }
serde_json = "1"

# Database
sqlx = { version = "0.7", features = ["runtime-tokio", "sqlite", "postgres"] }

# Embeddings
async-openai = "0.20"  # Ollama-compatible

# Cryptography
blake3 = "1"

# Circuit breaker
tokio-circuit-breaker = "0.1"

# Secrets handling
secrecy = "0.8"

# Error handling
thiserror = "1"
anyhow = "1"

# Logging
tracing = "0.1"
tracing-subscriber = "0.3"

# Date/time
chrono = { version = "0.4", features = ["serde"] }

# HNSW for semantic search
instant-distance = "0.6"

[dev-dependencies]
tokio-test = "0.4"
testcontainers = "0.15"
wiremock = "0.5"

[build-dependencies]
# Minimal build.rs, prefer const evaluation

Implementation Phases

Phase 1: Foundation (~1-2 days)

Project setup (Cargo.toml, structure)
Domain models (domain/models.rs)
Port traits (ports/)
Error types (infrastructure/error.rs)
Configuration infrastructure

Phase 2: Core Adapters (~2-3 days)

DuckDuckGo provider (adapters/providers/duckduckgo.rs)
Brave provider (adapters/providers/brave.rs)
Synthetic provider (adapters/providers/synthetic.rs)
Result normalization (adapters/providers/normalized_result.rs)
Provider pool with weighting (adapters/providers/provider_pool.rs)

Phase 3: Circuit Breaker + Cache (~2-3 days)

Lock-free circuit breaker (adapters/circuit_breaker/in_memory.rs)
SQLite cache backend (adapters/cache/sqlite_backend.rs)
HNSW semantic index (adapters/cache/semantic_index.rs)
Cache migrations (adapters/cache/migrations.rs)

Phase 4: Embeddings (~1-2 days)

Ollama embedding provider (adapters/embeddings/ollama.rs)
OpenAI embedding provider (adapters/embeddings/openai.rs)
Embedding quantization utilities

Phase 5: MCP Integration (~2 days)

Stdio transport (adapters/mcp/stdio.rs)
web_search tool
search_similar tool (RAG)
search_news tool
search_cached tool
Admin tools (cache_clear, cache_migrate, etc.)
Observability tools (get_provider_status, get_server_stats)

Phase 6: Testing + Polish (~1-2 days)

Integration test fixtures
E2E MCP protocol tests
Documentation
README.md with usage examples

Verification Steps

1. Build and Test

cargo build --release
cargo test
cargo clippy -- -D warnings

2. Run MCP Server

RUST_LOG=debug ./target/release/mcp-websearch

3. Test with MCP Client

Add to Claude Desktop config.json:

{
  "mcpServers": {
    "websearch": {
      "command": "/path/to/mcp-websearch"
    }
  }
}

4. Verify Tools

In Claude:

"Search for Rust async programming" → Should use web_search tool
"Find similar results to previous search" → Should use search_similar tool
"What is the cache status?" → Should use get_cache_stats tool
"Check provider health" → Should use get_provider_status tool

5. Performance Validation

# Cache hit should be <10ms
# Cache miss should be <500ms
# Provider fallback should work (disable one provider, search should succeed)

Critical Files to Modify/Create

New files (all) - Greenfield project
Key implementation files:
- src/domain/models.rs - Core types
- src/ports/search_provider.rs - Provider abstraction
- src/adapters/providers/*.rs - Provider implementations
- src/adapters/cache/sqlite_backend.rs - Cache implementation
- src/adapters/mcp/tools/*.rs - MCP tool handlers
- migrations/V1__initial_schema.sql - Database schema

Review Feedback Incorporated

Architecture Review

✅ Provider capabilities exposed via trait for routing decisions
✅ Two-tier circuit breaker with fallback chain
✅ CacheBackend trait supports exact and semantic lookup
✅ MCP adapter is thin, delegates to application layer

Security Review

✅ SSRF protection via URL allowlisting
✅ Prompt injection detection in input validation
✅ Budget controls for paid embedding APIs
✅ secrecy crate for API key memory protection
✅ File permission enforcement (0o600)
✅ rustls instead of OpenSSL

Performance Review

✅ Multi-tier cache (L1 HNSW, L2 quantized, L3 SQLite)
✅ Embedding quantization (96% memory reduction)
✅ Lock-free circuit breaker
✅ Shared HTTP client with connection pooling
✅ Provider racing with timeout (not waiting for all)
✅ WAL mode for SQLite concurrency

Agent Guardrails (MANDATORY)

Cloned from agent-guardrails-template

The Four Laws of Agent Safety

Read Before Editing - Never modify code without reading first
Stay in Scope - Only touch authorized files
Verify Before Committing - Test all changes
Halt When Uncertain - Ask instead of guessing

Pre-Execution Checklist

Before ANY file modification, verify:

#	Check	Requirement
1	READ FIRST	NEVER edit a file without reading it first
2	SCOPE LOCK	Only modify files explicitly in scope
3	NO FEATURE CREEP	Do NOT add features, refactor, or "improve" unrelated code
4	SUB-500 LINES	No file should exceed 500 lines
5	TEST BEFORE COMMIT	All tests must pass before committing
6	CHECK FAILURE REGISTRY	Review known bugs for affected files
7	VERIFY FIXES INTACT	Confirm previous fixes not being undone

Git Safety Rules

Rule	Description
NO FORCE PUSH	Never use `git push --force`
NO AMEND	Do not amend commits you didn't create this session
NO CONFIG CHANGES	Do not modify git config
NO PUSH WITHOUT PERMISSION	Only push if user explicitly requests
NO SKIP HOOKS	Never use `--no-verify`
NO REBASE	Never rebase shared branches

Code Safety Rules

Rule	Rationale
EXACT REPLACEMENT	Use provided code exactly - no "improvements"
NO NEW IMPORTS	Unless explicitly required by the task
PRESERVE FORMATTING	Match existing indentation and style
NO SECRETS	Never commit credentials, keys, tokens

HALT Conditions

Stop immediately and report to user if ANY of these occur:

Target file does not exist
Line numbers don't match expected
File has unexpected modifications
Syntax check fails after edit
Any test fails after edit
Merge conflicts encountered
Uncertain about ANY step
Edit tool reports "string not found"
Permission denied errors

Guardrails Files to Create

mcp-websearch/
├── CLAUDE.md                      # Agent guidelines (see below)
├── .guardrails/
│   ├── pre-work-check.md          # Pre-work checklist
│   ├── failure-registry.jsonl     # Bug database
│   └── prevention-rules/
│       ├── pattern-rules.json     # Regex-based rules
│       ├── semantic-rules.json    # AST-based rules
│       └── extracted-rules.json   # Rules from AGENT_GUARDRAILS.md
└── docs/
    └── AGENT_GUARDRAILS.md        # Full guardrails documentation

CLAUDE.md Content

Create /mnt/ollama/git/mcp-websearch/CLAUDE.md:

# mcp-websearch - Rust MCP Web Search Server

## Project Navigation

- **INDEX_MAP.md**: Find documents by keyword/category (TODO: create)
- **HEADER_MAP.md**: Find specific sections with file:line references (TODO: create)
- **Flow**: INDEX_MAP → identify doc → HEADER_MAP → read specific section

## Context

Rust-based MCP server for AI assistants to perform intelligent web searches with:
- Multiple providers (DuckDuckGo, Brave, Synthetic Search)
- Weighted round-robin with automatic fallback
- Semantic caching with RAG integration
- Enterprise-grade observability via MCP tools

## Stack

- **Language**: Rust 1.80+
- **Runtime**: Tokio async
- **Architecture**: Hexagonal (ports/adapters)
- **Database**: SQLite (default) / PostgreSQL (optional)
- **Embeddings**: Ollama (default) / OpenAI (optional)
- **Protocol**: MCP via stdio

## Quick Commands

```bash
# Build
cargo build --release

# Test
cargo test

# Run
RUST_LOG=info ./target/release/mcp-websearch

# Lint
cargo clippy -- -D warnings

File Organization

src/
├── domain/         # Core business logic (NO external deps)
├── application/    # Use case orchestration
├── ports/          # Trait definitions
├── adapters/       # External integrations (MCP, providers, cache, embeddings)
└── infrastructure/ # Cross-cutting concerns

Constraint: Sub-500 Lines

All source files MUST be under 500 lines. Split files that exceed this limit.

Agent Guardrails

MANDATORY: Read .guardrails/pre-work-check.md before any modifications.

The Four Laws

Read Before Editing - Never modify without reading
Stay in Scope - Only touch authorized files
Verify Before Committing - Test all changes
Halt When Uncertain - Ask instead of guessing

Git Rules

NO force push
NO skip hooks (--no-verify)
NO amend commits you didn't create
NO push without permission

Code Rules

NO secrets in code
NO feature creep
NO unnecessary imports
Production code BEFORE test code

Key Files

File	Purpose
`src/domain/models.rs`	Core types (SearchResult, SearchQuery, CacheEntry)
`src/ports/search_provider.rs`	SearchProvider trait
`src/adapters/providers/*.rs`	DuckDuckGo, Brave, Synthetic implementations
`src/adapters/cache/sqlite_backend.rs`	SQLite cache with semantic search
`src/adapters/mcp/tools/*.rs`	MCP tool handlers
`migrations/*.sql`	Database schema

Testing

Integration tests with recorded fixtures in tests/fixtures/
Run with RECORD_FIXTURES=1 to capture real API responses
All external provider calls should use fixtures in CI

FilesExpand file tree

IMPLEMENTATION_PLAN.md

Latest commit

History