Skip to content

Latest commit

 

History

History
483 lines (381 loc) · 9.49 KB

File metadata and controls

483 lines (381 loc) · 9.49 KB

Helm Deployment Guide

This guide covers deploying the MCP Context Server using the official Helm chart.

Prerequisites

  • Kubernetes 1.21+
  • Helm 3.8+
  • PV provisioner support (for SQLite persistence)
  • Optional: PostgreSQL database for production deployments

Installation

Add Repository (Future)

# Once published to a Helm repository
helm repo add mcp-context https://alex-feel.github.io/mcp-context-server
helm repo update

Install from Local Chart

# Clone the repository
git clone https://github.com/alex-feel/mcp-context-server.git
cd mcp-context-server

# Install with default values (SQLite)
helm install mcp ./deploy/helm/mcp-context-server

# Install with custom values
helm install mcp ./deploy/helm/mcp-context-server -f my-values.yaml

Configuration Profiles

SQLite (Development)

Best for single-user deployments and development:

helm install mcp ./deploy/helm/mcp-context-server \
  -f ./deploy/helm/mcp-context-server/values-sqlite.yaml

PostgreSQL (Production)

Best for multi-user and production deployments:

helm install mcp ./deploy/helm/mcp-context-server \
  -f ./deploy/helm/mcp-context-server/values-postgresql.yaml \
  --set storage.postgresql.host=your-postgres-host \
  --set storage.postgresql.password=your-password

With Semantic Search

Enable AI-powered semantic search with Ollama sidecar:

helm install mcp ./deploy/helm/mcp-context-server \
  --set search.semantic.enabled=true \
  --set ollama.enabled=true

With Summary Generation

Enable automatic LLM-based summarization with Ollama sidecar:

helm install mcp ./deploy/helm/mcp-context-server \
  --set ollama.enabled=true \
  --set search.summary.enabled=true \
  --set search.summary.provider=ollama \
  --set search.summary.model=qwen3:0.6b

Note: When using the Ollama sidecar, models must be pulled manually before use:

# Pull the summary model
kubectl exec -it <pod-name> -c ollama -- ollama pull qwen3:0.6b
# Pull the embedding model (if semantic search is also enabled)
kubectl exec -it <pod-name> -c ollama -- ollama pull qwen3-embedding:0.6b

Full-Featured Production

All features enabled with PostgreSQL:

helm install mcp ./deploy/helm/mcp-context-server \
  --set storage.backend=postgresql \
  --set storage.postgresql.enabled=true \
  --set storage.postgresql.host=postgres.example.com \
  --set storage.postgresql.password=secure-password \
  --set search.fts.enabled=true \
  --set search.semantic.enabled=true \
  --set search.hybrid.enabled=true \
  --set search.summary.enabled=true \
  --set ollama.enabled=true \
  --set ingress.enabled=true \
  --set ingress.hosts[0].host=mcp.example.com

Values Reference

Image Configuration

image:
  repository: ghcr.io/alex-feel/mcp-context-server
  tag: ""  # Defaults to Chart.appVersion
  pullPolicy: IfNotPresent

imagePullSecrets: []

Service Configuration

service:
  type: ClusterIP  # or LoadBalancer, NodePort
  port: 8000

Storage Configuration

SQLite

storage:
  backend: sqlite
  sqlite:
    enabled: true
    path: /data/context_storage.db
    persistence:
      enabled: true
      size: 1Gi
      storageClassName: ""  # Use default
      accessModes:
        - ReadWriteOnce

PostgreSQL

storage:
  backend: postgresql
  postgresql:
    enabled: true
    host: "postgresql-host"
    port: "5432"
    user: "postgres"
    password: ""  # Set via --set or secret
    database: "mcp_context"
    sslMode: "prefer"
    # Use existing secret instead of password
    existingSecret: ""
    existingSecretKey: "postgresql-password"

Search Configuration

search:
  fts:
    enabled: true
    language: "english"
  semantic:
    enabled: false
    model: "qwen3-embedding:0.6b"
    dim: 1024
  hybrid:
    enabled: false
    rrfK: 60

Summary Generation

search:
  summary:
    enabled: true             # ENABLE_SUMMARY_GENERATION
    provider: "ollama"        # SUMMARY_PROVIDER: ollama, openai, or anthropic
    model: "qwen3:0.6b"      # SUMMARY_MODEL
    maxTokens: 2000           # SUMMARY_MAX_TOKENS (tokens, 50-5000)
    minContentLength: 500     # SUMMARY_MIN_CONTENT_LENGTH (chars, 0=always)
    prompt: ""                # SUMMARY_PROMPT: empty uses built-in default
    timeout: 240.0            # SUMMARY_TIMEOUT_S
    retryMaxAttempts: 5       # SUMMARY_RETRY_MAX_ATTEMPTS
    retryBaseDelay: 1.0       # SUMMARY_RETRY_BASE_DELAY_S
    maxConcurrent: 3          # SUMMARY_MAX_CONCURRENT (1-20)

For OpenAI or Anthropic providers, configure API keys:

summarySecrets:
  openaiApiKey: ""        # For summary.provider=openai
  anthropicApiKey: ""     # For summary.provider=anthropic
  existingSecret: ""      # Use pre-existing Kubernetes secret

Ollama Sidecar

ollama:
  enabled: false
  image:
    repository: ollama/ollama
    tag: "latest"
    pullPolicy: IfNotPresent
  resources:
    requests:
      cpu: "500m"
      memory: "2Gi"
    limits:
      cpu: "2000m"
      memory: "4Gi"
  persistence:
    enabled: true
    size: 5Gi
    storageClassName: ""

GPU Acceleration

The Ollama sidecar supports GPU acceleration for faster inference. Configure GPU resources in values.yaml:

NVIDIA GPU:

ollama:
  enabled: true
  resources:
    limits:
      nvidia.com/gpu: "1"

Requires the NVIDIA device plugin for Kubernetes.

AMD GPU:

ollama:
  enabled: true
  image:
    tag: "rocm"
  resources:
    limits:
      amd.com/gpu: "1"

Requires the AMD GPU device plugin for Kubernetes.

For complete GPU setup instructions, troubleshooting, and Intel/Vulkan considerations, see the GPU Acceleration Guide.

Ingress Configuration

ingress:
  enabled: false
  className: ""
  annotations: {}
  hosts:
    - host: mcp-context-server.local
      paths:
        - path: /
          pathType: Prefix
  tls: []

Resource Limits

resources:
  requests:
    cpu: "100m"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

Health Probes

probes:
  liveness:
    enabled: true
    path: /health
    initialDelaySeconds: 30
    periodSeconds: 30
    failureThreshold: 3
  readiness:
    enabled: true
    path: /health
    initialDelaySeconds: 10
    periodSeconds: 10
    failureThreshold: 3
  startup:
    enabled: true
    path: /health
    initialDelaySeconds: 5
    periodSeconds: 10
    failureThreshold: 30

Security Context

podSecurityContext:
  runAsNonRoot: true
  runAsUser: 10001
  runAsGroup: 10001
  fsGroup: 10001

securityContext:
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: false
  capabilities:
    drop:
      - ALL

Service Account

serviceAccount:
  create: true
  name: ""
  annotations: {}

Common Operations

Upgrade

helm upgrade mcp ./deploy/helm/mcp-context-server

Rollback

helm rollback mcp 1

Uninstall

helm uninstall mcp

Note: PersistentVolumeClaims are not deleted automatically. To remove data:

kubectl delete pvc -l app.kubernetes.io/instance=mcp

View Rendered Templates

helm template mcp ./deploy/helm/mcp-context-server

Debug Installation

helm install mcp ./deploy/helm/mcp-context-server --debug --dry-run

Examples

External PostgreSQL with Existing Secret

# values-production.yaml
storage:
  backend: postgresql
  sqlite:
    enabled: false
  postgresql:
    enabled: true
    host: "postgres.production.svc.cluster.local"
    port: "5432"
    user: "mcp_user"
    database: "mcp_context"
    sslMode: "require"
    existingSecret: "postgres-credentials"
    existingSecretKey: "password"

search:
  fts:
    enabled: true
  semantic:
    enabled: true
  hybrid:
    enabled: true

ollama:
  enabled: true
  resources:
    requests:
      memory: "4Gi"
    limits:
      memory: "8Gi"

ingress:
  enabled: true
  className: "nginx"
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
  hosts:
    - host: mcp.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: mcp-tls
      hosts:
        - mcp.example.com

Install with:

helm install mcp ./deploy/helm/mcp-context-server -f values-production.yaml

Development with Port Forward

# Install minimal setup
helm install mcp ./deploy/helm/mcp-context-server

# Port forward to local machine
kubectl port-forward svc/mcp 8000:8000

# Test connection
curl http://localhost:8000/health

Troubleshooting

Pod Stuck in Pending

Check PersistentVolumeClaim:

kubectl get pvc
kubectl describe pvc mcp-data

Ollama Out of Memory

Increase resources:

ollama:
  resources:
    limits:
      memory: "8Gi"

Health Check Failing

Check logs:

kubectl logs -l app.kubernetes.io/name=mcp-context-server

Additional Resources

Related Documentation