This guide covers deploying the MCP Context Server using the official Helm chart.
- Kubernetes 1.21+
- Helm 3.8+
- PV provisioner support (for SQLite persistence)
- Optional: PostgreSQL database for production deployments
# Once published to a Helm repository
helm repo add mcp-context https://alex-feel.github.io/mcp-context-server
helm repo update# Clone the repository
git clone https://github.com/alex-feel/mcp-context-server.git
cd mcp-context-server
# Install with default values (SQLite)
helm install mcp ./deploy/helm/mcp-context-server
# Install with custom values
helm install mcp ./deploy/helm/mcp-context-server -f my-values.yamlBest for single-user deployments and development:
helm install mcp ./deploy/helm/mcp-context-server \
-f ./deploy/helm/mcp-context-server/values-sqlite.yamlBest for multi-user and production deployments:
helm install mcp ./deploy/helm/mcp-context-server \
-f ./deploy/helm/mcp-context-server/values-postgresql.yaml \
--set storage.postgresql.host=your-postgres-host \
--set storage.postgresql.password=your-passwordEnable AI-powered semantic search with Ollama sidecar:
helm install mcp ./deploy/helm/mcp-context-server \
--set search.semantic.enabled=true \
--set ollama.enabled=trueEnable automatic LLM-based summarization with Ollama sidecar:
helm install mcp ./deploy/helm/mcp-context-server \
--set ollama.enabled=true \
--set search.summary.enabled=true \
--set search.summary.provider=ollama \
--set search.summary.model=qwen3:0.6bNote: When using the Ollama sidecar, models must be pulled manually before use:
# Pull the summary model
kubectl exec -it <pod-name> -c ollama -- ollama pull qwen3:0.6b
# Pull the embedding model (if semantic search is also enabled)
kubectl exec -it <pod-name> -c ollama -- ollama pull qwen3-embedding:0.6bAll features enabled with PostgreSQL:
helm install mcp ./deploy/helm/mcp-context-server \
--set storage.backend=postgresql \
--set storage.postgresql.enabled=true \
--set storage.postgresql.host=postgres.example.com \
--set storage.postgresql.password=secure-password \
--set search.fts.enabled=true \
--set search.semantic.enabled=true \
--set search.hybrid.enabled=true \
--set search.summary.enabled=true \
--set ollama.enabled=true \
--set ingress.enabled=true \
--set ingress.hosts[0].host=mcp.example.comimage:
repository: ghcr.io/alex-feel/mcp-context-server
tag: "" # Defaults to Chart.appVersion
pullPolicy: IfNotPresent
imagePullSecrets: []service:
type: ClusterIP # or LoadBalancer, NodePort
port: 8000storage:
backend: sqlite
sqlite:
enabled: true
path: /data/context_storage.db
persistence:
enabled: true
size: 1Gi
storageClassName: "" # Use default
accessModes:
- ReadWriteOncestorage:
backend: postgresql
postgresql:
enabled: true
host: "postgresql-host"
port: "5432"
user: "postgres"
password: "" # Set via --set or secret
database: "mcp_context"
sslMode: "prefer"
# Use existing secret instead of password
existingSecret: ""
existingSecretKey: "postgresql-password"search:
fts:
enabled: true
language: "english"
semantic:
enabled: false
model: "qwen3-embedding:0.6b"
dim: 1024
hybrid:
enabled: false
rrfK: 60search:
summary:
enabled: true # ENABLE_SUMMARY_GENERATION
provider: "ollama" # SUMMARY_PROVIDER: ollama, openai, or anthropic
model: "qwen3:0.6b" # SUMMARY_MODEL
maxTokens: 2000 # SUMMARY_MAX_TOKENS (tokens, 50-5000)
minContentLength: 500 # SUMMARY_MIN_CONTENT_LENGTH (chars, 0=always)
prompt: "" # SUMMARY_PROMPT: empty uses built-in default
timeout: 240.0 # SUMMARY_TIMEOUT_S
retryMaxAttempts: 5 # SUMMARY_RETRY_MAX_ATTEMPTS
retryBaseDelay: 1.0 # SUMMARY_RETRY_BASE_DELAY_S
maxConcurrent: 3 # SUMMARY_MAX_CONCURRENT (1-20)For OpenAI or Anthropic providers, configure API keys:
summarySecrets:
openaiApiKey: "" # For summary.provider=openai
anthropicApiKey: "" # For summary.provider=anthropic
existingSecret: "" # Use pre-existing Kubernetes secretollama:
enabled: false
image:
repository: ollama/ollama
tag: "latest"
pullPolicy: IfNotPresent
resources:
requests:
cpu: "500m"
memory: "2Gi"
limits:
cpu: "2000m"
memory: "4Gi"
persistence:
enabled: true
size: 5Gi
storageClassName: ""The Ollama sidecar supports GPU acceleration for faster inference. Configure GPU resources in values.yaml:
NVIDIA GPU:
ollama:
enabled: true
resources:
limits:
nvidia.com/gpu: "1"Requires the NVIDIA device plugin for Kubernetes.
AMD GPU:
ollama:
enabled: true
image:
tag: "rocm"
resources:
limits:
amd.com/gpu: "1"Requires the AMD GPU device plugin for Kubernetes.
For complete GPU setup instructions, troubleshooting, and Intel/Vulkan considerations, see the GPU Acceleration Guide.
ingress:
enabled: false
className: ""
annotations: {}
hosts:
- host: mcp-context-server.local
paths:
- path: /
pathType: Prefix
tls: []resources:
requests:
cpu: "100m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"probes:
liveness:
enabled: true
path: /health
initialDelaySeconds: 30
periodSeconds: 30
failureThreshold: 3
readiness:
enabled: true
path: /health
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 3
startup:
enabled: true
path: /health
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 30podSecurityContext:
runAsNonRoot: true
runAsUser: 10001
runAsGroup: 10001
fsGroup: 10001
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false
capabilities:
drop:
- ALLserviceAccount:
create: true
name: ""
annotations: {}helm upgrade mcp ./deploy/helm/mcp-context-serverhelm rollback mcp 1helm uninstall mcpNote: PersistentVolumeClaims are not deleted automatically. To remove data:
kubectl delete pvc -l app.kubernetes.io/instance=mcphelm template mcp ./deploy/helm/mcp-context-serverhelm install mcp ./deploy/helm/mcp-context-server --debug --dry-run# values-production.yaml
storage:
backend: postgresql
sqlite:
enabled: false
postgresql:
enabled: true
host: "postgres.production.svc.cluster.local"
port: "5432"
user: "mcp_user"
database: "mcp_context"
sslMode: "require"
existingSecret: "postgres-credentials"
existingSecretKey: "password"
search:
fts:
enabled: true
semantic:
enabled: true
hybrid:
enabled: true
ollama:
enabled: true
resources:
requests:
memory: "4Gi"
limits:
memory: "8Gi"
ingress:
enabled: true
className: "nginx"
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
hosts:
- host: mcp.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: mcp-tls
hosts:
- mcp.example.comInstall with:
helm install mcp ./deploy/helm/mcp-context-server -f values-production.yaml# Install minimal setup
helm install mcp ./deploy/helm/mcp-context-server
# Port forward to local machine
kubectl port-forward svc/mcp 8000:8000
# Test connection
curl http://localhost:8000/healthCheck PersistentVolumeClaim:
kubectl get pvc
kubectl describe pvc mcp-dataIncrease resources:
ollama:
resources:
limits:
memory: "8Gi"Check logs:
kubectl logs -l app.kubernetes.io/name=mcp-context-server- GPU Acceleration Guide - NVIDIA, AMD, and Intel GPU setup for Docker and Kubernetes
- Kubernetes Deployment Guide - General Kubernetes deployment
- Docker Deployment Guide - Alternative Docker Compose deployment
- Summary Generation Guide - LLM-based summary configuration
- Database Backends - Database configuration details
- Semantic Search - Ollama and embedding configuration