Skip to content

Monitoring

Griffen Fargo edited this page Apr 21, 2026 · 2 revisions

Monitoring

Self-hosted monitoring with Prometheus, Grafana, and Alertmanager for strut stacks.

Quick Reference

strut monitoring deploy --env prod
strut monitoring add-target my-stack --env prod
strut monitoring alert-channel add email \
  --to alerts@yourdomain.com \
  --from monitoring@yourdomain.com \
  --resend-api-key re_xxx
strut monitoring alert-channel test email
strut monitoring status
strut monitoring reload

Architecture

Component Purpose Default Port
Prometheus Metrics collection, time-series DB, alert rules 9090
Grafana Dashboards and visualization 3000
Alertmanager Alert routing, grouping, notifications 9093
Node Exporter System metrics (CPU, memory, disk, network) 9100
cAdvisor Per-container resource metrics 8080

Installation

Step 1: Deploy

strut monitoring deploy --env prod

Step 2: Configure Environment

Edit .monitoring-prod.env:

RESEND_API_KEY=re_xxx
ALERT_EMAIL_TO=alerts@yourdomain.com
ALERT_EMAIL_FROM=monitoring@yourdomain.com
GRAFANA_ADMIN_USER=admin
GRAFANA_ADMIN_PASSWORD=<secure-password>

# Optional
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/xxx

Step 3: Add Stacks

strut monitoring add-target my-stack --env prod
strut monitoring add-target another-stack --env prod

Step 4: Access Grafana

Open http://<vps-ip>:3000 and login with credentials from the env file.

Alert Channels

Email (Resend SMTP)

strut monitoring alert-channel add email \
  --to alerts@yourdomain.com \
  --from monitoring@yourdomain.com \
  --resend-api-key re_xxx

SMTP: smtp.resend.com:587, username=resend, password=API_KEY, TLS required.

Slack

strut monitoring alert-channel add slack \
  --webhook-url https://hooks.slack.com/services/xxx

Generic Webhooks

strut monitoring alert-channel add webhook \
  --url https://your-service.com/alerts --method POST

Alert Routing by Severity

strut monitoring alert-route critical email,slack
strut monitoring alert-route warning email
strut monitoring alert-route info slack
Severity Triggers Examples
Critical Immediate action Service down, DB unreachable, disk >95%
Warning Attention needed CPU >80% 5min, memory >90%, disk >85%
Info Informational Backup completed, deployment successful

Default Alert Rules

  • ServiceDownup == 0 for 2+ minutes → critical
  • HighCPU — CPU >80% for 5+ minutes → warning
  • HighMemory — Memory >90% for 5+ minutes → warning
  • DiskSpaceLow — Disk <15% free for 5+ minutes → warning

Custom Alert Rules

Create stacks/monitoring/prometheus/alerts/custom.yml:

groups:
  - name: custom_alerts
    rules:
      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High error rate detected"

Reload: strut monitoring reload

Pre-configured Dashboards

  • Stack Overview — all stacks at a glance (health, resources, alerts, uptime)
  • Stack Health — per-stack service availability, response times, error rates
  • Resource Usage — CPU/memory/disk/network per service with trends
  • Backup Status — success rate, last backup time, verification, storage

Cross-VPS Monitoring

Setup Node Exporter on Remote VPS

ssh ubuntu@<remote-vps>
docker run -d --name node-exporter --restart unless-stopped -p 9100:9100 prom/node-exporter
docker run -d --name cadvisor --restart unless-stopped -p 8080:8080 \
  -v /:/rootfs:ro -v /var/run:/var/run:ro -v /sys:/sys:ro \
  -v /var/lib/docker/:/var/lib/docker:ro gcr.io/cadvisor/cadvisor

SSH Tunnel (Recommended)

autossh -M 0 -f -N \
  -o "ServerAliveInterval 30" -o "ServerAliveCountMax 3" \
  -L 9100:localhost:9100 ubuntu@<remote-vps>

Add Remote Stack

strut monitoring add-target my-stack --env prod --vps vps-2

Maintenance

strut monitoring update --env prod
strut monitoring restart --env prod
strut monitoring backup prometheus --env prod
strut monitoring backup grafana --env prod

Security

  1. Use strong Grafana admin password
  2. Don't expose metrics endpoints publicly
  3. Use SSH tunnels for cross-VPS (not open ports)
  4. Restrict metrics ports via firewall
  5. Secure webhook URLs and API keys

Related

  • Security Posturestrut posture runs a scheduled/CI security audit across every stack (placeholder secrets, exposed ports, missing resource limits, env files in git)
  • Notifications — strut can fire Slack/Discord/webhook events on deploy.success, backup.success, health.fail, drift.detected etc., independent of the monitoring stack. Useful when you want deploy pings without running Prometheus.
  • Debuggingstrut status-all gives a one-shot cross-stack dashboard without dashboards

Clone this wiki locally