Skip to content

Latest commit

 

History

History
674 lines (462 loc) · 21 KB

File metadata and controls

674 lines (462 loc) · 21 KB

Running Sentinel

Status: Active Owner: HyperFleet Team Last Updated: 2026-03-12

Audience: Developers running Sentinel for development and testing purposes.

IMPORTANT: This documentation covers running Sentinel for development and testing purposes. Production deployments are handled via CI/CD pipelines.

This guide enables developers to run Sentinel both locally (for development) and on GKE (for integration) before merging code changes.

Table of Contents


Running Locally

Prerequisites for Running Locally

  • Go 1.25+ installed
  • Podman (for running broker locally and integration tests)
  • Make utility
  • Access to a message broker (RabbitMQ recommended for local development)
  • HyperFleet API accessible (local or remote instance)

Tip: If you want to simulate the Hyperfleet API responses, use the mock HyperFleet API to serve a configurable number of resources.

1. Setting Up a Message Broker

Option A: RabbitMQ via Podman (Recommended)

podman run -d --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:3-management

Verify: Access RabbitMQ management console at http://localhost:15672 (guest/guest)

Option B: Google Pub/Sub Emulator (gcloud CLI)

# Start the emulator (runs on port 8085 by default)
gcloud beta emulators pubsub start --project=test-project --host-port=localhost:8085

Note: The emulator runs in the foreground. Open a new terminal for subsequent commands.

Option C: Google Pub/Sub Emulator via Podman

You can also run the emulator in Podman, as documented in the broker library:

export PUBSUB_PROJECT_ID=test-project
export PUBSUB_EMULATOR_HOST=localhost:8085

podman run --rm --name pubsub-emulator -d -p 8085:8085 google/cloud-sdk:emulators \
  /bin/bash -c "gcloud beta emulators pubsub start --project=test-project --host-port='0.0.0.0:8085'"

2. Configuring Sentinel

Broker configuration happens in two places:

  • broker.yaml - For non-sensitive settings (broker type, project ID, etc.)
  • Environment variables - For sensitive settings (credentials, URLs with passwords)
    • For the Pub/Sub emulator, environment variables are also required for the Google SDK to work properly

Note: If using real Google Pub/Sub (not the emulator), you need GCP credentials in place via gcloud auth application-default login or by setting GOOGLE_APPLICATION_CREDENTIALS to a service account key file.

Step 1: Generate the OpenAPI Client

Before running Sentinel, you must generate the OpenAPI client:

make generate

This extracts the OpenAPI spec from the hyperfleet-api-spec Go module (version pinned in go.mod) and generates the Go client code.

Step 2: Configure Broker

The default broker.yaml is configured for RabbitMQ. Choose your broker below:

For RabbitMQ (default, no changes needed to broker.yaml):

export BROKER_RABBITMQ_URL="amqp://guest:guest@localhost:5672/"

For Google Pub/Sub Emulator (requires broker.yaml modification):

  1. Edit broker.yaml to use googlepubsub:

    broker:
      type: googlepubsub
      googlepubsub:
        project_id: test-project
  2. Set the emulator host (required for the Google SDK):

    export PUBSUB_EMULATOR_HOST=localhost:8085

Step 3: Set Topic Name

Set the topic name for event publishing:

# For clusters
export HYPERFLEET_BROKER_TOPIC=hyperfleet-dev-${USER}-clusters

# For nodepools
export HYPERFLEET_BROKER_TOPIC=hyperfleet-dev-${USER}-nodepools

This sets the full topic name where events will be published (e.g., hyperfleet-dev-rafael-clusters). See Naming Strategy for details.

3. Running Sentinel

Option A: Build and Run Binary

# Build the binary
make build

# Run Sentinel (uses broker.yaml from current directory)
./bin/sentinel serve --config=configs/dev-example.yaml

# With custom log settings
./bin/sentinel serve --config=configs/dev-example.yaml --log-level=debug --log-format=json

Option B: Run Directly with Go

# Run with explicit broker config path
BROKER_CONFIG_FILE=broker.yaml go run ./cmd/sentinel serve --config=configs/dev-example.yaml

# With environment variables for logging
HYPERFLEET_LOG_LEVEL=debug HYPERFLEET_LOG_FORMAT=json go run ./cmd/sentinel serve --config=configs/dev-example.yaml

Logging Configuration

Flag Environment Variable Values Default
--log-level HYPERFLEET_LOG_LEVEL debug, info, warn, error info
--log-format HYPERFLEET_LOG_FORMAT text, json json
--log-output HYPERFLEET_LOG_OUTPUT stdout, stderr stdout

Precedence: flags → environment variables → defaults

4. Verification Steps

Check Health Endpoints

# Liveness
curl http://localhost:8080/healthz

# Readiness
curl http://localhost:8080/readyz

Expected responses:

# /healthz (healthy, HTTP 200)
{"status":"ok"}

# /healthz (stale poll, HTTP 503)
{"status":"poll stale"}

# /readyz (healthy, HTTP 200)
{"status":"ok","checks":{...}}

Note: /readyz returns 503 until the first successful poll completes.

Check Metrics Endpoint

curl http://localhost:8080/metrics | grep hyperfleet_sentinel

Without HyperFleet API running, you will see error metrics:

hyperfleet_sentinel_api_errors_total{error_type="fetch_error",resource_selector="all",resource_type="clusters"} 1

Note: This is expected behavior when running locally without a HyperFleet API instance. The api_errors_total metric indicates the Sentinel is running correctly but cannot reach the API.

With HyperFleet API running, you will see additional metrics:

hyperfleet_sentinel_pending_resources{...} 0
hyperfleet_sentinel_events_published_total{...} 0
hyperfleet_sentinel_poll_duration_seconds_bucket{...}

Monitor Logs

Watch console output for startup and broker connection messages.

Startup messages (always visible):

2025-12-17T14:07:30.136547Z INFO [sentinel] [dev] [hostname] Loading configuration from configs/dev-example.yaml
2025-12-17T14:07:30.137373Z INFO [sentinel] [dev] [hostname] Configuration loaded successfully: resource_type=clusters
2025-12-17T14:07:30.137382Z INFO [sentinel] [dev] [hostname] Starting HyperFleet Sentinel

Note: Log format can be configured via --log-format flag or HYPERFLEET_LOG_FORMAT environment variable. Use json for production (structured logging) and text for development (human-readable).

For RabbitMQ, you should also see the broker connection log:

[watermill] 2025/12/01 15:28:26.051755 connection.go:99: level=INFO msg="Connected to AMQP"

For Google Pub/Sub, there is no explicit connection log. The Google Pub/Sub SDK does not expose connection events, so the publisher initializes silently. You can verify it's working by checking the health endpoints (curl http://localhost:8080/healthz and curl http://localhost:8080/readyz) and metrics. For debugging, you can enable SDK debug logging with these environment variables:

export GOOGLE_SDK_GO_LOGGING_LEVEL=debug
export GRPC_GO_LOG_VERBOSITY_LEVEL=99
export GODEBUG=http2debug=1

Note: If the HyperFleet API is not running, Sentinel will still start but API polling will fail silently (visible in metrics as api_errors_total). This is expected for local broker validation.


Running on GKE

Prerequisites for GKE

  • GKE cluster access (use the shared cluster below or your own)
  • GKE cluster must have Workload Identity enabled (required for Pub/Sub authentication; the shared dev cluster already has this enabled)
  • gcloud CLI configured and authenticated
  • kubectl configured for the cluster
  • podman for building images
  • helm for deploying the chart
  • Access to Google Container Registry (GCR) for your project

1. Set Up Environment Variables

Set these variables once and use them throughout the deployment:

# GCP project ID
export GCP_PROJECT=hcm-hyperfleet

# Your namespace: hyperfleet-{env}-{username}
export NAMESPACE=hyperfleet-dev-${USER}

# Image tag: {namespace}-{git-sha-short} (follows naming convention)
export IMAGE_TAG=${NAMESPACE}-$(git rev-parse --short HEAD)
# Example: hyperfleet-dev-rafael-a1b2c3d (if USER=rafael)

Note: The image tag format {namespace}-{git-sha-short} follows the Naming Strategy convention to prevent collisions between developers.

2. Connect to GKE Cluster

A shared GKE cluster with Config Connector enabled is available for development and testing:

gcloud container clusters get-credentials hyperfleet-dev --zone=us-central1-a --project=${GCP_PROJECT}

Usage guidelines:

  • For personal work, create a namespace named after yourself to isolate resources
  • For team collaboration, use a designated namespace to separate resources among members

Note: This environment is scheduled for deletion every Friday at 8:00 PM (EST). See GKE deployment docs for more details.

3. Building Container Image

Option A: Using Makefile Targets (Recommended for Dev)

For pushing to your personal Quay registry:

# One-time login (required before pushing to Quay)
make quay-login

# Build and push to quay.io/${QUAY_USER}/${IMAGE_NAME}:dev-<commit>
QUAY_USER=${USER} make image-dev

This will output the image tag to use in your terraform.tfvars.

Option B: Manual Build for GCR

For pushing to Google Container Registry:

# Build for AMD64 (required for GKE)
podman build --platform linux/amd64 -t gcr.io/${GCP_PROJECT}/sentinel:${IMAGE_TAG} .

Note: If building on ARM64 Mac for AMD64 GKE, you must use --platform linux/amd64 to avoid architecture mismatch errors.

4. Authentication and Image Push

Note: If you used make image-dev (Option A above), authentication and push are handled automatically. Skip to Helm Deployment. For Quay.io, ensure you've run make quay-login first.

Configure Authentication with GCR

gcloud auth configure-docker gcr.io

Push Image to Registry

podman push gcr.io/${GCP_PROJECT}/sentinel:${IMAGE_TAG}

5. Configure Workload Identity

When deploying Sentinel on GKE with real Google Pub/Sub, you need to configure authentication. Workload Identity Federation grants permissions directly to the Kubernetes ServiceAccount without needing intermediate GCP service accounts or annotations.

First, get your project number:

export GCP_PROJECT_NUMBER=$(gcloud projects describe ${GCP_PROJECT} \
  --format="value(projectNumber)")

Then grant the Pub/Sub publisher role to the K8s ServiceAccount:

gcloud projects add-iam-policy-binding ${GCP_PROJECT} \
  --role="roles/pubsub.publisher" \
  --member="principal://iam.googleapis.com/projects/${GCP_PROJECT_NUMBER}/locations/global/workloadIdentityPools/${GCP_PROJECT}.svc.id.goog/subject/ns/${NAMESPACE}/sa/sentinel-test" \
  --condition=None

Note: The principal:// format is: principal://iam.googleapis.com/projects/{PROJECT_NUMBER}/locations/global/workloadIdentityPools/{PROJECT_ID}.svc.id.goog/subject/ns/{NAMESPACE}/sa/{K8S_SA_NAME}

6. Helm Deployment

Deploy Sentinel using the image you built:

Option A: Using Quay Image (from make image-dev)

If you used make image-dev, deploy with:

helm upgrade --install sentinel-test ./charts \
  --namespace ${NAMESPACE} \
  --create-namespace \
  --set global.imageRegistry=quay.io \
  --set image.repository=${USER}/hyperfleet-sentinel \
  --set image.tag=dev-$(git rev-parse --short HEAD) \
  --set broker.type=googlepubsub \
  --set broker.googlepubsub.projectId=${GCP_PROJECT} \
  --set monitoring.podMonitoring.enabled=true

Option B: Using GCR Image (from manual build)

If you manually built and pushed to GCR, deploy with:

helm upgrade --install sentinel-test ./charts \
  --namespace ${NAMESPACE} \
  --create-namespace \
  --set image.repository=gcr.io/${GCP_PROJECT}/sentinel \
  --set image.tag=${IMAGE_TAG} \
  --set broker.type=googlepubsub \
  --set broker.googlepubsub.projectId=${GCP_PROJECT} \
  --set monitoring.podMonitoring.enabled=true

# For Prometheus Operator environments (OpenShift, vanilla Kubernetes):
helm upgrade --install sentinel-test ./charts \
  --namespace ${NAMESPACE} \
  --create-namespace \
  --set image.repository=gcr.io/${GCP_PROJECT}/sentinel \
  --set image.tag=${IMAGE_TAG} \
  --set broker.type=googlepubsub \
  --set broker.googlepubsub.projectId=${GCP_PROJECT} \
  --set monitoring.serviceMonitor.enabled=true \
  --set monitoring.serviceMonitor.additionalLabels.release=prometheus

Tip: The default topic is {namespace}-{resourceType} (e.g., hyperfleet-dev-rafael-clusters). Override with --set broker.topic=custom-topic. See Naming Strategy for details.

7. Verification Steps

Check Pod Status

kubectl get pods -n ${NAMESPACE} -l app.kubernetes.io/name=sentinel

View Pod Logs

kubectl logs -n ${NAMESPACE} -l app.kubernetes.io/name=sentinel -f

You should see the startup messages:

2025-12-17T14:07:30.136547Z INFO [sentinel] [0.1.0] [pod-name] Loading configuration from /app/configs/sentinel.yaml
2025-12-17T14:07:30.137373Z INFO [sentinel] [0.1.0] [pod-name] Configuration loaded successfully: resource_type=clusters
2025-12-17T14:07:30.137382Z INFO [sentinel] [0.1.0] [pod-name] Starting HyperFleet Sentinel

Note: Sentinel outputs minimal logs during normal operation. Use the health endpoints (/healthz, /readyz) and metrics to verify the service is running correctly. Configure --log-format=json for production deployments.

Verify Health Endpoints

Start port-forward in a separate terminal:

kubectl port-forward -n ${NAMESPACE} svc/sentinel-test 8080:8080

Check health endpoints:

# Liveness
curl http://localhost:8080/healthz

# Readiness
curl http://localhost:8080/readyz

Check metrics:

curl http://localhost:8080/metrics | grep hyperfleet_sentinel

Check Monitoring Resources

For GKE with Google Cloud Managed Prometheus (PodMonitoring):

kubectl get podmonitoring -n ${NAMESPACE}
kubectl describe podmonitoring -n ${NAMESPACE} -l app.kubernetes.io/name=sentinel

For Prometheus Operator environments (ServiceMonitor):

kubectl get servicemonitor -n ${NAMESPACE}
kubectl describe servicemonitor -n ${NAMESPACE} -l app.kubernetes.io/name=sentinel

Verify Metrics in Google Cloud Console

  1. Open the Metrics Explorer for your project
  2. In "Select a metric", search for hyperfleet_sentinel
  3. Select Prometheus Target > Hyperfleet > choose a metric (e.g., api_errors_total)

Verify Workload Identity IAM Binding

If the pod fails to authenticate with Pub/Sub, verify the IAM binding exists:

gcloud projects get-iam-policy ${GCP_PROJECT} \
  --flatten="bindings[].members" \
  --filter="bindings.members:principal://iam.googleapis.com/projects/${GCP_PROJECT_NUMBER}" \
  --format="table(bindings.role, bindings.members)"

You should see an entry with roles/pubsub.publisher for your namespace/SA.

8. Cleanup

Remove the deployment when done:

helm uninstall sentinel-test -n ${NAMESPACE}

Optionally, delete the image from the registry:

gcloud container images delete gcr.io/${GCP_PROJECT}/sentinel:${IMAGE_TAG} \
  --quiet --force-delete-tags

If you configured Workload Identity, remove the IAM binding:

# Remove the Pub/Sub publisher role from the K8s ServiceAccount
gcloud projects remove-iam-policy-binding ${GCP_PROJECT} \
  --role="roles/pubsub.publisher" \
  --member="principal://iam.googleapis.com/projects/${GCP_PROJECT_NUMBER}/locations/global/workloadIdentityPools/${GCP_PROJECT}.svc.id.goog/subject/ns/${NAMESPACE}/sa/sentinel-test"

Deployment Configuration

Basic Production Configuration

# sentinel-config.yaml
resource_type: clusters
poll_interval: 5s

# Watch all clusters (no filtering)
resource_selector: []

hyperfleet_api:
  endpoint: http://hyperfleet-api.hyperfleet-system.svc.cluster.local:8000
  timeout: 5s

# CloudEvent data payload using CEL expressions
message_data:
  resource_id: "resource.id"        # CEL expression accessing resource.id field
  resource_type: "resource.kind"   # CEL expression accessing resource.kind field
  generation: "resource.generation" # CEL expression accessing resource.generation field
  region: "resource.labels.region" # CEL expression accessing nested labels.region field

Development Environment Configuration

# sentinel-dev-config.yaml
resource_type: clusters
poll_interval: 10s      # Slower polling for dev

resource_selector:
  - label: environment
    value: development

hyperfleet_api:
  endpoint: http://hyperfleet-api.hyperfleet-system.svc.cluster.local:8000
  timeout: 5s

message_data:
  resource_id: "resource.id"
  resource_type: "resource.kind"
  generation: "resource.generation"
  environment: "resource.labels.environment"

Troubleshooting

Exec format error

Problem: Container fails to start with exec format error

Cause: Architecture mismatch - image was built for a different CPU architecture than the target

Solution: Ensure --platform linux/amd64 is used when building:

podman build --platform linux/amd64 -t gcr.io/${GCP_PROJECT}/sentinel:${IMAGE_TAG} .

Broker connection refused

Problem: Sentinel fails to start with "connection refused" errors for broker

Cause: Broker is not running or broker.yaml is configured for the wrong broker type

Solution:

  1. Verify the broker is running (RabbitMQ or Pub/Sub emulator)
  2. Ensure broker.yaml has the correct type (rabbitmq or googlepubsub)
  3. For Pub/Sub emulator, ensure PUBSUB_EMULATOR_HOST is set
  4. For RabbitMQ, ensure BROKER_RABBITMQ_URL is set or the URL in broker.yaml is correct

Metrics not appearing in GMP

Problem: Metrics are not visible in Google Cloud Metrics Explorer

Cause: PodMonitoring not configured correctly or GMP collector not scraping

Solution:

  1. Verify PodMonitoring is created:

    kubectl get podmonitoring -n ${NAMESPACE}
  2. Check GMP collector logs:

    kubectl logs -n gmp-system -l app.kubernetes.io/name=collector
  3. Ensure the metrics endpoint is accessible:

    kubectl port-forward -n ${NAMESPACE} svc/sentinel-test 8080:8080
    curl http://localhost:8080/metrics

ConfigMap vs Environment Variable Configuration

Problem: Broker credentials not being picked up

Cause: Broker credentials must be set via environment variables, not ConfigMap

Solution: Use --set flags or a values file to set broker credentials:

--set broker.rabbitmq.url="amqp://user:pass@host:5672/"

HyperFleet API Connection Errors

Problem: Sentinel cannot connect to HyperFleet API

Solution:

  1. Verify the API endpoint is correct in your config

  2. For local execution, ensure the API is running

  3. For GKE, use the in-cluster service name:

    clients:
      hyperfleet_api:
        base_url: http://hyperfleet-api.hyperfleet-system.svc.cluster.local:8080

OpenAPI Client Not Generated

Problem: Build fails with missing package errors

Cause: OpenAPI client was not generated

Solution: Run the generate target before building:

make generate
make build