Skip to content

grafana/flux-commit-tracker

Repository files navigation

flux-commit-tracker

A Kubernetes controller built with controller-runtime that tracks the time taken for commits referenced in Flux Kustomizations (kustomize.toolkit.fluxcd.io/v1) to be applied to the cluster after they were merged to the source repository. This is used for a Platform SLO which tracks the speed of our deployment pipeline for Kubernetes changes.

Metrics, logs and traces are exported to an OTLP collector (e.g., the OpenTelemetry collector or Grafana Alloy).

How does it work?

Although this repository is open source, it tracks changes made through a process that is not yet. Here's a brief description of the process so you can understand what we're measuring here.

  1. Engineers make changes to Jsonnet code in a repository grafana/deployment_tools. This is our infrastructure-as-code monorepo. The Jsonnet code describes Tanka environments.
  2. Pull requests get created and reviewed as normal.
  3. Once merged, a program called kube-manifests-exporter runs on the master branch. Only one of these runs can be in progress at a time, to prevent out of order deploys. A run finds all the commits that happened since the last run (which didn't get them due to this locking) and runs tk export on all the modified environments. The tk tool importers subcommand is used to determine this in the case of library changes.
  4. If there are any changes, a commit to kube-manifests with the updated Kubernetes YAMLs produced by tk export is made and pushed, and an OCI image containing the manifests of the changed cluster(s) is pushed to its own GAR repo. Grafana Cloud is made up of several clusters, and the YAMLs end up in a directory corresponding to their cluster. Alongside this, a file exporter-info.json is updated to contain details about the commits which were exported in this run.
  5. Flux, running in each tracked cluster, reconciles from the cluster's OCI image in GAR. It detects new revisions, determines if it needs to apply changes, and acts accordingly.

When Flux finds a new revision, it updates the status.lastAppliedRevision of its Kustomization objects. This creates an event in the cluster which we hook into here. We read the applied OCI revision as Flux sees it, fetch the OCI manifest by digest, extract the exporter-info layer (application/vnd.grafana.exporter-info.v1+json), and calculate timings from that payload.

Metrics exported

  • flux_commit_tracker.e2e.export-time: The time taken for the deployment_tools commit to be applied by Flux in the cluster.

How to run

There are two main ways to run flux-commit-tracker:

Locally, talking to a remote cluster

  • You need access to a Kubernetes cluster configuration (e.g., via ~/.kube/config).

  • Specify the Kubernetes context to use with the --kube-context flag. If omitted, we will use the default context.

  • Ensure you have a Grafana observability stack running for OTLP export.

    You can use the provided docker-compose.yml:

    # Start the Grafana stack
    docker-compose up -d

    The default OTLP endpoint will write here. Visit http://localhost:3000/explore to see the exported metrics, logs and traces.

In-Cluster

  • Deploy the controller within your Kubernetes cluster (e.g., using a Deployment).
  • It will automatically use the service account token mounted into the pod for Kubernetes API access. Do not provide --kube-context.
  • Ensure network connectivity to your OTLP collector endpoint if using OTLP.

Configuration

Configuration can be provided via command-line flags or environment variables:

Flag Environment Variable Description
--health-addr HEALTH_ADDR Health addr (def: :9440)
--kube-context KUBE_CONTEXT K8s context (when running locally)
--log-level LOG_LEVEL Log level (e.g., info)
--metrics-addr METRICS_ADDR Metrics addr (def: :8888)
--telemetry-endpoint TELEMETRY_ENDPOINT OTLP endpoint (host:port)
--telemetry-insecure TELEMETRY_INSECURE Use insecure OTLP conn
--telemetry-mode TELEMETRY_MODE Telemetry mode

The controller supports several telemetry modes via the --telemetry-mode flag / TELEMETRY_MODE env var. This determines whether telemetry is printed to standard output or exported via OTLP.

  • otlp: Sends telemetry (logs, traces, metrics) to an OTLP collector (default endpoint: localhost:4317, configure via --telemetry-endpoint / TELEMETRY_ENDPOINT).
  • stdout-logs: Outputs only application logs to the console.
  • stdout-logs+otlp: Sends to OTLP and outputs logs to the console.

The -all variants are very verbose because the traces and metrics contain a lot of data.

  • stdout-all: Outputs all telemetry to the console (can be noisy).
  • stdout-all+otlp: Sends to OTLP and outputs all telemetry to console.

Example (running locally, OTLP mode):

go run \
  github.com/grafana/flux-commit-tracker/cmd/ \
    --kube-context=dev-us-central-0 \
    --telemetry-mode otlp

If using OTLP, examine telemetry in Grafana (e.g., http://localhost:3000 with the default Docker Compose setup).

When running against private GAR repositories, OCI auth uses authn.DefaultKeychain from go-containerregistry (same pattern as kube-manifests-exporter). Configure credentials via standard Docker credential sources, e.g. $DOCKER_CONFIG/config.json and credential helpers.

See --help for all options.

Running from a Docker image

We push a Docker image to the GitHub Container Registry:

docker run --rm \
  -v /path/to/docker-config-dir:/root/.docker:ro \  # e.g. ~/.docker:/root/.docker:ro
  ghcr.io/grafana/flux-commit-tracker:latest \
  --log-level=debug

Semver-tagged images will be available for any releases once we have them, as is main for the latest commit on the main branch.

These docker images are attested using GitHub Artifact Attestations. You can verify our container images by using the gh CLI:

$ gh attestation verify --repo grafana/flux-commit-tracker oci://ghcr.io/grafana/flux-commit-tracker:<tag>
Loaded digest sha256:... for oci://ghcr.io/grafana/flux-commit-tracker:<sometag>
Loaded 3 attestations from GitHub API
✓ Verification succeeded!

sha256:<sometag> was attested by:
REPO                                PREDICATE_TYPE                  WORKFLOW
grafana/flux-commit-tracker         https://slsa.dev/provenance/v1  .github/workflows/build.yml@<somref>

Attestations can also be viewed on the attestation page of this repository.

What this lets you do is trace a container image back to a build in this repository. You'll still need to verify the build steps that were used to build the image to ensure that the image is safe to use.

About

No description, website, or topics provided.

Resources

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors