A Kubernetes controller built with controller-runtime that tracks the time
taken for commits referenced in Flux Kustomizations
(kustomize.toolkit.fluxcd.io/v1) to be applied to the cluster after they were
merged to the source repository. This is used for a Platform SLO which tracks
the speed of our deployment pipeline for Kubernetes changes.
Metrics, logs and traces are exported to an OTLP collector (e.g., the OpenTelemetry collector or Grafana Alloy).
Although this repository is open source, it tracks changes made through a process that is not yet. Here's a brief description of the process so you can understand what we're measuring here.
- Engineers make changes to Jsonnet code in a repository
grafana/deployment_tools. This is our infrastructure-as-code monorepo. The Jsonnet code describes Tanka environments. - Pull requests get created and reviewed as normal.
- Once merged, a program called
kube-manifests-exporterruns on themasterbranch. Only one of these runs can be in progress at a time, to prevent out of order deploys. A run finds all the commits that happened since the last run (which didn't get them due to this locking) and runstk exporton all the modified environments. Thetk tool importerssubcommand is used to determine this in the case of library changes. - If there are any changes, a commit to
kube-manifestswith the updated Kubernetes YAMLs produced bytk exportis made and pushed, and an OCI image containing the manifests of the changed cluster(s) is pushed to its own GAR repo. Grafana Cloud is made up of several clusters, and the YAMLs end up in a directory corresponding to their cluster. Alongside this, a fileexporter-info.jsonis updated to contain details about the commits which were exported in this run. - Flux, running in each tracked cluster, reconciles from the cluster's OCI image in GAR. It detects new revisions, determines if it needs to apply changes, and acts accordingly.
When Flux finds a new revision, it updates the status.lastAppliedRevision of
its Kustomization objects. This creates an event in the cluster which we hook
into here. We read the applied OCI revision as Flux sees it, fetch the OCI
manifest by digest, extract the exporter-info layer
(application/vnd.grafana.exporter-info.v1+json), and calculate timings from
that payload.
flux_commit_tracker.e2e.export-time: The time taken for the deployment_tools commit to be applied by Flux in the cluster.
There are two main ways to run flux-commit-tracker:
-
You need access to a Kubernetes cluster configuration (e.g., via
~/.kube/config). -
Specify the Kubernetes context to use with the
--kube-contextflag. If omitted, we will use the default context. -
Ensure you have a Grafana observability stack running for OTLP export.
You can use the provided
docker-compose.yml:# Start the Grafana stack docker-compose up -dThe default OTLP endpoint will write here. Visit http://localhost:3000/explore to see the exported metrics, logs and traces.
- Deploy the controller within your Kubernetes cluster (e.g., using a Deployment).
- It will automatically use the service account token mounted into the pod
for Kubernetes API access. Do not provide
--kube-context. - Ensure network connectivity to your OTLP collector endpoint if using OTLP.
Configuration can be provided via command-line flags or environment variables:
| Flag | Environment Variable | Description |
|---|---|---|
--health-addr |
HEALTH_ADDR |
Health addr (def: :9440) |
--kube-context |
KUBE_CONTEXT |
K8s context (when running locally) |
--log-level |
LOG_LEVEL |
Log level (e.g., info) |
--metrics-addr |
METRICS_ADDR |
Metrics addr (def: :8888) |
--telemetry-endpoint |
TELEMETRY_ENDPOINT |
OTLP endpoint (host:port) |
--telemetry-insecure |
TELEMETRY_INSECURE |
Use insecure OTLP conn |
--telemetry-mode |
TELEMETRY_MODE |
Telemetry mode |
The controller supports several telemetry modes via the --telemetry-mode
flag / TELEMETRY_MODE env var. This determines whether telemetry is printed to
standard output or exported via OTLP.
otlp: Sends telemetry (logs, traces, metrics) to an OTLP collector (default endpoint:localhost:4317, configure via--telemetry-endpoint/TELEMETRY_ENDPOINT).stdout-logs: Outputs only application logs to the console.stdout-logs+otlp: Sends to OTLP and outputs logs to the console.
The -all variants are very verbose because the traces and metrics contain a
lot of data.
stdout-all: Outputs all telemetry to the console (can be noisy).stdout-all+otlp: Sends to OTLP and outputs all telemetry to console.
Example (running locally, OTLP mode):
go run \
github.com/grafana/flux-commit-tracker/cmd/ \
--kube-context=dev-us-central-0 \
--telemetry-mode otlpIf using OTLP, examine telemetry in Grafana (e.g., http://localhost:3000
with the default Docker Compose setup).
When running against private GAR repositories, OCI auth uses
authn.DefaultKeychain from go-containerregistry (same pattern as
kube-manifests-exporter). Configure credentials via standard Docker credential
sources, e.g. $DOCKER_CONFIG/config.json and credential helpers.
See --help for all options.
We push a Docker image to the GitHub Container Registry:
docker run --rm \
-v /path/to/docker-config-dir:/root/.docker:ro \ # e.g. ~/.docker:/root/.docker:ro
ghcr.io/grafana/flux-commit-tracker:latest \
--log-level=debugSemver-tagged images will be available for any releases once we have them, as is
main for the latest commit on the main branch.
These docker images are attested using GitHub Artifact Attestations.
You can verify our container images by using the gh CLI:
$ gh attestation verify --repo grafana/flux-commit-tracker oci://ghcr.io/grafana/flux-commit-tracker:<tag>
Loaded digest sha256:... for oci://ghcr.io/grafana/flux-commit-tracker:<sometag>
Loaded 3 attestations from GitHub API
✓ Verification succeeded!
sha256:<sometag> was attested by:
REPO PREDICATE_TYPE WORKFLOW
grafana/flux-commit-tracker https://slsa.dev/provenance/v1 .github/workflows/build.yml@<somref>Attestations can also be viewed on the attestation page of this repository.
What this lets you do is trace a container image back to a build in this repository. You'll still need to verify the build steps that were used to build the image to ensure that the image is safe to use.