CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

What this repo is

A hands-on Platform Engineering training built around kcp and KDP (Kubermatic Developer Platform). It is not application source code — it is a sequence of numbered labs (READMEs + supporting YAML/makefiles) that a trainee runs inside a prepared environment. Almost every command in the labs assumes the workspace is bind-mounted at the absolute path /training/, not the host's repo path.

Runtime environment

The labs are designed for a GitHub Codespace using .devcontainer/devcontainer.json, which mounts the repo at /training/ inside the image quay.io/kubermatic-labs/training-ghcs-platform-engineering-trainee-environment:1.0.0 and runs as root. That image preinstalls kubectl, helm, helmfile, terraform, kubeone, etcdctl, gcloud, yq, kubectx/kubens, plus a kcp kubectl krew plugin set.

When editing or testing lab content, do not rewrite /training/... paths to relative paths — trainees copy/paste these commands verbatim, and the absolute path is part of the contract.

Configuration model: `.trainingrc`

Lab steps repeatedly source /root/.trainingrc and append exports to it. This file is the single source of truth for per-trainee configuration. Required exports (verified by the root makefile's verify target):

GCE_PROJECT, TRAINEE_NAME, TRAINEE_EMAIL, DOMAIN, DNS_ZONE_NAME, K8S_VERSION, TF_VERSION, K1_VERSION, KUBECONFIG, PLATFORM_DOMAIN, PROVIDER_DOMAIN, plus runtime-derived values such as INGRESS_IP, KCP_FRONT_PROXY_IP, OIDC_CLIENT_SECRET, PASSWORD_HASH, GOOGLE_CREDENTIALS.

Trainee-specific secrets live in /training/.secrets/ (gitignored): the GCE service-account JSON, an ssh keypair, the trainee's .trainingrc fragment, and every generated kubeconfig-*.yaml.

Common commands

# verify the environment is fully provisioned (env vars + tools + secrets present)
make verify

# any lab with its own makefile is invoked with -C
make -C /training/<lab-dir> <target>
# e.g.
make -C /training/00_prerequisites    ssh
make -C /training/00_prerequisites    gce
make -C /training/10_create-platform-cluster create-cluster
make -C /training/11_create-provider-cluster create-cluster
make -C /training/99_teardown teardown   # full destroy (kubeone reset, tf destroy, DNS cleanup)

# helmfile is the install pattern for everything cluster-side; selectors pick one release
helmfile sync --file /training/12_setup-kcp-in-platform-cluster/helm/helmfile.yaml --selector id=ingress-nginx
helmfile sync --file /training/12_setup-kcp-in-platform-cluster/helm/helmfile.yaml --selector id=cert-manager
helmfile sync --file /training/12_setup-kcp-in-platform-cluster/helm/helmfile.yaml --selector id=dex
helmfile sync --file /training/12_setup-kcp-in-platform-cluster/helm/helmfile.yaml --selector id=kcp
helmfile sync --file /training/50_setup-kdp-in-platform-cluster/helm/helmfile.yaml --selector id=developer-platform

Helper targets (called by other makefiles, rarely by hand):

make squash-kubeconfigs      # merges /training/.secrets/kubeconfig-*.yaml into kubeconfig.yaml
                             # invoked automatically by 10/11_create-*-cluster and 13_create-kcp-root-kubeconfig

There is no test suite, lint step, or build pipeline — make verify is the closest analogue to a smoke test.

Kubeconfig contexts produced by the labs

After labs 10–15 run, the squashed kubeconfig.yaml contains several contexts. Knowing which one targets what saves a lot of guessing:

Context	Created by	Talks to
`admin@k8s-platform`	`10_create-platform-cluster`	platform GCE cluster (kcp + DEX + ingress)
`admin@k8s-provider`	`11_create-provider-cluster`	provider GCE cluster (where syncagent runs)
`root@kcp`, `base@kcp`	`13_create-kcp-root-kubeconfig`	kcp via front-proxy, root workspace / `/clusters/` base
`provider@kcp`	`14_create-kcp-provider-kubeconfig`	kcp `:root:provider` workspace (SA token)
`consumer@kcp`	`15_create-kcp-consumer-kubeconfig`	kcp `:root:consumer` workspace (SA token)

The two KDP-specific kubeconfigs (kubeconfig-kdp-root.yaml, kubeconfig-kdp-provider.yaml) used in lab 60_provide-a-service are downloaded from the KDP dashboard, not generated by makefiles, and are referenced explicitly via KUBECONFIG=... rather than as merged contexts.

Lab architecture (big picture)

The labs build up two GCE Kubernetes clusters and layer kcp/KDP on top:

00_prerequisites — ssh key, gcloud auth, .trainingrc setup.
01_install-kcp-locally → 04_sharing-apis — concept-only labs that run kcp as a local binary. They teach the kcp primitives (kubectl ws, workspaces, APIResourceSchema, APIExport, APIBinding) before any real cluster work. The local kcp data dir is /training/.kcp/.
10_create-platform-cluster, 11_create-provider-cluster — provision two GCE clusters using Terraform (tf_infra/terraform.tfvars + .tf files copied from the kubeone examples directory) followed by kubeone apply. The makefiles also rewrite the resulting kubeconfig user/context names to admin@k8s-platform / admin@k8s-provider, deposit them into /training/.secrets/kubeconfig-*.yaml, and squash them.
12_setup-kcp-in-platform-cluster — installs ingress-nginx, cert-manager, DEX (OIDC IdP), and the kcp helm chart on the platform cluster; configures Let's Encrypt + GCP DNS records under $PLATFORM_DOMAIN and internal.$PLATFORM_DOMAIN (kcp front-proxy).
13/14/15_create-kcp-*-kubeconfig — manually craft kubeconfigs for the three kcp personas (root admin, provider, consumer) using the kcp front-proxy CA and either client certs or service-account tokens. Resulting contexts: root@kcp, base@kcp, provider@kcp, consumer@kcp.
20/21_* (provider) → 30/31_* (consumer) → 40_verify → 41_teardown — end-to-end demo of providing a service: create the MyService CRD on the provider cluster, install the kcp api-syncagent helm chart pointed at the provider workspace, publish a PublishedResource, then bind/consume from the consumer workspace and verify the synced object lands back on the provider cluster.
50_setup-kdp-in-platform-cluster — installs the KDP helm charts (developer-platform, developer-platform-dashboard) on top of kcp.
60_provide-a-service → 70_consume-a-service → 99_teardown — repeats the provide/consume flow but driven through the KDP dashboard. Trainees download kubeconfigs from the dashboard UI and drag them into /training/.secrets/.

Key cross-cutting pieces:

The kcp api-syncagent (api-syncagent helm chart, configured in *_syncagent-helmfile.yaml) is the bridge between a provider's real Kubernetes cluster and a kcp workspace; the chart value (apiExportName on ≤0.4, apiExportEndpointSliceName on ≥0.5) must match the APIExport / APIExportEndpointSlice name in the kcp provider workspace. The two labs are deliberately on different chart versions today — see "Known issues" below before bumping either.
DNS, TLS, and OIDC are coupled: ingress-nginx's LB IP is captured into INGRESS_IP, written into Google Cloud DNS for $PLATFORM_DOMAIN / *.$PLATFORM_DOMAIN, then DEX issues OIDC tokens at https://login.$PLATFORM_DOMAIN, and cert-manager + Let's Encrypt secure both.
The kcp front-proxy is exposed separately as a LoadBalancer; its IP (KCP_FRONT_PROXY_IP) is mapped to internal.$PLATFORM_DOMAIN and is what every kcp kubeconfig in 13/14/15 points at (https://internal.$PLATFORM_DOMAIN:8443).

A visual reference for lab 04_sharing-apis lives at .99_todos/lab04/kcp-sharing-apis.excalidraw (drag onto excalidraw.com to view). Like everything under .99_todos/, it's trainer scratch — useful background, but not surfaced to trainees in the codespace.

Known issues / pinned versions

The two syncagent helmfiles are intentionally on different chart versions today:

21_provide-a-service/myservice_syncagent-helmfile.yaml:11 — chart kcp/api-syncagent at 0.6.0 with the ≥0.5 field apiExportEndpointSliceName: myapiexport.
60_provide-a-service/myservice_syncagent-helmfile.yaml:11 — chart kcp/api-syncagent at 0.3.1 with the ≤0.4 field apiExportName: myorg.com.

Lab 60 (KDP-driven) was held back because the KDP servlet kubeconfig downloaded from the dashboard did not grant apiexportendpointslices permissions that the ≥0.5 chart requires. Re-confirm KDP's current behaviour before bumping lab 60 in lockstep with lab 21.

Both helmfiles set enableLeaderElection: false, which sidesteps a chart bug where the leader-election Role and RoleBinding names didn't match (template "name" vs include "fullname"). Don't enable leader election without re-checking the chart.

Other known trap: the lab 21 README creates a ClusterRoleBinding for ServiceAccount myservice-syncagent-api-syncagent — the chart's fullname helper produces this longer name under the current values, not the shorter myservice-syncagent. If you set serviceAccount.name in the helmfile values, also update the --serviceaccount= flag in the README to match (and vice versa).

Working with this repo

When a lab references an env var, assume it is supplied by .trainingrc — do not invent default values or hardcode trainee-specific data into committed files.
Files like <lab>/myservice_syncagent-helmfile.yaml, <lab>/myservice_published-resource.yaml, and the various kubeconfig*.yaml are templates that the lab steps mutate in place via sed/yq. Preserve the placeholder tokens (<DOMAIN>, <FILL-IN-YOUR-GCE-PROJECT-ID>, <FILL-IN-CLUSTER-NAME>, <FILL-IN-YOUR-PASSWORD>, your-email@example.com) when editing — the makefiles substitute them at runtime.
Lab 04_sharing-apis/ ships YAML for the APIResourceSchema and APIExport only — the corresponding APIBinding is created imperatively via kubectl kcp bind apiexport. There's no apibinding.yaml in the tree by design, even though the binding could be expressed declaratively.
.99_todos/ is the trainer's scratch area (open issues, slide notes); the devcontainer hides it from VS Code's file tree, but it is still part of the repo. Don't treat it as canonical content.
The two clusters' kubeone configs live at platform-cluster/kubeone.yaml and provider-cluster/kubeone.yaml; their Terraform state is materialized into */tf_infra/ only after make prepare-tf-config copies the kubeone-provided .tf files in.
.claude/skills/ contains two user-invocable lint skills: lint md runs the md-linter (prose-only review of every top-level */README.md) and lint code runs the code-linter (shell snippets in READMEs + YAML/makefile correctness). Both skip .secrets/ and .99_todos/, and treat the placeholder tokens (<FILL-IN-…>, <DOMAIN>, your-email@example.com, TODO, XXXXX) as intentional — do not "fix" those, and do not bump pinned versions during a lint run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

What this repo is

Runtime environment

Configuration model: `.trainingrc`

Common commands

Kubeconfig contexts produced by the labs

Lab architecture (big picture)

Known issues / pinned versions

Working with this repo

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

What this repo is

Runtime environment

Configuration model: .trainingrc

Common commands

Kubeconfig contexts produced by the labs

Lab architecture (big picture)

Known issues / pinned versions

Working with this repo

Configuration model: `.trainingrc`