Skip to content

Latest commit

 

History

History
110 lines (75 loc) · 11.1 KB

File metadata and controls

110 lines (75 loc) · 11.1 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

What this repo is

A hands-on Platform Engineering training built around kcp and KDP (Kubermatic Developer Platform). It is not application source code — it is a sequence of numbered labs (READMEs + supporting YAML/makefiles) that a trainee runs inside a prepared environment. Almost every command in the labs assumes the workspace is bind-mounted at the absolute path /training/, not the host's repo path.

Runtime environment

The labs are designed for a GitHub Codespace using .devcontainer/devcontainer.json, which mounts the repo at /training/ inside the image quay.io/kubermatic-labs/training-ghcs-platform-engineering-trainee-environment:1.0.0 and runs as root. That image preinstalls kubectl, helm, helmfile, terraform, kubeone, etcdctl, gcloud, yq, kubectx/kubens, plus a kcp kubectl krew plugin set.

When editing or testing lab content, do not rewrite /training/... paths to relative paths — trainees copy/paste these commands verbatim, and the absolute path is part of the contract.

Configuration model: .trainingrc

Lab steps repeatedly source /root/.trainingrc and append exports to it. This file is the single source of truth for per-trainee configuration. Required exports (verified by the root makefile's verify target):

GCE_PROJECT, TRAINEE_NAME, TRAINEE_EMAIL, DOMAIN, DNS_ZONE_NAME, K8S_VERSION, TF_VERSION, K1_VERSION, KUBECONFIG, PLATFORM_DOMAIN, PROVIDER_DOMAIN, plus runtime-derived values such as INGRESS_IP, KCP_FRONT_PROXY_IP, OIDC_CLIENT_SECRET, PASSWORD_HASH, GOOGLE_CREDENTIALS.

Trainee-specific secrets live in /training/.secrets/ (gitignored): the GCE service-account JSON, an ssh keypair, the trainee's .trainingrc fragment, and every generated kubeconfig-*.yaml.

Common commands

# verify the environment is fully provisioned (env vars + tools + secrets present)
make verify

# any lab with its own makefile is invoked with -C
make -C /training/<lab-dir> <target>
# e.g.
make -C /training/00_prerequisites    ssh
make -C /training/00_prerequisites    gce
make -C /training/10_create-platform-cluster create-cluster
make -C /training/11_create-provider-cluster create-cluster
make -C /training/99_teardown teardown   # full destroy (kubeone reset, tf destroy, DNS cleanup)

# helmfile is the install pattern for everything cluster-side; selectors pick one release
helmfile sync --file /training/12_setup-kcp-in-platform-cluster/helm/helmfile.yaml --selector id=ingress-nginx
helmfile sync --file /training/12_setup-kcp-in-platform-cluster/helm/helmfile.yaml --selector id=cert-manager
helmfile sync --file /training/12_setup-kcp-in-platform-cluster/helm/helmfile.yaml --selector id=dex
helmfile sync --file /training/12_setup-kcp-in-platform-cluster/helm/helmfile.yaml --selector id=kcp
helmfile sync --file /training/50_setup-kdp-in-platform-cluster/helm/helmfile.yaml --selector id=developer-platform

Helper targets (called by other makefiles, rarely by hand):

make squash-kubeconfigs      # merges /training/.secrets/kubeconfig-*.yaml into kubeconfig.yaml
                             # invoked automatically by 10/11_create-*-cluster and 13_create-kcp-root-kubeconfig

There is no test suite, lint step, or build pipeline — make verify is the closest analogue to a smoke test.

Kubeconfig contexts produced by the labs

After labs 1015 run, the squashed kubeconfig.yaml contains several contexts. Knowing which one targets what saves a lot of guessing:

Context Created by Talks to
admin@k8s-platform 10_create-platform-cluster platform GCE cluster (kcp + DEX + ingress)
admin@k8s-provider 11_create-provider-cluster provider GCE cluster (where syncagent runs)
root@kcp, base@kcp 13_create-kcp-root-kubeconfig kcp via front-proxy, root workspace / /clusters/ base
provider@kcp 14_create-kcp-provider-kubeconfig kcp :root:provider workspace (SA token)
consumer@kcp 15_create-kcp-consumer-kubeconfig kcp :root:consumer workspace (SA token)

The two KDP-specific kubeconfigs (kubeconfig-kdp-root.yaml, kubeconfig-kdp-provider.yaml) used in lab 60_provide-a-service are downloaded from the KDP dashboard, not generated by makefiles, and are referenced explicitly via KUBECONFIG=... rather than as merged contexts.

Lab architecture (big picture)

The labs build up two GCE Kubernetes clusters and layer kcp/KDP on top:

  • 00_prerequisites — ssh key, gcloud auth, .trainingrc setup.
  • 01_install-kcp-locally04_sharing-apis — concept-only labs that run kcp as a local binary. They teach the kcp primitives (kubectl ws, workspaces, APIResourceSchema, APIExport, APIBinding) before any real cluster work. The local kcp data dir is /training/.kcp/.
  • 10_create-platform-cluster, 11_create-provider-cluster — provision two GCE clusters using Terraform (tf_infra/terraform.tfvars + .tf files copied from the kubeone examples directory) followed by kubeone apply. The makefiles also rewrite the resulting kubeconfig user/context names to admin@k8s-platform / admin@k8s-provider, deposit them into /training/.secrets/kubeconfig-*.yaml, and squash them.
  • 12_setup-kcp-in-platform-cluster — installs ingress-nginx, cert-manager, DEX (OIDC IdP), and the kcp helm chart on the platform cluster; configures Let's Encrypt + GCP DNS records under $PLATFORM_DOMAIN and internal.$PLATFORM_DOMAIN (kcp front-proxy).
  • 13/14/15_create-kcp-*-kubeconfig — manually craft kubeconfigs for the three kcp personas (root admin, provider, consumer) using the kcp front-proxy CA and either client certs or service-account tokens. Resulting contexts: root@kcp, base@kcp, provider@kcp, consumer@kcp.
  • 20/21_* (provider) → 30/31_* (consumer) → 40_verify41_teardown — end-to-end demo of providing a service: create the MyService CRD on the provider cluster, install the kcp api-syncagent helm chart pointed at the provider workspace, publish a PublishedResource, then bind/consume from the consumer workspace and verify the synced object lands back on the provider cluster.
  • 50_setup-kdp-in-platform-cluster — installs the KDP helm charts (developer-platform, developer-platform-dashboard) on top of kcp.
  • 60_provide-a-service70_consume-a-service99_teardown — repeats the provide/consume flow but driven through the KDP dashboard. Trainees download kubeconfigs from the dashboard UI and drag them into /training/.secrets/.

Key cross-cutting pieces:

  • The kcp api-syncagent (api-syncagent helm chart, configured in *_syncagent-helmfile.yaml) is the bridge between a provider's real Kubernetes cluster and a kcp workspace; the chart value (apiExportName on ≤0.4, apiExportEndpointSliceName on ≥0.5) must match the APIExport / APIExportEndpointSlice name in the kcp provider workspace. The two labs are deliberately on different chart versions today — see "Known issues" below before bumping either.
  • DNS, TLS, and OIDC are coupled: ingress-nginx's LB IP is captured into INGRESS_IP, written into Google Cloud DNS for $PLATFORM_DOMAIN / *.$PLATFORM_DOMAIN, then DEX issues OIDC tokens at https://login.$PLATFORM_DOMAIN, and cert-manager + Let's Encrypt secure both.
  • The kcp front-proxy is exposed separately as a LoadBalancer; its IP (KCP_FRONT_PROXY_IP) is mapped to internal.$PLATFORM_DOMAIN and is what every kcp kubeconfig in 13/14/15 points at (https://internal.$PLATFORM_DOMAIN:8443).

A visual reference for lab 04_sharing-apis lives at .99_todos/lab04/kcp-sharing-apis.excalidraw (drag onto excalidraw.com to view). Like everything under .99_todos/, it's trainer scratch — useful background, but not surfaced to trainees in the codespace.

Known issues / pinned versions

The two syncagent helmfiles are intentionally on different chart versions today:

  • 21_provide-a-service/myservice_syncagent-helmfile.yaml:11 — chart kcp/api-syncagent at 0.6.0 with the ≥0.5 field apiExportEndpointSliceName: myapiexport.
  • 60_provide-a-service/myservice_syncagent-helmfile.yaml:11 — chart kcp/api-syncagent at 0.3.1 with the ≤0.4 field apiExportName: myorg.com.

Lab 60 (KDP-driven) was held back because the KDP servlet kubeconfig downloaded from the dashboard did not grant apiexportendpointslices permissions that the ≥0.5 chart requires. Re-confirm KDP's current behaviour before bumping lab 60 in lockstep with lab 21.

Both helmfiles set enableLeaderElection: false, which sidesteps a chart bug where the leader-election Role and RoleBinding names didn't match (template "name" vs include "fullname"). Don't enable leader election without re-checking the chart.

Other known trap: the lab 21 README creates a ClusterRoleBinding for ServiceAccount myservice-syncagent-api-syncagent — the chart's fullname helper produces this longer name under the current values, not the shorter myservice-syncagent. If you set serviceAccount.name in the helmfile values, also update the --serviceaccount= flag in the README to match (and vice versa).

Working with this repo

  • When a lab references an env var, assume it is supplied by .trainingrc — do not invent default values or hardcode trainee-specific data into committed files.
  • Files like <lab>/myservice_syncagent-helmfile.yaml, <lab>/myservice_published-resource.yaml, and the various kubeconfig*.yaml are templates that the lab steps mutate in place via sed/yq. Preserve the placeholder tokens (<DOMAIN>, <FILL-IN-YOUR-GCE-PROJECT-ID>, <FILL-IN-CLUSTER-NAME>, <FILL-IN-YOUR-PASSWORD>, your-email@example.com) when editing — the makefiles substitute them at runtime.
  • Lab 04_sharing-apis/ ships YAML for the APIResourceSchema and APIExport only — the corresponding APIBinding is created imperatively via kubectl kcp bind apiexport. There's no apibinding.yaml in the tree by design, even though the binding could be expressed declaratively.
  • .99_todos/ is the trainer's scratch area (open issues, slide notes); the devcontainer hides it from VS Code's file tree, but it is still part of the repo. Don't treat it as canonical content.
  • The two clusters' kubeone configs live at platform-cluster/kubeone.yaml and provider-cluster/kubeone.yaml; their Terraform state is materialized into */tf_infra/ only after make prepare-tf-config copies the kubeone-provided .tf files in.
  • .claude/skills/ contains two user-invocable lint skills: lint md runs the md-linter (prose-only review of every top-level */README.md) and lint code runs the code-linter (shell snippets in READMEs + YAML/makefile correctness). Both skip .secrets/ and .99_todos/, and treat the placeholder tokens (<FILL-IN-…>, <DOMAIN>, your-email@example.com, TODO, XXXXX) as intentional — do not "fix" those, and do not bump pinned versions during a lint run.