This file is written for coding assistants and human maintainers who need to understand how to work safely in this repository.
This repository manages one Hetzner dedicated server that runs:
- Debian 13 as the base operating system
- Proxmox VE 9 as the hypervisor
- an internal guest network on
10.10.10.0/24 - six managed VMs for ingress, Docker runtime, Docker builds, monitoring, PostgreSQL, and backups
The repository is not only documentation. It is intended to be the operating contract for the live platform.
The current target shape is:
- Proxmox host on public IPv4
203.0.113.1 vmbr0for the public uplinkvmbr10for the private guest network10.10.10.10NGINX10.10.10.20Docker runtime10.10.10.30Docker build10.10.10.40monitoringexample.comas the public DNS zone
The authoritative machine-readable state is versions/stack.yaml.
Use these defaults unless a runbook or break-glass situation explicitly requires otherwise:
- connect to the Proxmox host as
ops - use
sudofor elevated Linux operations - use
ops@pamfor routine Proxmox administration - use
lv3-automation@pveAPI tokens for non-human Proxmox object management - treat
rooton the Proxmox host as break-glass only - do not use
rootfor guest SSH - reach guests directly over the Tailscale-routed
10.10.10.0/24path once ADR 0014 is applied - if the tailnet path is unavailable, use the Proxmox host jump path only as break-glass
The canonical identity classification and metadata contract lives in docs/runbooks/identity-taxonomy-and-managed-principals.md and versions/stack.yaml.
- README.md
- AGENTS.md
- docs/repository-map.md
- workstreams.yaml
- relevant ADRs in docs/adr
- relevant runbooks in docs/runbooks
- Identify or create the workstream before changing code.
- Use one branch and preferably one worktree per workstream.
- Change the automation first when feasible.
- Update the workstream doc and registry while the work is in progress.
- Leave protected integration files alone unless you are doing the merge/integration step.
- Merge to
main, then bumpVERSION. - Apply merged work live, then bump
platform_versionand refresh observed state. - Run
make preflight WORKFLOW=<id>before long-running workflows that depend on controller-local secrets or external tokens. - Use
make workflowsormake workflow-info WORKFLOW=<id>when you need the canonical entry point instead of inferring it from prose. - Use
make commandsormake command-info COMMAND=<id>before mutating live systems so the approval policy, expected inputs, and rollback guidance are explicit. - After a real live apply, record the verification evidence in
receipts/live-applies/.
At minimum, review whether these files need updates:
- README.md
- workstreams.yaml
- a workstream file in docs/workstreams
- VERSION only during integration to
main - changelog.md only during integration to
main - versions/stack.yaml only for merged truth or verified live state
- a runbook in docs/runbooks
- an ADR in docs/adr
- versions/stack.yaml if an identity inventory, owner, scope boundary, or credential-storage contract changed
Use the Makefile instead of rebuilding long commands from memory:
make start-workstream WORKSTREAM=adr-0011-monitoringmake workflowsmake workflow-info WORKFLOW=converge-monitoringmake commandsmake command-info COMMAND=configure-networkmake lanesmake lane-info LANE=apimake api-publicationmake api-publication-info SURFACE=proxmox-management-apimake validatemake validate-data-modelsmake generate-status-docsmake validate-generated-docsmake receiptsmake preflight WORKFLOW=converge-monitoringmake syntax-checkmake install-proxmoxmake configure-networkmake configure-ingressmake configure-tailscalemake provision-guestsmake harden-accessmake harden-guest-accessmake harden-securitymake provision-api-accessmake syntax-check-docker-runtimemake converge-docker-runtimemake syntax-check-portainermake converge-portainermake portainer-manage ACTION=list-containers PORTAINER_ARGS='--all'make converge-postgres-vmmake syntax-check-uptime-kumamake syntax-check-open-webuimake converge-open-webuimake deploy-uptime-kumamake uptime-kuma-manage ACTION=list-monitorsmake database-dns
- repo and platform versioning remain separate
- ADRs record both decision state and implementation state
- branch workstream state lives in
workstreams.yamlanddocs/workstreams/ - protected integration files are changed only during merge/integration
- every named human, service, agent, and break-glass principal is classified in ADR 0046 terms before more automation is added
- shared values stay in inventory and group vars rather than copied into many tasks
- live one-off shell changes are either codified immediately or explicitly documented as temporary
- secrets and ephemeral provider passwords do not get committed
- workflow entry points stay declared in
config/workflow-catalog.json - mutating command contracts stay declared in
config/command-catalog.json - control-plane communication surfaces stay declared in
config/control-plane-lanes.json - API and webhook publication tiers stay declared in
config/api-publication.json - controller-local secret prerequisites stay declared in
config/controller-local-secrets.json - live applies keep structured evidence under
receipts/live-applies/ - shared controller-side Python primitives stay centralized in
scripts/controller_automation_toolkit.py make validateis the minimum repository gate before merge tomain
These are the highest-value incomplete areas:
- guest-level exporter and alert expansion beyond the current Grafana plus Proxmox-metrics baseline
- live apply of ADR 0020 storage and backup automation
- guest subnet-route completion for ADR 0014 private guest access
- ADR 0024 Docker guest security baseline
- ADR 0025 compose-managed runtime stacks