Skip to content

Commit 0620729

Browse files
baditaflorinclaude
authored andcommitted
[publish] Sanitized snapshot from c14f4aa
Source: platform_server main @ c14f4aa Generated by: scripts/publish_to_serverclaw.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent c14f4aa commit 0620729

3,773 files changed

Lines changed: 274087 additions & 272566 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -293,7 +293,7 @@ real values into `platform.yml`. The publish pipeline sanitises the real values
293293
when syncing to the public mirror. The private repo's `platform.yml` must
294294
always reflect actual deployment reality.
295295

296-
> **Incident**: This gap caused `headscale.lv3.org` DNS to point at `203.0.113.1`
296+
> **Incident**: This gap caused `headscale.example.com` DNS to point at `203.0.113.1`
297297
> (a non-routable documentation IP), breaking Tailscale VPN for the entire
298298
> deployment.
299299

catalog/services/gitea/service.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ health:
121121
argv:
122122
- test
123123
- -x
124-
- /opt/gitea/data/git/repositories/ops/proxmox_florin_server.git/custom_hooks/pre-receive
124+
- /opt/gitea/data/git/repositories/ops/platform_server.git/custom_hooks/pre-receive
125125
success_rc: 0
126126
docker_publication:
127127
container_name: gitea

collections/ansible_collections/lv3/platform/roles/proxmox_network/tasks/main.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@
3939
placeholder. Writing this to /etc/network/interfaces causes TOTAL HOST LOCKOUT
4040
on the next reboot (wrong IP — server unreachable). Aborting convergence.
4141
Fix: ensure .local/identity.yml is injected via -e @.local/identity.yml.
42-
See incident postmortem 2026-04-12 (6h outage on 65.108.75.123).
42+
See incident postmortem 2026-04-12 (6h outage on 203.0.113.1).
4343
4444
- name: Validate optional staging bridge inputs
4545
ansible.builtin.assert:

collections/ansible_collections/lv3/platform/roles/proxmox_security/tasks/main.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@
8484
# INPUT policy. If it crashes after setting DROP policy but before adding ACCEPT rules,
8585
# the host becomes unreachable. This guard detects that state, stops pve-firewall, and
8686
# aborts convergence so the operator can investigate.
87-
# Incident reference: 2026-04-12 — 6h outage on 65.108.75.123; root cause was
87+
# Incident reference: 2026-04-12 — 6h outage on 203.0.113.1; root cause was
8888
# placeholder IP in /etc/network/interfaces (wrong IP, not firewall), but this guard
8989
# provides defence-in-depth against the firewall-crash scenario.
9090
- name: Wait for pve-firewall to populate ACCEPT rules in PVEFW-HOST-IN (up to 30s)

config/health-probe-catalog.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -790,7 +790,7 @@
790790
"argv": [
791791
"test",
792792
"-x",
793-
"/opt/gitea/data/git/repositories/ops/proxmox_florin_server.git/custom_hooks/pre-receive"
793+
"/opt/gitea/data/git/repositories/ops/platform_server.git/custom_hooks/pre-receive"
794794
],
795795
"success_rc": 0,
796796
"docker_publication": {

docs/adr/0371-parameterized-verify-tasks.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,9 @@
2424
- Re-verified the latest realistic `origin/main@7b72694975ef8aae83e59d96c08dd27181595b2e`
2525
base (`repo 0.178.142`, `platform 0.178.141`) with exact-main replays for
2626
`repowise`, `litellm`, and `librechat`, plus direct controller-side health
27-
verification of `https://repowise.lv3.org/health`,
27+
verification of `https://repowise.example.com/health`,
2828
`http://10.10.10.20:4000/health/liveliness`, and
29-
`https://chat.lv3.org/`.
29+
`https://chat.example.com/`.
3030

3131
## Context
3232

docs/adr/0373-service-registry-and-derived-defaults.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -356,11 +356,11 @@ If a role's migration breaks, the fix is to temporarily re-add the removed defau
356356
- `python3 scripts/interface_contracts.py --list`
357357
- `uv run --with pytest --with pyyaml --with jsonschema --with fastapi --with jinja2 --with python-multipart --with itsdangerous --with httpx python -m pytest -q tests/test_restic_config_backup.py tests/test_repo_intake_runtime_role.py tests/test_environment_topology.py tests/test_interface_contracts.py tests/test_validate_service_registry.py tests/test_validate_service_completeness.py tests/test_ansible_execution_scopes.py`
358358
- `make preflight WORKFLOW=live-apply-service`
359-
- `LV3_PROXMOX_HOST_ADDR=65.108.75.123 LV3_PROXMOX_HOST_PORT=2222 make converge-restic-config-backup env=production`
359+
- `LV3_PROXMOX_HOST_ADDR=203.0.113.1 LV3_PROXMOX_HOST_PORT=2222 make converge-restic-config-backup env=production`
360360
- `python3 scripts/trigger_restic_live_apply.py --env production --mode backup --triggered-by ws-0373-live-apply --live-apply-trigger`
361361
- `uv run --with ansible-core --with pyyaml --with nats-py python scripts/security_posture_report.py --env production --skip-trivy --audit-surface manual --print-report-json`
362362
- `python3 scripts/vulnerability_budget.py --service repo_intake`
363-
- `ANSIBLE_COLLECTIONS_PATH="$PWD/collections:$PWD/.ansible/validation/collections" LV3_PROXMOX_HOST_ADDR=65.108.75.123 LV3_PROXMOX_HOST_PORT=2222 make live-apply-service service=repo_intake env=production ALLOW_IN_PLACE_MUTATION=true`
363+
- `ANSIBLE_COLLECTIONS_PATH="$PWD/collections:$PWD/.ansible/validation/collections" LV3_PROXMOX_HOST_ADDR=203.0.113.1 LV3_PROXMOX_HOST_PORT=2222 make live-apply-service service=repo_intake env=production ALLOW_IN_PLACE_MUTATION=true`
364364
- The 2026-04-21 replay exposed and repaired two latest-main regressions before
365365
the final rerun:
366366
- `repo_intake_runtime` readiness used the unsupported `connect_timeout`
@@ -375,7 +375,7 @@ If a role's migration breaks, the fix is to temporarily re-add the removed defau
375375
`0.178.148` base and the integrated `0.178.149` tree
376376
- direct health checks on `http://127.0.0.1:8101/health` returned `{"status":"ok"}`
377377
- edge verification from the `nginx` guest returned the expected OAuth redirect
378-
for `https://repo-intake.lv3.org/`
378+
for `https://repo-intake.example.com/`
379379
- the systemd-backed restic runtime converged successfully after syncing
380380
`outline_client.py`, and the governed trigger refreshed
381381
`receipts/restic-backups/20260421T105958Z.json`,

docs/adr/0382-keycloak-sign-in-button-stuck-postmortem.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ ADR 0382 is now closed on the latest realistic `origin/main` base:
1717
version `0.178.143` at replay time.
1818

1919
- The shared-edge certificate publication path now follows the effective live
20-
Certbot lineage, so `home.lv3.org` and `sso.lv3.org` stay published after
20+
Certbot lineage, so `home.example.com` and `sso.example.com` stay published after
2121
suffix rotation from `lv3-edge` to `lv3-edge-0001`.
2222
- The governed production replay of
2323
`make live-apply-service service=keycloak env=production ALLOW_IN_PLACE_MUTATION=true`
@@ -28,7 +28,7 @@ version `0.178.143` at replay time.
2828
Outline using the latest-main replayed state.
2929
- A fresh governed restore verification receipt
3030
`20260415T070524Z.json` passed and restored `4652` files from the historical
31-
`/srv/proxmox_florin_server/receipts` snapshot path, confirming the
31+
`/srv/platform_server/receipts` snapshot path, confirming the
3232
snapshot-root resolution fix still holds on mainline truth.
3333

3434
## Symptom

docs/adr/0410-docker-isolation-testing-and-ioc-completion.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -295,7 +295,7 @@ step-ca.docker-dev.local → step-ca container (172.30.10.99)
295295
```
296296

297297
This enables services to resolve each other by FQDN, matching production
298-
behavior where all services use `*.lv3.org`.
298+
behavior where all services use `*.example.com`.
299299

300300
### Phase 5: Test Scenarios and Timing (P2)
301301

docs/adr/0413-sso-redirect-uri-and-service-topology-variable-drift.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ that together caused 7 of 17 services to be unavailable or broken.
1414

1515
### Bug Class 1 — SSO Redirect URI Mismatch (LibreChat / serverclaw client)
1616

17-
When a user clicked "Login with Keycloak" on LibreChat (`chat.lv3.org`), Keycloak returned:
17+
When a user clicked "Login with Keycloak" on LibreChat (`chat.example.com`), Keycloak returned:
1818

1919
```
2020
We are sorry... An internal server error has occurred
@@ -53,22 +53,22 @@ play time, causing template rendering to silently fail or produce empty port num
5353

5454
| Service | URL | Symptom | Broken variable |
5555
|---------|-----|---------|----------------|
56-
| Directus | data.lv3.org | 502 Bad Gateway | `directus_container_port` |
57-
| Paperless | paperless.lv3.org | 502 Bad Gateway | `paperless_service_topology` |
58-
| Coolify | coolify.lv3.org | 502 Bad Gateway | `coolify_dashboard_port` |
59-
| GlitchTip | errors.lv3.org | TLS + dead code | `glitchtip_internal_port` (dead code) |
56+
| Directus | data.example.com | 502 Bad Gateway | `directus_container_port` |
57+
| Paperless | paperless.example.com | 502 Bad Gateway | `paperless_service_topology` |
58+
| Coolify | coolify.example.com | 502 Bad Gateway | `coolify_dashboard_port` |
59+
| GlitchTip | errors.example.com | TLS + dead code | `glitchtip_internal_port` (dead code) |
6060

6161
**Services with latent bugs (currently alive from old deployment):**
6262

6363
| Service | URL | Broken variable | Risk |
6464
|---------|-----|----------------|------|
65-
| Dify | agents.lv3.org | `dify_port`, `dify_internal_base_url`, `dify_ollama_base_url` | Next converge would break port mapping |
65+
| Dify | agents.example.com | `dify_port`, `dify_internal_base_url`, `dify_ollama_base_url` | Next converge would break port mapping |
6666

6767
**Services with TLS cert gaps (separate from above):**
6868

6969
The nginx edge certificate `lv3-edge` was missing SANs for five subdomains that were
7070
added to the service topology after the last cert issuance:
71-
`grist.lv3.org`, `errors.lv3.org`, `bi.lv3.org`, `paperless.lv3.org`, `scheduler.lv3.org`.
71+
`grist.example.com`, `errors.example.com`, `bi.example.com`, `paperless.example.com`, `scheduler.example.com`.
7272

7373
This causes hard TLS errors in browsers even when the backend containers are running.
7474
Fix: run `make converge-nginx-edge env=production` which will invoke certbot DNS-01
@@ -86,7 +86,7 @@ All other references (Keycloak client registration, service registry, tests)
8686
must match this value. The path `/oauth/openid/callback` is correct.
8787

8888
**Immediate live fix:** Updated the Keycloak `serverclaw` client via the admin API
89-
on the live platform to register `https://chat.lv3.org/oauth/openid/callback`.
89+
on the live platform to register `https://chat.example.com/oauth/openid/callback`.
9090
This fix is reflected in code so the next `make converge-keycloak` is idempotent.
9191

9292
### 2. Eliminate all `platform_service_topology` references in role defaults
@@ -119,10 +119,10 @@ per ADR 0412).
119119
| Action | Command | Required for |
120120
|--------|---------|--------------|
121121
| Reissue TLS cert | `make converge-nginx-edge env=production` | grist, errors, bi, paperless, scheduler TLS |
122-
| Redeploy Directus | `make converge-directus env=production` | data.lv3.org 502 fix |
123-
| Redeploy Paperless | `make converge-paperless env=production` | paperless.lv3.org 502 fix |
124-
| Redeploy Coolify | `make converge-coolify env=production` | coolify.lv3.org 502 fix |
125-
| Investigate Superset | SSH to docker-runtime, `docker ps | grep superset` | bi.lv3.org — port chain correct, container may be stopped |
122+
| Redeploy Directus | `make converge-directus env=production` | data.example.com 502 fix |
123+
| Redeploy Paperless | `make converge-paperless env=production` | paperless.example.com 502 fix |
124+
| Redeploy Coolify | `make converge-coolify env=production` | coolify.example.com 502 fix |
125+
| Investigate Superset | SSH to docker-runtime, `docker ps | grep superset` | bi.example.com — port chain correct, container may be stopped |
126126
| Re-converge Keycloak | `make converge-keycloak env=production` | Pick up serverclaw redirect_uri fix |
127127

128128
---
@@ -142,7 +142,7 @@ per ADR 0412).
142142
- Four services (Directus, Paperless, Coolify, Superset) require a manual re-convergence
143143
to actually recover from 502. The code fix alone is not sufficient.
144144
- TLS cert expansion also requires a manual `make converge-nginx-edge` run.
145-
- Nomad scheduler (`scheduler.lv3.org`) has both a TLS cert gap and a backend timeout
145+
- Nomad scheduler (`scheduler.example.com`) has both a TLS cert gap and a backend timeout
146146
and requires separate investigation.
147147

148148
### Neutral

0 commit comments

Comments
 (0)