Skip to content

Commit eaa165d

Browse files
docs: release-readiness sweep — spec + populate changelog, refresh guides, future-proof upgrade test (#585)
* docs(changelog): spec a human-readable changelog + populate [Unreleased] Add specs/release/changelog.spec.yaml (release-changelog): every entry is a user-facing sentence (>=5 words, no commit-subject prefix), Keep a Changelog categories, dated version sections, no emoji. Enforced by packaging/tests/changelog_test.go (source-inspection, AC-01..05), scoped to the actively-edited [Unreleased] section so history is grandfathered. Populate [Unreleased] with the post-rc.7 work in that style: Settings activation (users, notifications, security/SSO), per-host SSH auth/sudo learning, one-command safe upgrade, airgap + fresh-install fixes, and the pre-release security-hardening batch (#584). * docs: refresh hardening + install guides; de-hardcode upgrade test - SECURITY_HARDENING.md: drop the 'not yet implemented' rate-limit/headers callout (shipped in #584); document auth rate limiting, CSRF, and security headers as live; note always-on breach screening; add an outbound-SSH persistent known-hosts subsection; version line to 0.2.0 rc series. - install_guide.md: fix the stale DEB version example (rc.5 -> rc.7) and add an 'Upgrading' section documenting the one-command auto-migrate + backup + fail-safe path and /etc/openwatch/upgrade.conf. - upgrade-container-test.sh / run-upgrade-container-test.sh: stop hardcoding migration 34->35 and the host_connection_profile table. The host driver now derives the head migration's goose version_id and its -- +goose Down SQL and passes them in; the test reverses exactly the head migration and asserts a generic prior->head advance, so it survives every future migration. (CLAUDE.md was also refreshed locally for AI sessions but is intentionally gitignored per commit 7a73353, so it is not part of this commit.) * test(packaging): pin upgrade-test driver to the host RPM arch dist/ can hold a leftover cross-built arm64 RPM (the packaging Go suite cross-builds it for AC-14), and the rockylinux:9 container runs the host platform, so the previous `cp dist/openwatch-*-1.*.rpm` swept in both arches and `rpm -i` collided on /usr/bin/openwatch. Glob a single host-arch RPM. Validated end-to-end: the container test now passes — real `rpm -U` runs the %post helper ($1=2), which stops the service, takes a pg_dump restore point, applies the head migration (35 -> 36), and restarts.
1 parent 1f6e72d commit eaa165d

9 files changed

Lines changed: 531 additions & 36 deletions

File tree

.secrets.baseline

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -165,14 +165,14 @@
165165
"filename": "docs/engineering/install_guide.md",
166166
"hashed_secret": "c99a970222c9f5d73283b8be8021086dd666620b",
167167
"is_verified": false,
168-
"line_number": 139
168+
"line_number": 146
169169
},
170170
{
171171
"type": "Basic Auth Credentials",
172172
"filename": "docs/engineering/install_guide.md",
173173
"hashed_secret": "9d4e1e23bd5b727046a9e3b4b7db57bd8d6ee684",
174174
"is_verified": false,
175-
"line_number": 370
175+
"line_number": 377
176176
}
177177
],
178178
"docs/engineering/prototypes/openwatch-v1/Host Management.html": [
@@ -2867,5 +2867,5 @@
28672867
}
28682868
]
28692869
},
2870-
"generated_at": "2026-06-15T21:15:28Z"
2870+
"generated_at": "2026-06-17T01:00:11Z"
28712871
}

CHANGELOG.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,74 @@ Versioning: [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
1010

1111
## [Unreleased]
1212

13+
Settings became a working control panel, OpenWatch started learning how to
14+
reach each host over SSH, package upgrades became a single safe command, and a
15+
pre-release security review closed a batch of perimeter and access-control
16+
gaps.
17+
18+
### Added
19+
20+
- Settings > Users now lets you invite people, manage their accounts, and
21+
assign roles from the UI instead of showing a placeholder (#552, #553).
22+
- Settings > Notifications can now send compliance alerts to Slack, a generic
23+
webhook, or email over SMTP, and each channel can be edited after creation
24+
(#554, #555).
25+
- Settings > Security is live end to end: scoped API tokens you can create and
26+
revoke, an authentication policy (password strength and session timeouts),
27+
and single sign-on through an OIDC identity provider (#556, #557, #558).
28+
- Settings > Audit and About now browse the audit log in-app and show the live
29+
license and build details instead of static text (#552).
30+
- Each host's actions menu now has Edit and Delete entries so you can correct
31+
or remove a host without leaving the list (#560).
32+
- OpenWatch now learns how to reach each host: it records which SSH
33+
authentication method and sudo style actually worked and reuses them on the
34+
next discovery, intelligence, and liveness pass, so it stops retrying
35+
combinations that already failed (#566, #575, #576).
36+
37+
### Changed
38+
39+
- Upgrading is now one command. `dnf update` (or `apt upgrade`) applies any
40+
pending database migrations automatically, takes a full database backup
41+
first, and on a failed migration leaves the service stopped with clear
42+
recovery steps instead of running against a half-migrated schema. The
43+
PostgreSQL engine upgrade itself stays an operator-supervised step (#569).
44+
- Web fonts now ship inside the application instead of loading from a font CDN,
45+
so the interface renders completely in air-gapped deployments (#561).
46+
- Updated the frontend build and CI tooling (Vite, Vitest, lucide-react, zod,
47+
and several GitHub Actions) to current major versions (#571, #572, #573).
48+
49+
### Fixed
50+
51+
- A fresh install now boots on the first try: the Kensa rule corpus and the
52+
server identity keys are provisioned during installation rather than failing
53+
at first startup (#564).
54+
- The SMTP notification channel edit form now pre-fills its current settings
55+
instead of opening blank (#561).
56+
- Removed leftover demo and sample data that could appear on the dashboard,
57+
the activity feed, and the host lists (#562).
58+
59+
### Security
60+
61+
A pre-release security review (six parallel audit dimensions, every
62+
high-severity finding re-verified by hand) closed eight findings (#584):
63+
64+
- State-changing requests made with a session cookie are now CSRF-protected
65+
with a double-submit token; a request without a matching token is rejected.
66+
- The login and MFA-verify endpoints are now rate-limited per client address
67+
to slow online password and one-time-code guessing.
68+
- Every response now carries security headers: HSTS, a content-security policy
69+
that forbids framing, no-sniff, and a strict referrer policy.
70+
- SSH host keys learned on first connection are now stored in the database, so
71+
a restart no longer re-trusts every host and a changed host key is detected
72+
across restarts.
73+
- New passwords are now screened against a built-in list of common and breached
74+
passwords, with an option to point at a full breach corpus; the check now
75+
runs in production instead of being silently skipped.
76+
- Reading the audit-event API now requires the audit-read permission instead of
77+
being open to any caller.
78+
- Creating an API token or assigning a role can no longer grant more access
79+
than the caller already holds.
80+
1381
---
1482

1583
## [0.2.0-rc.7] Eyrie — 2026-06-14

docs/engineering/install_guide.md

Lines changed: 58 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -247,7 +247,7 @@ PGPASSWORD='replace-with-a-strong-password' \
247247
### Step 3 — Install the packages
248248

249249
```bash
250-
sudo apt install -y ./openwatch_0.2.0-rc.5_amd64.deb ./kensa-rules_0.4.3_all.deb
250+
sudo apt install -y ./openwatch_0.2.0-rc.7_amd64.deb ./kensa-rules_0.4.3_all.deb
251251
```
252252

253253
Install **both** files together — `openwatch` `Depends` on `kensa-rules` (the
@@ -404,6 +404,63 @@ the underlying error.
404404

405405
---
406406

407+
## Upgrading
408+
409+
Upgrading is one command. Download the newer `openwatch` package (and the newer
410+
`kensa-rules` package if the rule corpus moved) and install it the same way you
411+
did originally:
412+
413+
```bash
414+
# RHEL family
415+
sudo dnf install -y ./openwatch-<new>.x86_64.rpm ./kensa-rules-<new>.noarch.rpm
416+
417+
# Debian / Ubuntu
418+
sudo apt install -y ./openwatch_<new>_amd64.deb ./kensa-rules_<new>_all.deb
419+
```
420+
421+
On an upgrade (and only on an upgrade — never on a fresh install) the package
422+
post-install step runs the upgrade helper, which:
423+
424+
1. Checks the database is reachable. If it is not, it leaves the service alone,
425+
prints how to finish later (`openwatch migrate && systemctl restart
426+
openwatch`), and does **not** fail the package transaction.
427+
2. Stops the service so the old binary never runs against a half-migrated
428+
schema.
429+
3. Takes a full `pg_dump` restore point into `/var/lib/openwatch/backups/`
430+
before touching the schema. If the backup fails, it aborts **without**
431+
migrating (fail-closed) — your data is untouched.
432+
4. Applies any pending migrations, then starts the service again.
433+
434+
If a migration fails, the helper leaves the service **stopped** and exits
435+
non-zero so the package manager surfaces the problem, and it prints the restore
436+
path. Your data is intact (each migration runs in its own transaction and rolls
437+
back on error). After fixing the cause:
438+
439+
```bash
440+
openwatch migrate # re-apply; reads the same DSN from secrets.env
441+
sudo systemctl start openwatch
442+
```
443+
444+
To preview what an upgrade would apply without changing anything:
445+
446+
```bash
447+
sudo -u openwatch openwatch migrate --status
448+
```
449+
450+
Tunables live in `/etc/openwatch/upgrade.conf` (a `noreplace` config file):
451+
`AUTO_BACKUP=yes|no` toggles the pre-migration dump, and
452+
`BACKUP_RETENTION_DAYS` controls pruning. A `systemd` timer
453+
(`openwatch-backup-cleanup.timer`) prunes old dumps daily but **always keeps the
454+
most recent one** regardless of age.
455+
456+
> Scope: this automates the OpenWatch **application** schema only. A PostgreSQL
457+
> **engine** major-version upgrade (for example PostgreSQL 15 -> 16) is a
458+
> separate, operator-supervised `pg_upgrade` and is never triggered from a
459+
> package scriptlet. See `specs/release/upgrade.spec.yaml` for the full
460+
> contract.
461+
462+
---
463+
407464
## Uninstall
408465

409466
### RPM

docs/guides/SECURITY_HARDENING.md

Lines changed: 39 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# OpenWatch security hardening guide
22

3-
**Applies to:** OpenWatch 0.2.0-rc.5 (Go single-binary build; pre-release)
3+
**Applies to:** OpenWatch 0.2.0 pre-release (rc series; Go single-binary build)
44
**Audience:** System administrators, security engineers, compliance officers
55

66
This guide covers the security controls you operate when you deploy OpenWatch
@@ -62,6 +62,20 @@ Hardening steps you perform at the host level:
6262
Source: `packaging/common/openwatch.service`,
6363
`docs/engineering/install_guide.md` (Step 2).
6464

65+
### Outbound SSH to managed hosts
66+
67+
When OpenWatch connects to a managed host it validates the host key on a
68+
trust-on-first-use basis and **persists** the accepted key in PostgreSQL
69+
(`ssh_known_hosts`, migration 0036). The first connection records the key; every
70+
later connection compares against it and is rejected if the key changed
71+
(`ErrHostKeyMismatch`). Because the store is durable, a service restart does not
72+
re-trust hosts, so an attacker cannot MITM the first scan after a restart to
73+
harvest credentials. Presented keys are also strength-validated per NIST SP
74+
800-57 (RSA >= 2048, Ed25519 always accepted). To rotate a host's key
75+
intentionally (a re-provisioned host), delete its row from `ssh_known_hosts` so
76+
the next connection re-learns it. Source: `internal/knownhosts/store.go`,
77+
`internal/ssh/`.
78+
6579
---
6680

6781
## 3. Transport security (TLS)
@@ -164,7 +178,7 @@ Source: `packaging/common/openwatch.service`, `internal/config/config.go`
164178
| Session inactivity timeout | 15 minutes | `internal/identity/sessions.go` (`SessionInactivityWindow`) |
165179
| Session absolute timeout | 12 hours | `internal/identity/sessions.go` (`SessionAbsoluteWindow`) |
166180
| Password policy | Length only — 8 chars (regular), 15 chars (admin), max 128; NIST SP 800-63B | `internal/identity/password.go` |
167-
| Breach check | Optional corpus lookup rejects known-compromised passwords | `internal/identity/password.go` |
181+
| Breach check | Always-on in production: new passwords are screened against an embedded common/breached corpus (airgap-safe); point `OPENWATCH_BREACH_CORPUS_FILE` at a full HIBP list to extend it | `internal/identity/password.go`, `internal/identity/breach_corpus_default.go` |
168182
| MFA | TOTP enrollment and verification | `internal/identity/mfa.go` |
169183

170184
The password policy is deliberately length-based with no character-class rules,
@@ -173,12 +187,12 @@ per NIST SP 800-63B. The first admin is created out-of-band with
173187

174188
Source: `internal/identity/`, `cmd/openwatch/main.go` (`cmdCreateAdmin`).
175189

176-
> Not yet implemented: there is no failed-login throttle, account lockout, or
177-
> per-IP brute-force backoff in the auth handlers. The Argon2id cost (~50–100 ms
178-
> per verification) is the only built-in slow-down on online guessing. Until
179-
> rate limiting lands (Section 9), protect `/api/v1/auth/login` with an upstream
180-
> control (a reverse proxy with rate limiting, or network ACLs) if you expose
181-
> 8443 beyond a trusted network. Source: `internal/server/auth_handlers.go`.
190+
`/api/v1/auth/login` and `/auth/mfa:verify` are rate-limited per client IP
191+
(Section 10), which throttles online guessing in addition to the Argon2id cost
192+
(~50-100 ms per verification). There is still no per-account lockout after N
193+
failed attempts, so for an internet-facing 8443 a reverse proxy or network ACL
194+
is still worthwhile as defense in depth. Source:
195+
`internal/server/auth_handlers.go`, `internal/server/ratelimit.go`.
182196

183197
---
184198

@@ -297,7 +311,7 @@ Hardening steps:
297311

298312
---
299313

300-
## 10. Rate limiting and request controls — current state
314+
## 10. Rate limiting, CSRF, and security headers
301315

302316
The HTTP server sets request-hardening timeouts and size limits:
303317

@@ -309,15 +323,22 @@ The HTTP server sets request-hardening timeouts and size limits:
309323
| `IdleTimeout` | 120 s | `internal/server/server.go` |
310324
| `MaxHeaderBytes` | 64 KiB | `internal/server/server.go` |
311325

312-
> Not yet implemented: there is no per-user or per-IP HTTP rate-limiting
313-
> middleware, and no HTTP security-header middleware (HSTS, CSP, `X-Frame-Options`,
314-
> `X-Content-Type-Options`, `Referrer-Policy`, `Permissions-Policy`). The
315-
> `RateLimit` constants in the codebase belong to the intelligence and discovery
316-
> schedulers (how many hosts to enqueue per tick), not to the HTTP surface.
317-
> Until these land, enforce request rate limiting and inject security response
318-
> headers at an upstream reverse proxy if you expose 8443 publicly. Source:
319-
> `internal/server/server.go`, `internal/intelligence/scheduler/service.go`,
320-
> `internal/intelligence/discovery/scheduler/service.go`.
326+
The single binary serves the SPA and the API from one origin with no required
327+
edge proxy, so the perimeter controls run in the application itself:
328+
329+
| Control | Behavior | Source |
330+
|---------|----------|--------|
331+
| Auth rate limiting | Per-client-IP sliding window on `POST /api/v1/auth/login` and `/auth/mfa:verify`; over the limit returns `429` + `Retry-After` and skips the credential check. The key is the direct connection address (`RemoteAddr`), not a client-supplied `X-Forwarded-For`. | `internal/server/ratelimit.go` |
332+
| CSRF | Double-submit token: login and refresh set a non-HttpOnly `XSRF-TOKEN` cookie, and unsafe (POST/PUT/PATCH/DELETE) cookie-authenticated requests must echo it in `X-CSRF-Token` (constant-time compare) or get `403 authz.csrf_invalid`. Bearer/token requests and `/api/v1/auth/*` are exempt. | `internal/server/csrf.go` |
333+
| Security headers | Every response carries HSTS (>=1 year, includeSubDomains), a Content-Security-Policy that denies framing (`frame-ancestors 'none'`, `default-src 'self'`; `/docs` relaxes script/style for Swagger but still denies framing), `X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`, and `Referrer-Policy: no-referrer`. | `internal/server/security_headers.go` |
334+
335+
> Scope notes: auth rate limiting covers the credential-guessing surface, not
336+
> every route — there is still no general per-route HTTP limiter, and no request
337+
> body-size cap (`http.MaxBytesReader`) on JSON endpoints. If you expose 8443
338+
> publicly, an upstream reverse proxy is still useful for global rate limiting
339+
> and body-size enforcement. The scheduler `RateLimit` constants are unrelated
340+
> (they bound how many hosts the intelligence and discovery schedulers enqueue
341+
> per tick), see `internal/intelligence/scheduler/service.go`.
321342
322343
---
323344

0 commit comments

Comments
 (0)