Skip to content

fix(deploy): bump postgres to 18-alpine and move volume mount#41

Merged
Haibread merged 1 commit into
mainfrom
fix/postgres-v18-investigate
May 4, 2026
Merged

fix(deploy): bump postgres to 18-alpine and move volume mount#41
Haibread merged 1 commit into
mainfrom
fix/postgres-v18-investigate

Conversation

@Haibread
Copy link
Copy Markdown
Owner

@Haibread Haibread commented May 4, 2026

Summary

Renovate #11 bumps the postgres docker tag to v18 but leaves the volume mount unchanged. The 18+ official images changed the on-disk layout — PostgreSQL data now lives in a major-version-specific subdirectory under `/var/lib/postgresql` — and mounting the old `/var/lib/postgresql/data` path now triggers an `unused mount/volume` startup error (see docker-library/postgres#1259).

The actual error from CI:
```
postgres-1 | Error: in 18+, these Docker images are configured to store database data in a
format which is compatible with "pg_ctlcluster" ... Counter to that, there appears to
be PostgreSQL data in: /var/lib/postgresql/data (unused mount/volume)
```

(This was only visible because #38 fixed the dump-logs step in the E2E workflow earlier today.)

Fix: mount the parent directory `/var/lib/postgresql` so pg18 manages its per-version data subdir. Applied to both `docker-compose.yml` and `docker-compose.dev.yml`. The Helm chart is unaffected — it provisions postgres via cloudnative-pg which manages its own storage.

⚠️ Breaking change for local dev

Existing `postgres_data` volumes from pg16 must be wiped before this change takes effect:
```
docker compose -f deploy/docker-compose.yml -f deploy/docker-compose.dev.yml down -v
```
CI starts each run with a fresh volume, so it's unaffected.

Test plan

  • CI green (E2E job exercises the full compose stack)

Renovate PR #11 bumps the postgres docker tag to v18 but leaves the
volume mount at /var/lib/postgresql/data. The 18+ official images
changed the on-disk layout so PostgreSQL data lives in a major-version-
specific subdirectory under /var/lib/postgresql, and mounting the old
"data" path now triggers an "unused mount/volume" startup error
(see docker-library/postgres#1259):

    postgres-1  | Error: in 18+, these Docker images are configured to
            store database data in a format which is compatible with
            "pg_ctlcluster" ... Counter to that, there appears to be
            PostgreSQL data in: /var/lib/postgresql/data (unused
            mount/volume)

The fix is to mount the parent directory (the new convention) so pg18
manages the per-version data subdir itself.

The Helm chart is unaffected — it provisions postgres via
cloudnative-pg, which manages its own storage layout.

BREAKING CHANGE for local dev: existing `postgres_data` volumes from
pg16 must be wiped before this change takes effect:

    docker compose -f deploy/docker-compose.yml \
        -f deploy/docker-compose.dev.yml down -v

CI starts each run with a fresh volume so it is unaffected.
@Haibread Haibread merged commit fbd7acc into main May 4, 2026
6 checks passed
@Haibread Haibread deleted the fix/postgres-v18-investigate branch May 10, 2026 14:35
Haibread added a commit that referenced this pull request May 10, 2026
…ge (#56)

The Helm chart was still pinning the CNPG cluster to postgres 16
while the docker-compose stack had moved to postgres:18-alpine in
PR #41. Project audit flagged this as a P0 — version drift between
local dev and a Helm-managed cluster means schema and tooling
behaviour differ in production from what we test in dev.

  - deploy/helm/ai-registry/values.yaml: postgresVersion "16" → "18".
    Renders to ghcr.io/cloudnative-pg/postgresql:18 in the CNPG
    Cluster manifest (verified via `helm template`). Comment now
    documents the parity with PR #41 so the next bump understands
    the relationship.
  - docs/runbook.md: the two `pg-probe` snippets that spin up an
    ephemeral pod for psql-into-the-cluster troubleshooting were
    pulling postgres:16-alpine. Bumped to 18-alpine to match.

Verified: `helm lint` passes, `helm template` produces
`imageName: ghcr.io/cloudnative-pg/postgresql:18` for the Cluster
resource.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Haibread added a commit that referenced this pull request May 14, 2026
…gn, ADR 0001, UI/UX plan (#60)

Sweep over the project's markdown to bring it back in line with the
post-Phase-7 + post-audit codebase. Documentation was lagging in
multiple places — the project audit's docs front had flagged most of
this; this PR ships the fixes.

CHANGELOG.md
  Repointed the `Unreleased` section to cover BOTH the existing
  Phase 7 access-control + change-approval bundle (PRs #28-#32 + #37)
  AND the project-audit follow-ups (PRs #52-#59) that just landed.
  The audit follow-ups subsection summarises each merged PR with its
  scope and verification, plus a deferred-items list (DisallowUnknownFields,
  rate-limiter janitor, per-handler child spans, eager markdown
  chunk). Added a note that a real version stamp is overdue.

CLAUDE.md
  - Tech stack: drop the "sqlc or pgx" hedge — the codebase is pgx
    + hand-written SQL only.
  - Auth bullet: stop claiming hashed API keys are supported; they're
    parked under v0.4.x. Mirrored on Decision B in the table.
  - Deployment bullet: stop claiming docker-compose has dev + prod
    profiles. Only dev + ci exist; the prod profile is parked.
  - Conventions: drop the obsolete `claude/ai-registry-setup-KMC3l`
    bootstrap branch reference; describe the actual feat/fix/docs/chore
    convention.
  - Configuration section: "**both** of the following" → "**all three**"
    (env / YAML / default). Original wording predated the YAML layer
    and was only updated on the precedence list, not the lede.

PLAN.md
  - Phase 5 hardening section: the four-line `**TODO — Phase 5:**`
    list was misleading — the items weren't unfinished Phase 5 work,
    they were carried forward into v0.4.x. Replaced with a "Parked
    from Phase 5 (now tracked under v0.4.x)" header that points at
    the README + CLAUDE.md Decision B for the live status.
  - v0.2.2 section: the entire DoD checklist was rendered as `- [ ]`
    even though the section header said "✅ SHIPPED". Replaced the
    unchecked checklist with a past-tense "What landed" summary, a
    "Carried forward" subsection for the OTel-spans gap that v0.2.2
    only partially closed (PR #58 finished it), and a DoD-met note.

README.md
  - Tech stack: PostgreSQL 16 → 18, matching the dev compose
    (postgres:18-alpine since PR #41) and the Helm CNPG cluster
    (PR #56).
  - Infra bullet: replace the "docker-compose (dev / ci / prod)"
    claim with an accurate description of the two real overlays
    (dev, ci) and a parked-under-v0.4.x note for prod.

design.md
  - Typography table: drop the "Geist (next/font)" row — that loader
    was removed with the rest of the Next.js stack in Phase 6
    (ADR 0004). The web app uses Tailwind's default `font-sans` /
    `font-mono` system stacks; explained in a follow-up paragraph.
  - Admin sidebar ASCII diagram: add the Workspaces nav item that
    Phase 7 introduced; rename "Activity" → "Audit" to match the
    actual nav label; flag the API Keys item as a placeholder.

docs/adr/0001-workspaces-under-publishers.md
  - Drop the "Migration numbers ... are **placeholders**" preamble —
    the actual numbered migrations (000008, 000009, 000010) are on
    disk and the placeholder language is no longer accurate.
  - Mark Step 1 + Step 2 as shipped with the actual entry points
    (Go-side `db.BackfillWorkspaces` instead of the original
    `make backfill-workspaces`).
  - Add an explicit "Status note (2026-05-10)" callout that Step 3
    (the finalising migration that drops `publisher_id` and flips
    `workspace_id` to NOT NULL) was scoped at design time but never
    landed; production keeps both columns coexisting and code paths
    still read `publisher_id` directly. Verified by grep against
    `internal/store/mcp.go`.

docs/ui-ux-implementation-plan.md
  - Added a "Status: largely shipped — retrospective document" banner
    at the top. The document plans 10 batches, the vast majority of
    which shipped across v0.2.x and v0.3.x. Without the banner the
    file reads as forward-looking work and gives a misleading picture
    to anyone landing on it via search. Points the reader at PLAN.md
    and README.md for current open work.

Out of scope (audit findings deliberately not addressed here):
docs/ui-ux-proposals.md is a decision record (proposals + accepted /
deferred verdicts), so staleness is by design — not a bug. The
runbook, db-backup, future-multi-environment, ADRs 0002/0003/0004,
and test/load README are all current.

No code changes; no test impact.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant