Skip to content

Commit 33f41a2

Browse files
feat(dropguard): name-convention guard on every customer-data drop + migration-safety CI gate (truehomie D3) (#56)
* feat(dropguard): name-convention guard on every customer-data drop + migration-safety CI gate (truehomie D3) Layer 2 of the truehomie-db DROP hardening (layer 1 = the guardedDrop audit chokepoint, PR #50). The chokepoint made every sanctioned drop AUDITABLE; this makes a MIS-TARGETED drop UNEXECUTABLE: - internal/dropguard: charset+denylist validation of naming tokens and final db/user identifiers. Refuses empty/garbage tokens, SQL metacharacters, system databases (postgres, template0/1, instant_customers, instant_platform, mongo admin/local/config) and admin roles (instanode_admin, instant_cust, doadmin, postgres, redis "default"). Deliberately permissive on token SHAPE (uuid, dashless hex, pool-*, e2e cohorts all pass) so no legitimate legacy deprovision can wedge — this repo auto-deploys. - server.guardedDrop: validates the poolident-resolved naming token BEFORE dispatch; refusal = error + `provisioner.drop.refused` log event + instant_provisioner_drop_total{outcome="refused"}. - postgres local Deprovision + cleanupProvisionPartial (non-chokepoint rollback path), dedicated deprovisionLocal: validate the FINAL constructed dbName/username via validateDropTargets right before DROP. - mongo Deprovision: token guard before connect + per-candidate name guard (refused canonical = hard error; refused legacy candidate = skip). - redis local/dedicated DELUSER: skip any non-tenant-shaped username (ACL DELUSER "default" would brick the shared pod). - scripts/check-migration-safety.sh + ci.yml step: destructive DDL (DROP TABLE/DATABASE/SCHEMA/COLUMN/ROLE/USER, TRUNCATE, DELETE FROM without WHERE) in migrations/*.sql fails CI unless the file carries `-- destructive: acknowledged <reason>`. Self-test fixture suite included. No behavior change for legitimate deprovision flows (regression tests assert uuid/dashless/pool/e2e token shapes still reach the backend). New refusal branches are unit-tested in every package; dropguard itself is 100% covered. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(pool): chokepoint contract on the pool reaper's deprovision dispatch pool.deprovisionBacking is the ONE customer-infra drop dispatch that does not pass through server.guardedDrop (no gRPC request). Give it the same contract: dropguard naming-token validation (refusal = provisioner.drop.refused) and the `provisioner.drop` audit event emitted BEFORE the backend executes, with caller="pool_reaper" and the proto-style resource_type strings so one NR query covers RPC drops and pool-reap drops alike. Without this, every pool reap of a failed postgres item would page the new customer-db-destructive-ddl NR alert as an unsanctioned DROP (and was itself an un-audited drop path). Test fixture fix: fakeRows.Scan filled every string dest with "postgres", which dropguard now (correctly) refuses as a pool naming token — fill positionally so pool_token carries a valid pool-token shape. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(dropguard): identifier length bound is an absurdity bound (128), not postgres's 63 The live-cluster CI suite provisions tokens like tok<nano>_<TestName> (>63 bytes with the usr_ prefix) — postgres truncates such identifiers CONSISTENTLY on CREATE and DROP, so they round-trip fine in reality, and the 63-byte refusal wedged their Deprovision (4 coverage-job test failures). Keep the cap purely as an absurdity bound at 128. Verified the four failing live tests pass against a real postgres 16. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Manas Srivastava <[email protected]> Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
1 parent 1043175 commit 33f41a2

18 files changed

Lines changed: 1113 additions & 11 deletions

.github/workflows/ci.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,16 @@ jobs:
5555
steps:
5656
- uses: actions/checkout@v6
5757

58+
# Migration-safety gate (truehomie-db DROP hardening, task D3): any
59+
# destructive DDL (DROP TABLE/DATABASE/SCHEMA/COLUMN/ROLE/USER, TRUNCATE,
60+
# DELETE FROM without WHERE) in a migrations/*.sql file fails CI unless
61+
# the file carries an explicit `-- destructive: acknowledged <reason>`
62+
# marker. Self-test runs first so a regression in the gate itself reds CI.
63+
- name: Migration safety gate
64+
run: |
65+
bash scripts/check-migration-safety.sh --self-test
66+
bash scripts/check-migration-safety.sh
67+
5868
- name: Checkout proto sibling (replace ../proto)
5969
uses: actions/checkout@v6
6070
with:
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
package mongo
2+
3+
// dropguard_test.go — name-convention guard tests for the Mongo Deprovision
4+
// path (truehomie hardening, task D3). The invariant: a reserved/system naming
5+
// token is refused BEFORE any connection, and a reserved derived candidate name
6+
// is never passed to dropUser/dropDatabase.
7+
8+
import (
9+
"context"
10+
"errors"
11+
"testing"
12+
"time"
13+
14+
"instant.dev/provisioner/internal/dropguard"
15+
)
16+
17+
// TestLocalDeprovision_RefusedToken_NeverConnects points the backend at an
18+
// unreachable URI: a refused token must return dropguard.ErrRefused instantly,
19+
// proving the guard runs before connect (a connect attempt would instead
20+
// return a server-selection error after the timeout).
21+
func TestLocalDeprovision_RefusedToken_NeverConnects(t *testing.T) {
22+
b := newLocalBackend("mongodb://127.0.0.1:1", "127.0.0.1:1")
23+
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
24+
defer cancel()
25+
for _, tok := range []string{"postgres", "instant_customers", "", "admin", "a b"} {
26+
err := b.Deprovision(ctx, tok, "")
27+
if !errors.Is(err, dropguard.ErrRefused) {
28+
t.Fatalf("Deprovision(%q): expected dropguard.ErrRefused before connect, got %v", tok, err)
29+
}
30+
}
31+
}
32+
33+
// TestLocalDeprovision_ReservedLegacyCandidate_SkippedNotFatal uses a token
34+
// whose 12-char legacy truncation collides with a reserved identifier
35+
// ("instant_customersabc"[:12] == "instant_cust"). The legacy candidate must
36+
// be SKIPPED (refused, logged) while the canonical candidate proceeds — the
37+
// deprovision still succeeds.
38+
func TestLocalDeprovision_ReservedLegacyCandidate_SkippedNotFatal(t *testing.T) {
39+
uri, ok := liveMongoURI(t)
40+
if !ok {
41+
t.Skip("CUSTOMER_MONGO_URL unreachable; skipping")
42+
}
43+
b := newLocalBackend(uri, hostFromURI(uri))
44+
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
45+
defer cancel()
46+
if err := b.Deprovision(ctx, "instant_customersabc", ""); err != nil {
47+
t.Fatalf("Deprovision with reserved legacy candidate: want nil (skip), got %v", err)
48+
}
49+
}
50+
51+
// TestLocalDeprovision_ReservedCanonicalDB_ReturnsError uses a token that
52+
// passes the token guard ("ad-min") but whose canonical dash-stripped form is
53+
// the reserved identifier "admin" — so the CANONICAL candidates (usr_admin,
54+
// db_admin) embed a reserved tail. A refused canonical DB candidate is a hard
55+
// error (unlike a legacy candidate, which is skipped): the deprovision must
56+
// fail loudly rather than silently leak the real database.
57+
func TestLocalDeprovision_ReservedCanonicalDB_ReturnsError(t *testing.T) {
58+
uri, ok := liveMongoURI(t)
59+
if !ok {
60+
t.Skip("CUSTOMER_MONGO_URL unreachable; skipping")
61+
}
62+
b := newLocalBackend(uri, hostFromURI(uri))
63+
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
64+
defer cancel()
65+
err := b.Deprovision(ctx, "ad-min", "")
66+
if !errors.Is(err, dropguard.ErrRefused) {
67+
t.Fatalf("Deprovision(%q): expected dropguard.ErrRefused for reserved canonical db candidate, got %v", "ad-min", err)
68+
}
69+
}

internal/backend/mongo/mongo.go

Lines changed: 35 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ import (
2020
"go.mongodb.org/mongo-driver/mongo"
2121
"go.mongodb.org/mongo-driver/mongo/options"
2222

23+
"instant.dev/provisioner/internal/dropguard"
2324
"instant.dev/provisioner/internal/poolident"
2425
)
2526

@@ -196,6 +197,25 @@ func (b *LocalBackend) StorageBytes(ctx context.Context, token, providerResource
196197
// Deprovision drops the user and database for the given token.
197198
// Drops user first, then drops the database.
198199
func (b *LocalBackend) Deprovision(ctx context.Context, token, providerResourceID string) error {
200+
// P0-2: a pool-claimed database/user are named from the pool token, not
201+
// the request token. Resolve the canonical naming token from
202+
// provider_resource_id so dropUser/dropDatabase target the real infra
203+
// instead of no-op'ing on db_<real-token>, which would leak it forever.
204+
namingToken := poolident.NamingToken(token, providerResourceID)
205+
206+
// Name-convention guard (truehomie hardening, task D3): every candidate
207+
// user/database name below derives from this token; refuse the whole
208+
// deprovision — before even connecting — when the token is empty,
209+
// malformed, or a reserved system identifier. Per-candidate names are
210+
// additionally validated in the loops so a future naming.go refactor
211+
// stays covered.
212+
if guardErr := dropguard.CheckNamingToken(namingToken); guardErr != nil {
213+
slog.Error("provisioner.drop.refused",
214+
"event", "provisioner.drop.refused", "site", "nosql.Deprovision",
215+
"token", token, "naming_token", namingToken, "error", guardErr)
216+
return fmt.Errorf("nosql.Deprovision: %w", guardErr)
217+
}
218+
199219
client, err := b.connect(ctx, options.Client().ApplyURI(b.adminURI).
200220
SetServerSelectionTimeout(connectTimeout))
201221
if err != nil {
@@ -207,18 +227,18 @@ func (b *LocalBackend) Deprovision(ctx context.Context, token, providerResourceI
207227
}
208228
}()
209229

210-
// P0-2: a pool-claimed database/user are named from the pool token, not
211-
// the request token. Resolve the canonical naming token from
212-
// provider_resource_id so dropUser/dropDatabase target the real infra
213-
// instead of no-op'ing on db_<real-token>, which would leak it forever.
214-
namingToken := poolident.NamingToken(token, providerResourceID)
215-
216230
// Drop every candidate user. The resource was provisioned under exactly
217231
// one scheme, but we cannot know which without a stored name, so we drop
218232
// the canonical name and every legacy form. dropUser on a non-existent
219233
// user is harmless (logged, non-fatal).
220234
adminDB := client.Database("admin")
221235
for _, username := range legacyMongoUserNames(namingToken) {
236+
if guardErr := dropguard.CheckUserName(username); guardErr != nil {
237+
slog.Error("provisioner.drop.refused",
238+
"event", "provisioner.drop.refused", "site", "nosql.Deprovision",
239+
"token", token, "user", username, "error", guardErr)
240+
continue
241+
}
222242
if r := adminDB.RunCommand(ctx, bson.D{{Key: "dropUser", Value: username}}); r.Err() != nil {
223243
slog.Debug("nosql.Deprovision: dropUser miss (continuing)", "token", token, "user", username, "error", r.Err())
224244
}
@@ -229,6 +249,15 @@ func (b *LocalBackend) Deprovision(ctx context.Context, token, providerResourceI
229249
// the canonical name still propagates.
230250
canonicalDB := mongoDBName(namingToken)
231251
for _, dbName := range legacyMongoDBNames(namingToken) {
252+
if guardErr := dropguard.CheckDatabaseName(dbName); guardErr != nil {
253+
slog.Error("provisioner.drop.refused",
254+
"event", "provisioner.drop.refused", "site", "nosql.Deprovision",
255+
"token", token, "db", dbName, "error", guardErr)
256+
if dbName == canonicalDB {
257+
return fmt.Errorf("nosql.Deprovision: %w", guardErr)
258+
}
259+
continue
260+
}
232261
if dropErr := client.Database(dbName).Drop(ctx); dropErr != nil {
233262
if dbName == canonicalDB {
234263
return fmt.Errorf("nosql.Deprovision: drop database %s: %w", dbName, dropErr)

internal/backend/postgres/dedicated.go

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -330,6 +330,13 @@ func (p *DedicatedProvider) deprovisionLocal(ctx context.Context, token string)
330330
dbName := "dedicated_db_" + token
331331
username := "dedicated_usr_" + token
332332

333+
// Name-convention guard (truehomie hardening, task D3): validate the FINAL
334+
// constructed identifiers before the destructive statements run. Refusal is
335+
// an error — a wrong-name bug must never reach DROP.
336+
if guardErr := validateDropTargets("db.dedicated.deprovisionLocal", token, dbName, username); guardErr != nil {
337+
return fmt.Errorf("db.dedicated.deprovisionLocal: %w", guardErr)
338+
}
339+
333340
adminDSN := p.localAdminDSN()
334341
conn, err := pgxConnect(ctx, adminDSN)
335342
if err != nil {
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
package postgres
2+
3+
// dropcheck.go — the postgres-package seam for the dropguard name-convention
4+
// guard (truehomie hardening, task D3). Every DROP DATABASE / DROP USER this
5+
// package executes validates its FINAL constructed identifiers here first, so
6+
// a bug constructing a wrong target (empty token, system database, admin role)
7+
// is refused before the destructive statement runs.
8+
9+
import (
10+
"log/slog"
11+
12+
"instant.dev/provisioner/internal/dropguard"
13+
)
14+
15+
// validateDropTargets refuses a (dbName, username) pair that does not match the
16+
// per-tenant naming convention. On refusal it emits the structured
17+
// `provisioner.drop.refused` audit event (same event name as the server
18+
// chokepoint, so one NR/grep surface covers every refusal site) and returns the
19+
// dropguard error for the caller to wrap. site names the calling function.
20+
func validateDropTargets(site, token, dbName, username string) error {
21+
err := dropguard.CheckDatabaseName(dbName)
22+
if err == nil {
23+
err = dropguard.CheckUserName(username)
24+
}
25+
if err != nil {
26+
slog.Error("provisioner.drop.refused",
27+
"event", "provisioner.drop.refused", "site", site,
28+
"token", token, "db", dbName, "user", username, "error", err)
29+
}
30+
return err
31+
}
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
package postgres
2+
3+
// dropcheck_test.go — unit tests for the name-convention guard on every DROP
4+
// DATABASE / DROP USER site in this package (truehomie hardening, task D3).
5+
// The invariant: a wrong-name bug must never reach a destructive statement,
6+
// and legitimate per-tenant names must pass untouched.
7+
8+
import (
9+
"context"
10+
"errors"
11+
"testing"
12+
13+
"instant.dev/provisioner/internal/dropguard"
14+
)
15+
16+
func TestValidateDropTargets_OK(t *testing.T) {
17+
if err := validateDropTargets("test.site", "tok",
18+
"db_96edf9eed8ed42929036b63298ec5b2b",
19+
"usr_96edf9eed8ed42929036b63298ec5b2b"); err != nil {
20+
t.Fatalf("validateDropTargets: unexpected refusal: %v", err)
21+
}
22+
}
23+
24+
func TestValidateDropTargets_RefusesBadDatabase(t *testing.T) {
25+
err := validateDropTargets("test.site", "tok", "instant_customers", "usr_ok123")
26+
if !errors.Is(err, dropguard.ErrRefused) {
27+
t.Fatalf("expected ErrRefused for system database, got %v", err)
28+
}
29+
}
30+
31+
func TestValidateDropTargets_RefusesBadUser(t *testing.T) {
32+
// Database valid, user invalid — exercises the second check independently.
33+
err := validateDropTargets("test.site", "tok", "db_ok123", "instanode_admin")
34+
if !errors.Is(err, dropguard.ErrRefused) {
35+
t.Fatalf("expected ErrRefused for admin role, got %v", err)
36+
}
37+
}
38+
39+
// TestLocalDeprovision_RefusedToken_NeverConnects asserts the guard refuses a
40+
// reserved/system naming token BEFORE any admin connection or DROP statement.
41+
// The backend is constructed with a nil router: if the guard let the call
42+
// through, the router deref would panic — passing proves the early return.
43+
func TestLocalDeprovision_RefusedToken_NeverConnects(t *testing.T) {
44+
b := &LocalBackend{}
45+
for _, tok := range []string{"postgres", "instant_customers", "", "x y"} {
46+
err := b.Deprovision(context.Background(), tok, "")
47+
if !errors.Is(err, dropguard.ErrRefused) {
48+
t.Fatalf("Deprovision(%q): expected ErrRefused, got %v", tok, err)
49+
}
50+
}
51+
}
52+
53+
func TestCleanupProvisionPartial_RefusedNames_ExecutesNothing(t *testing.T) {
54+
conn := &fakePGConn{}
55+
cleanupProvisionPartial(conn, "instant_customers", "usr_ok123")
56+
if len(conn.execCalls) != 0 {
57+
t.Fatalf("rollback executed %d statements for a refused database name: %v", len(conn.execCalls), conn.execCalls)
58+
}
59+
cleanupProvisionPartial(conn, "db_ok123", "instanode_admin")
60+
if len(conn.execCalls) != 0 {
61+
t.Fatalf("rollback executed %d statements for a refused user name: %v", len(conn.execCalls), conn.execCalls)
62+
}
63+
}
64+
65+
func TestCleanupProvisionPartial_ValidNames_StillExecutesDrops(t *testing.T) {
66+
// Behaviour preservation: the legitimate rollback path still runs both
67+
// IF EXISTS drops.
68+
conn := &fakePGConn{}
69+
cleanupProvisionPartial(conn, "db_ok123", "usr_ok123")
70+
if len(conn.execCalls) != 2 {
71+
t.Fatalf("rollback: want 2 statements (DROP DATABASE + DROP USER), got %d: %v", len(conn.execCalls), conn.execCalls)
72+
}
73+
}
74+
75+
// TestDedicatedDeprovisionLocal_RefusedToken_NeverConnects mirrors the local
76+
// test for the dedicated provider: a reserved token is refused before the
77+
// admin DSN is ever dialed (empty adminDSN would otherwise attempt a real
78+
// connection to the default customers URL).
79+
func TestDedicatedDeprovisionLocal_RefusedToken_NeverConnects(t *testing.T) {
80+
p := &DedicatedProvider{}
81+
for _, tok := range []string{"postgres", "", "a b"} {
82+
err := p.deprovisionLocal(context.Background(), tok)
83+
if !errors.Is(err, dropguard.ErrRefused) {
84+
t.Fatalf("deprovisionLocal(%q): expected ErrRefused, got %v", tok, err)
85+
}
86+
}
87+
}

internal/backend/postgres/local.go

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -252,6 +252,14 @@ func (b *LocalBackend) Provision(ctx context.Context, token, tier string, connLi
252252
// even if the incoming request context is on the verge of its deadline. Both
253253
// DROPs use IF EXISTS so partial states (db created, user not) are safe.
254254
func cleanupProvisionPartial(conn pgConn, dbName, username string) {
255+
// Name-convention guard (truehomie hardening, task D3): this rollback path
256+
// does NOT pass through the server's guardedDrop chokepoint, so it
257+
// validates its own targets. A wrong-name bug must SKIP the drops, never
258+
// execute them — the cost of a refusal is a leaked just-created db, not a
259+
// destroyed customer one.
260+
if validateDropTargets("db.local.cleanupProvisionPartial", "", dbName, username) != nil {
261+
return
262+
}
255263
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
256264
defer cancel()
257265
if _, err := conn.Exec(ctx, fmt.Sprintf("DROP DATABASE IF EXISTS %q WITH (FORCE)", dbName)); err != nil {
@@ -307,6 +315,14 @@ func (b *LocalBackend) Deprovision(ctx context.Context, token, providerResourceI
307315
dbName := dbNamePrefix + namingToken
308316
username := userNamePrefix + namingToken
309317

318+
// Name-convention guard (truehomie hardening, task D3): validate the FINAL
319+
// constructed identifiers right before the destructive statements run, so a
320+
// future refactor that builds them differently is still covered. Refusal is
321+
// an error (the caller's deprovision fails loudly) — never a wrong drop.
322+
if guardErr := validateDropTargets("db.local.Deprovision", token, dbName, username); guardErr != nil {
323+
return fmt.Errorf("db.local.Deprovision: %w", guardErr)
324+
}
325+
310326
adminURL := b.router.AdminURLForResource(poolident.BasePRID(providerResourceID))
311327
conn, err := pgxConnect(ctx, adminURL)
312328
if err != nil {

internal/backend/redis/dedicated.go

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ import (
1818
"log/slog"
1919

2020
goredis "github.com/redis/go-redis/v9"
21+
22+
"instant.dev/provisioner/internal/dropguard"
2123
)
2224

2325
// DedicatedProvider provisions a dedicated Redis instance.
@@ -228,6 +230,14 @@ func (p *DedicatedProvider) deprovisionLocal(ctx context.Context, token, provide
228230
probes = append(probes, legacy)
229231
}
230232
for _, username := range probes {
233+
// Name-convention guard (truehomie hardening, task D3): refuse a
234+
// DELUSER of any non-tenant-shaped name (e.g. "default", admin users).
235+
if guardErr := dropguard.CheckUserName(username); guardErr != nil {
236+
slog.Error("provisioner.drop.refused",
237+
"event", "provisioner.drop.refused", "site", "cache.dedicated.deprovisionLocal",
238+
"token", token, "user", username, "error", guardErr)
239+
continue
240+
}
231241
if err := p.rdb.Do(ctx, "ACL", "DELUSER", username).Err(); err != nil {
232242
// Best-effort — user may not exist.
233243
slog.Warn("cache.dedicated.deprovisionLocal: ACL DELUSER (non-fatal)",

0 commit comments

Comments
 (0)