Skip to content

Commit 5955b51

Browse files
fix(security): guard + audit the dev-fallback direct customer-DB DROP path (truehomie-db) (#286)
* fix(security): guard + audit the dev-fallback direct customer-DB DROP path (truehomie-db) The PROVISIONER_ADDR-unset dev-fallback providers (db/local.go, nosql/mongo.go) ran direct DROP DATABASE / DROP USER (Postgres) and dropDatabase / dropUser (Mongo) as the customer-DB superuser with ZERO audit trail. An api process started without PROVISIONER_ADDR but with a prod CUSTOMER_DATABASE_URL / MONGO_ADMIN_URI (a laptop/.env/E2E run pointed at the then-public admin host) would CREATE/DROP real customer databases unlogged — the 2026-06-03 truehomie incident signature (an active Pro customer's DB + role dropped by an unidentified, non-audited path). Defense in depth, smallest blast radius first, in a new internal/providers/ dbsafety package (api-internal mirror of the provisioner-side dropguard D3): 1. Production refusal. The fallback is DEV-ONLY, so GuardDrop fails closed when effectively in production: a prod-class ENVIRONMENT, OR a target DSN host that is not clearly local/in-cluster/dev (public FQDN, *.instanode.dev, *.ondigitalocean.com, routable public IP). Localhost / 127.x / ::1 / RFC1918 / in-cluster service short-names stay dev-safe, so local dev, CI (TEST_* DBs on localhost), and the port-forwarded full-stack E2E flow keep working. 2. Name-convention + denylist guard. Every DROP target must carry the per-tenant db_ / usr_ prefix + a [A-Za-z0-9._-] token and miss the system denylist (postgres, template0/1, instant_customers, instant_platform, instanode_admin, doadmin, admin/local/config, …). An empty/wildcard/system name can never reach the DROP. 3. Audit. Every sanctioned drop emits an audit_log row of new kind customer_db.direct_drop (operator-internal, NOT a customer email) via an injected sink — models.WireDBSafetyAuditSink wires a *sql.DB writer at handler construction; provider unit tests fall back to a structured-slog sink. So even if layers 1+2 pass there is a forensic trail. The guard runs BEFORE the superuser connection is opened, so a refused op never touches the customer cluster. New dbsafety + sink code is 100% unit-covered; the four DROP sites (local.go DROP DATABASE/USER, mongo.go dropUser/ dropDatabase) are all guarded. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test(dbsafety): cover uppercase char-class branch in validTokenChars CI diff-cover flagged dbsafety.go:134 (the `case c >= 'A' && c <= 'Z'` branch) — every existing good-token in TestCheckDatabaseName was lowercase (UUIDs are lowercase hex), so the uppercase case never fired. Added a mixed-case good token (e2e/CI tokens may carry A-Z, per the function's own comment). dbsafety pkg now 100.0% statements; line 134 covered. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
1 parent 337f2bd commit 5955b51

18 files changed

Lines changed: 1278 additions & 55 deletions

internal/handlers/db.go

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,9 @@ func NewDBHandler(db *sql.DB, rdb *redis.Client, cfg *config.Config, provClient
5353
if provClient == nil {
5454
// fall back to local provider
5555
h.dbProvider = dbprovider.New(cfg, cfg.PostgresCustomersURL)
56+
// Wire the *sql.DB-backed dbsafety audit sink so the dev-fallback
57+
// provider's direct DROPs are recorded in audit_log (truehomie-db).
58+
models.WireDBSafetyAuditSink(db)
5659
}
5760
return h
5861
}

internal/handlers/nosql.go

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,10 @@ func NewNoSQLHandler(db *sql.DB, rdb *redis.Client, cfg *config.Config, provClie
4141
}
4242
if provClient == nil {
4343
// fall back to local provider
44-
h.nosqlProvider = nosqlprovider.New(cfg.MongoAdminURI, cfg.MongoHost)
44+
h.nosqlProvider = nosqlprovider.New(cfg.MongoAdminURI, cfg.MongoHost, cfg.Environment)
45+
// Wire the *sql.DB-backed dbsafety audit sink so the dev-fallback
46+
// provider's dropUser/dropDatabase are recorded in audit_log.
47+
models.WireDBSafetyAuditSink(db)
4548
}
4649
return h
4750
}

internal/handlers/vector.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,8 @@ func NewVectorHandler(db *sql.DB, rdb *redis.Client, cfg *config.Config, provCli
8787
}
8888
if provClient == nil {
8989
h.dbProvider = dbprovider.New(cfg, cfg.PostgresCustomersURL)
90+
// Wire the dbsafety audit sink for the dev-fallback DROP path.
91+
models.WireDBSafetyAuditSink(db)
9092
}
9193
return h
9294
}
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
package models
2+
3+
// dbsafety_audit_sink.go — the *sql.DB-backed audit sink for the dbsafety guard
4+
// used by the dev-fallback (PROVISIONER_ADDR-unset) customer-data providers
5+
// (internal/providers/db/local.go + internal/providers/nosql/mongo.go).
6+
//
7+
// dbsafety lives below the models layer and must not import it (that would pull
8+
// the platform-DB stack into the provider packages and risk a cycle). So the
9+
// audit emission is an injected seam: dbsafety calls its AuditSink interface;
10+
// this file provides the production implementation that writes an audit_log row
11+
// of kind customer_db.direct_drop via InsertAuditEvent.
12+
//
13+
// The sink is registered once at handler construction (the fallback path of
14+
// NewDBHandler / NewNoSQLHandler / NewVectorHandler) via WireDBSafetyAuditSink.
15+
// Writes are best-effort and bounded: an audit failure must NEVER block (or
16+
// fail) a deprovision (audit_log.go's contract).
17+
18+
import (
19+
"context"
20+
"database/sql"
21+
"encoding/json"
22+
"log/slog"
23+
"time"
24+
25+
"instant.dev/internal/providers/dbsafety"
26+
"instant.dev/internal/safego"
27+
)
28+
29+
// dbSafetyAuditTimeout bounds the best-effort audit write so a stalled
30+
// platform-DB never wedges (or delays) the deprovision goroutine.
31+
const dbSafetyAuditTimeout = 3 * time.Second
32+
33+
// dbSafetyAuditSink persists one audit_log row of kind customer_db.direct_drop
34+
// per sanctioned direct drop. It never blocks the caller — the row is written
35+
// in a panic-safe, bounded-context goroutine.
36+
type dbSafetyAuditSink struct {
37+
db *sql.DB
38+
}
39+
40+
// WireDBSafetyAuditSink installs a *sql.DB-backed dbsafety audit sink. Called
41+
// once from the fallback path of the provisioning handler constructors. A nil
42+
// db (test config) leaves the structured-slog default sink in place so the
43+
// event is still logged.
44+
func WireDBSafetyAuditSink(db *sql.DB) {
45+
if db == nil {
46+
return
47+
}
48+
dbsafety.SetAuditSink(&dbSafetyAuditSink{db: db})
49+
}
50+
51+
// Emit writes the audit_log row best-effort. The metadata captures the
52+
// destroyed identifiers + the admin DSN host (never credentials) so an operator
53+
// can reconstruct exactly what the api dropped, and where — even though layers
54+
// 1+2 of the guard should make a prod drop unreachable.
55+
func (s *dbSafetyAuditSink) Emit(_ context.Context, rec dbsafety.AuditRecord) {
56+
// json.Marshal of a fixed-shape map[string]string cannot error — the only
57+
// failure modes are unsupported types / cycles, neither of which a string
58+
// map has. A nil meta would be a strictly worse audit than the marshalled
59+
// one, which always succeeds here, so the error is intentionally dropped.
60+
meta, _ := json.Marshal(map[string]string{
61+
"provider": rec.Provider,
62+
"token": rec.Token,
63+
"database": rec.DatabaseName,
64+
"user": rec.UserName,
65+
"dsn_host": rec.DSNHost,
66+
})
67+
68+
safego.Go("dbsafety.audit.emit", func() {
69+
bgCtx, cancel := context.WithTimeout(context.Background(), dbSafetyAuditTimeout)
70+
defer cancel()
71+
ev := AuditEvent{
72+
Actor: "system",
73+
Kind: rec.Kind,
74+
Summary: "direct customer-DB drop via dev-fallback provider",
75+
Metadata: meta,
76+
}
77+
if err := InsertAuditEvent(bgCtx, s.db, ev); err != nil {
78+
// Best-effort: a failed audit write must not surface anywhere.
79+
// Log loudly — a missing trail for THIS kind is itself notable.
80+
slog.WarnContext(bgCtx, "dbsafety_audit: InsertAuditEvent failed",
81+
"audit_kind", rec.Kind, "error", err)
82+
}
83+
})
84+
}
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
package models
2+
3+
// dbsafety_audit_sink_mock_test.go — white-box (package models) coverage of the
4+
// dbSafetyAuditSink.Emit branches via sqlmock, with no live DB. Covers the
5+
// goroutine insert (happy) and the InsertAuditEvent-failure log branch
6+
// deterministically. The integration row-lands test lives in the package
7+
// models_test file.
8+
9+
import (
10+
"context"
11+
"errors"
12+
"testing"
13+
"time"
14+
15+
"github.com/DATA-DOG/go-sqlmock"
16+
"github.com/stretchr/testify/require"
17+
18+
"instant.dev/internal/providers/dbsafety"
19+
)
20+
21+
func sampleRecord() dbsafety.AuditRecord {
22+
return dbsafety.AuditRecord{
23+
Kind: dbsafety.AuditKindCustomerDBDirectDrop,
24+
Provider: "db.local",
25+
Token: "tok",
26+
DatabaseName: "db_tok",
27+
UserName: "usr_tok",
28+
DSNHost: "postgres-customers",
29+
}
30+
}
31+
32+
// waitForExpectations polls the mock until every expectation is met or the
33+
// deadline passes — the Emit insert runs in a safego goroutine.
34+
func waitForExpectations(t *testing.T, mock sqlmock.Sqlmock) {
35+
t.Helper()
36+
deadline := time.Now().Add(2 * time.Second)
37+
for {
38+
if err := mock.ExpectationsWereMet(); err == nil {
39+
return
40+
}
41+
if time.Now().After(deadline) {
42+
t.Fatalf("sqlmock expectations not met before deadline: %v", mock.ExpectationsWereMet())
43+
}
44+
time.Sleep(10 * time.Millisecond)
45+
}
46+
}
47+
48+
// TestDBSafetyAuditSink_Emit_Inserts covers the happy goroutine path: Emit
49+
// marshals the metadata and InsertAuditEvent writes the row.
50+
func TestDBSafetyAuditSink_Emit_Inserts(t *testing.T) {
51+
db, mock := newMock(t)
52+
mock.ExpectExec(`INSERT INTO audit_log`).WillReturnResult(sqlmock.NewResult(0, 1))
53+
54+
s := &dbSafetyAuditSink{db: db}
55+
s.Emit(context.Background(), sampleRecord())
56+
57+
waitForExpectations(t, mock)
58+
}
59+
60+
// TestDBSafetyAuditSink_Emit_InsertError covers the InsertAuditEvent-failure log
61+
// branch: the insert errors but Emit must not panic or surface anything.
62+
func TestDBSafetyAuditSink_Emit_InsertError(t *testing.T) {
63+
db, mock := newMock(t)
64+
mock.ExpectExec(`INSERT INTO audit_log`).WillReturnError(errors.New("boom"))
65+
66+
s := &dbSafetyAuditSink{db: db}
67+
s.Emit(context.Background(), sampleRecord())
68+
69+
waitForExpectations(t, mock)
70+
// Give the goroutine a beat to run its error-log branch after the failing
71+
// exec is observed (the log call itself has no observable side effect).
72+
require.NotNil(t, s.db)
73+
}
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
package models_test
2+
3+
// dbsafety_audit_sink_test.go — covers the *sql.DB-backed dbsafety audit sink
4+
// (truehomie-db hardening). Integration test — needs TEST_DATABASE_URL; skips
5+
// cleanly otherwise.
6+
7+
import (
8+
"context"
9+
"encoding/json"
10+
"testing"
11+
"time"
12+
13+
"github.com/stretchr/testify/assert"
14+
"github.com/stretchr/testify/require"
15+
16+
"instant.dev/internal/models"
17+
"instant.dev/internal/providers/dbsafety"
18+
"instant.dev/internal/testhelpers"
19+
)
20+
21+
// TestWireDBSafetyAuditSink_NilIsNoop asserts a nil db leaves the default
22+
// (structured-slog) sink in place rather than installing a panicking nil-db
23+
// sink. Pure — no DB needed.
24+
func TestWireDBSafetyAuditSink_NilIsNoop(t *testing.T) {
25+
dbsafety.SetAuditSink(nil)
26+
t.Cleanup(func() { dbsafety.SetAuditSink(nil) })
27+
28+
models.WireDBSafetyAuditSink(nil) // must NOT install a *sql.DB sink over a nil db
29+
30+
// A guard against a dev host with valid names triggers exactly one emit;
31+
// with the default slog sink this must not panic.
32+
err := dbsafety.GuardDrop(context.Background(), dbsafety.DropParams{
33+
Provider: "db.local",
34+
Env: dbsafety.EnvDevelopment,
35+
DSNHost: "postgres://u:p@localhost:5432/d",
36+
Token: "tok",
37+
DatabaseName: "db_tok",
38+
UserName: "usr_tok",
39+
})
40+
require.NoError(t, err)
41+
}
42+
43+
// TestDBSafetyAuditSink_EmitWritesRow drives the production sink end-to-end: a
44+
// sanctioned (dev-host, valid-name) GuardDrop emits a customer_db.direct_drop
45+
// audit_log row carrying the destroyed identifiers + DSN host. The emit fires
46+
// from a goroutine, so the row is polled for.
47+
func TestDBSafetyAuditSink_EmitWritesRow(t *testing.T) {
48+
db, clean := testhelpers.SetupTestDB(t)
49+
defer clean()
50+
51+
models.WireDBSafetyAuditSink(db)
52+
t.Cleanup(func() { dbsafety.SetAuditSink(nil) })
53+
54+
const token = "auditseam-tok"
55+
err := dbsafety.GuardDrop(context.Background(), dbsafety.DropParams{
56+
Provider: "db.local",
57+
Env: dbsafety.EnvDevelopment,
58+
DSNHost: "postgres://u:p@postgres-customers:5432/d",
59+
Token: token,
60+
DatabaseName: "db_" + token,
61+
UserName: "usr_" + token,
62+
})
63+
require.NoError(t, err)
64+
65+
// Poll for the audit row (team_id is NULL — admin-only rows — so query the
66+
// kind directly rather than via ListAuditEventsByTeam, which filters team).
67+
deadline := time.Now().Add(3 * time.Second)
68+
var (
69+
gotKind, gotActor, gotSummary, gotMeta string
70+
found bool
71+
)
72+
for {
73+
row := db.QueryRowContext(context.Background(), `
74+
SELECT kind, actor, summary, COALESCE(metadata::text, '')
75+
FROM audit_log
76+
WHERE kind = $1
77+
AND metadata->>'token' = $2
78+
ORDER BY created_at DESC
79+
LIMIT 1
80+
`, dbsafety.AuditKindCustomerDBDirectDrop, token)
81+
if err := row.Scan(&gotKind, &gotActor, &gotSummary, &gotMeta); err == nil {
82+
found = true
83+
break
84+
}
85+
if time.Now().After(deadline) {
86+
break
87+
}
88+
time.Sleep(25 * time.Millisecond)
89+
}
90+
require.True(t, found, "a customer_db.direct_drop audit row must land after a sanctioned drop")
91+
92+
assert.Equal(t, dbsafety.AuditKindCustomerDBDirectDrop, gotKind)
93+
assert.Equal(t, "system", gotActor)
94+
assert.Contains(t, gotSummary, "direct customer-DB drop")
95+
96+
var meta map[string]string
97+
require.NoError(t, json.Unmarshal([]byte(gotMeta), &meta))
98+
assert.Equal(t, "db.local", meta["provider"])
99+
assert.Equal(t, "db_"+token, meta["database"])
100+
assert.Equal(t, "usr_"+token, meta["user"])
101+
assert.Equal(t, "postgres-customers", meta["dsn_host"], "DSN host (no credentials) must be recorded")
102+
}

internal/providers/db/local.go

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ import (
1313

1414
"github.com/jackc/pgx/v5"
1515
"github.com/jackc/pgx/v5/pgconn"
16+
"instant.dev/internal/providers/dbsafety"
1617
)
1718

1819
const defaultCustomersURL = "postgres://instant_cust:instant_cust@postgres-customers:5432/instant_customers?sslmode=disable"
@@ -47,14 +48,18 @@ var pgxConnect = func(ctx context.Context, connString string) (pgConn, error) {
4748
// LocalBackend provisions databases on the shared postgres-customers instance.
4849
type LocalBackend struct {
4950
customersURL string // admin connection URL
51+
env string // process ENVIRONMENT — feeds the dbsafety production refusal
5052
}
5153

5254
// newLocalBackend creates a LocalBackend using the given admin connection URL.
53-
func newLocalBackend(customersURL string) *LocalBackend {
55+
// env is the process ENVIRONMENT (cfg.Environment); it feeds the dbsafety
56+
// production-refusal guard so this dev-only fallback fails closed when
57+
// PROVISIONER_ADDR is unset against a non-dev customer-DB host.
58+
func newLocalBackend(customersURL, env string) *LocalBackend {
5459
if customersURL == "" {
5560
customersURL = defaultCustomersURL
5661
}
57-
return &LocalBackend{customersURL: customersURL}
62+
return &LocalBackend{customersURL: customersURL, env: env}
5863
}
5964

6065
// generatePassword returns a cryptographically random alphanumeric string of length n.
@@ -206,6 +211,22 @@ func (b *LocalBackend) Deprovision(ctx context.Context, token, providerResourceI
206211
dbName := "db_" + token
207212
username := "usr_" + token
208213

214+
// dbsafety guard (truehomie-db incident): refuse the DROP entirely when
215+
// this dev-only fallback is effectively in production (non-dev customer-DB
216+
// host) or the target name doesn't match the per-tenant convention, and
217+
// audit every sanctioned drop. Runs BEFORE we open the superuser
218+
// connection so a refused op never even touches the customer cluster.
219+
if err := dbsafety.GuardDrop(ctx, dbsafety.DropParams{
220+
Provider: "db.local",
221+
Env: b.env,
222+
DSNHost: b.customersURL,
223+
Token: token,
224+
DatabaseName: dbName,
225+
UserName: username,
226+
}); err != nil {
227+
return fmt.Errorf("db.local.Deprovision: %w", err)
228+
}
229+
209230
conn, err := pgxConnect(ctx, b.customersURL)
210231
if err != nil {
211232
return fmt.Errorf("db.local.Deprovision: connect: %w", err)

0 commit comments

Comments
 (0)