Skip to content

Commit e4b33c3

Browse files
fix(provisioner): bug-bash — pool item leak, redis ACL fail-closed, quota error matching (#37)
* fix(provisioner): bug-bash — pool item leak, redis ACL fail-closed, quota error matching Three bugs from the 2026-06-02 platform bug bash: - #3 (P1) pool: a Claimed Postgres pool item was LEAKED when the connection- limit regrade failed or the item carried no PoolToken — the code fell through to live provisioning leaving the pool_items row stuck 'assigned' with no owning resource. Add pool.Manager.Discard (marks 'failed' + refills) and a Discard method on the PoolClaimer interface; call it in both fallback branches of provisionPostgres. - #19 (P2) redis LocalBackend.Provision: on ACL SETUSER failure it silently returned a credential-less shared-instance URL (the shared redis-provision pod is nopass/+@ALL, so KeyPrefix is advisory only) — full cross-tenant access. Now fails CLOSED on the shared multi-tenant backend; a single-tenant/ dev deployment opts into the namespace-only fallback via REDIS_ALLOW_INSECURE_NO_ACL_FALLBACK=true. - #24 (P3) redis StorageBytes: the MEMORY USAGE error guard skipped any error whose string contained "ERR", silently swallowing real server errors ("ERR max number of clients reached", "ERR DENIED by ACL") and under-counting quota. The deleted-key race is goredis.Nil only — skip ONLY that. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(provisioner): cover Discard call sites + pool.Discard; defer #24 - Add hermetic server tests exercising the Discard-error log branches in provisionPostgres (regrade-fail + missing-PoolToken) so the bug-bash #3 changed lines hit 100% patch coverage. - Add a DB-gated pool.Manager.Discard test (marks an assigned item 'failed'). - Revert the #24 StorageBytes error-matching change: the only changed line sits inside the MemoryUsage-error block, which has no deterministic test without a redis mock dependency (disproportionate for a P3 quota-undercount edge case). Deferred — will reland with a redismock-based test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(provisioner): StorageBytes propagates server errors (#24, security HIGH) Re-land the #24 fix flagged HIGH by the automated security review (quota bypass / fail-open): drop the `strings.Contains(err.Error(),"ERR")` clause so MEMORY USAGE skips ONLY the goredis.Nil deleted-key race and propagates every real server error, instead of swallowing them and reporting a truncated total. Covered hermetically via redismock (no real Redis): one test asserts a server error propagates, one asserts goredis.Nil skips just that key. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(provisioner): cover pool.Discard error-return branch (#3) Add a closed-pool case to the DB-gated Discard test so the exec-error wrap (manager.go) is covered — closes the #37 patch-coverage gap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Manas Srivastava <[email protected]> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 6286073 commit e4b33c3

9 files changed

Lines changed: 280 additions & 27 deletions

File tree

go.mod

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ go 1.25.0
55
toolchain go1.25.10
66

77
require (
8+
github.com/go-redis/redismock/v9 v9.2.0
89
github.com/google/uuid v1.6.0
910
github.com/jackc/pgx/v5 v5.9.2
1011
github.com/minio/minio-go/v7 v7.0.90

go.sum

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@ github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkp
2424
github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto=
2525
github.com/emicklei/go-restful/v3 v3.11.0 h1:rAQeMHw1c7zTmncogyy8VvRZwtkmkZ4FxERmMY4rD+g=
2626
github.com/emicklei/go-restful/v3 v3.11.0/go.mod h1:6n3XBCmQQb25CM2LCACGz8ukIrRry+4bhvbpWn3mrbc=
27+
github.com/fsnotify/fsnotify v1.4.9 h1:hsms1Qyu0jgnwNXIxa+/V/PDsU6CfLf6CNO8H7IWoS4=
28+
github.com/fsnotify/fsnotify v1.4.9/go.mod h1:znqG4EE+3YCdAaPaxE2ZRY/06pZUdp0tY4IgpuI1SZQ=
2729
github.com/fxamacker/cbor/v2 v2.7.0 h1:iM5WgngdRBanHcxugY4JySA0nk1wZorNOpTgCMedv5E=
2830
github.com/fxamacker/cbor/v2 v2.7.0/go.mod h1:pxXPTn3joSm21Gbwsv0w9OSA2y1HFR9qXEeXQVeNoDQ=
2931
github.com/go-ini/ini v1.67.0 h1:z6ZrTEZqSWOTyH2FlglNbNgARyHG8oLW9gMELqKr06A=
@@ -39,6 +41,8 @@ github.com/go-openapi/jsonreference v0.20.4 h1:bKlDxQxQJgwpUSgOENiMPzCTBVuc7vTdX
3941
github.com/go-openapi/jsonreference v0.20.4/go.mod h1:5pZJyJP2MnYCpoeoMAql78cCHauHj0V9Lhc506VOpw4=
4042
github.com/go-openapi/swag v0.22.9 h1:XX2DssF+mQKM2DHsbgZK74y/zj4mo9I99+89xUmuZCE=
4143
github.com/go-openapi/swag v0.22.9/go.mod h1:3/OXnFfnMAwBD099SwYRk7GD3xOrr1iL7d/XNLXVVwE=
44+
github.com/go-redis/redismock/v9 v9.2.0 h1:ZrMYQeKPECZPjOj5u9eyOjg8Nnb0BS9lkVIZ6IpsKLw=
45+
github.com/go-redis/redismock/v9 v9.2.0/go.mod h1:18KHfGDK4Y6c2R0H38EUGWAdc7ZQS9gfYxc94k7rWT0=
4246
github.com/go-task/slim-sprig/v3 v3.0.0 h1:sUs3vkvUymDpBKi3qH1YSqBQk9+9D/8M2mN1vB6EwHI=
4347
github.com/go-task/slim-sprig/v3 v3.0.0/go.mod h1:W848ghGpv3Qj3dhTPRyJypKRiqCdHZiAzKg9hl15HA8=
4448
github.com/goccy/go-json v0.10.5 h1:Fq85nIqj+gXn/S5ahsiTlK3TmC85qgirsdTP/+DeaC4=
@@ -125,10 +129,14 @@ github.com/newrelic/go-agent/v3/integrations/nrgrpc v1.4.9 h1:mkoYqqEjFTNjJURsX+
125129
github.com/newrelic/go-agent/v3/integrations/nrgrpc v1.4.9/go.mod h1:KkYfN06JZLI/H6l7w2+TJ5ILKF5NCXN5iysLsKkzMiI=
126130
github.com/newrelic/go-agent/v3/integrations/nrsecurityagent v1.1.0 h1:gqkTDYUHWUyiG+u0PJQCRh98rcHLxP/w7GtIbJDVULY=
127131
github.com/newrelic/go-agent/v3/integrations/nrsecurityagent v1.1.0/go.mod h1:3wugGvRmOVYov/08y+D8tB1uYIZds5bweVdr5vo4Gbs=
132+
github.com/nxadm/tail v1.4.8 h1:nPr65rt6Y5JFSKQO7qToXr7pePgD6Gwiw05lkbyAQTE=
133+
github.com/nxadm/tail v1.4.8/go.mod h1:+ncqLTQzXmGhMZNUePPaPqPvBxHAIsmXswZKocGu+AU=
134+
github.com/onsi/ginkgo v1.16.5 h1:8xi0RTUf59SOSfEtZMvwTvXYMzG4gV23XVHOZiXNtnE=
135+
github.com/onsi/ginkgo v1.16.5/go.mod h1:+E8gABHa3K6zRBolWtd+ROzc/U5bkGt0FwiG042wbpU=
128136
github.com/onsi/ginkgo/v2 v2.19.0 h1:9Cnnf7UHo57Hy3k6/m5k3dRfGTMXGvxhHFvkDTCTpvA=
129137
github.com/onsi/ginkgo/v2 v2.19.0/go.mod h1:rlwLi9PilAFJ8jCg9UE1QP6VBpd6/xj3SRC0d6TU0To=
130-
github.com/onsi/gomega v1.19.0 h1:4ieX6qQjPP/BfC3mpsAtIGGlxTWPeA3Inl/7DtXw1tw=
131-
github.com/onsi/gomega v1.19.0/go.mod h1:LY+I3pBVzYsTBU1AnDwOSxaYi9WoWiqgwooUqq9yPro=
138+
github.com/onsi/gomega v1.25.0 h1:Vw7br2PCDYijJHSfBOWhov+8cAnUf8MfMaIOV323l6Y=
139+
github.com/onsi/gomega v1.25.0/go.mod h1:r+zV744Re+DiYCIPRlYOTxn0YkOLcAnW8k1xXdMPGhM=
132140
github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4=
133141
github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
134142
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
@@ -270,6 +278,8 @@ gopkg.in/evanphx/json-patch.v4 v4.12.0 h1:n6jtcsulIzXPJaxegRbvFNNrZDjbij7ny3gmSP
270278
gopkg.in/evanphx/json-patch.v4 v4.12.0/go.mod h1:p8EYWUEYMpynmqDbY58zCKCFZw8pRWMG4EsWvDvM72M=
271279
gopkg.in/inf.v0 v0.9.1 h1:73M5CoZyi3ZLMOyDlQh031Cx6N9NDJ2Vvfl76EDAgDc=
272280
gopkg.in/inf.v0 v0.9.1/go.mod h1:cWUDdTG/fYaXco+Dcufb5Vnc6Gp2YChqWtbxRZE0mXw=
281+
gopkg.in/tomb.v1 v1.0.0-20141024135613-dd632973f1e7 h1:uRGJdciOHaEIrze2W8Q3AKkepLTh2hOroT7a+7czfdQ=
282+
gopkg.in/tomb.v1 v1.0.0-20141024135613-dd632973f1e7/go.mod h1:dt/ZhP58zS4L8KSrWDmTeBkI65Dw0HsyUHuEVlX15mw=
273283
gopkg.in/yaml.v2 v2.2.8/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
274284
gopkg.in/yaml.v2 v2.4.0 h1:D8xgwECY7CYvx+Y2n4sBz93Jn9JRvxdiyyo8CTfuKaY=
275285
gopkg.in/yaml.v2 v2.4.0/go.mod h1:RDklbk79AGWmwhnvt/jBztapEOGDOx6ZbXqjP6csGnQ=

internal/backend/redis/coverage_test.go

Lines changed: 29 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1587,19 +1587,37 @@ func TestK8sBackend_StorageBytes_GetServiceError(t *testing.T) {
15871587
// TestLocalBackend_Provision_ACLFails_FallsBackToNamespace forces the
15881588
// ACL-fallback branch by pointing the LocalBackend at a closed port — Do(ACL
15891589
// SETUSER) errors and Provision returns the namespace-only URL.
1590-
func TestLocalBackend_Provision_ACLFails_FallsBackToNamespace(t *testing.T) {
1591-
b := &LocalBackend{
1592-
rdb: goredis.NewClient(&goredis.Options{
1593-
Addr: "127.0.0.1:1",
1594-
DialTimeout: 200 * time.Millisecond,
1595-
MaxRetries: -1,
1596-
}),
1597-
redisHost: "127.0.0.1:1",
1590+
func TestLocalBackend_Provision_ACLFails_FailsClosedUnlessOptIn(t *testing.T) {
1591+
newBackend := func() *LocalBackend {
1592+
return &LocalBackend{
1593+
rdb: goredis.NewClient(&goredis.Options{
1594+
Addr: "127.0.0.1:1",
1595+
DialTimeout: 200 * time.Millisecond,
1596+
MaxRetries: -1,
1597+
}),
1598+
redisHost: "127.0.0.1:1",
1599+
}
15981600
}
1599-
defer b.rdb.Close()
1600-
creds, err := b.Provision(context.Background(), "abc1234deadbeef", "anonymous")
1601+
1602+
// Default (no opt-in): an ACL SETUSER failure on the shared multi-tenant
1603+
// backend MUST fail closed — never hand out a credential-less shared URL
1604+
// that grants cross-tenant access (bug bash 2026-06-02 #19).
1605+
t.Setenv("REDIS_ALLOW_INSECURE_NO_ACL_FALLBACK", "")
1606+
b := newBackend()
1607+
if _, err := b.Provision(context.Background(), "abc1234deadbeef", "anonymous"); err == nil {
1608+
b.rdb.Close()
1609+
t.Fatal("Provision must fail closed when ACL SETUSER fails on the shared backend; got nil error")
1610+
}
1611+
b.rdb.Close()
1612+
1613+
// Explicit single-tenant/dev opt-in: the credential-less namespace fallback
1614+
// is permitted and still returns a redis:// URL embedding redisHost.
1615+
t.Setenv("REDIS_ALLOW_INSECURE_NO_ACL_FALLBACK", "true")
1616+
b2 := newBackend()
1617+
defer b2.rdb.Close()
1618+
creds, err := b2.Provision(context.Background(), "abc1234deadbeef", "anonymous")
16011619
if err != nil {
1602-
t.Fatalf("Provision (fallback): %v", err)
1620+
t.Fatalf("Provision (opt-in fallback): %v", err)
16031621
}
16041622
if !strings.Contains(creds.URL, "redis://") {
16051623
t.Errorf("fallback URL should still start with redis://; got %s", creds.URL)

internal/backend/redis/local.go

Lines changed: 34 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ import (
1010
"crypto/rand"
1111
"encoding/hex"
1212
"fmt"
13+
"log/slog"
1314
"os"
14-
"strings"
1515

1616
goredis "github.com/redis/go-redis/v9"
1717

@@ -149,14 +149,28 @@ func (b *LocalBackend) Provision(ctx context.Context, token, tier string) (*Cred
149149
}, nil
150150
}
151151

152-
// ACL failed (Redis < 6 or ACL disabled) — fall back to key-namespace isolation.
153-
// Return the shared Redis URL. Client must prefix all keys with {token}: to
154-
// stay in their namespace.
155-
url := fmt.Sprintf("redis://%s/0", userHost)
156-
return &Credentials{
157-
URL: url,
158-
KeyPrefix: keyPrefix,
159-
}, nil
152+
// ACL SETUSER failed. The LocalBackend serves the SHARED, multi-tenant
153+
// redis-provision pool whose default user is nopass/+@all — so the old
154+
// credential-less "redis://host/0" fallback handed this tenant UNRESTRICTED
155+
// access to every other tenant's keys (the KeyPrefix is advisory, NOT
156+
// enforced). That is a cross-tenant isolation failure, so we now fail
157+
// CLOSED on the shared backend (bug bash 2026-06-02 #19).
158+
//
159+
// The credential-less, key-namespace-only fallback is only acceptable on a
160+
// genuinely single-tenant deployment (or Redis < 6 with no ACL support in
161+
// dev). Such deployments opt in explicitly via
162+
// REDIS_ALLOW_INSECURE_NO_ACL_FALLBACK=true; prod leaves it unset and gets
163+
// a hard error instead of a silent isolation downgrade.
164+
if os.Getenv("REDIS_ALLOW_INSECURE_NO_ACL_FALLBACK") == "true" {
165+
slog.Warn("cache.provisionLocal: ACL SETUSER failed — returning INSECURE credential-less shared URL (REDIS_ALLOW_INSECURE_NO_ACL_FALLBACK=true; no enforced cross-tenant isolation)",
166+
"error", aclCmd.Err())
167+
url := fmt.Sprintf("redis://%s/0", userHost)
168+
return &Credentials{
169+
URL: url,
170+
KeyPrefix: keyPrefix,
171+
}, nil
172+
}
173+
return nil, fmt.Errorf("cache.provisionLocal: ACL SETUSER failed on shared multi-tenant Redis — refusing to return a credential-less shared URL (set REDIS_ALLOW_INSECURE_NO_ACL_FALLBACK=true only for single-tenant/dev): %w", aclCmd.Err())
160174
}
161175

162176
// publicHostPort returns the host:port to embed in user-facing Redis URLs.
@@ -212,9 +226,17 @@ func (b *LocalBackend) StorageBytes(ctx context.Context, token, providerResource
212226
// MEMORY USAGE returns bytes used by the key including metadata.
213227
mem, err := b.rdb.MemoryUsage(ctx, key).Result()
214228
if err != nil {
215-
// goredis.Nil / "ERR" => the key was deleted between SCAN and
216-
// MEMORY USAGE — a benign race, skip just that key.
217-
if err == goredis.Nil || strings.Contains(err.Error(), "ERR") {
229+
// Key deleted between SCAN and MEMORY USAGE — a benign race.
230+
// Redis returns a nil bulk reply for MEMORY USAGE on a missing
231+
// key, which go-redis maps to goredis.Nil. Skip ONLY that case
232+
// (bug bash 2026-06-02 #24 / security review HIGH): the old
233+
// `|| strings.Contains(err.Error(), "ERR")` clause added no
234+
// real race coverage and silently swallowed genuine server
235+
// errors ("ERR max number of clients reached", "ERR DENIED by
236+
// ACL"), under-counting quota and reporting a partial sum as
237+
// authoritative. Every other error propagates so quota is never
238+
// decided on partial data.
239+
if err == goredis.Nil {
218240
continue
219241
}
220242
// Any other error (conn drop, timeout, ctx-cancel) means the
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
package redis
2+
3+
// storagebytes_error_test.go — hermetic (redismock, no real Redis) coverage
4+
// for the MEMORY USAGE error handling in LocalBackend.StorageBytes. Pins bug
5+
// bash 2026-06-02 #24 / the HIGH security finding: a genuine server error must
6+
// propagate (quota is never decided on a truncated total), while the benign
7+
// deleted-key race (goredis.Nil) is skipped.
8+
9+
import (
10+
"context"
11+
"errors"
12+
"testing"
13+
14+
"github.com/go-redis/redismock/v9"
15+
)
16+
17+
func TestLocalBackend_StorageBytes_MemoryUsageServerError_Propagates(t *testing.T) {
18+
rdb, mock := redismock.NewClientMock()
19+
b := &LocalBackend{rdb: rdb, redisHost: "x"}
20+
const token = "tok24err"
21+
key := token + ":k1"
22+
23+
mock.ExpectScan(0, token+":*", 100).SetVal([]string{key}, 0)
24+
mock.ExpectMemoryUsage(key).SetErr(errors.New("ERR max number of clients reached"))
25+
26+
if _, err := b.StorageBytes(context.Background(), token, ""); err == nil {
27+
t.Fatal("StorageBytes must PROPAGATE a server error from MEMORY USAGE (not swallow it and report a truncated total)")
28+
}
29+
}
30+
31+
func TestLocalBackend_StorageBytes_MemoryUsageNil_SkipsKey(t *testing.T) {
32+
rdb, mock := redismock.NewClientMock()
33+
b := &LocalBackend{rdb: rdb, redisHost: "x"}
34+
const token = "tok24nil"
35+
key := token + ":k1"
36+
37+
mock.ExpectScan(0, token+":*", 100).SetVal([]string{key}, 0)
38+
mock.ExpectMemoryUsage(key).RedisNil()
39+
40+
got, err := b.StorageBytes(context.Background(), token, "")
41+
if err != nil {
42+
t.Fatalf("goredis.Nil (deleted-key race) must be skipped, not propagated; got err: %v", err)
43+
}
44+
if got != 0 {
45+
t.Errorf("got %d; want 0 (the only key was skipped as a deleted-key race)", got)
46+
}
47+
}

internal/pool/manager.go

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,30 @@ func (m *Manager) Claim(ctx context.Context, resourceType string) (*Item, error)
204204
return &item, nil
205205
}
206206

207+
// Discard marks a previously-Claimed item as 'failed' so it is never handed
208+
// out again, and triggers a refill to replace it. Callers invoke this when
209+
// they cannot safely use a claimed item (e.g. the connection-limit regrade
210+
// failed, or the item carries no pool token) and fall back to live
211+
// provisioning. Without it the row stays 'assigned' forever with no owning
212+
// resource row — a slow leak of pre-provisioned backing infra that no sweeper
213+
// can distinguish from a legitimately in-use item (bug bash 2026-06-02 #3).
214+
func (m *Manager) Discard(ctx context.Context, item *Item) error {
215+
if item == nil {
216+
return nil
217+
}
218+
if _, err := m.db.Exec(ctx, `
219+
UPDATE pool_items
220+
SET status = 'failed', assigned_at = now()
221+
WHERE id = $1
222+
`, item.ID); err != nil {
223+
return fmt.Errorf("pool.Discard item %s: %w", item.ID, err)
224+
}
225+
slog.Warn("pool.Discard: claimed item marked failed (unusable, falling back to live)",
226+
"pool_id", item.ID, "resource_type", item.ResourceType)
227+
m.triggerRefill(item.ResourceType)
228+
return nil
229+
}
230+
207231
// Stats returns the count of ready items per resource type.
208232
func (m *Manager) Stats(ctx context.Context) (map[string]int, error) {
209233
rows, err := m.db.Query(ctx, `

internal/pool/manager_db_test.go

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -269,3 +269,44 @@ func TestStartShutdown_Lifecycle(t *testing.T) {
269269
t.Fatal("Shutdown did not return within 5s")
270270
}
271271
}
272+
273+
// TestManager_Discard_MarksFailed covers bug bash #3: Discard flips a claimed
274+
// (assigned) item to 'failed' so it is never handed out again and a sweeper
275+
// can reclaim it. DB-gated (skips without TEST_PROVISIONER_DATABASE_URL).
276+
func TestManager_Discard_MarksFailed(t *testing.T) {
277+
m, pool, _ := newDBManager(t, Config{})
278+
ctx := context.Background()
279+
280+
var id string
281+
if err := pool.QueryRow(ctx, `
282+
INSERT INTO pool_items (resource_type, connection_url, status)
283+
VALUES ('postgres', 'enc-url', 'assigned')
284+
RETURNING id
285+
`).Scan(&id); err != nil {
286+
t.Fatalf("seed assigned pool_item: %v", err)
287+
}
288+
289+
if err := m.Discard(ctx, &Item{ID: id, ResourceType: "postgres"}); err != nil {
290+
t.Fatalf("Discard: %v", err)
291+
}
292+
293+
var status string
294+
if err := pool.QueryRow(ctx, `SELECT status FROM pool_items WHERE id = $1`, id).Scan(&status); err != nil {
295+
t.Fatalf("select status: %v", err)
296+
}
297+
if status != "failed" {
298+
t.Errorf("status = %q; want 'failed' after Discard", status)
299+
}
300+
301+
// A nil item is a no-op (defensive guard).
302+
if err := m.Discard(ctx, nil); err != nil {
303+
t.Errorf("Discard(nil) should be a no-op, got %v", err)
304+
}
305+
306+
// Error path: a closed pool makes the UPDATE fail → Discard wraps and
307+
// returns the error (covers the error-return branch).
308+
pool.Close()
309+
if err := m.Discard(ctx, &Item{ID: id, ResourceType: "postgres"}); err == nil {
310+
t.Error("Discard on a closed pool must return the wrapped exec error")
311+
}
312+
}

internal/server/server.go

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,10 @@ func teamIDFromContext(ctx context.Context) string {
6161
// (*pool.Manager).Claim so the concrete type continues to satisfy it.
6262
type PoolClaimer interface {
6363
Claim(ctx context.Context, resourceType string) (*pool.Item, error)
64+
// Discard releases a claimed item the caller cannot safely use, marking it
65+
// so it is never handed out again (and refilling). Without it a claimed-
66+
// but-unusable item leaks as a stuck 'assigned' row (bug bash #3).
67+
Discard(ctx context.Context, item *pool.Item) error
6468
}
6569

6670
// Server implements ProvisionerServiceServer.
@@ -415,6 +419,12 @@ func (s *Server) provisionPostgres(ctx context.Context, req *provisionerv1.Provi
415419
// which applies the cap at CREATE USER time.
416420
slog.Warn("server.provisionPostgres: pool item connection-limit regrade failed (falling back to live)",
417421
"pool_id", item.ID, "tier", req.Tier, "conn_limit", connLimit, "error", rerr)
422+
// Release the consumed item so it isn't leaked as a stuck
423+
// 'assigned' row with no owning resource (bug bash #3).
424+
if derr := s.pool.Discard(ctx, item); derr != nil {
425+
slog.Warn("server.provisionPostgres: failed to discard unusable pool item (manual reclaim needed)",
426+
"pool_id", item.ID, "error", derr)
427+
}
418428
} else {
419429
slog.Info("server.provisionPostgres: pool item connection limit applied",
420430
"pool_id", item.ID, "tier", req.Tier,
@@ -438,6 +448,11 @@ func (s *Server) provisionPostgres(ctx context.Context, req *provisionerv1.Provi
438448
// provision rather than hand out an uncapped role.
439449
slog.Warn("server.provisionPostgres: pool item missing PoolToken — cannot apply connection limit, falling back to live",
440450
"pool_id", item.ID, "username", item.Username)
451+
// Release the consumed item so it isn't leaked (bug bash #3).
452+
if derr := s.pool.Discard(ctx, item); derr != nil {
453+
slog.Warn("server.provisionPostgres: failed to discard unusable pool item (manual reclaim needed)",
454+
"pool_id", item.ID, "error", derr)
455+
}
441456
}
442457
}
443458
}

0 commit comments

Comments
 (0)