Skip to content

Commit d2e512b

Browse files
author
claude
committed
apiserver: backup upload hardening — size pre-flight, key entropy, rate limiting
Three defects from the backup upload half of the first-customer audit (part 2). Complements the earlier webhook-race fix (20e4243) to round out the 'first customer contact' path. 1. B1 — Pre-flight size check via Content-Length. Handler was reading body through a 100MB-cap tee reader and only rejecting oversize AFTER writing to the blob store, then deleting. 100MB of wasted disk I/O every time a buggy client retries an oversize upload. Now: ContentLength header checked against maxBackupSize first; oversize rejected with 413 before touching disk. The post- write check stays as a backstop for chunked / Content-Length- less clients, but the common case is cheap now. 2. B3 — Blob key collision prevention. Key format was acct-{N}-site-{N}-{UnixNano}.blob. Two uploads for the same (account, site) in the same nanosecond produce identical keys; LocalBlobStore.Put uses os.Rename which silently overwrites, so metadata row #1 ends up pointing at row #2's bytes — restore-by-id would return the wrong data. Extremely rare in practice but catastrophic when it hits. Fix: append 8 bytes of crypto/rand as hex to the key. Collisions now effectively impossible (2^64 entropy on top of the nanos). 3. B6 — Per-account rate limiting. Backup upload endpoint had no limits. A runaway client (loop bug, stolen license, malicious script) could hammer the endpoint and fill the Railway volume. Limits: 10 uploads per hour, 100 per day, per cloud account. Moderate by design — a small business doing a few scheduled + a few manual backups per day won't notice; a runaway client exceeds them quickly. Implementation: sliding window via SELECT COUNT(*) on cloud_backup_blobs.uploaded_at (which is already indexed). Matches the datetime('now', '-N seconds') pattern already in cloud_retention.go. No new table, no separate state to sync. Returns 429 with Retry-After header on breach. Fails OPEN on DB error (a spurious 429 for a legitimate customer is worse than a rare skipped check). New test file internal/apiserver/cloud_backup_hardening_test.go with 6 tests: - TestBackupUpload_PreflightSizeCheck — 413 before any blob or DB write on Content-Length > max - TestBackupUpload_KeysUnique — 5 rapid uploads → 5 distinct keys, each with the expected 5-dash shape - TestBackupUpload_RateLimitHourly — exactly N succeed, (N+1)th 429s with Retry-After + 'per hour' message - TestBackupUpload_RateLimitIsolatedByAccount — account A maxing out does not affect account B - TestCountBackupsInWindow — unit test on the DB helper across 5 window sizes (1m, 1h, 3h, 3d, 0) - TestRateLimitConstants — sanity-check the limit values New DB method: SqliteDB.CountBackupsInWindow(accountID, windowSeconds). New constants: backupsPerHourLimit = 10, backupsPerDayLimit = 100. New imports in cloud_handlers.go: crypto/rand, encoding/binary. Remaining first-customer audit items not addressed: - B9 (client memory bomb during encrypt — 2x allocation) — desktop-side, low priority for initial customers given the 100MB upload cap - B12 (no upload progress UI) — desktop-side UX polish - B13 (no retry on transient network failure) — desktop-side resilience, would add retry + exponential backoff on 5xx/network - B16 (zip-bomb defense on restore) — requires MITM + stolen-key combo to exploit, low likelihood - Pre-launch: bearer-token hashing, per-IP rate limiting, License.Validate fail-closed on unknown tiers Verification: go vet clean, gofmt clean, full apiserver test suite passes (15s).
1 parent 20e4243 commit d2e512b

3 files changed

Lines changed: 461 additions & 16 deletions

File tree

Lines changed: 339 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,339 @@
1+
package apiserver
2+
3+
import (
4+
"bytes"
5+
"encoding/json"
6+
"fmt"
7+
"io"
8+
"net/http/httptest"
9+
"strings"
10+
"sync"
11+
"testing"
12+
"time"
13+
)
14+
15+
// inMemBlobs is a BlobStore stub that actually accepts writes, for
16+
// tests that need uploads to succeed (rate limiting, key generation).
17+
// It reads and discards the body (counting bytes) and stores nothing;
18+
// we don't need to inspect blob contents for these tests.
19+
type inMemBlobs struct {
20+
mu sync.Mutex
21+
keys map[string]int64
22+
}
23+
24+
func newInMemBlobs() *inMemBlobs {
25+
return &inMemBlobs{keys: map[string]int64{}}
26+
}
27+
28+
func (b *inMemBlobs) Put(key string, r io.Reader) (int64, error) {
29+
n, err := io.Copy(io.Discard, r)
30+
if err != nil {
31+
return n, err
32+
}
33+
b.mu.Lock()
34+
defer b.mu.Unlock()
35+
if _, exists := b.keys[key]; exists {
36+
// Match LocalBlobStore's overwrite semantics. We surface the
37+
// key-reuse fact via b.keys so tests can detect it.
38+
b.keys[key] = n
39+
return n, nil
40+
}
41+
b.keys[key] = n
42+
return n, nil
43+
}
44+
func (b *inMemBlobs) Get(key string) (io.ReadCloser, error) {
45+
return nil, ErrBlobNotFound
46+
}
47+
func (b *inMemBlobs) Delete(key string) error {
48+
b.mu.Lock()
49+
defer b.mu.Unlock()
50+
delete(b.keys, key)
51+
return nil
52+
}
53+
func (b *inMemBlobs) KeyCount() int {
54+
b.mu.Lock()
55+
defer b.mu.Unlock()
56+
return len(b.keys)
57+
}
58+
func (b *inMemBlobs) Keys() []string {
59+
b.mu.Lock()
60+
defer b.mu.Unlock()
61+
out := make([]string, 0, len(b.keys))
62+
for k := range b.keys {
63+
out = append(out, k)
64+
}
65+
return out
66+
}
67+
68+
// --- B1: pre-flight size check ---
69+
70+
// TestBackupUpload_PreflightSizeCheck: a client that sets
71+
// Content-Length larger than maxBackupSize should get 413 WITHOUT
72+
// any bytes being written to the blob store.
73+
func TestBackupUpload_PreflightSizeCheck(t *testing.T) {
74+
db, cleanup := newTestDB(t)
75+
defer cleanup()
76+
77+
email := "preflight@example.com"
78+
acct, err := db.CreateCloudAccount(email, "cloud-single", "cus_P", "sub_P")
79+
if err != nil {
80+
t.Fatalf("create account: %v", err)
81+
}
82+
licenseKey := seedActiveCloudLicense(t, db, email, "cloud-single")
83+
blobs := newInMemBlobs()
84+
svc := NewCloudService(db, &LogMailer{}, blobs, "http://localhost", false)
85+
86+
// Build a request with Content-Length larger than max.
87+
body := strings.NewReader("fake body doesn't matter")
88+
r := httptest.NewRequest("POST", "/api/cloud/desktop/backup", body)
89+
r.Header.Set("Authorization", "Bearer "+licenseKey)
90+
r.ContentLength = maxBackupSize + 1
91+
r.Header.Set("X-Site-Slug", "default")
92+
93+
w := httptest.NewRecorder()
94+
svc.HandleBackupUpload(w, r)
95+
96+
if w.Code != 413 {
97+
t.Fatalf("expected 413 (payload too large), got %d: %s", w.Code, w.Body.String())
98+
}
99+
if blobs.KeyCount() != 0 {
100+
t.Fatalf("expected no blobs written on preflight reject, got %d", blobs.KeyCount())
101+
}
102+
// Verify nothing made it into the DB either.
103+
var count int
104+
db.conn.QueryRow("SELECT COUNT(*) FROM cloud_backup_blobs WHERE account_id = ?", acct.ID).Scan(&count)
105+
if count != 0 {
106+
t.Fatalf("expected no DB rows on preflight reject, got %d", count)
107+
}
108+
}
109+
110+
// --- B3: blob key collision prevention ---
111+
112+
// TestBackupUpload_KeysUnique: 5 uploads in a tight loop (same account
113+
// + site) should produce 5 distinct blob keys. Before the fix the key
114+
// was acct-{N}-site-{N}-{nanos}.blob and same-nanosecond uploads
115+
// produced identical keys; now there's an 8-byte random suffix.
116+
func TestBackupUpload_KeysUnique(t *testing.T) {
117+
db, cleanup := newTestDB(t)
118+
defer cleanup()
119+
120+
email := "keys@example.com"
121+
if _, err := db.CreateCloudAccount(email, "cloud-multi", "cus_K", "sub_K"); err != nil {
122+
t.Fatalf("create account: %v", err)
123+
}
124+
licenseKey := seedActiveCloudLicense(t, db, email, "cloud-multi")
125+
blobs := newInMemBlobs()
126+
svc := NewCloudService(db, &LogMailer{}, blobs, "http://localhost", false)
127+
128+
for i := 0; i < 5; i++ {
129+
body := bytes.NewReader([]byte("payload"))
130+
r := httptest.NewRequest("POST", "/api/cloud/desktop/backup", body)
131+
r.Header.Set("Authorization", "Bearer "+licenseKey)
132+
r.Header.Set("X-Site-Slug", "default")
133+
r.ContentLength = 7
134+
135+
w := httptest.NewRecorder()
136+
svc.HandleBackupUpload(w, r)
137+
138+
if w.Code != 200 {
139+
t.Fatalf("upload %d: expected 200, got %d: %s", i, w.Code, w.Body.String())
140+
}
141+
}
142+
143+
keys := blobs.Keys()
144+
if len(keys) != 5 {
145+
t.Fatalf("expected 5 unique keys, got %d: %v", len(keys), keys)
146+
}
147+
// Spot-check: all keys start with the expected prefix and contain
148+
// BOTH a nanosecond chunk and a random hex chunk (two numeric
149+
// sections separated by '-' after the site-id).
150+
for _, k := range keys {
151+
if !strings.HasPrefix(k, "acct-") || !strings.HasSuffix(k, ".blob") {
152+
t.Errorf("unexpected key shape: %s", k)
153+
}
154+
// acct-N-site-N-{nanos}-{hex}.blob has 5 dashes
155+
if n := strings.Count(k, "-"); n != 5 {
156+
t.Errorf("key %s has %d dashes, expected 5 (acct-N-site-N-nanos-rand)", k, n)
157+
}
158+
}
159+
}
160+
161+
// --- B6: rate limiting ---
162+
163+
// TestBackupUpload_RateLimitHourly: exactly backupsPerHourLimit
164+
// uploads succeed, the next one returns 429 with Retry-After set.
165+
func TestBackupUpload_RateLimitHourly(t *testing.T) {
166+
db, cleanup := newTestDB(t)
167+
defer cleanup()
168+
169+
email := "rlh@example.com"
170+
if _, err := db.CreateCloudAccount(email, "cloud-multi", "cus_RH", "sub_RH"); err != nil {
171+
t.Fatalf("create account: %v", err)
172+
}
173+
licenseKey := seedActiveCloudLicense(t, db, email, "cloud-multi")
174+
blobs := newInMemBlobs()
175+
svc := NewCloudService(db, &LogMailer{}, blobs, "http://localhost", false)
176+
177+
doUpload := func() *httptest.ResponseRecorder {
178+
body := bytes.NewReader([]byte("x"))
179+
r := httptest.NewRequest("POST", "/api/cloud/desktop/backup", body)
180+
r.Header.Set("Authorization", "Bearer "+licenseKey)
181+
r.Header.Set("X-Site-Slug", "default")
182+
r.ContentLength = 1
183+
w := httptest.NewRecorder()
184+
svc.HandleBackupUpload(w, r)
185+
return w
186+
}
187+
188+
// First N should succeed.
189+
for i := 0; i < backupsPerHourLimit; i++ {
190+
w := doUpload()
191+
if w.Code != 200 {
192+
t.Fatalf("upload %d/%d: expected 200, got %d: %s",
193+
i+1, backupsPerHourLimit, w.Code, w.Body.String())
194+
}
195+
}
196+
197+
// The (N+1)th should 429.
198+
w := doUpload()
199+
if w.Code != 429 {
200+
t.Fatalf("upload %d: expected 429, got %d: %s",
201+
backupsPerHourLimit+1, w.Code, w.Body.String())
202+
}
203+
if w.Header().Get("Retry-After") == "" {
204+
t.Errorf("expected Retry-After header on 429, got empty")
205+
}
206+
207+
// Verify the 429 response body mentions the limit.
208+
var rlResp map[string]string
209+
if err := json.Unmarshal(w.Body.Bytes(), &rlResp); err != nil {
210+
t.Fatalf("decode 429 body: %v", err)
211+
}
212+
if !strings.Contains(rlResp["error"], "rate limit") {
213+
t.Errorf("expected error to mention rate limit, got: %s", rlResp["error"])
214+
}
215+
if !strings.Contains(rlResp["limit"], "per hour") {
216+
t.Errorf("expected limit to mention 'per hour', got: %s", rlResp["limit"])
217+
}
218+
}
219+
220+
// TestBackupUpload_RateLimitIsolatedByAccount: account A hitting its
221+
// limit does NOT affect account B.
222+
func TestBackupUpload_RateLimitIsolatedByAccount(t *testing.T) {
223+
db, cleanup := newTestDB(t)
224+
defer cleanup()
225+
226+
// Account A maxes out the hourly limit.
227+
emailA := "a@example.com"
228+
if _, err := db.CreateCloudAccount(emailA, "cloud-multi", "cus_A", "sub_A"); err != nil {
229+
t.Fatalf("create A: %v", err)
230+
}
231+
keyA := seedActiveCloudLicense(t, db, emailA, "cloud-multi")
232+
233+
// Account B with 0 uploads.
234+
emailB := "b@example.com"
235+
if _, err := db.CreateCloudAccount(emailB, "cloud-multi", "cus_B", "sub_B"); err != nil {
236+
t.Fatalf("create B: %v", err)
237+
}
238+
keyB := seedActiveCloudLicense(t, db, emailB, "cloud-multi")
239+
240+
blobs := newInMemBlobs()
241+
svc := NewCloudService(db, &LogMailer{}, blobs, "http://localhost", false)
242+
243+
upload := func(bearer string) int {
244+
body := bytes.NewReader([]byte("x"))
245+
r := httptest.NewRequest("POST", "/api/cloud/desktop/backup", body)
246+
r.Header.Set("Authorization", "Bearer "+bearer)
247+
r.Header.Set("X-Site-Slug", "default")
248+
r.ContentLength = 1
249+
w := httptest.NewRecorder()
250+
svc.HandleBackupUpload(w, r)
251+
return w.Code
252+
}
253+
254+
for i := 0; i < backupsPerHourLimit; i++ {
255+
if code := upload(keyA); code != 200 {
256+
t.Fatalf("A upload %d: got %d", i+1, code)
257+
}
258+
}
259+
// A should be rate-limited now.
260+
if code := upload(keyA); code != 429 {
261+
t.Fatalf("A overflow: expected 429, got %d", code)
262+
}
263+
// But B should still be fine.
264+
if code := upload(keyB); code != 200 {
265+
t.Fatalf("B fresh upload: expected 200, got %d", code)
266+
}
267+
}
268+
269+
// --- CountBackupsInWindow unit test ---
270+
271+
// TestCountBackupsInWindow: insert rows with controlled timestamps,
272+
// verify the count matches for different window sizes.
273+
func TestCountBackupsInWindow(t *testing.T) {
274+
db, cleanup := newTestDB(t)
275+
defer cleanup()
276+
277+
email := "cnt@example.com"
278+
acct, err := db.CreateCloudAccount(email, "cloud-multi", "cus_CNT", "sub_CNT")
279+
if err != nil {
280+
t.Fatalf("create account: %v", err)
281+
}
282+
svc := NewCloudService(db, &LogMailer{}, newInMemBlobs(), "http://localhost", false)
283+
siteID, err := svc.resolveOrCreateSite(&CloudAccount{ID: acct.ID, Tier: "cloud-multi"}, "default")
284+
if err != nil {
285+
t.Fatalf("resolve site: %v", err)
286+
}
287+
288+
// Insert: one 30s old, one 2h old, one 2d old.
289+
cases := []string{"-30 seconds", "-2 hours", "-2 days"}
290+
for i, offset := range cases {
291+
if _, err := db.conn.Exec(`
292+
INSERT INTO cloud_backup_blobs
293+
(account_id, site_id, blob_key, size_bytes, sha256_hex, client_version, uploaded_at)
294+
VALUES (?, ?, ?, ?, ?, ?, datetime('now', ?))
295+
`, acct.ID, siteID, fmt.Sprintf("tkey-%d", i), 100, "sha", "v0", offset); err != nil {
296+
t.Fatalf("insert %s: %v", offset, err)
297+
}
298+
}
299+
300+
// 1-minute window: only the 30s-old row.
301+
if n, err := db.CountBackupsInWindow(acct.ID, 60); err != nil || n != 1 {
302+
t.Errorf("1-minute window: n=%d err=%v, want n=1", n, err)
303+
}
304+
// 1-hour window: same (the 2h row is outside).
305+
if n, err := db.CountBackupsInWindow(acct.ID, 3600); err != nil || n != 1 {
306+
t.Errorf("1-hour window: n=%d err=%v, want n=1", n, err)
307+
}
308+
// 3-hour window: 30s + 2h = 2.
309+
if n, err := db.CountBackupsInWindow(acct.ID, 10800); err != nil || n != 2 {
310+
t.Errorf("3-hour window: n=%d err=%v, want n=2", n, err)
311+
}
312+
// 3-day window: all 3.
313+
if n, err := db.CountBackupsInWindow(acct.ID, 3*86400); err != nil || n != 3 {
314+
t.Errorf("3-day window: n=%d err=%v, want n=3", n, err)
315+
}
316+
// Zero-length window: 0.
317+
if n, err := db.CountBackupsInWindow(acct.ID, 0); err != nil || n != 0 {
318+
t.Errorf("0-window: n=%d err=%v, want n=0", n, err)
319+
}
320+
}
321+
322+
// Sanity: limit constants are sensible values.
323+
func TestRateLimitConstants(t *testing.T) {
324+
if backupsPerHourLimit <= 0 {
325+
t.Error("backupsPerHourLimit must be positive")
326+
}
327+
if backupsPerDayLimit <= backupsPerHourLimit {
328+
t.Error("daily limit should be higher than hourly limit")
329+
}
330+
// 10/hour × 24 hours = 240 — daily limit of 100 is tighter. Good.
331+
// Exercising this math here documents the intent.
332+
if backupsPerDayLimit >= backupsPerHourLimit*24 {
333+
t.Errorf("daily limit (%d) >= hourly × 24 (%d): the daily cap is not binding",
334+
backupsPerDayLimit, backupsPerHourLimit*24)
335+
}
336+
}
337+
338+
// suppress unused-import complaints in case some helpers go unused
339+
var _ = time.Now

0 commit comments

Comments
 (0)