Skip to content

Commit 7b3bae5

Browse files
authored
Merge pull request #62 from weroperking/codex/create-branch-for-back-end-code-review
Add v3 backend/core hardening review document
2 parents 6051622 + f6decd8 commit 7b3bae5

1 file changed

Lines changed: 339 additions & 0 deletions

File tree

docs/core/hardening-review-v3.md

Lines changed: 339 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,339 @@
1+
# Betterbase Backend/Core Hardening Review (v3)
2+
3+
## 1) Objective (final outcome)
4+
5+
Deliver a **backend-first security and reliability hardening baseline** for Betterbase, focused on:
6+
7+
- `packages/server` runtime and auth boundaries,
8+
- database/migration safety,
9+
- Docker/container defaults and edge proxying,
10+
- operational protections for production readiness.
11+
12+
> Out of scope by request: dashboard/frontend quality review.
13+
14+
---
15+
16+
## 2) Scope map (what is reviewed)
17+
18+
### Included
19+
- API/auth/device/admin flows (`packages/server/src/**`)
20+
- Migration and schema lifecycle (`packages/server/src/lib/migrate.ts`, `packages/server/migrations/**`)
21+
- Container/runtime and local infra defaults (`Dockerfile`, `docker-compose.yml`, `docker/nginx/nginx.conf`)
22+
23+
### Excluded
24+
- Dashboard UX, styling, and frontend correctness (`apps/dashboard/**`)
25+
26+
---
27+
28+
## 3) Plan (how this review is executed)
29+
30+
1. **Synchronize branch state** before any edits (fetch/pull when tracking exists).
31+
2. **Static backend attack-surface review**:
32+
- auth/session/device/token issuance,
33+
- admin routing and privilege checks,
34+
- secret handling and env defaults,
35+
- SQL use patterns and dynamic identifier safety.
36+
3. **Docker and reverse-proxy hardening pass**:
37+
- insecure defaults,
38+
- exposed ports/services,
39+
- transport/auth boundaries.
40+
4. **Reliability pass**:
41+
- migration atomicity,
42+
- logging correctness,
43+
- operational behavior under failure.
44+
5. **Produce v3 documentation artifact** with severity, impact, evidence, and fixes.
45+
46+
---
47+
48+
## 4) Acceptance criteria vs non-acceptance criteria
49+
50+
### Accepted criteria
51+
A review is accepted only if it:
52+
53+
- Prioritizes **critical/high backend risks first**,
54+
- Lists concrete findings with **file-level evidence**,
55+
- Proposes remediation that is actionable and ordered,
56+
- Separates scope (core/backend) from excluded dashboard concerns,
57+
- Defines objective pass/fail criteria for closure.
58+
59+
### Not accepted
60+
A review is not accepted if it:
61+
62+
- Focuses mostly on frontend/dashboard problems,
63+
- Gives generic best-practice lists without code evidence,
64+
- Omits severity and business impact,
65+
- Doesn’t define what “done” looks like.
66+
67+
---
68+
69+
## 5) Problem classes targeted in this review
70+
71+
- **Authentication hardening** (brute-force, token abuse, device flow safety)
72+
- **Authorization hardening** (scope enforcement, privilege boundaries)
73+
- **Input/output security** (HTML injection, unsafe interpolation)
74+
- **Secrets/config hardening** (weak defaults, exposure risks)
75+
- **Operational resilience** (migration safety, observability correctness)
76+
- **Container/proxy hardening** (network exposure, transport/auth controls)
77+
78+
---
79+
80+
## 6) Findings (backend/core only)
81+
82+
## F-01 (HIGH): No brute-force/rate limiting on admin login and device verification
83+
84+
**Evidence**
85+
- `/admin/auth/login` verifies credentials directly with no throttling/lockout/captcha/backoff. (`packages/server/src/routes/admin/auth.ts`)
86+
- `/device/verify` form POST verifies admin credentials with no anti-automation controls. (`packages/server/src/routes/device/index.ts`)
87+
88+
**Impact**
89+
- Enables online password guessing and credential stuffing against administrative entry points.
90+
91+
**Recommended fix**
92+
1. Add IP + account-based rate limits (e.g., sliding window).
93+
2. Add progressive delays/temporary lockouts after repeated failures.
94+
3. Emit structured security events for failed auth attempts.
95+
96+
---
97+
98+
## F-02 (HIGH): API key scopes are stored but not enforced
99+
100+
**Evidence**
101+
- API keys include `scopes` at creation and persistence. (`packages/server/src/routes/admin/api-keys.ts`, `packages/server/migrations/008_api_keys.sql`)
102+
- Auth middleware validates key hash and expiration only; it does not evaluate `scopes` for route/action authorization. (`packages/server/src/lib/admin-middleware.ts`)
103+
104+
**Impact**
105+
- A supposedly limited key can act as full-admin bearer credentials, breaking least-privilege expectations.
106+
107+
**Recommended fix**
108+
1. Introduce route-level scope requirements (e.g., `requireScopes(["projects:read"])`).
109+
2. Validate scopes in middleware before allowing handler execution.
110+
3. Add tests proving denied access when scope is insufficient.
111+
112+
---
113+
114+
## F-03 (MEDIUM-HIGH): Reflected HTML injection risk in device verification page
115+
116+
**Evidence**
117+
- `userCode` query param is interpolated directly into HTML input value without escaping. (`packages/server/src/routes/device/index.ts`)
118+
119+
**Impact**
120+
- Crafted links can inject markup/script contexts depending on browser parsing behavior, risking phishing/session abuse in admin verification flow.
121+
122+
**Recommended fix**
123+
1. Escape HTML entities for all interpolated values.
124+
2. Prefer server-side templates with automatic escaping.
125+
3. Add CSP headers and disable inline script where possible.
126+
127+
---
128+
129+
## F-04 (MEDIUM): Migration execution is not wrapped in per-file transactions
130+
131+
**Evidence**
132+
- Migration runner executes raw SQL then records applied filename, without explicit transaction guards around each migration file. (`packages/server/src/lib/migrate.ts`)
133+
134+
**Impact**
135+
- Partial migration failures can leave inconsistent schema states that are hard to recover in production rollouts.
136+
137+
**Recommended fix**
138+
1. Wrap each migration in `BEGIN/COMMIT` with rollback on failure.
139+
2. Abort startup if any migration fails.
140+
3. Add idempotency checks for unsafe DDL operations.
141+
142+
---
143+
144+
## F-05 (MEDIUM): Insecure local Docker defaults can leak into non-local usage
145+
146+
**Evidence**
147+
- Static/dev credentials and wide host port exposure in compose defaults (`5432`, `9000`, `9001`, `8288`) plus default passwords. (`docker-compose.yml`)
148+
149+
**Impact**
150+
- If reused outside isolated local environments, these defaults materially increase compromise likelihood.
151+
152+
**Recommended fix**
153+
1. Gate local-only config via explicit profile names and warnings.
154+
2. Require strong secrets through `.env` for non-dev profiles.
155+
3. Limit host port publishing to required services only.
156+
157+
---
158+
159+
## 7) Closure criteria for this hardening cycle
160+
161+
To mark this cycle complete, all must be true:
162+
163+
1. Login/device endpoints enforce rate limits and lockout policy.
164+
2. API key scopes are enforced in middleware/route guards.
165+
3. Device verification HTML interpolation is safely escaped.
166+
4. Migration runner supports transactional per-file execution.
167+
5. Container docs/config clearly separate local unsafe defaults from production-secure defaults.
168+
169+
---
170+
171+
## 8) Implementation priority order
172+
173+
1. **P0:** F-01, F-02
174+
2. **P1:** F-03
175+
3. **P2:** F-04, F-05
176+
177+
---
178+
179+
## 9) Summary
180+
181+
This v3 review identifies backend/core hardening gaps primarily in auth abuse resistance, authorization scope enforcement, HTML output safety, migration resilience, and Docker default safety posture. Addressing P0/P1 items first materially reduces compromise risk while improving operational confidence.
182+
183+
---
184+
185+
## 10) AI reviewer operating manual (for the next reviewing agent)
186+
187+
This section is intentionally prescriptive so another AI/code reviewer can continue this work with high consistency and low ambiguity.
188+
189+
### 10.1 Review intent
190+
191+
You are not only finding bugs. You are validating whether Betterbase can safely operate as:
192+
- an internet-facing backend API,
193+
- a multi-project control plane,
194+
- a migration-driven data platform,
195+
- and a containerized service.
196+
197+
### 10.2 Required review mindset
198+
199+
For every endpoint or subsystem, ask:
200+
1. **Can an attacker reach this?**
201+
2. **What identity context is assumed?**
202+
3. **What authorization boundary is enforced?**
203+
4. **What happens on repeated abuse/failure?**
204+
5. **How does this fail in production under partial outage?**
205+
206+
If any answer is unclear, treat it as a review finding candidate.
207+
208+
### 10.3 Evidence standard (non-optional)
209+
210+
Every finding should include all of:
211+
- exact file path(s),
212+
- function/route names,
213+
- exploit or failure path in 2–5 steps,
214+
- business impact,
215+
- concrete mitigation order (quick patch + durable fix).
216+
217+
Avoid generic “best-practice only” findings with no file-level proof.
218+
219+
---
220+
221+
## 11) Backend review blueprint by domain
222+
223+
Use this sequence to avoid missing critical areas.
224+
225+
### 11.1 Identity and authentication
226+
- Admin login (`/admin/auth/login`)
227+
- Setup flow (`/admin/auth/setup`)
228+
- Device flow (`/device/code`, `/device/verify`, `/device/token`)
229+
- Token issuance/validation (`signAdminToken`, `verifyAdminToken`)
230+
- API key lifecycle and bearer handling
231+
232+
**What to catch**
233+
- brute force/stuffing paths,
234+
- token replay windows,
235+
- missing audience/issuer constraints,
236+
- no rotation/revocation strategy,
237+
- secret leakage in logs/errors.
238+
239+
### 11.2 Authorization and tenancy boundaries
240+
- `requireAdmin` middleware behavior for JWT vs API keys
241+
- project-scoped routes and project attachment middleware
242+
- actions that should require scoped permissions but do not
243+
244+
**What to catch**
245+
- privilege escalation across routes,
246+
- “scope exists in DB but not enforced” patterns,
247+
- project hopping / IDOR possibilities,
248+
- routes inheriting global admin access unintentionally.
249+
250+
### 11.3 Data and SQL safety
251+
- dynamic schema/table identifier usage
252+
- raw SQL interpolation (especially schema-qualified queries)
253+
- migration sequencing and rollback resilience
254+
255+
**What to catch**
256+
- unsafe identifier injection opportunities,
257+
- inconsistent transaction handling,
258+
- ordering collisions in migration naming,
259+
- partial-apply states causing startup drift.
260+
261+
### 11.4 Output and interface safety
262+
- HTML-rendering endpoints (device verification page)
263+
- plain-text/HTML response interpolation
264+
- error message leakage
265+
266+
**What to catch**
267+
- reflected/stored injection risks,
268+
- security header gaps (CSP/XFO/etc.),
269+
- differential error behavior that aids attackers.
270+
271+
### 11.5 Container, proxy, and runtime posture
272+
- `docker-compose.yml` defaults
273+
- nginx routing and trust headers
274+
- image pinning and exposed ports
275+
276+
**What to catch**
277+
- weak default credentials reused outside dev,
278+
- unintended public service exposure,
279+
- over-trusting forwarded headers,
280+
- missing TLS/security hardening assumptions in docs.
281+
282+
---
283+
284+
## 12) Severity rubric for this project
285+
286+
Use this rubric consistently for triage:
287+
288+
- **Critical**: direct compromise of admin control plane, tenant data exposure, or remote code/data integrity impact with low attacker cost.
289+
- **High**: realistic unauthorized access path, major authz bypass, or high-probability account compromise vector.
290+
- **Medium**: meaningful security/reliability weakness requiring additional conditions.
291+
- **Low**: hygiene/documentation/observability issues with limited direct exploitability.
292+
293+
When uncertain between two severities, choose the higher one and justify assumptions.
294+
295+
---
296+
297+
## 13) Acceptance gates for a “secure-enough” next milestone
298+
299+
The next implementation PR(s) should not be accepted unless all gates pass:
300+
301+
1. **Auth abuse gate**
302+
- login + device verification endpoints have measurable rate limiting and lockout/backoff.
303+
2. **Authorization gate**
304+
- API key scopes are enforceable and tested on representative privileged routes.
305+
3. **Injection gate**
306+
- user-controlled values rendered in HTML are escaped or template-engine protected.
307+
4. **Migration safety gate**
308+
- each migration file executes atomically with rollback on failure.
309+
5. **Runtime hygiene gate**
310+
- local-only insecure defaults are explicitly isolated from production paths.
311+
312+
---
313+
314+
## 14) Reviewer handoff format (for future AI agents)
315+
316+
When handing off to another reviewer/engineer, provide:
317+
318+
1. **Top-5 unresolved risks** (ordered by exploitability × impact),
319+
2. **Proof snippets** (route/file/line references),
320+
3. **Proposed patch plan** split by:
321+
- immediate hotfixes (hours),
322+
- short-term hardening (days),
323+
- structural improvements (weeks),
324+
4. **Regression checks** that must run after each fix,
325+
5. **Known blind spots** (areas not fully reviewed).
326+
327+
This prevents review drift and keeps findings actionable.
328+
329+
---
330+
331+
## 15) Suggested follow-up implementation roadmap (non-doc work)
332+
333+
1. Implement centralized rate-limit middleware for auth-sensitive endpoints.
334+
2. Introduce explicit scope policy map and enforce in `requireAdmin`/route guards.
335+
3. Add safe HTML escaping helper for any server-rendered templates.
336+
4. Refactor migration runner to transactional execution with deterministic failure logging.
337+
5. Add a production-hardening compose profile and deployment checklist.
338+
339+
These implementation tracks should be handled in dedicated code PRs with tests.

0 commit comments

Comments
 (0)