|
| 1 | +# Betterbase Backend/Core Hardening Review (v3) |
| 2 | + |
| 3 | +## 1) Objective (final outcome) |
| 4 | + |
| 5 | +Deliver a **backend-first security and reliability hardening baseline** for Betterbase, focused on: |
| 6 | + |
| 7 | +- `packages/server` runtime and auth boundaries, |
| 8 | +- database/migration safety, |
| 9 | +- Docker/container defaults and edge proxying, |
| 10 | +- operational protections for production readiness. |
| 11 | + |
| 12 | +> Out of scope by request: dashboard/frontend quality review. |
| 13 | +
|
| 14 | +--- |
| 15 | + |
| 16 | +## 2) Scope map (what is reviewed) |
| 17 | + |
| 18 | +### Included |
| 19 | +- API/auth/device/admin flows (`packages/server/src/**`) |
| 20 | +- Migration and schema lifecycle (`packages/server/src/lib/migrate.ts`, `packages/server/migrations/**`) |
| 21 | +- Container/runtime and local infra defaults (`Dockerfile`, `docker-compose.yml`, `docker/nginx/nginx.conf`) |
| 22 | + |
| 23 | +### Excluded |
| 24 | +- Dashboard UX, styling, and frontend correctness (`apps/dashboard/**`) |
| 25 | + |
| 26 | +--- |
| 27 | + |
| 28 | +## 3) Plan (how this review is executed) |
| 29 | + |
| 30 | +1. **Synchronize branch state** before any edits (fetch/pull when tracking exists). |
| 31 | +2. **Static backend attack-surface review**: |
| 32 | + - auth/session/device/token issuance, |
| 33 | + - admin routing and privilege checks, |
| 34 | + - secret handling and env defaults, |
| 35 | + - SQL use patterns and dynamic identifier safety. |
| 36 | +3. **Docker and reverse-proxy hardening pass**: |
| 37 | + - insecure defaults, |
| 38 | + - exposed ports/services, |
| 39 | + - transport/auth boundaries. |
| 40 | +4. **Reliability pass**: |
| 41 | + - migration atomicity, |
| 42 | + - logging correctness, |
| 43 | + - operational behavior under failure. |
| 44 | +5. **Produce v3 documentation artifact** with severity, impact, evidence, and fixes. |
| 45 | + |
| 46 | +--- |
| 47 | + |
| 48 | +## 4) Acceptance criteria vs non-acceptance criteria |
| 49 | + |
| 50 | +### Accepted criteria |
| 51 | +A review is accepted only if it: |
| 52 | + |
| 53 | +- Prioritizes **critical/high backend risks first**, |
| 54 | +- Lists concrete findings with **file-level evidence**, |
| 55 | +- Proposes remediation that is actionable and ordered, |
| 56 | +- Separates scope (core/backend) from excluded dashboard concerns, |
| 57 | +- Defines objective pass/fail criteria for closure. |
| 58 | + |
| 59 | +### Not accepted |
| 60 | +A review is not accepted if it: |
| 61 | + |
| 62 | +- Focuses mostly on frontend/dashboard problems, |
| 63 | +- Gives generic best-practice lists without code evidence, |
| 64 | +- Omits severity and business impact, |
| 65 | +- Doesn’t define what “done” looks like. |
| 66 | + |
| 67 | +--- |
| 68 | + |
| 69 | +## 5) Problem classes targeted in this review |
| 70 | + |
| 71 | +- **Authentication hardening** (brute-force, token abuse, device flow safety) |
| 72 | +- **Authorization hardening** (scope enforcement, privilege boundaries) |
| 73 | +- **Input/output security** (HTML injection, unsafe interpolation) |
| 74 | +- **Secrets/config hardening** (weak defaults, exposure risks) |
| 75 | +- **Operational resilience** (migration safety, observability correctness) |
| 76 | +- **Container/proxy hardening** (network exposure, transport/auth controls) |
| 77 | + |
| 78 | +--- |
| 79 | + |
| 80 | +## 6) Findings (backend/core only) |
| 81 | + |
| 82 | +## F-01 (HIGH): No brute-force/rate limiting on admin login and device verification |
| 83 | + |
| 84 | +**Evidence** |
| 85 | +- `/admin/auth/login` verifies credentials directly with no throttling/lockout/captcha/backoff. (`packages/server/src/routes/admin/auth.ts`) |
| 86 | +- `/device/verify` form POST verifies admin credentials with no anti-automation controls. (`packages/server/src/routes/device/index.ts`) |
| 87 | + |
| 88 | +**Impact** |
| 89 | +- Enables online password guessing and credential stuffing against administrative entry points. |
| 90 | + |
| 91 | +**Recommended fix** |
| 92 | +1. Add IP + account-based rate limits (e.g., sliding window). |
| 93 | +2. Add progressive delays/temporary lockouts after repeated failures. |
| 94 | +3. Emit structured security events for failed auth attempts. |
| 95 | + |
| 96 | +--- |
| 97 | + |
| 98 | +## F-02 (HIGH): API key scopes are stored but not enforced |
| 99 | + |
| 100 | +**Evidence** |
| 101 | +- API keys include `scopes` at creation and persistence. (`packages/server/src/routes/admin/api-keys.ts`, `packages/server/migrations/008_api_keys.sql`) |
| 102 | +- Auth middleware validates key hash and expiration only; it does not evaluate `scopes` for route/action authorization. (`packages/server/src/lib/admin-middleware.ts`) |
| 103 | + |
| 104 | +**Impact** |
| 105 | +- A supposedly limited key can act as full-admin bearer credentials, breaking least-privilege expectations. |
| 106 | + |
| 107 | +**Recommended fix** |
| 108 | +1. Introduce route-level scope requirements (e.g., `requireScopes(["projects:read"])`). |
| 109 | +2. Validate scopes in middleware before allowing handler execution. |
| 110 | +3. Add tests proving denied access when scope is insufficient. |
| 111 | + |
| 112 | +--- |
| 113 | + |
| 114 | +## F-03 (MEDIUM-HIGH): Reflected HTML injection risk in device verification page |
| 115 | + |
| 116 | +**Evidence** |
| 117 | +- `userCode` query param is interpolated directly into HTML input value without escaping. (`packages/server/src/routes/device/index.ts`) |
| 118 | + |
| 119 | +**Impact** |
| 120 | +- Crafted links can inject markup/script contexts depending on browser parsing behavior, risking phishing/session abuse in admin verification flow. |
| 121 | + |
| 122 | +**Recommended fix** |
| 123 | +1. Escape HTML entities for all interpolated values. |
| 124 | +2. Prefer server-side templates with automatic escaping. |
| 125 | +3. Add CSP headers and disable inline script where possible. |
| 126 | + |
| 127 | +--- |
| 128 | + |
| 129 | +## F-04 (MEDIUM): Migration execution is not wrapped in per-file transactions |
| 130 | + |
| 131 | +**Evidence** |
| 132 | +- Migration runner executes raw SQL then records applied filename, without explicit transaction guards around each migration file. (`packages/server/src/lib/migrate.ts`) |
| 133 | + |
| 134 | +**Impact** |
| 135 | +- Partial migration failures can leave inconsistent schema states that are hard to recover in production rollouts. |
| 136 | + |
| 137 | +**Recommended fix** |
| 138 | +1. Wrap each migration in `BEGIN/COMMIT` with rollback on failure. |
| 139 | +2. Abort startup if any migration fails. |
| 140 | +3. Add idempotency checks for unsafe DDL operations. |
| 141 | + |
| 142 | +--- |
| 143 | + |
| 144 | +## F-05 (MEDIUM): Insecure local Docker defaults can leak into non-local usage |
| 145 | + |
| 146 | +**Evidence** |
| 147 | +- Static/dev credentials and wide host port exposure in compose defaults (`5432`, `9000`, `9001`, `8288`) plus default passwords. (`docker-compose.yml`) |
| 148 | + |
| 149 | +**Impact** |
| 150 | +- If reused outside isolated local environments, these defaults materially increase compromise likelihood. |
| 151 | + |
| 152 | +**Recommended fix** |
| 153 | +1. Gate local-only config via explicit profile names and warnings. |
| 154 | +2. Require strong secrets through `.env` for non-dev profiles. |
| 155 | +3. Limit host port publishing to required services only. |
| 156 | + |
| 157 | +--- |
| 158 | + |
| 159 | +## 7) Closure criteria for this hardening cycle |
| 160 | + |
| 161 | +To mark this cycle complete, all must be true: |
| 162 | + |
| 163 | +1. Login/device endpoints enforce rate limits and lockout policy. |
| 164 | +2. API key scopes are enforced in middleware/route guards. |
| 165 | +3. Device verification HTML interpolation is safely escaped. |
| 166 | +4. Migration runner supports transactional per-file execution. |
| 167 | +5. Container docs/config clearly separate local unsafe defaults from production-secure defaults. |
| 168 | + |
| 169 | +--- |
| 170 | + |
| 171 | +## 8) Implementation priority order |
| 172 | + |
| 173 | +1. **P0:** F-01, F-02 |
| 174 | +2. **P1:** F-03 |
| 175 | +3. **P2:** F-04, F-05 |
| 176 | + |
| 177 | +--- |
| 178 | + |
| 179 | +## 9) Summary |
| 180 | + |
| 181 | +This v3 review identifies backend/core hardening gaps primarily in auth abuse resistance, authorization scope enforcement, HTML output safety, migration resilience, and Docker default safety posture. Addressing P0/P1 items first materially reduces compromise risk while improving operational confidence. |
| 182 | + |
| 183 | +--- |
| 184 | + |
| 185 | +## 10) AI reviewer operating manual (for the next reviewing agent) |
| 186 | + |
| 187 | +This section is intentionally prescriptive so another AI/code reviewer can continue this work with high consistency and low ambiguity. |
| 188 | + |
| 189 | +### 10.1 Review intent |
| 190 | + |
| 191 | +You are not only finding bugs. You are validating whether Betterbase can safely operate as: |
| 192 | +- an internet-facing backend API, |
| 193 | +- a multi-project control plane, |
| 194 | +- a migration-driven data platform, |
| 195 | +- and a containerized service. |
| 196 | + |
| 197 | +### 10.2 Required review mindset |
| 198 | + |
| 199 | +For every endpoint or subsystem, ask: |
| 200 | +1. **Can an attacker reach this?** |
| 201 | +2. **What identity context is assumed?** |
| 202 | +3. **What authorization boundary is enforced?** |
| 203 | +4. **What happens on repeated abuse/failure?** |
| 204 | +5. **How does this fail in production under partial outage?** |
| 205 | + |
| 206 | +If any answer is unclear, treat it as a review finding candidate. |
| 207 | + |
| 208 | +### 10.3 Evidence standard (non-optional) |
| 209 | + |
| 210 | +Every finding should include all of: |
| 211 | +- exact file path(s), |
| 212 | +- function/route names, |
| 213 | +- exploit or failure path in 2–5 steps, |
| 214 | +- business impact, |
| 215 | +- concrete mitigation order (quick patch + durable fix). |
| 216 | + |
| 217 | +Avoid generic “best-practice only” findings with no file-level proof. |
| 218 | + |
| 219 | +--- |
| 220 | + |
| 221 | +## 11) Backend review blueprint by domain |
| 222 | + |
| 223 | +Use this sequence to avoid missing critical areas. |
| 224 | + |
| 225 | +### 11.1 Identity and authentication |
| 226 | +- Admin login (`/admin/auth/login`) |
| 227 | +- Setup flow (`/admin/auth/setup`) |
| 228 | +- Device flow (`/device/code`, `/device/verify`, `/device/token`) |
| 229 | +- Token issuance/validation (`signAdminToken`, `verifyAdminToken`) |
| 230 | +- API key lifecycle and bearer handling |
| 231 | + |
| 232 | +**What to catch** |
| 233 | +- brute force/stuffing paths, |
| 234 | +- token replay windows, |
| 235 | +- missing audience/issuer constraints, |
| 236 | +- no rotation/revocation strategy, |
| 237 | +- secret leakage in logs/errors. |
| 238 | + |
| 239 | +### 11.2 Authorization and tenancy boundaries |
| 240 | +- `requireAdmin` middleware behavior for JWT vs API keys |
| 241 | +- project-scoped routes and project attachment middleware |
| 242 | +- actions that should require scoped permissions but do not |
| 243 | + |
| 244 | +**What to catch** |
| 245 | +- privilege escalation across routes, |
| 246 | +- “scope exists in DB but not enforced” patterns, |
| 247 | +- project hopping / IDOR possibilities, |
| 248 | +- routes inheriting global admin access unintentionally. |
| 249 | + |
| 250 | +### 11.3 Data and SQL safety |
| 251 | +- dynamic schema/table identifier usage |
| 252 | +- raw SQL interpolation (especially schema-qualified queries) |
| 253 | +- migration sequencing and rollback resilience |
| 254 | + |
| 255 | +**What to catch** |
| 256 | +- unsafe identifier injection opportunities, |
| 257 | +- inconsistent transaction handling, |
| 258 | +- ordering collisions in migration naming, |
| 259 | +- partial-apply states causing startup drift. |
| 260 | + |
| 261 | +### 11.4 Output and interface safety |
| 262 | +- HTML-rendering endpoints (device verification page) |
| 263 | +- plain-text/HTML response interpolation |
| 264 | +- error message leakage |
| 265 | + |
| 266 | +**What to catch** |
| 267 | +- reflected/stored injection risks, |
| 268 | +- security header gaps (CSP/XFO/etc.), |
| 269 | +- differential error behavior that aids attackers. |
| 270 | + |
| 271 | +### 11.5 Container, proxy, and runtime posture |
| 272 | +- `docker-compose.yml` defaults |
| 273 | +- nginx routing and trust headers |
| 274 | +- image pinning and exposed ports |
| 275 | + |
| 276 | +**What to catch** |
| 277 | +- weak default credentials reused outside dev, |
| 278 | +- unintended public service exposure, |
| 279 | +- over-trusting forwarded headers, |
| 280 | +- missing TLS/security hardening assumptions in docs. |
| 281 | + |
| 282 | +--- |
| 283 | + |
| 284 | +## 12) Severity rubric for this project |
| 285 | + |
| 286 | +Use this rubric consistently for triage: |
| 287 | + |
| 288 | +- **Critical**: direct compromise of admin control plane, tenant data exposure, or remote code/data integrity impact with low attacker cost. |
| 289 | +- **High**: realistic unauthorized access path, major authz bypass, or high-probability account compromise vector. |
| 290 | +- **Medium**: meaningful security/reliability weakness requiring additional conditions. |
| 291 | +- **Low**: hygiene/documentation/observability issues with limited direct exploitability. |
| 292 | + |
| 293 | +When uncertain between two severities, choose the higher one and justify assumptions. |
| 294 | + |
| 295 | +--- |
| 296 | + |
| 297 | +## 13) Acceptance gates for a “secure-enough” next milestone |
| 298 | + |
| 299 | +The next implementation PR(s) should not be accepted unless all gates pass: |
| 300 | + |
| 301 | +1. **Auth abuse gate** |
| 302 | + - login + device verification endpoints have measurable rate limiting and lockout/backoff. |
| 303 | +2. **Authorization gate** |
| 304 | + - API key scopes are enforceable and tested on representative privileged routes. |
| 305 | +3. **Injection gate** |
| 306 | + - user-controlled values rendered in HTML are escaped or template-engine protected. |
| 307 | +4. **Migration safety gate** |
| 308 | + - each migration file executes atomically with rollback on failure. |
| 309 | +5. **Runtime hygiene gate** |
| 310 | + - local-only insecure defaults are explicitly isolated from production paths. |
| 311 | + |
| 312 | +--- |
| 313 | + |
| 314 | +## 14) Reviewer handoff format (for future AI agents) |
| 315 | + |
| 316 | +When handing off to another reviewer/engineer, provide: |
| 317 | + |
| 318 | +1. **Top-5 unresolved risks** (ordered by exploitability × impact), |
| 319 | +2. **Proof snippets** (route/file/line references), |
| 320 | +3. **Proposed patch plan** split by: |
| 321 | + - immediate hotfixes (hours), |
| 322 | + - short-term hardening (days), |
| 323 | + - structural improvements (weeks), |
| 324 | +4. **Regression checks** that must run after each fix, |
| 325 | +5. **Known blind spots** (areas not fully reviewed). |
| 326 | + |
| 327 | +This prevents review drift and keeps findings actionable. |
| 328 | + |
| 329 | +--- |
| 330 | + |
| 331 | +## 15) Suggested follow-up implementation roadmap (non-doc work) |
| 332 | + |
| 333 | +1. Implement centralized rate-limit middleware for auth-sensitive endpoints. |
| 334 | +2. Introduce explicit scope policy map and enforce in `requireAdmin`/route guards. |
| 335 | +3. Add safe HTML escaping helper for any server-rendered templates. |
| 336 | +4. Refactor migration runner to transactional execution with deterministic failure logging. |
| 337 | +5. Add a production-hardening compose profile and deployment checklist. |
| 338 | + |
| 339 | +These implementation tracks should be handled in dedicated code PRs with tests. |
0 commit comments