Skip to content

Commit d5518a6

Browse files
committed
docs: catch up DEVLOG + architecture after the launch-readiness audit
DEVLOG Day 79 covers the launch-readiness audit + the 10-fix launch PR (merge a868b24): 73-agent discovery + 3-lens adversarial verification, 2 Criticals (SECURITY-1 BOLA + CRITICAL-2 async quota envelope), 8 Highs (FLOW-3, FE-SEC-1, BACKEND-2, LLM-1+OBS-1, OBS-2, PERF-1+2, A11Y-1+2, TEST-1), and the deferrals (H1, PERFDB-1/2/3/4, TEST-2). DEVLOG Day 80 covers the Medium + Low cleanup PR (merge 507cb3f): 24 Mediums across three domain-coherent phases + 8 Lows one commit each, plus deferrals (M3, M15, M20, M19 multi-row, M11 follow-ups) and the five Architectural Recommendations (R1-R5) parked in report.md. architecture.md splices: - backend/ section now mentions backend/services/workspace_run_jobs.py (owner-scoped, sync quota pre-flight, per-user in-flight cap) and the admin-gated /health/sentry-debug - Observability section now records the Sentry stage-boundary breadcrumbs/tags/context/user, the saved-workspaces-retention sentry_cron_monitor, and the X-PostHog-Distinct-Id header for anonymous attribution - Persistence Model now records the app_users BEFORE-UPDATE entitlement trigger, the atomic save_saved_job RPC, the count_active() workspace-quota head-read, and the saved_workspaces 1/1/1 single-slot reality - New "Browser security baseline" subsection documents the X-Frame-Options/HSTS/nosniff/Referrer-Policy/CSP-Report-Only header set, the safeRedirect/isAllowedRedirect allowlist on backend- supplied URLs, and the useAccessibleDialog primitive behind the ⌘K palette + assistant FAB report.md (intentionally untracked per docs/README.md governance) got a new PARKED (2026-05-30) section that captures the deferred Highs + Mediums + Lows plus the five Architectural Recommendations verbatim from the audit report, so the source-of-findings worktree can be cleaned up without losing the open items.
1 parent 507cb3f commit d5518a6

2 files changed

Lines changed: 209 additions & 0 deletions

File tree

docs/DEVLOG.md

Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3983,3 +3983,192 @@ pg_cron now records a fast 202 instead of a 524; nothing else changes.
39833983

39843984
Verification: test suite green; the `/admin/refresh-cache` endpoint
39853985
test was updated to assert 202 + a scheduled background worker.
3986+
3987+
## Day 79: Launch-readiness audit — 73-agent swarm, 10 fixes shipped
3988+
3989+
Twitter-launch readiness pass. A 12-domain discovery swarm (security,
3990+
correctness/concurrency, performance/DB, LLM integration, frontend
3991+
security/correctness/perf/a11y, API-contract integrity, observability,
3992+
testing, E2E flows) read the codebase in parallel, then every Critical
3993+
and High finding went through a 3-lens adversarial verification —
3994+
correctness/repro, impact/exploitability, missed-mitigation. A finding
3995+
survived only if at least two of three skeptics could NOT refute it.
3996+
73 agents total. Surviving counts: **2 Critical · 18 High · 24 Medium ·
3997+
8 Low**, with all 20 Critical/High findings surviving (19 at 3/3,
3998+
TEST-2 at 2/3).
3999+
4000+
The two Criticals were the launch blockers:
4001+
4002+
- **SECURITY-1** — unauthenticated BOLA on `GET
4003+
/workspace/analyze-jobs/{job_id}` + `POST .../cancel`. The async-job
4004+
dict was looked up purely by id; the returned payload included
4005+
`artifacts.tailored_resume` + `cover_letter` (the PII-densest object
4006+
in the product). Fix: `owner_user_id` bound at start,
4007+
`Depends(get_required_auth_tokens)` on both routes, **404** (not
4008+
403) when not owner so existence isn't confirmed. (`87117f5`)
4009+
- **CRITICAL-2** — the async `/analyze-jobs` path never called the
4010+
quota gate; a capped Free user got *"The agentic workflow failed
4011+
unexpectedly"* instead of the structured 429 + upgrade nudge. Fix:
4012+
run `enforce_llm_budget` synchronously **before** spawning the
4013+
worker, plus widen `_serialize_job` to round-trip the structured
4014+
`{code, counter, cap, tier, reset_period}` envelope so the polling
4015+
hook renders the existing upgrade CTA. (`d19030b`)
4016+
4017+
Eight Highs landed alongside the Criticals: theme-entitlement scope
4018+
(FLOW-3 — Free résumé export no longer blocked by an unrelated
4019+
cover-letter theme, `17f160f`); browser-security baseline (FE-SEC-1 —
4020+
CSP Report-Only, X-Frame-Options DENY, HSTS, X-Content-Type-Options,
4021+
Referrer-Policy on the Next frontend, `69e36c8`); per-user
4022+
in-flight-runs cap (BACKEND-2 — closes the concurrent-run
4023+
weekly-token bypass and the fairness gap where one user's 5 runs
4024+
locked out the process-wide semaphore, `fc6a8c4`); a cost-attribution
4025+
chokepoint (LLM-1 + OBS-1 — `web_search` routed through
4026+
`OpenAIService` so it meters and cost-traces; `_record_cost_trace`
4027+
falls back to the `meter_user_scope` ContextVar so JD / résumé parser
4028+
+ embedding spend finally lands in `aijobagent_run_traces`,
4029+
`8cdbc38`); two missing PostHog funnel events (OBS-2 — `jd_parsed` +
4030+
`resume_built`, plugging the hole between `job_searched` and
4031+
`analysis_started`, `d064241`); two render-storm fixes (PERF-1 +
4032+
PERF-2 — assistant streaming state moved out of `WorkspaceShell`;
4033+
`buildJobReview` memoized; `b-canvas` children `React.memo`-d, so a
4034+
multi-paragraph answer no longer drives hundreds of whole-tree
4035+
reconciliations and JD keystrokes no longer re-parse the multi-KB JD
4036+
on every character, `f870667`); a shared accessible-dialog primitive
4037+
(A11Y-1 + A11Y-2 — `useAccessibleDialog` with focus trap, initial
4038+
focus, Escape, focus restore, applied to the ⌘K palette + assistant
4039+
FAB; palette also gets combobox/listbox semantics, `6b454c6`); and a
4040+
Vitest baseline wired into CI (TEST-1 — 5 coverage cases over
4041+
`humanizeApiError`, `auth-session`, the workspace-quota hook, the
4042+
tier-gate render, and `JDReview` submit wiring; CI frontend job now
4043+
runs lint + build + test, `d376aac`).
4044+
4045+
Deferred from this PR by deliberate decision (parked in `report.md`):
4046+
H1 (upgrade CTAs all point at `/pricing` which 404s — gated on
4047+
payment going live); PERFDB-1/2/3/4 (four 1000-row time-bombs:
4048+
`cleanup_missing` can hard-delete a bookmarked row, unpaginated
4049+
missing-row enumeration, the workspace-retention sweeper's N+1 +
4050+
1000-row cap, and the `cached_jobs` DDL only living in the live DB —
4051+
acceptable pre-traction, will bite around the thousandth user);
4052+
TEST-2 (`tests/quality/` runners aren't collected by pytest, so a
4053+
prompt edit can silently degrade tailoring/review quality with CI
4054+
green — defer until there's a hermetic no-live-key path).
4055+
4056+
Verification: 502 backend pytest, Vitest baseline, tsc + eslint clean
4057+
on touched files. Merge: `a868b24`. Live-API smoke after deploy
4058+
confirmed `/workspace/analyze-jobs/<fake>` no auth → 401 (SECURITY-1
4059+
enforced), and the security headers landed on both the app subdomain
4060+
and the marketing site.
4061+
4062+
## Day 80: Medium + Low cleanup — 24 + 8 from the same audit
4063+
4064+
Cleanup PR for everything the launch PR scoped out. Three phases,
4065+
thirteen commits on the feature branch, merge `507cb3f`. Verification:
4066+
**980 backend pytest** (up from 502 — this PR adds substantial new
4067+
test coverage), 33 Vitest, clean production build.
4068+
4069+
Phase 1 — Tier-1 Mediums:
4070+
4071+
- **M1** — users could PATCH `app_users.plan_tier` / `account_status`
4072+
through their own JWT because the RLS UPDATE policy was
4073+
`using/with check (auth.uid() = id)` with no column restriction,
4074+
and the legacy daily-quota path read `app_users.plan_tier`. Now a
4075+
BEFORE-UPDATE trigger rejects non-`service_role` writes to those
4076+
columns, and `get_daily_quota_for_plan` reads from
4077+
`resolve_user_tier` (which sources `aijobagent_subscriptions`).
4078+
(`36c2aa8`)
4079+
- **M5–M10, M14, M16** — iframe `sandbox=""` on the preview surfaces;
4080+
session-replay PII masked via privacy-by-default;
4081+
`safeRedirect`/`isAllowedRedirect` allowlist on every backend-
4082+
supplied URL the client navigates to; JD auto-parse 429 notices
4083+
now surface an inline upgrade CTA; clear-then-repaste resets
4084+
`lastParsedTextRef` so the LLM-parsed panels return on retry;
4085+
`handleSignOut` resets workspace content slices so isolation holds
4086+
even without the hard-nav backstop; account popover restores focus
4087+
and ditches the wrong-widget `role="menu"` for a labelled
4088+
disclosure; debounced JD auto-parse threads `AbortSignal` through
4089+
`request()` so a superseded LLM parse actually cancels.
4090+
(`d42332b`)
4091+
4092+
Phase 2 — Tier-2 Mediums (three sub-commits, domain-coherent):
4093+
4094+
- **Backend correctness + coverage** (`c18109b`): atomic
4095+
`save_saved_job` RPC closes the count-then-upsert TOCTOU on the
4096+
persistent saved-jobs cap (M2); `/workspace/quota`'s
4097+
`_persistent_count()` uses a new `count_active()` head-read
4098+
instead of deserializing the fat saved-workspace blob on every
4099+
mount-and-after-every-run poll (M4); `POST /billing/portal` got
4100+
tests across all six outcomes (M18); `saved_workspaces` per-tier
4101+
cap pinned to **1/1/1** (M19 — the schema is one-row-per-user, so
4102+
the cap+1 case was structurally unenforceable; multi-row history
4103+
flagged as future enhancement).
4104+
- **Observability + anon attribution** (`11eb8c5`): backend events
4105+
for unauthenticated callers now use the browser's PostHog distinct
4106+
id via a new `X-PostHog-Distinct-Id` request header (M21 — closes
4107+
the funnel hole where every anon visitor mapped to one literal
4108+
`"anonymous"` person and anonymous→signup conversion couldn't be
4109+
computed); the retention sweeper got its `sentry_cron_monitor`
4110+
(`saved-workspaces-retention`) so a stuck cron pages instead of
4111+
silently leaving Free data past its 7-day retention promise (M22);
4112+
Sentry breadcrumbs / tags / context / user are now set on each
4113+
analysis stage and on the export route, defeating the
4114+
AI-Agents-Monitoring blind spot ADR-024 was adopted for (M23).
4115+
- **Frontend perf + UX** (`1a3bc69`): job-grid memoized via
4116+
`React.memo(JobCard)` + stabilized per-card callbacks (M11 — full
4117+
`JobSearch` memo benefit + virtualization deferred); session
4118+
replay is route-gated to marketing pages (M12 — `posthog-js` has
4119+
no client `session_recording` sample rate, so route gating is the
4120+
available knob); `--fg-4` lightened to a contrast-passing token
4121+
for its four text-uses (M13); dead `BackendHealth` type +
4122+
`getBackendHealth` deleted (M17 — no caller; better to remove than
4123+
fix the drift); JD paste no longer collapses the input textarea
4124+
~1.5s after a paste (M24).
4125+
4126+
Phase 3 — Lows, one commit per finding: `/health/sentry-debug` gated
4127+
behind the admin bearer secret so anyone curling it stops burning
4128+
Sentry quota (L1, `2987364`); `fetch_github_readme` sets
4129+
`allow_redirects=False` so the SSRF-adjacent surface disappears (L2,
4130+
`1cb5a63`); a regression test pins the `web_search` 30s timeout the
4131+
launch PR already shipped via LLM-1 (L4, `b7f2884`); completed
4132+
analysis jobs drop `job.result` on the first terminal get +
4133+
`JOB_TTL_SECONDS` tightened 1800→600 (L3, `cf6f8f4`); the "Parsing
4134+
JD…" indicator is gated on the current `AbortController` so a
4135+
superseded request's `finally` doesn't hide the busy hint while a
4136+
newer parse is still in flight (L5, `a4239c8`); the VoiceInputButton
4137+
reduced-motion override is driven off a class instead of a brittle
4138+
`[style*="animation"]` substring selector (L6, `7035d2f`); the dead
4139+
non-streaming `askWorkspaceAssistant` client fn is gone but the
4140+
backend `/workspace/assistant/answer` route stays as a tested
4141+
lockstep fallback — the report's "dead" framing was inaccurate; the
4142+
route shares the metered `answer_workspace_question` path (L7,
4143+
`064532c`); auth-cookie tests now assert `Secure`, `SameSite`, and
4144+
clear-scope so a config refactor dropping any of those would fail
4145+
(L8, `3ec4b6a`).
4146+
4147+
Deferred from the cleanup (parked in `report.md` alongside the launch
4148+
PR's deferrals): **M3** (process-global run-concurrency cap with no
4149+
per-user fairness — effectively addressed by BACKEND-2's per-user
4150+
cap; the architectural piece is Rec #1/#3 territory); **M15, M20**
4151+
(export + streaming-assistant 429 upgrade CTAs — blocked on the same
4152+
`/pricing` destination that gates H1); **M19 multi-row workspaces**
4153+
(single-slot is shipped reality; multi-row needs a schema migration
4154+
plus un-deferring the structural-enforcement test); **M11
4155+
follow-ups** (wrap `WorkspaceShell`'s `JobSearch` callbacks in
4156+
`useCallback` to fully activate the memo boundary, and add grid
4157+
virtualization — needs a windowing dep); **L3 follow-up** (an
4158+
optional periodic prune timer; the terminal-get drop + lowered TTL
4159+
already bound resident memory).
4160+
4161+
Plus the five **Architectural Recommendations** from the audit report
4162+
(R1 async-as-transparent-transport; R2 `OpenAIService` as the only
4163+
door; R3 per-user authZ + HTTP security to enforced edges; R4
4164+
paginated maintenance scans + tracked `cached_jobs` migration; R5
4165+
shared accessible-overlay primitive + workspace shell split + CI test
4166+
tier) — some are partially complete after the launch + cleanup PRs,
4167+
the "architecture" half of each is parked. All five documented in
4168+
`report.md`.
4169+
4170+
Live smoke post-deploy: `GET /health/sentry-debug` no auth → **401**
4171+
(was 500 before — proves both the L1 fix and the deploy on `507cb3f`),
4172+
`GET /workspace/analyze-jobs/<fake>` no auth → 401 (SECURITY-1 still
4173+
enforced), security headers still healthy on both subdomains, 31/31
4174+
hermetic new cleanup tests pass locally.

docs/architecture.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,8 @@ Owns the FastAPI API surface:
7272
- `backend/routers/billing.py` owns the HMAC-verified `POST /webhooks/lemonsqueezy` subscription-event endpoint + the customer-portal redirect; the signature-verification + event-routing logic lives in `backend/webhooks/lemonsqueezy.py`
7373
- `backend/prompt_registry.py` loads every LLM prompt from `prompts/<name>/<version>.json` — all 11 builders migrated off Python f-string concats; see [ADR-018](adr/ADR-018-three-layer-llm-retry-and-per-agent-fallback-isolation.md) family + the prompt-registry DEVLOG entries
7474
- `backend/services/job_cache_service.py` runs the per-source refresh + smart-cleanup worker invoked by the admin endpoint
75+
- `backend/services/workspace_run_jobs.py` owns the async `/analyze-jobs` job system. Each `WorkspaceRunJob` is bound to its `owner_user_id` at start time and the status/cancel routes check it (returning **404** — same code for "unknown" and "not yours" — so existence isn't confirmed); the quota gate runs **synchronously before** the worker is spawned, with the structured `{code, counter, cap, tier, reset_period}` envelope round-tripped through `_serialize_job` so the polling hook renders the same 429 upgrade CTA the sync path does; a per-user in-flight cap (1 run/user) sits in front of the process-global `BoundedSemaphore(5)` so one user's burst can't 503 every other account. The launch-readiness pass that introduced these guarantees is DEVLOG Day 79
76+
- `backend/routers/health.py` also hosts `/health/sentry-debug` — now gated behind the admin bearer secret so an unauthenticated curl gets a 401 instead of a `ZeroDivisionError` that would burn Sentry quota (DEVLOG Day 80)
7577

7678
### `src/services/`
7779

@@ -221,6 +223,8 @@ Each `cached_jobs` row holds one upstream posting keyed on `(source, job_id)`. T
221223

222224
`aijobagent_feedback` holds one row per artifact thumbs-up/down (`user_id`, `workspace_id`, `artifact_kind`, `rating`, `comment`, `created_at`), RLS-scoped to the owning user; admin reads go through the service role.
223225

226+
A small set of structural reinforcements landed during the launch-readiness cleanup (DEVLOG Day 80) that are worth flagging here because they're load-bearing on the entitlement and read-fast paths: (1) a BEFORE-UPDATE trigger on `app_users` rejects non-`service_role` writes to `plan_tier` / `account_status`, so the unrestricted RLS UPDATE policy can no longer be abused to PATCH one's own tier; the legacy daily-quota path now sources tier from `resolve_user_tier` (which reads `aijobagent_subscriptions`) instead of `app_users.plan_tier`; (2) `save_saved_job` is now an atomic SECURITY DEFINER RPC that count-and-inserts in one transaction (advisory lock), closing the TOCTOU window where two concurrent saves at count=cap−1 could both pass and exceed the persistent cap; (3) `/workspace/quota`'s `_persistent_count()` no longer reads the fat `saved_workspaces` blob — a `count_active(user_id)` head-read returns 0/1 without deserializing `workflow_snapshot_json` / `cover_letter_payload_json` / `tailored_resume_payload_json`. `saved_workspaces` per-tier caps are pinned to **1/1/1** because the schema is one-row-per-user (multi-row history is flagged as a future enhancement requiring a schema migration).
227+
224228
## Observability And Telemetry Layer
225229

226230
Wired Day 46. The compliance posture is enforced at the SDK-init level, not as legalese on a privacy page — see [ADR-024](adr/ADR-024-observability-stack-sentry-and-posthog.md) and [ADR-025](adr/ADR-025-eu-cookie-consent-banner-and-gdpr-analytics-gating.md).
@@ -232,6 +236,8 @@ Two vendors, one bootstrap path:
232236

233237
Both clients are no-ops when their DSN / key is empty, so dev, CI, and the test suite run without observability wiring or network calls.
234238

239+
The launch-readiness cleanup (DEVLOG Day 80) added three reinforcements to this surface: (1) Sentry breadcrumbs / tags / context / user are now set on each pipeline stage in `src/agents/orchestrator.py` (via the stage-boundary callback, not the orchestrator internals) and on the export route, so a mid-pipeline 5xx is localizable to the failing agent — defeating the AI-Agents-Monitoring blind spot ADR-024 was adopted for; (2) the `saved-workspaces-retention` sweeper got its `sentry_cron_monitor` wrapper so a stuck retention cron now pages instead of silently leaving Free data past its 7-day retention promise; (3) backend events emitted by unauthenticated callers now carry the browser's PostHog distinct id via a new `X-PostHog-Distinct-Id` request header — the previous `"anonymous"` constant collapsed every anon visitor onto one PostHog person and made anonymous→signup conversion uncomputable.
240+
235241
### Consent gating
236242

237243
The single source of truth is `localStorage["jobagent-cookie-consent"]`, set by the custom in-house cookie banner (`frontend/src/components/cookie-consent.tsx`), three states: `pending` / `accepted` / `declined`. The split:
@@ -245,6 +251,20 @@ A `jobagent-cookie-consent-change` custom event re-evaluates the gated integrati
245251

246252
A Sentry Uptime monitor pings `https://api.job-application-copilot.xyz/health` every 5 minutes from the EU region. Configured in the Sentry dashboard rather than in code — a fresh-project rebuild must recreate it manually.
247253

254+
## Browser security baseline
255+
256+
The Next.js app sends a fixed set of response headers on every route, configured via `headers()` in `frontend/next.config.ts`. The defense-in-depth posture is the same on the marketing site and the workspace subdomain:
257+
258+
- **`X-Frame-Options: DENY`** + **`Content-Security-Policy: frame-ancestors 'none'`** — clickjacking defense. The workspace can't be framed and overlaid to trick a signed-in user into destructive actions; SameSite=Lax cookies would otherwise ride along on top-level navigation.
259+
- **`Strict-Transport-Security: max-age=63072000; includeSubDomains; preload`** — HTTPS for two years across all subdomains, preload-eligible.
260+
- **`X-Content-Type-Options: nosniff`** — disables MIME-type sniffing on responses (resource loaders honor the declared `Content-Type`).
261+
- **`Referrer-Policy: strict-origin-when-cross-origin`** — strips path + query from the Referer on cross-origin navigation while keeping it intact within the site.
262+
- **`Content-Security-Policy`** as Report-Only for the first weeks of public traffic — same-origin defaults plus the actual allowlist (PostHog `eu.i.posthog.com`, Sentry `*.sentry.io`, Lemon Squeezy, Supabase `*.supabase.co`). Tuning to enforce-mode tracks violation reports in Sentry.
263+
264+
The launch-readiness pass that introduced this baseline is DEVLOG Day 79 (FE-SEC-1). Backend-side, every backend-supplied redirect URL the client navigates to passes through an explicit allowlist (`frontend/src/lib/redirectAllowlist.ts``safeRedirect` / `isAllowedRedirect`) so the OAuth handoff + workspace-shell redirects can't be steered to an attacker-controlled origin (DEVLOG Day 80, M7).
265+
266+
The accessible-overlay primitive (`frontend/src/lib/useAccessibleDialog.ts`) is the shared focus-trap + initial-focus + Escape + focus-restore contract behind every modal surface in the workspace shell — the ⌘K command palette and the assistant FAB use it directly; the palette also gets combobox/listbox semantics (`role="combobox"` + `aria-expanded` + `aria-controls` + `aria-activedescendant`, list `role="listbox"`, items `role="option"` with `aria-selected`). DEVLOG Day 79 (A11Y-1/A11Y-2).
267+
248268
## Testing Model
249269

250270
The repo includes focused tests for:

0 commit comments

Comments
 (0)