Skip to content

Commit 507cb3f

Browse files
committed
Merge PR: Medium + Low cleanup (24 Mediums + 8 Lows from audit swarm)
Clears the Medium/Low backlog scoped out of the launch PR (a868b24). Three phases, 13 commits on the feature branch. 980 pytest pass (up from 502 baseline -- this PR adds substantial new test coverage), 33 Vitest, clean production build. Phase 1 -- Tier-1 Mediums: - M1 plan_tier/account_status BEFORE-UPDATE trigger; daily quota sourced via resolve_user_tier (36c2aa8) - M5-M10, M14, M16 bundled in d42332b: iframe sandbox=""; session- replay PII masking; redirect allowlist (safeRedirect); JD-parse 429 CTA; clear-then-repaste re-parse; sign-out content reset; account-popover focus + disclosure semantics; cancellable JD upload Phase 2 -- Tier-2 Mediums (3 sub-commits): - M2,M4,M18,M19 (c18109b): atomic save_saved_job RPC; /workspace/quota count_active() (no blob deser); billing-portal tests; saved_workspaces caps -> 1/1/1 - M21-M23 (11eb8c5): anon PostHog distinct-id via header; retention- sweeper Sentry cron monitor + inventory; Sentry breadcrumbs / tags / context / user on analysis + export - M11,M12,M13,M17,M24 (1a3bc69): job-grid memoization + JobCard React.memo with stabilized callbacks; route-gated session replay; --fg-4 contrast; deleted dead BackendHealth; JD-paste no longer collapses input mid-edit Phase 3 -- Lows (one commit each): - L1 (2987364) gate /health/sentry-debug behind admin secret - L2 (1cb5a63) allow_redirects=False on GitHub README fetch - L3 (cf6f8f4) drop job.result on first terminal get; TTL 1800->600 - L4 (b7f2884) regression test for web_search 30s timeout (fix already shipped in launch PR LLM-2) - L5 (a4239c8) JD busy hint reset gated on current abort - L6 (7035d2f) voice-input pulse driven by class for reduced-motion - L7 (064532c) deleted dead askWorkspaceAssistant client fn; kept the backend route as a tested lockstep fallback - L8 (3ec4b6a) assert Secure/SameSite + clear scope on auth cookies Audit-trail note: Phases 1 and 2 deviated from the brief's "one commit per finding" rule -- 20 Mediums landed across 4 squashed (but domain- coherent) commits. The per-finding audit trail is preserved in the PR body's commit -> finding table, not in git log alone. Lows held the per-finding rule (8 findings, 8 commits). Deferred / out of scope (with reasons): - M3 (process-global run concurrency) -- effectively addressed by launch PR's BACKEND-2 per-user in-flight cap; the remaining process-global piece is Architectural Rec #1/#3 - M15, M20 (export + streaming-assistant 429 upgrade CTAs) -- both wire to /pricing which currently 404s. Revisit when the pricing page / payment lands (tracked with launch PR's H1 deferral). - M19 multi-row workspace history -- shipped 1/1/1 single-slot per the handoff's "if unsure" guidance; multi-row would need a schema migration + un-deferring the structural-enforcement test. - M11 follow-ups -- (a) wrap WorkspaceShell's ~15 JobSearch callbacks in useCallback to activate the JobSearch React.memo on stream / keystroke churn; (b) add grid virtualization (needs a windowing dependency). - L3 optional periodic prune timer -- the terminal-get drop + lowered TTL already bound resident memory. Test gates (final pre-merge, all green): - uv run pytest -q -> 980 passed - npx vitest run -> 33 passed - npx tsc --noEmit -> clean - npm run build -> production build succeeds
2 parents a868b24 + 3ec4b6a commit 507cb3f

43 files changed

Lines changed: 1831 additions & 154 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

backend/app.py

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
from contextlib import asynccontextmanager
22
from typing import AsyncIterator
33

4-
from fastapi import FastAPI, Request
4+
from fastapi import Depends, FastAPI, Request
55
from fastapi.middleware.cors import CORSMiddleware
66
from fastapi.responses import JSONResponse
77
from slowapi.errors import RateLimitExceeded
@@ -16,7 +16,11 @@
1616
from backend.routers.auth import router as auth_router
1717
from backend.routers.billing import router as billing_router
1818
from backend.routers.health import router as health_router
19-
from backend.routers.jobs import admin_router as jobs_admin_router, router as jobs_router
19+
from backend.routers.jobs import (
20+
_verify_refresh_secret,
21+
admin_router as jobs_admin_router,
22+
router as jobs_router,
23+
)
2024
from backend.routers.workspace import router as workspace_router
2125
from src.errors import QuotaExceededError
2226

@@ -119,16 +123,23 @@ def root():
119123

120124

121125
@app.get("/health/sentry-debug")
122-
def sentry_debug() -> None:
126+
def sentry_debug(_: None = Depends(_verify_refresh_secret)) -> None:
123127
"""Raise an unhandled exception so Sentry sees the issue end-to-end.
124128
125129
Used once at deploy time to confirm the DSN, environment, and
126130
release tag are wired. The route returns no JSON — Sentry's
127131
FastAPI integration catches the raise, ships the event, and
128132
FastAPI's default 500 handler returns "Internal Server Error" to
129-
the caller. There is intentionally no auth on this route: it must
130-
be callable from anywhere with curl. Remove or gate it behind a
131-
feature flag if your threat model objects.
133+
the caller.
134+
135+
Gated behind the admin bearer secret (``_verify_refresh_secret``,
136+
review L1): previously anyone could curl this to mint 500s / Sentry
137+
noise on demand. Deploy-time smoke testing still works — pass the
138+
``REFRESH_CACHE_SECRET`` bearer token — but anonymous callers now get
139+
a 401 (or 503 if no secret is configured) before the body ever runs,
140+
so the crash only fires for an authorized caller. The gate raises an
141+
HTTPException, which FastAPI handles cleanly (not an unhandled 500),
142+
so a blocked call generates no Sentry event.
132143
133144
Path is OUTSIDE ``settings.api_prefix`` (no /api/) so it doesn't
134145
accidentally appear in the workspace-API surface that auth /

backend/maintenance.py

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,11 @@
4040
from datetime import datetime, timedelta, timezone
4141
from typing import Any, Optional
4242

43+
from backend.observability import (
44+
SAVED_WORKSPACES_RETENTION_MONITOR_CONFIG,
45+
SAVED_WORKSPACES_RETENTION_MONITOR_SLUG,
46+
sentry_cron_monitor,
47+
)
4348
from backend.tiers import Tier, resolve_user_tier, retention_days_for_tier
4449
from src.config import (
4550
SUPABASE_SAVED_WORKSPACES_TABLE,
@@ -309,7 +314,16 @@ def main() -> None:
309314
run) invokes this with `python -m backend.maintenance`; output
310315
is JSON so structured-log pipelines can ingest it directly.
311316
"""
312-
summary = sweep_expired_workspaces()
317+
# Wrap the sweep in a Sentry cron check-in (review M22) so a stopped or
318+
# failing retention cron pages the operator instead of silently leaving
319+
# Free-tier data past its retention promise. No-op when Sentry is
320+
# disabled / under pytest; the monitor is upserted from the config so it
321+
# never needs touching in the Sentry UI.
322+
with sentry_cron_monitor(
323+
SAVED_WORKSPACES_RETENTION_MONITOR_SLUG,
324+
SAVED_WORKSPACES_RETENTION_MONITOR_CONFIG,
325+
):
326+
summary = sweep_expired_workspaces()
313327
print(json.dumps(summary.to_dict(), indent=2))
314328

315329

backend/observability.py

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -262,6 +262,46 @@ def _init_posthog(settings: BackendSettings) -> None:
262262
# product-scoped dashboards.
263263
_PRODUCT_TAG = "jobagent"
264264

265+
# Header the browser sets on anonymous calls (review M21) so a server-side
266+
# funnel event can be attributed to the SAME PostHog person as the client-side
267+
# session, instead of every anonymous visitor collapsing onto one shared
268+
# "anonymous" distinct id. The browser sends `posthog.get_distinct_id()`; on
269+
# login the client calls `posthog.identify(userId)`, which auto-aliases that
270+
# anonymous id to the Supabase id — so the whole pre-login → signup path
271+
# stitches into one person and anonymous→signup conversion is computable.
272+
#
273+
# PostHog distinct ids are client-controlled by design, so trusting this header
274+
# for analytics attribution carries no security weight: a signed-in caller's
275+
# Supabase id ALWAYS takes precedence (see `resolve_distinct_id`), and the value
276+
# is never used for auth, ownership, or quota — only as a PostHog person key.
277+
POSTHOG_DISTINCT_ID_HEADER = "X-PostHog-Distinct-Id"
278+
279+
# Upper bound on the browser-supplied id we'll honor. PostHog's own ids are
280+
# short uuids; this just stops a pathological header from ballooning an event.
281+
_MAX_BROWSER_DISTINCT_ID_LEN = 200
282+
283+
284+
def resolve_distinct_id(
285+
user_id: str | None,
286+
browser_distinct_id: str | None = None,
287+
) -> str:
288+
"""Pick the best PostHog distinct id for an event (review M21).
289+
290+
Precedence: the authenticated Supabase id, then the browser's own anonymous
291+
distinct id (so anonymous funnel events join the client-side session rather
292+
than collapsing onto one shared ``"anonymous"`` person), then the
293+
``"anonymous"`` constant as a last resort for server-to-server callers with
294+
neither. The browser value is length-clamped and never trusted for anything
295+
but this person key.
296+
"""
297+
resolved = (user_id or "").strip()
298+
if resolved:
299+
return resolved
300+
browser = (browser_distinct_id or "").strip()
301+
if browser:
302+
return browser[:_MAX_BROWSER_DISTINCT_ID_LEN]
303+
return "anonymous"
304+
265305

266306
def capture_event(
267307
distinct_id: str,
@@ -359,6 +399,21 @@ def shutdown_observability() -> None:
359399
"recovery_threshold": 1,
360400
}
361401

402+
# The tier-aware retention sweeper (backend/maintenance.py) runs daily and
403+
# deletes Free workspaces > 7d / Pro > 30d. Unmonitored, a stopped cron would
404+
# silently leave Free-tier data past its stated retention / GDPR promise with
405+
# nothing to page the operator (review M22). Daily at 03:00 UTC; reconcile the
406+
# `value` here with the actual prod crontab if it differs.
407+
SAVED_WORKSPACES_RETENTION_MONITOR_SLUG = "saved-workspaces-retention"
408+
SAVED_WORKSPACES_RETENTION_MONITOR_CONFIG: dict[str, Any] = {
409+
"schedule": {"type": "crontab", "value": "0 3 * * *"},
410+
"timezone": "UTC",
411+
"checkin_margin": 60,
412+
"max_runtime": 10,
413+
"failure_issue_threshold": 1,
414+
"recovery_threshold": 1,
415+
}
416+
362417

363418
def _sentry_active() -> bool:
364419
"""True when cron check-ins should actually be sent.
@@ -441,3 +496,99 @@ def sentry_cron_monitor(
441496
duration=time.monotonic() - started,
442497
monitor_config=monitor_config,
443498
)
499+
500+
501+
# ---------------------------------------------------------------------------
502+
# Sentry scope-enrichment helpers (review M23)
503+
#
504+
# Analysis (POST /workspace/{id}/run) and artifact export raise into Sentry as
505+
# bare 5xx with no actor, no pipeline stage, and no export descriptor — every
506+
# issue reads identically and triage starts from zero. These thin wrappers add
507+
# the actor (set_user), the failing stage (set_tag + breadcrumb), and the
508+
# export descriptor (set_context) onto the active Sentry scope.
509+
#
510+
# Telemetry must never break the request it decorates: each helper is a no-op
511+
# when Sentry is inactive (or under pytest) and swallows every error. They are
512+
# intentionally tiny pass-throughs so callers don't import sentry_sdk directly
513+
# or repeat the hasattr/guard dance — and so the orchestrator's own control
514+
# flow (praised, left untouched) never has to learn about Sentry.
515+
# ---------------------------------------------------------------------------
516+
517+
518+
def _sentry_sdk_or_none():
519+
"""Return the ``sentry_sdk`` module when enrichment should apply, else None.
520+
521+
Mirrors ``_sentry_active``'s guard (pytest off, client active) and folds in
522+
the import so every helper below is a one-liner that can't raise on a
523+
missing/disabled SDK.
524+
"""
525+
if not _sentry_active():
526+
return None
527+
try:
528+
import sentry_sdk
529+
except Exception: # pragma: no cover — defensive
530+
return None
531+
return sentry_sdk
532+
533+
534+
def set_sentry_user(user_id: str | None) -> None:
535+
"""Attach the acting user's id to the Sentry scope (just the id, no PII).
536+
537+
Lets an analysis/export 5xx be filtered to a single account instead of
538+
reading as an anonymous platform-wide failure. No-op for anonymous calls
539+
(falsy ``user_id``) and when Sentry is inactive.
540+
"""
541+
if not user_id:
542+
return
543+
sentry_sdk = _sentry_sdk_or_none()
544+
if sentry_sdk is None:
545+
return
546+
with suppress(Exception):
547+
if hasattr(sentry_sdk, "set_user"):
548+
sentry_sdk.set_user({"id": str(user_id)})
549+
550+
551+
def set_sentry_tag(key: str, value: Any) -> None:
552+
"""Set an indexed Sentry tag (e.g. ``pipeline_stage``) on the active scope."""
553+
sentry_sdk = _sentry_sdk_or_none()
554+
if sentry_sdk is None:
555+
return
556+
with suppress(Exception):
557+
if hasattr(sentry_sdk, "set_tag"):
558+
sentry_sdk.set_tag(key, value)
559+
560+
561+
def set_sentry_context(key: str, data: dict[str, Any]) -> None:
562+
"""Attach a structured context block (e.g. the export descriptor)."""
563+
sentry_sdk = _sentry_sdk_or_none()
564+
if sentry_sdk is None:
565+
return
566+
with suppress(Exception):
567+
if hasattr(sentry_sdk, "set_context"):
568+
sentry_sdk.set_context(key, dict(data))
569+
570+
571+
def add_sentry_breadcrumb(
572+
*,
573+
category: str,
574+
message: str,
575+
data: dict[str, Any] | None = None,
576+
level: str = "info",
577+
) -> None:
578+
"""Drop a breadcrumb so a later error carries the trail that led to it.
579+
580+
Used at orchestrator stage boundaries (``category="agent"``) so an analysis
581+
failure shows which stage was entered last — without touching the
582+
orchestrator's own control flow.
583+
"""
584+
sentry_sdk = _sentry_sdk_or_none()
585+
if sentry_sdk is None:
586+
return
587+
with suppress(Exception):
588+
if hasattr(sentry_sdk, "add_breadcrumb"):
589+
sentry_sdk.add_breadcrumb(
590+
category=category,
591+
message=message,
592+
level=level,
593+
data=dict(data or {}),
594+
)

backend/routers/jobs.py

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,9 @@
1515
CACHED_JOBS_HEALTHCHECK_MONITOR_SLUG,
1616
CACHED_JOBS_REFRESH_MONITOR_CONFIG,
1717
CACHED_JOBS_REFRESH_MONITOR_SLUG,
18+
POSTHOG_DISTINCT_ID_HEADER,
1819
capture_event,
20+
resolve_distinct_id,
1921
sentry_cron_monitor,
2022
)
2123
from backend.rate_limit import LIMIT_LLM, limiter
@@ -101,10 +103,15 @@ def search_jobs(
101103
result = service.search(domain_query) if live else service.search_cached(domain_query)
102104
# PostHog funnel event — the top of the job-application funnel.
103105
# Server-side capture, fire-and-forget; carries no PII (counts +
104-
# tier only). `quota_user_id` is "" for anonymous callers, which
105-
# capture_event maps to the "anonymous" distinct id.
106+
# tier only). For anonymous callers `quota_user_id` is "", so we fall
107+
# back to the browser's own PostHog distinct id (review M21) — passed in
108+
# the X-PostHog-Distinct-Id header — so this top-of-funnel event attaches
109+
# to the same person the client identifies on signup, instead of every
110+
# anonymous visitor collapsing onto one shared "anonymous" id.
106111
capture_event(
107-
distinct_id=quota_user_id,
112+
distinct_id=resolve_distinct_id(
113+
quota_user_id, request.headers.get(POSTHOG_DISTINCT_ID_HEADER)
114+
),
108115
event="job_searched",
109116
properties={
110117
"mode": "live" if live else "cached",

backend/routers/workspace.py

Lines changed: 33 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,11 @@
11
from fastapi import APIRouter, Depends, File, HTTPException, Request, UploadFile
22
from fastapi.responses import StreamingResponse
33

4-
from backend.observability import capture_event
4+
from backend.observability import (
5+
capture_event,
6+
set_sentry_context,
7+
set_sentry_user,
8+
)
59
from backend.quota import enforce_export_entitlement, enforce_llm_budget
610
from backend.rate_limit import LIMIT_HEAVY, LIMIT_LLM, LIMIT_PARSE, limiter
711
from backend.request_auth import get_optional_auth_tokens, get_required_auth_tokens
@@ -1106,6 +1110,17 @@ def answer_assistant_question(
11061110
payload: WorkspaceAssistantRequestModel,
11071111
auth_tokens=Depends(get_required_auth_tokens),
11081112
):
1113+
"""Non-streaming assistant answer — retained as a tested fallback (L1/L7).
1114+
1115+
The UI talks only to the SSE sibling below; the dead client wrapper
1116+
(``askWorkspaceAssistant``) was removed. This route is deliberately KEPT
1117+
rather than deleted (review L7): it shares the SAME accounted code path
1118+
(``answer_workspace_question`` -> assistant_service, one monthly
1119+
assistant-turn counter across both routes) and is pinned by the
1120+
quota / login-required / error-handling suites, so it cannot silently
1121+
drift from the stream. If a future change makes it a true parallel
1122+
contract instead of a thin sync mirror, delete it and its tests then.
1123+
"""
11091124
access_token, refresh_token = auth_tokens
11101125
try:
11111126
return answer_workspace_question(
@@ -1290,6 +1305,23 @@ def export_workspace_artifact_route(
12901305
export_format=payload.export_format,
12911306
themes=(export_theme,),
12921307
)
1308+
# Resolve the actor once and reuse it for both Sentry enrichment and the
1309+
# PostHog funnel event below. Enrich the Sentry scope BEFORE the render
1310+
# (review M23) so an export crash — captured by the request scope — carries
1311+
# who exported what (format / theme / artifact) instead of a bare 5xx.
1312+
# Both calls are no-ops when Sentry is inactive.
1313+
user_id, tier = _event_identity(access_token or "", refresh_token or "")
1314+
set_sentry_user(user_id)
1315+
set_sentry_context(
1316+
"export",
1317+
{
1318+
"artifact_kind": payload.artifact_kind,
1319+
"export_format": payload.export_format,
1320+
"resume_theme": payload.resume_theme,
1321+
"cover_letter_theme": payload.cover_letter_theme,
1322+
"tier": tier,
1323+
},
1324+
)
12931325
try:
12941326
result = export_workspace_artifact(
12951327
workspace_snapshot=payload.workspace_snapshot,
@@ -1301,7 +1333,6 @@ def export_workspace_artifact_route(
13011333
# PostHog funnel event — the conversion point: the user took an
13021334
# artifact away. Theme + format properties show which export
13031335
# options actually get used.
1304-
user_id, tier = _event_identity(access_token or "", refresh_token or "")
13051336
capture_event(
13061337
distinct_id=user_id,
13071338
event="artifact_exported",

backend/services/auth_session_service.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,14 @@
44
from typing import Any
55

66
from backend.config import get_backend_settings
7+
from backend.tiers import resolve_user_tier
78
from src.auth_service import AuthService, AuthSession
89
from src.config import (
910
AUTH_DEFAULT_ACCOUNT_STATUS,
1011
assisted_workflow_requires_login,
1112
get_default_plan_tier_for_email,
1213
)
13-
from src.errors import AgentExecutionError, AppError, InputValidationError
14+
from src.errors import AgentExecutionError, InputValidationError
1415
from src.openai_service import OpenAIService
1516
from src.quota_service import QuotaService
1617
from src.saved_jobs_store import SavedJobsStore
@@ -94,11 +95,16 @@ def _load_daily_quota(
9495
if not usage_store.is_configured():
9596
return None
9697
quota_service = QuotaService(auth_service, usage_store)
98+
# Source the tier from resolve_user_tier (subscriptions-backed), NOT
99+
# app_user.plan_tier (review M1). app_users.plan_tier is client-writable
100+
# via the RLS UPDATE policy, so trusting it here let a self-promoted
101+
# plan_tier raise the daily allowance; resolve_user_tier reads the
102+
# service-role subscriptions table the rest of the gates already use.
97103
return quota_service.get_daily_quota_status(
98104
access_token,
99105
refresh_token,
100106
app_user.id,
101-
app_user.plan_tier,
107+
resolve_user_tier(app_user),
102108
)
103109

104110

backend/services/resume_builder_tools.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,13 @@ def _fetch_text(url: str, *, timeout: float, max_bytes: int) -> dict:
134134
response = requests.get(
135135
url,
136136
timeout=timeout,
137-
allow_redirects=True,
137+
# Do NOT follow redirects (review L2). The caller validates the
138+
# URL host up front, but a followed redirect chain isn't re-checked
139+
# per hop — a residual SSRF surface if raw.githubusercontent.com
140+
# ever 3xx'd off-host. The validated raw URL returns 200 with the
141+
# body directly; an unexpected redirect now lands as a non-200 and
142+
# is rejected as "http_status" below (fail closed).
143+
allow_redirects=False,
138144
# raw.githubusercontent.com returns text — we don't need
139145
# any of the github.com auth/session cookies, and we don't
140146
# send a custom UA so we look like a vanilla client.

0 commit comments

Comments
 (0)