This document describes the current runtime architecture of the AI Job Application Agent.
The app helps a candidate:
- sign in with Google
- upload and parse a resume
- search technical jobs or import a supported job link
- upload or paste a job description
- review a structured JD summary
- run a grounded agentic workflow
- review a tailored resume and cover letter
- ask grounded follow-up questions in the workspace assistant
- export DOCX or PDF versions of the generated documents (the earlier Markdown export path was removed in 2026-05; see ADR-015)
The product now runs as a split web application:
frontend/is the Next.js workspace deployed on Vercelbackend/is the FastAPI API deployed on the VPSsrc/contains the shared Python workflow, builders, orchestration, auth helpers, and persistence logicbackend/vps/contains the Docker Compose + Caddy deployment bundle for the backend stack
This is no longer a Streamlit runtime. The old Streamlit shell and related deployment files were removed from the active codebase.
- The user opens the Next.js workspace.
- The user signs in with Google through Supabase-backed auth endpoints.
- The user uploads a resume.
- The backend parses the resume and builds a normalized candidate profile.
- The user can search configured Greenhouse, Lever, Ashby, and Workday sources via the Supabase-cached job index (or paste a supported job URL, or continue manually with JD text).
- The app builds a structured JD summary for review.
- The user explicitly triggers the agentic workflow.
- The orchestrator runs
tailoring,review,resume_generation, andcover_letter. The earlierfitandstrategystages were removed from the live workflow; the deterministic fit-scoring service insrc/services/fit_service.pyis still available as a building block fortailoringbut is no longer a visible workflow stage. - Builders assemble the tailored resume and cover letter.
- The workspace assistant answers grounded questions from the current workspace state.
- Export helpers produce DOCX and PDF files for the current document; both formats share one typed
ThemeSpecregistry — 12 résumé themes (6 single-column ATS-safe + 6 two-column).professional_neutralis the product-wide default theme. Format + theme are an entitlement gate: Free is limited to PDF +professional_neutral, Pro/Business unlock DOCX + every non-default theme, enforced server-side on both export routes via the shared 429 upgrade path (see ADR-027, ADR-029, ADR-032). - For authenticated users, the latest workspace snapshot and saved jobs are persisted in Supabase.
Owns the user-facing workspace:
- account state and sign-in flow (signed-out users hitting
/workspaceget redirected to the landing page; cross-origin host strip mirrors the existing app-subdomain middleware) - resume intake (Upload mode + Build with assistant conversational chat; see ADR-016)
- job search and saved jobs
- JD review
- workflow progress UI
- document preview and export actions
- assistant chat — not gated; answers product-help questions from the first visit and grounded package questions once an analysis has run; see ADR-017
Step rail navigation: Resume / Job Search / Job Detail are independently accessible — a user can paste a JD without a resume, or browse listings without uploading anything. Only Analysis is gated (it requires both a parsed resume and a parsed JD); the rail-level lock is a hint, and the AnalysisRunner page surfaces what's missing when the user lands there early. See ADR-019.
Owns the FastAPI API surface:
backend/app.pybootstraps the APIbackend/observability.pyis the single observability bootstrap, imported beforeFastAPI()is constructed so the Sentry ASGI middleware wraps the app at startup; no-op when the Sentry DSN / PostHog key are empty; see the Observability And Telemetry Layer section and ADR-024backend/nightly_eval.pyis the manual-only LLM quality-regression CLI; exists + tested but deliberately not on a production cron at pre-revenue stage; see ADR-026backend/routers/health.pyexposes deployment smoke signals (also the Sentry Uptime monitor target)backend/routers/jobs.pyexposes the cache-backed search, the?live=trueescape-hatch fan-out, direct job-resolution endpoints, and the bearer-protectedPOST /admin/refresh-cacheendpoint that drives the cached-jobs refresh workerbackend/routers/auth.pyowns auth/session endpointsbackend/routers/workspace.pyowns resume, JD, workflow, assistant (both non-streaming and SSE), persistence, preview, export, resume-builder chat, resume-builder export, voice transcription (/workspace/transcribe), and artifact feedback (/workspace/feedback) endpointsbackend/routers/billing.pyowns the HMAC-verifiedPOST /webhooks/lemonsqueezysubscription-event endpoint + the customer-portal redirect; the signature-verification + event-routing logic lives inbackend/webhooks/lemonsqueezy.pybackend/prompt_registry.pyloads every LLM prompt fromprompts/<name>/<version>.json— all 11 builders migrated off Python f-string concats; see ADR-018 family + the prompt-registry DEVLOG entriesbackend/services/job_cache_service.pyruns the per-source refresh + smart-cleanup worker invoked by the admin endpointbackend/services/workspace_run_jobs.pyowns the async/analyze-jobsjob system. EachWorkspaceRunJobis bound to itsowner_user_idat start time and the status/cancel routes check it (returning 404 — same code for "unknown" and "not yours" — so existence isn't confirmed); the quota gate runs synchronously before the worker is spawned, with the structured{code, counter, cap, tier, reset_period}envelope round-tripped through_serialize_jobso the polling hook renders the same 429 upgrade CTA the sync path does; a per-user in-flight cap (1 run/user) sits in front of the process-globalBoundedSemaphore(5)so one user's burst can't 503 every other account. The launch-readiness pass that introduced these guarantees is DEVLOG Day 79backend/routers/health.pyalso hosts/health/sentry-debug— now gated behind the admin bearer secret so an unauthenticated curl gets a 401 instead of aZeroDivisionErrorthat would burn Sentry quota (DEVLOG Day 80)
Owns deterministic business logic:
- candidate-profile construction from resume input (
profile_service.py) - JD normalization (
job_service.py) plus ajd_summary_service.pyview layer - LLM-hybrid resume + JD parsers (
resume_llm_parser_service.py,jd_llm_parser_service.py) — pure-LLM source of truth with a deterministic fallback - fit scoring (
fit_service.py) — still used by tailoring, no longer a visible workflow stage - first-pass tailoring guidance (
tailoring_service.py)
These services are transport-agnostic and do not depend on Next.js or FastAPI.
Owns the supervised orchestration layer.
The active orchestrator path runs:
- tailoring
- review
- resume generation
- cover letter
The earlier fit and strategy stages are no longer part of the live workflow. The TailoringAgent consumes the structured FitAnalysis produced by src/services/fit_service.py directly — no FitAgent narration step. Each agent has a Tier-2/Tier-3 quality runner under tests/quality/ that scores it on fixture (resume, JD) pairs.
Per-agent retry + fallback isolation. Each agent step inside the orchestrator gets its own retry budget and its own fallback path. If an agent's LLM call raises AgentExecutionError (after the OpenAI service's own SDK + app-level retries exhaust), the orchestrator retries the agent's full .run(...) once with a 400 ms delay. If the retry also fails, only THAT agent's deterministic fallback runs — downstream agents continue trying the LLM path. A single bad packet during the Forge agent no longer cascades to "downgrade the whole pipeline to deterministic." The whole-pipeline deterministic fallback remains as a safety net for the unusual case where a per-agent deterministic path itself errors out. If every agent ended up falling back per-agent (zero LLM successes), result.mode is honestly downgraded to deterministic_fallback. See ADR-018.
Owns grounded prompt builders for the specialist agents and assistant.
Owns the thin OpenAI wrapper used by the workflow and assistant layers.
Responsibilities include:
- task-aware model routing
- Responses API calls (JSON-contract path via
run_json_prompt, streaming prose path viarun_text_stream) - GPT-5 reasoning-effort routing
- usage accounting metadata
- optional persisted usage-event callbacks
- daily-quota preflight checks
- output-budget retry handling (when responses are truncated due to insufficient
max_output_tokens) - application-level retry on top of the OpenAI Python SDK's own retries (
max_retries=2) — adds one extra attempt on the narrow allow-listAPIConnectionError/APITimeoutError/InternalServerError. Everyresponses.createin the codebase routes through_create_response_with_app_retry, so the resume parser, JD parser, JD summary, all four supervised-workflow agents, and the assistant chat all inherit the retry layer for free. See ADR-018.
Owns the single in-app assistant behavior. The chat is not gated on having run an analysis — it answers product-help questions ("how do I use this?", "what's step 03 for?") from the very first visit and grounded package questions ("summarize my fit") once an analysis has run. See ADR-017.
Responsibilities include:
- routing between product-help questions and grounded package questions
- compact workspace-context assembly, including a
workspace_stateprojection (current_step,has_resume,resume_summary,has_jd,jd_summary,has_analysis,saved_jobs_count,last_search_query) sent on every query so the LLM can answer state-aware questions before any analysis exists - deterministic fallback behavior when assisted execution is unavailable
src/resume_builder.py: deterministic tailored-resume assemblysrc/cover_letter_builder.py: deterministic grounded cover-letter assemblysrc/exporters.py: DOCX/PDF export helpers (export_docx_bytes,export_pdf_bytes) plus HTML preview generation, sharing a theme palette across formats; see ADR-015src/job_sources/: per-provider adapter implementations (Greenhouse, Lever, Ashby, Workday) feeding the cached-jobs refresh worker
The user-facing workspace is now centered on two visible outputs:
- tailored resume
- cover letter
Both ship in 12 themes (6 single-column ATS-safe + 6 two-column, all from one typed ThemeSpec registry — see ADR-029 and ADR-032) and both formats (DOCX, PDF). The earlier Markdown export path was removed in 2026-05 alongside the DOCX rollout. The earlier internal report builder was removed when the FitAgent + bundle endpoint were retired.
src/auth_service.py: Supabase Auth wrapper for Google OAuthsrc/user_store.py: syncs lightweightapp_usersrecordssrc/usage_store.py: persists authenticated assisted usage eventssrc/quota_service.py: computes daily quota state from persisted usagesrc/saved_workspace_store.py: persists and loads the latest reloadable workspace snapshotsrc/saved_jobs_store.py: persists and loads shortlisted jobssrc/cached_jobs_store.py: service-role-backed access layer for the globalcached_jobsindex — bulk upsert (with embed-on-write for newly-cached jobs when hybrid search is enabled), smart cleanup, and lexical + hybrid search via Postgres RPCs; see ADR-013 and ADR-014src/resume_builder_store.py: persists and loads conversational resume-builder draft sessions (resume_builder_sessionstable) with the 7-day TTL + active-user refresh policy; see ADR-016
Owns environment-backed configuration for:
- model routing
- reasoning routing
- quota defaults
- auth and Supabase settings
- saved-workspace retention settings
- frontend/backend integration settings
Owns shared typed models for:
- resumes
- candidate profiles
- work experience
- education
- job descriptions
- fit analyses
- tailoring drafts
- tailored resume artifacts
- cover letter artifacts
- internal reports
- agent outputs
- orchestrated workflow results
- auth and persistence records
The runtime uses a split state model:
- browser state for the current workspace session
- Supabase Postgres for authenticated persistence and the global cached-jobs index
Per-user persistent state:
app_usersusage_eventssaved_workspacessaved_jobsresume_builder_sessions
Global (non-user-scoped) state:
cached_jobs— the indexed set of upstream postings refreshed every 4 hours (six times a day) by the backend'srefresh_cached_jobsworker; see ADR-013
Each saved_workspaces row stores one latest snapshot per user, including enough data to restore the current resume/JD/workflow state.
Each saved_jobs row stores one shortlisted posting per user and normalized job id, including:
- source/provider identity
- title, company, location, and employment type
- source URL
- normalized summary and description text
- provider metadata
- saved and updated timestamps
Each resume_builder_sessions row stores one in-progress conversational resume-builder draft per user with a 7-day TTL refreshed on every save. A pg_cron job (cleanup-expired-resume-builder-sessions) hard-deletes expired rows every 5 min and RLS hides expired rows from per-user queries; see ADR-016.
Each cached_jobs row holds one upstream posting keyed on (source, job_id). The table has GENERATED STORED columns (work_mode, employment_type_norm) backing the dropdown filters, removed_at tombstones for upstream-closed jobs the user has bookmarked, and an embedding vector(1536) column (pgvector, HNSW cosine index) for semantic search. A pg_cron + pg_net schedule (cached_jobs_refresh_4h) POSTs to /admin/refresh-cache every 4 hours, six times a day (see docs/sql/job_cache_cron_setup.sql for the template — production runs 0 */4 * * *). Search is two-tier: the lexical search_cached_jobs_ranked RPC (ADR-014) and the hybrid search_cached_jobs_hybrid RPC, which fuses that lexical ranking with a pgvector semantic ranking via Reciprocal Rank Fusion. The hybrid path is gated behind the JOB_SEARCH_HYBRID_ENABLED flag and degrades to lexical on any failure; see ADR-033. As of 2026-06-23, production runs in LEAN MODE (Supabase free-tier downgrade): the embedding column + HNSW index + hybrid RPC are DROPPED in prod and the flag is false, so live search is lexical-only. The hybrid capability remains in the codebase + the SQL files; it's a restore-on-Pro operation, not a deleted feature. See the "Hybrid-search lean/full switch" runbook in deployment.md and DEVLOG Day 82.
aijobagent_run_traces is an append-only cost-attribution table — one row per successful LLM call (user_id, model, task, prompt_tokens, completion_tokens, cost_usd, created_at). Writes are best-effort: a missing table or a write error never propagates to the user-facing path. It is the canonical answer to "what is OpenAI spend doing", separate from the Sentry/PostHog telemetry surface.
aijobagent_feedback holds one row per artifact thumbs-up/down (user_id, workspace_id, artifact_kind, rating, comment, created_at), RLS-scoped to the owning user; admin reads go through the service role.
A small set of structural reinforcements landed during the launch-readiness cleanup (DEVLOG Day 80) that are worth flagging here because they're load-bearing on the entitlement and read-fast paths: (1) a BEFORE-UPDATE trigger on app_users rejects non-service_role writes to plan_tier / account_status, so the unrestricted RLS UPDATE policy can no longer be abused to PATCH one's own tier; the legacy daily-quota path now sources tier from resolve_user_tier (which reads aijobagent_subscriptions) instead of app_users.plan_tier; (2) save_saved_job is now an atomic SECURITY DEFINER RPC that count-and-inserts in one transaction (advisory lock), closing the TOCTOU window where two concurrent saves at count=cap−1 could both pass and exceed the persistent cap; (3) /workspace/quota's _persistent_count() no longer reads the fat saved_workspaces blob — a count_active(user_id) head-read returns 0/1 without deserializing workflow_snapshot_json / cover_letter_payload_json / tailored_resume_payload_json. saved_workspaces per-tier caps are pinned to 1/1/1 because the schema is one-row-per-user (multi-row history is flagged as a future enhancement requiring a schema migration).
Wired Day 46. The compliance posture is enforced at the SDK-init level, not as legalese on a privacy page — see ADR-024 and ADR-025.
Two vendors, one bootstrap path:
- Sentry — error tracking, performance traces, AI Agents Monitoring (
OpenAIIntegration(include_prompts=False)— token/model/latency spans without prompt-body PII), Logs, and session replay (errors-only).backend/observability.pyis the only place the SDK is touched on the backend; it's imported beforeFastAPI()is constructed so the ASGI middleware wraps the app at startup. Thebefore_sendhook drops intentionalHTTPException4xx flow-control + the "not configured / temporarily unavailable" 5xx guards so the issue feed stays focused on genuine bugs. A_running_under_pytest()check skips Sentry entirely during the test suite. Frontend Sentry is wired viainstrumentation-client.ts/instrumentation.ts/sentry.server.config.ts/sentry.edge.config.ts;next.config.tsuploads source maps throughwithSentryConfig. - PostHog — product analytics, session replay, identify/group cohorts. The free Developer plan caps at one project per org, so the project is shared with the developer's other product; every event carries a
product: "jobagent"super-property (frontendposthog.register, backendcapture_eventmerge) so dashboards slice cleanly withwhere properties.product = 'jobagent'. Exception capture is off — Sentry is the source of truth for errors.
Both clients are no-ops when their DSN / key is empty, so dev, CI, and the test suite run without observability wiring or network calls.
The launch-readiness cleanup (DEVLOG Day 80) added three reinforcements to this surface: (1) Sentry breadcrumbs / tags / context / user are now set on each pipeline stage in src/agents/orchestrator.py (via the stage-boundary callback, not the orchestrator internals) and on the export route, so a mid-pipeline 5xx is localizable to the failing agent — defeating the AI-Agents-Monitoring blind spot ADR-024 was adopted for; (2) the saved-workspaces-retention sweeper got its sentry_cron_monitor wrapper so a stuck retention cron now pages instead of silently leaving Free data past its 7-day retention promise; (3) backend events emitted by unauthenticated callers now carry the browser's PostHog distinct id via a new X-PostHog-Distinct-Id request header — the previous "anonymous" constant collapsed every anon visitor onto one PostHog person and made anonymous→signup conversion uncomputable.
The single source of truth is localStorage["jobagent-cookie-consent"], set by the custom in-house cookie banner (frontend/src/components/cookie-consent.tsx), three states: pending / accepted / declined. The split:
- Always-on (legitimate interest, GDPR Art. 6(1)(f) — crash reporting is operationally necessary): Sentry error tracking + traces + Feedback widget. Load regardless of banner state.
- Consent-gated (explicit opt-in required, ePrivacy Art. 5(3)): PostHog product analytics + PostHog session replay + Sentry Session Replay. Load only when consent
=== "accepted".
A jobagent-cookie-consent-change custom event re-evaluates the gated integrations on flip without a page reload (Sentry.addIntegration(...) hot-adds Replay; PostHog opt_in_capturing() / opt_out_capturing()). The banner is in-bundle (no third-party JS loads before consent) and scoped under the .ja-cookie-banner CSS class.
A Sentry Uptime monitor pings https://api.job-application-copilot.xyz/health every 5 minutes from the EU region. Configured in the Sentry dashboard rather than in code — a fresh-project rebuild must recreate it manually.
The Next.js app sends a fixed set of response headers on every route, configured via headers() in frontend/next.config.ts. The defense-in-depth posture is the same on the marketing site and the workspace subdomain:
X-Frame-Options: DENY+Content-Security-Policy: frame-ancestors 'none'— clickjacking defense. The workspace can't be framed and overlaid to trick a signed-in user into destructive actions; SameSite=Lax cookies would otherwise ride along on top-level navigation.Strict-Transport-Security: max-age=63072000; includeSubDomains; preload— HTTPS for two years across all subdomains, preload-eligible.X-Content-Type-Options: nosniff— disables MIME-type sniffing on responses (resource loaders honor the declaredContent-Type).Referrer-Policy: strict-origin-when-cross-origin— strips path + query from the Referer on cross-origin navigation while keeping it intact within the site.Content-Security-Policyas Report-Only for the first weeks of public traffic — same-origin defaults plus the actual allowlist (PostHogeu.i.posthog.com, Sentry*.sentry.io, Lemon Squeezy, Supabase*.supabase.co). Tuning to enforce-mode tracks violation reports in Sentry.
The launch-readiness pass that introduced this baseline is DEVLOG Day 79 (FE-SEC-1). Backend-side, every backend-supplied redirect URL the client navigates to passes through an explicit allowlist (frontend/src/lib/redirectAllowlist.ts — safeRedirect / isAllowedRedirect) so the OAuth handoff + workspace-shell redirects can't be steered to an attacker-controlled origin (DEVLOG Day 80, M7).
The accessible-overlay primitive (frontend/src/lib/useAccessibleDialog.ts) is the shared focus-trap + initial-focus + Escape + focus-restore contract behind every modal surface in the workspace shell — the ⌘K command palette and the assistant FAB use it directly; the palette also gets combobox/listbox semantics (role="combobox" + aria-expanded + aria-controls + aria-activedescendant, list role="listbox", items role="option" with aria-selected). DEVLOG Day 79 (A11Y-1/A11Y-2).
The repo includes focused tests for:
- resume parsing
- JD parsing (deterministic + LLM-hybrid)
- profile normalization
- job normalization
- tailoring guidance
- orchestrator behavior
- resume and cover-letter building
- DOCX + PDF export formatting
- auth and quota behavior
- saved-workspace persistence
- saved-job persistence
- cached-jobs store + RPC arg shape
- cached-jobs refresh worker (per-source isolation, cleanup gating, status reporting)
- per-provider job source adapters (Greenhouse, Lever, Ashby, Workday)
- conversational resume-builder turn handling + structuring pass
- backend workspace routes
- assistant SSE streaming endpoint
- OpenAI application-level retry contract (
tests/test_openai_app_retry.py): retries on the narrow allow-listAPIConnectionError/APITimeoutError/InternalServerError, does NOT retry on 4xx / auth / persistent rate-limit, returns success after retry, raises on double-failure - per-agent orchestrator behavior (
tests/test_orchestrator.py): per-agent retry recovers a flaky agent, per-agent fallback isolates a single failing agent (downstream agents still use LLM),result.modereconciles todeterministic_fallbackwhen no agent succeeded with LLM - tier enforcement (
tests/backend/test_tiers.py,test_quota.py,test_workspace_quota_enforcement.py, and siblings): atomic check-and-increment under thread races, refund-on-failure, lifetime-vs-monthly period switching, P0001 → 429 translation, Business unbounded-retention skip - Lemon Squeezy webhook (
tests/backend/test_lemonsqueezy_webhook.py): HMAC signature verification, event routing, unknown-variant silent-ack - prompt registry byte-identity (
tests/test_prompts.py): every one of the 11 migrated JSON templates is asserted bit-exact against the original Python concat - voice transcription + artifact feedback backend routes (
tests/backend/test_transcribe.py,test_feedback.py): multipart handling, 60s overrun rejection, RLS-scoped feedback writes
The _running_under_pytest() guard means Sentry never initializes during the test run, so the observability wiring adds zero test-suite coupling beyond a small leaky-detail-allowlist line-offset in tests/test_error_messages.py.
Tier-2 / Tier-3 quality runners under tests/quality/ evaluate LLM-driven components (resume parser, JD parser, renderer fidelity, skill canonicalization, tailoring, review, resume generation, cover letter, resume builder, assistant, end-to-end orchestrator) on fixture sets with weighted scorecards and a --include-llm cost gate. backend/nightly_eval.py wraps these into a single unattended batch with regression-threshold checking — manual-only at pre-revenue stage, see ADR-026.
- Long AI-assisted runs still execute as one request/response cycle today; they are not yet background jobs.
- The product stores one latest saved workspace snapshot per user; it does not expose a multi-entry history browser.
- Large binary artifacts are regenerated on demand instead of being stored in object storage.
- The internal report builder still exists in Python, but the visible workspace now centers on resume and cover letter only.
The next meaningful expansion is product hardening on the current stack:
- background execution for long-running workflow jobs
- tighter hosted reliability around retries and timeouts
- continued UI simplification around review and export
- broader hosted QA across Vercel, VPS, Supabase, and Cloudflare