Skip to content

Latest commit

 

History

History
314 lines (221 loc) · 27.4 KB

File metadata and controls

314 lines (221 loc) · 27.4 KB

Architecture Overview

This document describes the current runtime architecture of the AI Job Application Agent.

System Goal

The app helps a candidate:

  • sign in with Google
  • upload and parse a resume
  • search technical jobs or import a supported job link
  • upload or paste a job description
  • review a structured JD summary
  • run a grounded agentic workflow
  • review a tailored resume and cover letter
  • ask grounded follow-up questions in the workspace assistant
  • export DOCX or PDF versions of the generated documents (the earlier Markdown export path was removed in 2026-05; see ADR-015)

Runtime Shape

The product now runs as a split web application:

  • frontend/ is the Next.js workspace deployed on Vercel
  • backend/ is the FastAPI API deployed on the VPS
  • src/ contains the shared Python workflow, builders, orchestration, auth helpers, and persistence logic
  • backend/vps/ contains the Docker Compose + Caddy deployment bundle for the backend stack

This is no longer a Streamlit runtime. The old Streamlit shell and related deployment files were removed from the active codebase.

High-Level Flow

  1. The user opens the Next.js workspace.
  2. The user signs in with Google through Supabase-backed auth endpoints.
  3. The user uploads a resume.
  4. The backend parses the resume and builds a normalized candidate profile.
  5. The user can search configured Greenhouse, Lever, Ashby, and Workday sources via the Supabase-cached job index (or paste a supported job URL, or continue manually with JD text).
  6. The app builds a structured JD summary for review.
  7. The user explicitly triggers the agentic workflow.
  8. The orchestrator runs tailoring, review, resume_generation, and cover_letter. The earlier fit and strategy stages were removed from the live workflow; the deterministic fit-scoring service in src/services/fit_service.py is still available as a building block for tailoring but is no longer a visible workflow stage.
  9. Builders assemble the tailored resume and cover letter.
  10. The workspace assistant answers grounded questions from the current workspace state.
  11. Export helpers produce DOCX and PDF files for the current document; both formats share one typed ThemeSpec registry — 12 résumé themes (6 single-column ATS-safe + 6 two-column). professional_neutral is the product-wide default theme. Format + theme are an entitlement gate: Free is limited to PDF + professional_neutral, Pro/Business unlock DOCX + every non-default theme, enforced server-side on both export routes via the shared 429 upgrade path (see ADR-027, ADR-029, ADR-032).
  12. For authenticated users, the latest workspace snapshot and saved jobs are persisted in Supabase.

Main Modules

frontend/

Owns the user-facing workspace:

  • account state and sign-in flow (signed-out users hitting /workspace get redirected to the landing page; cross-origin host strip mirrors the existing app-subdomain middleware)
  • resume intake (Upload mode + Build with assistant conversational chat; see ADR-016)
  • job search and saved jobs
  • JD review
  • workflow progress UI
  • document preview and export actions
  • assistant chat — not gated; answers product-help questions from the first visit and grounded package questions once an analysis has run; see ADR-017

Step rail navigation: Resume / Job Search / Job Detail are independently accessible — a user can paste a JD without a resume, or browse listings without uploading anything. Only Analysis is gated (it requires both a parsed resume and a parsed JD); the rail-level lock is a hint, and the AnalysisRunner page surfaces what's missing when the user lands there early. See ADR-019.

backend/

Owns the FastAPI API surface:

  • backend/app.py bootstraps the API
  • backend/observability.py is the single observability bootstrap, imported before FastAPI() is constructed so the Sentry ASGI middleware wraps the app at startup; no-op when the Sentry DSN / PostHog key are empty; see the Observability And Telemetry Layer section and ADR-024
  • backend/nightly_eval.py is the manual-only LLM quality-regression CLI; exists + tested but deliberately not on a production cron at pre-revenue stage; see ADR-026
  • backend/routers/health.py exposes deployment smoke signals (also the Sentry Uptime monitor target)
  • backend/routers/jobs.py exposes the cache-backed search, the ?live=true escape-hatch fan-out, direct job-resolution endpoints, and the bearer-protected POST /admin/refresh-cache endpoint that drives the cached-jobs refresh worker
  • backend/routers/auth.py owns auth/session endpoints
  • backend/routers/workspace.py owns resume, JD, workflow, assistant (both non-streaming and SSE), persistence, preview, export, resume-builder chat, resume-builder export, voice transcription (/workspace/transcribe), and artifact feedback (/workspace/feedback) endpoints
  • backend/routers/billing.py owns the HMAC-verified POST /webhooks/lemonsqueezy subscription-event endpoint + the customer-portal redirect; the signature-verification + event-routing logic lives in backend/webhooks/lemonsqueezy.py
  • backend/prompt_registry.py loads every LLM prompt from prompts/<name>/<version>.json — all 11 builders migrated off Python f-string concats; see ADR-018 family + the prompt-registry DEVLOG entries
  • backend/services/job_cache_service.py runs the per-source refresh + smart-cleanup worker invoked by the admin endpoint
  • backend/services/workspace_run_jobs.py owns the async /analyze-jobs job system. Each WorkspaceRunJob is bound to its owner_user_id at start time and the status/cancel routes check it (returning 404 — same code for "unknown" and "not yours" — so existence isn't confirmed); the quota gate runs synchronously before the worker is spawned, with the structured {code, counter, cap, tier, reset_period} envelope round-tripped through _serialize_job so the polling hook renders the same 429 upgrade CTA the sync path does; a per-user in-flight cap (1 run/user) sits in front of the process-global BoundedSemaphore(5) so one user's burst can't 503 every other account. The launch-readiness pass that introduced these guarantees is DEVLOG Day 79
  • backend/routers/health.py also hosts /health/sentry-debug — now gated behind the admin bearer secret so an unauthenticated curl gets a 401 instead of a ZeroDivisionError that would burn Sentry quota (DEVLOG Day 80)

src/services/

Owns deterministic business logic:

  • candidate-profile construction from resume input (profile_service.py)
  • JD normalization (job_service.py) plus a jd_summary_service.py view layer
  • LLM-hybrid resume + JD parsers (resume_llm_parser_service.py, jd_llm_parser_service.py) — pure-LLM source of truth with a deterministic fallback
  • fit scoring (fit_service.py) — still used by tailoring, no longer a visible workflow stage
  • first-pass tailoring guidance (tailoring_service.py)

These services are transport-agnostic and do not depend on Next.js or FastAPI.

src/agents/

Owns the supervised orchestration layer.

The active orchestrator path runs:

  • tailoring
  • review
  • resume generation
  • cover letter

The earlier fit and strategy stages are no longer part of the live workflow. The TailoringAgent consumes the structured FitAnalysis produced by src/services/fit_service.py directly — no FitAgent narration step. Each agent has a Tier-2/Tier-3 quality runner under tests/quality/ that scores it on fixture (resume, JD) pairs.

Per-agent retry + fallback isolation. Each agent step inside the orchestrator gets its own retry budget and its own fallback path. If an agent's LLM call raises AgentExecutionError (after the OpenAI service's own SDK + app-level retries exhaust), the orchestrator retries the agent's full .run(...) once with a 400 ms delay. If the retry also fails, only THAT agent's deterministic fallback runs — downstream agents continue trying the LLM path. A single bad packet during the Forge agent no longer cascades to "downgrade the whole pipeline to deterministic." The whole-pipeline deterministic fallback remains as a safety net for the unusual case where a per-agent deterministic path itself errors out. If every agent ended up falling back per-agent (zero LLM successes), result.mode is honestly downgraded to deterministic_fallback. See ADR-018.

src/prompts.py

Owns grounded prompt builders for the specialist agents and assistant.

src/openai_service.py

Owns the thin OpenAI wrapper used by the workflow and assistant layers.

Responsibilities include:

  • task-aware model routing
  • Responses API calls (JSON-contract path via run_json_prompt, streaming prose path via run_text_stream)
  • GPT-5 reasoning-effort routing
  • usage accounting metadata
  • optional persisted usage-event callbacks
  • daily-quota preflight checks
  • output-budget retry handling (when responses are truncated due to insufficient max_output_tokens)
  • application-level retry on top of the OpenAI Python SDK's own retries (max_retries=2) — adds one extra attempt on the narrow allow-list APIConnectionError / APITimeoutError / InternalServerError. Every responses.create in the codebase routes through _create_response_with_app_retry, so the resume parser, JD parser, JD summary, all four supervised-workflow agents, and the assistant chat all inherit the retry layer for free. See ADR-018.

src/assistant_service.py

Owns the single in-app assistant behavior. The chat is not gated on having run an analysis — it answers product-help questions ("how do I use this?", "what's step 03 for?") from the very first visit and grounded package questions ("summarize my fit") once an analysis has run. See ADR-017.

Responsibilities include:

  • routing between product-help questions and grounded package questions
  • compact workspace-context assembly, including a workspace_state projection (current_step, has_resume, resume_summary, has_jd, jd_summary, has_analysis, saved_jobs_count, last_search_query) sent on every query so the LLM can answer state-aware questions before any analysis exists
  • deterministic fallback behavior when assisted execution is unavailable

Builders and Exporters

  • src/resume_builder.py: deterministic tailored-resume assembly
  • src/cover_letter_builder.py: deterministic grounded cover-letter assembly
  • src/exporters.py: DOCX/PDF export helpers (export_docx_bytes, export_pdf_bytes) plus HTML preview generation, sharing a theme palette across formats; see ADR-015
  • src/job_sources/: per-provider adapter implementations (Greenhouse, Lever, Ashby, Workday) feeding the cached-jobs refresh worker

The user-facing workspace is now centered on two visible outputs:

  • tailored resume
  • cover letter

Both ship in 12 themes (6 single-column ATS-safe + 6 two-column, all from one typed ThemeSpec registry — see ADR-029 and ADR-032) and both formats (DOCX, PDF). The earlier Markdown export path was removed in 2026-05 alongside the DOCX rollout. The earlier internal report builder was removed when the FitAgent + bundle endpoint were retired.

Auth and Persistence Modules

  • src/auth_service.py: Supabase Auth wrapper for Google OAuth
  • src/user_store.py: syncs lightweight app_users records
  • src/usage_store.py: persists authenticated assisted usage events
  • src/quota_service.py: computes daily quota state from persisted usage
  • src/saved_workspace_store.py: persists and loads the latest reloadable workspace snapshot
  • src/saved_jobs_store.py: persists and loads shortlisted jobs
  • src/cached_jobs_store.py: service-role-backed access layer for the global cached_jobs index — bulk upsert (with embed-on-write for newly-cached jobs when hybrid search is enabled), smart cleanup, and lexical + hybrid search via Postgres RPCs; see ADR-013 and ADR-014
  • src/resume_builder_store.py: persists and loads conversational resume-builder draft sessions (resume_builder_sessions table) with the 7-day TTL + active-user refresh policy; see ADR-016

src/config.py

Owns environment-backed configuration for:

  • model routing
  • reasoning routing
  • quota defaults
  • auth and Supabase settings
  • saved-workspace retention settings
  • frontend/backend integration settings

src/schemas.py

Owns shared typed models for:

  • resumes
  • candidate profiles
  • work experience
  • education
  • job descriptions
  • fit analyses
  • tailoring drafts
  • tailored resume artifacts
  • cover letter artifacts
  • internal reports
  • agent outputs
  • orchestrated workflow results
  • auth and persistence records

Persistence Model

The runtime uses a split state model:

  • browser state for the current workspace session
  • Supabase Postgres for authenticated persistence and the global cached-jobs index

Per-user persistent state:

  • app_users
  • usage_events
  • saved_workspaces
  • saved_jobs
  • resume_builder_sessions

Global (non-user-scoped) state:

  • cached_jobs — the indexed set of upstream postings refreshed every 4 hours (six times a day) by the backend's refresh_cached_jobs worker; see ADR-013

Each saved_workspaces row stores one latest snapshot per user, including enough data to restore the current resume/JD/workflow state.

Each saved_jobs row stores one shortlisted posting per user and normalized job id, including:

  • source/provider identity
  • title, company, location, and employment type
  • source URL
  • normalized summary and description text
  • provider metadata
  • saved and updated timestamps

Each resume_builder_sessions row stores one in-progress conversational resume-builder draft per user with a 7-day TTL refreshed on every save. A pg_cron job (cleanup-expired-resume-builder-sessions) hard-deletes expired rows every 5 min and RLS hides expired rows from per-user queries; see ADR-016.

Each cached_jobs row holds one upstream posting keyed on (source, job_id). The table has GENERATED STORED columns (work_mode, employment_type_norm) backing the dropdown filters, removed_at tombstones for upstream-closed jobs the user has bookmarked, and an embedding vector(1536) column (pgvector, HNSW cosine index) for semantic search. A pg_cron + pg_net schedule (cached_jobs_refresh_4h) POSTs to /admin/refresh-cache every 4 hours, six times a day (see docs/sql/job_cache_cron_setup.sql for the template — production runs 0 */4 * * *). Search is two-tier: the lexical search_cached_jobs_ranked RPC (ADR-014) and the hybrid search_cached_jobs_hybrid RPC, which fuses that lexical ranking with a pgvector semantic ranking via Reciprocal Rank Fusion. The hybrid path is gated behind the JOB_SEARCH_HYBRID_ENABLED flag and degrades to lexical on any failure; see ADR-033. As of 2026-06-23, production runs in LEAN MODE (Supabase free-tier downgrade): the embedding column + HNSW index + hybrid RPC are DROPPED in prod and the flag is false, so live search is lexical-only. The hybrid capability remains in the codebase + the SQL files; it's a restore-on-Pro operation, not a deleted feature. See the "Hybrid-search lean/full switch" runbook in deployment.md and DEVLOG Day 82.

aijobagent_run_traces is an append-only cost-attribution table — one row per successful LLM call (user_id, model, task, prompt_tokens, completion_tokens, cost_usd, created_at). Writes are best-effort: a missing table or a write error never propagates to the user-facing path. It is the canonical answer to "what is OpenAI spend doing", separate from the Sentry/PostHog telemetry surface.

aijobagent_feedback holds one row per artifact thumbs-up/down (user_id, workspace_id, artifact_kind, rating, comment, created_at), RLS-scoped to the owning user; admin reads go through the service role.

A small set of structural reinforcements landed during the launch-readiness cleanup (DEVLOG Day 80) that are worth flagging here because they're load-bearing on the entitlement and read-fast paths: (1) a BEFORE-UPDATE trigger on app_users rejects non-service_role writes to plan_tier / account_status, so the unrestricted RLS UPDATE policy can no longer be abused to PATCH one's own tier; the legacy daily-quota path now sources tier from resolve_user_tier (which reads aijobagent_subscriptions) instead of app_users.plan_tier; (2) save_saved_job is now an atomic SECURITY DEFINER RPC that count-and-inserts in one transaction (advisory lock), closing the TOCTOU window where two concurrent saves at count=cap−1 could both pass and exceed the persistent cap; (3) /workspace/quota's _persistent_count() no longer reads the fat saved_workspaces blob — a count_active(user_id) head-read returns 0/1 without deserializing workflow_snapshot_json / cover_letter_payload_json / tailored_resume_payload_json. saved_workspaces per-tier caps are pinned to 1/1/1 because the schema is one-row-per-user (multi-row history is flagged as a future enhancement requiring a schema migration).

Observability And Telemetry Layer

Wired Day 46. The compliance posture is enforced at the SDK-init level, not as legalese on a privacy page — see ADR-024 and ADR-025.

Two vendors, one bootstrap path:

  • Sentry — error tracking, performance traces, AI Agents Monitoring (OpenAIIntegration(include_prompts=False) — token/model/latency spans without prompt-body PII), Logs, and session replay (errors-only). backend/observability.py is the only place the SDK is touched on the backend; it's imported before FastAPI() is constructed so the ASGI middleware wraps the app at startup. The before_send hook drops intentional HTTPException 4xx flow-control + the "not configured / temporarily unavailable" 5xx guards so the issue feed stays focused on genuine bugs. A _running_under_pytest() check skips Sentry entirely during the test suite. Frontend Sentry is wired via instrumentation-client.ts / instrumentation.ts / sentry.server.config.ts / sentry.edge.config.ts; next.config.ts uploads source maps through withSentryConfig.
  • PostHog — product analytics, session replay, identify/group cohorts. The free Developer plan caps at one project per org, so the project is shared with the developer's other product; every event carries a product: "jobagent" super-property (frontend posthog.register, backend capture_event merge) so dashboards slice cleanly with where properties.product = 'jobagent'. Exception capture is off — Sentry is the source of truth for errors.

Both clients are no-ops when their DSN / key is empty, so dev, CI, and the test suite run without observability wiring or network calls.

The launch-readiness cleanup (DEVLOG Day 80) added three reinforcements to this surface: (1) Sentry breadcrumbs / tags / context / user are now set on each pipeline stage in src/agents/orchestrator.py (via the stage-boundary callback, not the orchestrator internals) and on the export route, so a mid-pipeline 5xx is localizable to the failing agent — defeating the AI-Agents-Monitoring blind spot ADR-024 was adopted for; (2) the saved-workspaces-retention sweeper got its sentry_cron_monitor wrapper so a stuck retention cron now pages instead of silently leaving Free data past its 7-day retention promise; (3) backend events emitted by unauthenticated callers now carry the browser's PostHog distinct id via a new X-PostHog-Distinct-Id request header — the previous "anonymous" constant collapsed every anon visitor onto one PostHog person and made anonymous→signup conversion uncomputable.

Consent gating

The single source of truth is localStorage["jobagent-cookie-consent"], set by the custom in-house cookie banner (frontend/src/components/cookie-consent.tsx), three states: pending / accepted / declined. The split:

  • Always-on (legitimate interest, GDPR Art. 6(1)(f) — crash reporting is operationally necessary): Sentry error tracking + traces + Feedback widget. Load regardless of banner state.
  • Consent-gated (explicit opt-in required, ePrivacy Art. 5(3)): PostHog product analytics + PostHog session replay + Sentry Session Replay. Load only when consent === "accepted".

A jobagent-cookie-consent-change custom event re-evaluates the gated integrations on flip without a page reload (Sentry.addIntegration(...) hot-adds Replay; PostHog opt_in_capturing() / opt_out_capturing()). The banner is in-bundle (no third-party JS loads before consent) and scoped under the .ja-cookie-banner CSS class.

Uptime

A Sentry Uptime monitor pings https://api.job-application-copilot.xyz/health every 5 minutes from the EU region. Configured in the Sentry dashboard rather than in code — a fresh-project rebuild must recreate it manually.

Browser security baseline

The Next.js app sends a fixed set of response headers on every route, configured via headers() in frontend/next.config.ts. The defense-in-depth posture is the same on the marketing site and the workspace subdomain:

  • X-Frame-Options: DENY + Content-Security-Policy: frame-ancestors 'none' — clickjacking defense. The workspace can't be framed and overlaid to trick a signed-in user into destructive actions; SameSite=Lax cookies would otherwise ride along on top-level navigation.
  • Strict-Transport-Security: max-age=63072000; includeSubDomains; preload — HTTPS for two years across all subdomains, preload-eligible.
  • X-Content-Type-Options: nosniff — disables MIME-type sniffing on responses (resource loaders honor the declared Content-Type).
  • Referrer-Policy: strict-origin-when-cross-origin — strips path + query from the Referer on cross-origin navigation while keeping it intact within the site.
  • Content-Security-Policy as Report-Only for the first weeks of public traffic — same-origin defaults plus the actual allowlist (PostHog eu.i.posthog.com, Sentry *.sentry.io, Lemon Squeezy, Supabase *.supabase.co). Tuning to enforce-mode tracks violation reports in Sentry.

The launch-readiness pass that introduced this baseline is DEVLOG Day 79 (FE-SEC-1). Backend-side, every backend-supplied redirect URL the client navigates to passes through an explicit allowlist (frontend/src/lib/redirectAllowlist.tssafeRedirect / isAllowedRedirect) so the OAuth handoff + workspace-shell redirects can't be steered to an attacker-controlled origin (DEVLOG Day 80, M7).

The accessible-overlay primitive (frontend/src/lib/useAccessibleDialog.ts) is the shared focus-trap + initial-focus + Escape + focus-restore contract behind every modal surface in the workspace shell — the ⌘K command palette and the assistant FAB use it directly; the palette also gets combobox/listbox semantics (role="combobox" + aria-expanded + aria-controls + aria-activedescendant, list role="listbox", items role="option" with aria-selected). DEVLOG Day 79 (A11Y-1/A11Y-2).

Testing Model

The repo includes focused tests for:

  • resume parsing
  • JD parsing (deterministic + LLM-hybrid)
  • profile normalization
  • job normalization
  • tailoring guidance
  • orchestrator behavior
  • resume and cover-letter building
  • DOCX + PDF export formatting
  • auth and quota behavior
  • saved-workspace persistence
  • saved-job persistence
  • cached-jobs store + RPC arg shape
  • cached-jobs refresh worker (per-source isolation, cleanup gating, status reporting)
  • per-provider job source adapters (Greenhouse, Lever, Ashby, Workday)
  • conversational resume-builder turn handling + structuring pass
  • backend workspace routes
  • assistant SSE streaming endpoint
  • OpenAI application-level retry contract (tests/test_openai_app_retry.py): retries on the narrow allow-list APIConnectionError / APITimeoutError / InternalServerError, does NOT retry on 4xx / auth / persistent rate-limit, returns success after retry, raises on double-failure
  • per-agent orchestrator behavior (tests/test_orchestrator.py): per-agent retry recovers a flaky agent, per-agent fallback isolates a single failing agent (downstream agents still use LLM), result.mode reconciles to deterministic_fallback when no agent succeeded with LLM
  • tier enforcement (tests/backend/test_tiers.py, test_quota.py, test_workspace_quota_enforcement.py, and siblings): atomic check-and-increment under thread races, refund-on-failure, lifetime-vs-monthly period switching, P0001 → 429 translation, Business unbounded-retention skip
  • Lemon Squeezy webhook (tests/backend/test_lemonsqueezy_webhook.py): HMAC signature verification, event routing, unknown-variant silent-ack
  • prompt registry byte-identity (tests/test_prompts.py): every one of the 11 migrated JSON templates is asserted bit-exact against the original Python concat
  • voice transcription + artifact feedback backend routes (tests/backend/test_transcribe.py, test_feedback.py): multipart handling, 60s overrun rejection, RLS-scoped feedback writes

The _running_under_pytest() guard means Sentry never initializes during the test run, so the observability wiring adds zero test-suite coupling beyond a small leaky-detail-allowlist line-offset in tests/test_error_messages.py.

Tier-2 / Tier-3 quality runners under tests/quality/ evaluate LLM-driven components (resume parser, JD parser, renderer fidelity, skill canonicalization, tailoring, review, resume generation, cover letter, resume builder, assistant, end-to-end orchestrator) on fixture sets with weighted scorecards and a --include-llm cost gate. backend/nightly_eval.py wraps these into a single unattended batch with regression-threshold checking — manual-only at pre-revenue stage, see ADR-026.

Current Constraints

  • Long AI-assisted runs still execute as one request/response cycle today; they are not yet background jobs.
  • The product stores one latest saved workspace snapshot per user; it does not expose a multi-entry history browser.
  • Large binary artifacts are regenerated on demand instead of being stored in object storage.
  • The internal report builder still exists in Python, but the visible workspace now centers on resume and cover letter only.

Next Architecture Step

The next meaningful expansion is product hardening on the current stack:

  • background execution for long-running workflow jobs
  • tighter hosted reliability around retries and timeouts
  • continued UI simplification around review and export
  • broader hosted QA across Vercel, VPS, Supabase, and Cloudflare