Skip to content

Hosted clone-free portfolio report (free tier)#102

Merged
saagpatel merged 13 commits into
mainfrom
feat/api-only-scoring
Jun 21, 2026
Merged

Hosted clone-free portfolio report (free tier)#102
saagpatel merged 13 commits into
mainfrom
feat/api-only-scoring

Conversation

@saagpatel

@saagpatel saagpatel commented Jun 20, 2026

Copy link
Copy Markdown
Owner

Adds a free, hosted "paste a GitHub username → portfolio health report" on top of the existing auditor engine. The OSS CLI is untouched; this is additive.

What's here

Engine (clone-free scoring)

  • API-only mode: materializes a sparse skeleton per repo from the GitHub API and runs the existing 13 analyzers — no cloning.
  • Prefers GraphQL for the repo list (one query + language breakdowns), REST fallback when unauthenticated.
  • Parallel per-repo scan (thread pool) + relevance ranking (original/active work first), capped per request. Cold scan of a small user ~6s; a prolific user's top-20 ~19s; cached hits ~2ms.

HTTP API (FastAPI, src/serve)

  • GET /api/report/{username} -> JSON report; POST /api/waitlist -> email capture; GET /api/health.
  • Report cache (TTL) + per-IP fixed-window throttle, pluggable KV (in-memory default, Redis/Upstash drop-in).
  • Durable SQLite waitlist with dedup + source attribution. CORS for the browser frontend.

Frontend (web/, Next.js 15 + React 19)

  • Paste-username form -> shareable /u/{username} route -> report with a "top fixes" framing.
  • Monitoring-waitlist email capture.
  • web/pnpm-workspace.yaml now explicitly approves the required sharp build script so pnpm/Vercel installs are not blocked.

Deploy

  • Dockerfile + fly.toml and DEPLOY.md cover the Fly API, Vercel frontend, Upstash/Redis optional cache, and env reference.
  • Vercel production frontend was deployed from this lane: https://ghra-report-web.vercel.app.
  • Fly API deployment is blocked until Fly billing is configured for the account; fly apps create ghra-report-api --yes returns a payment-information-required error before app creation.

Verification - 2026-06-20

  • uv run --extra serve --extra hosting ruff check src/ tests/: pass.
  • uv run --extra serve --extra hosting python -m pytest tests/test_api_checkout.py tests/test_api_only.py tests/test_serve.py tests/test_serve_api.py tests/test_serve_hosting.py tests/test_serve_waitlist.py tests/test_github_client.py -q -p no:cacheprovider: 166 passed, 1 warning.
  • uv run --extra serve --extra hosting --extra semantic python -m pytest -q -p no:cacheprovider: 2584 passed, 2 skipped, 2 warnings.
  • pnpm typecheck: pass after sharp approval.
  • pnpm build: pass locally and on Vercel production.
  • Local authenticated API smoke: /api/health reports github_token: true; /api/report/octocat returns HTTP 200, mode=api_only, 8 repos; /api/waitlist is idempotent.
  • Playwright/Chrome local browser proof captured for /u/octocat and invalid username error state.
  • Read-only merge preview against current origin/main found no conflict markers; GitHub reports this PR mergeable.

Remaining operator checklist

  1. Add Fly billing/payment or credits for the saagar Fly organization.
  2. Re-run:
    • fly apps create ghra-report-api --yes
    • fly volumes create ghra_data --region iad --size 1 --app ghra-report-api --yes
    • set Fly secrets by name only: GHRA_GITHUB_TOKEN, GHRA_CORS_ORIGINS=https://ghra-report-web.vercel.app, optional GHRA_REDIS_URL
    • fly deploy
  3. Verify https://ghra-report-api.fly.dev/api/health returns github_token: true.
  4. Verify https://ghra-report-web.vercel.app/u/octocat renders a report, not the graceful service-unreachable state.

Notes

  • The hosted lane is additive and does not weaken the workbook/control-center/release-gate paths.
  • Waitlist is SQLite on a Fly volume for v1; Postgres (Neon) remains the documented multi-writer follow-up.

saagpatel added 11 commits June 19, 2026 22:44
Add an API-only scoring path so an arbitrary public GitHub user can be scored without cloning any repository — the engine behind the hosted 'paste your username' report.

- github_client: get_repo_tree (Git Trees API, recursive) + get_file_content (Contents API, base64); fail-soft with logging on unexpected statuses and a tree-truncation signal.

- api_checkout: materialize a sparse on-disk skeleton from the tree (dirs + presence files + curated file content), path-traversal + null-byte guarded; drop-in replacement for cloner.clone_workspace.

- api_only: score_repos_api_only / audit_user_api_only run the existing, unmodified 13-analyzer engine against the skeleton. Interactive 'fast' mode skips slow async stats endpoints (~10x faster on live scans).

OSS CLI and analyzers unchanged. New unit tests cover the client methods, materializer, and orchestrator; live-verified on a public user.
Add GET /api/report/{username} wrapping audit_user_api_only(fast=True),
returning ApiOnlyReport.to_dict() JSON. Plain-def route offloads the
blocking, network-bound scan to FastAPI's threadpool; GitHubClient is
injected via a dependency for test/deploy override and server-side token.

- Validates the username via the existing validate_username gate (422).
- Maps GitHub errors: 404 not-found, 429 rate-limit (429 or 403 w/ zero
  quota), 403 forbidden, 502 for other HTTP/network/client errors.
- Clamps repos scored to MAX_REPOS_CAP to bound public cost.
- 14 endpoint tests; wired into the serve app factory.
Add CORSMiddleware to the app factory with origins resolved from
GHRA_CORS_ORIGINS (defaults to the local Next.js dev server). GET-only,
no credentials — the report endpoint is public and unauthenticated.
Add a Next.js 15 / React 19 App Router app under web/ that powers the
free hosted report: a username form does a client-side fetch to the
FastAPI /api/report endpoint and renders the result with a top-fixes
framing — grades, repo health, flags, and the engine's ranked action
candidates as the hero of each card. Repos sort worst-health-first.

- lib/api.ts: typed fetch client with per-status messages + a boundary
  shape guard; lib/url.ts: https-only href allowlist (XSS guard).
- ReportExplorer: client-side idle/loading/done/error state machine
  with ARIA live regions; ReportView + RepoCard are presentational.
- Dark editorial design, color-coded grade chips, mono accents.
- Typechecks clean, production build passes, visually verified end to
  end against a live GitHub user (8 cards rendered).
Add a pluggable hosting layer so the free report endpoint survives a
second visitor: an in-memory KV store (thread-safe, lazy-expiring, with
size-triggered reaping) backs a report cache (1h TTL default) and a
fixed-window per-IP rate limiter (20/hr default). A Redis/Upstash backend
drops in via GHRA_REDIS_URL for multi-instance deploys.

Endpoint flow is now throttle -> validate -> cache get/hit -> scan ->
cache put. Cache verified live: cold scan 6.3s, warm hit 1.5ms.

Hardening from review:
- Redis incr uses plain EXPIRE on first hit (no NX) — works on all server
  versions, never extends the window.
- X-Forwarded-For honored only when GHRA_TRUST_FORWARDED_FOR is set
  (default off) — XFF is spoofable, so default keys on the direct peer.
- Counter/value stores reap expired entries past a threshold to bound
  memory under unique-IP churn.

99 tests pass; ruff + mypy clean.
Load Space Grotesk (display/body) and JetBrains Mono (code/labels) via
next/font, replacing the system-ui stack with intentional, self-hosted
faces. Wired through the --font-sans / --font-mono CSS variables.
The report cache keys on username alone, but the endpoint accepted a
max_repos query param — a report cached for one cap could be served to a
request expecting another. Remove the public knob (MAX_REPOS_CAP already
bounds cost); the scan is always capped server-side, so a username fully
determines its cached report.
When the client has a token, list a user's repos via the existing
bulk_fetch_repos GraphQL query — one paginated call that also returns
per-repo language byte breakdowns, so metadata.languages is now populated
(REST left it empty). Falls back to REST list_repos when unauthenticated,
when GraphQL returns no user (clean 404), or on any GraphQL error.

Live-verified on octocat: language breakdowns populate, grades stable.
Step 4 — the 'earn the tier' instrumentation:

Backend:
- POST /api/waitlist captures emails (Pydantic-validated, throttled on a
  separate per-IP bucket so browsing reports never blocks signup).
- SqliteWaitlistStore: durable, dedup on lowercased email, thread-safe via
  lock + contextlib.closing per connection. Path from GHRA_WAITLIST_DB or
  under the app output dir.

Frontend:
- Shareable /u/[username] route: the form now routes there (via
  useTransition) instead of fetching in place, so every report has a URL.
- ReportLoader fetches client-side with an AbortController that cancels the
  orphaned scan on unmount/username change.
- WaitlistForm email capture on the report, with idle/submitting/done/error
  states; CORS now allows POST.

Verified live end to end: home → submit → /u/octocat → 8 cards → waitlist
signup persists with source attribution. 131 backend tests pass; web
typechecks + builds; reviewer findings from both stacks addressed.
Cold-scan latency was the gap (a 30-repo user took 60-90s, prolific users
timed out). Two changes:

- score_repos_api_only now materializes + analyzes + scores each repo
  concurrently (ThreadPoolExecutor, 8 workers). Safe on the hosted path: it
  is authenticated (ample rate limit), uses no shared response/analyzer
  cache, each repo writes its own temp subdir, and requests.Session is
  thread-safe. Scores are byte-identical to the sequential run.
- _select_repos ranks original active work (non-fork, non-archived) by
  recency then stars and takes the top N, so a prolific account's report
  showcases their best/current repos instead of an arbitrary slice. Cap
  lowered 30 -> 20.

Live: octocat 19.6s -> 6.3s; tiangolo (hundreds of repos) 90s+ timeout ->
18.7s scanning his top-20, all non-fork high-star repos.
Make the hosted report deployable:
- Dockerfile (uv, frozen lockfile, serve+hosting extras) running uvicorn
  with --forwarded-allow-ips so the per-IP throttle sees real client IPs
  behind a proxy. Built + ran the image: /api/health returns 200.
- fly.toml: health check on /api/health, /data volume for the waitlist DB,
  non-secret config; secrets via fly secrets set.
- .dockerignore trims the context to pyproject/uv.lock + src.
- DEPLOY.md: full runbook (Fly API, Vercel frontend, Upstash Redis, env
  reference, Postgres follow-up).
- Add GET /api/health (reports token presence) for platform probes.
- uv.lock now includes the redis (hosting) extra.

Fix found via the container run: SqliteWaitlistStore now creates its parent
dir so it works on a fresh host before the volume path is populated.
Comment thread src/github_client.py Fixed
Comment thread src/github_client.py Fixed
Comment thread src/serve/waitlist.py Fixed
Comment thread src/api_only.py Fixed
Comment thread src/serve/hosting.py Fixed
Comment thread src/serve/hosting.py Fixed
Comment thread src/serve/hosting.py Fixed
Comment thread src/serve/waitlist.py Fixed
Comment thread src/serve/waitlist.py Fixed
Comment thread tests/test_serve_api.py Fixed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f9dc63c7e6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/github_client.py Outdated
Comment thread src/github_client.py Outdated
Comment thread Dockerfile Outdated
@saagpatel saagpatel merged commit 99df92c into main Jun 21, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants