This document explains the current runtime architecture of the GitHub Portfolio Reviewer Agent.
The app evaluates one public GitHub repository or a selected public set of repositories and produces a recruiter-facing report with:
- deterministic repository scoring
- repo-level LLM analysis
- portfolio-level synthesis when multiple repositories are selected
- exportable Markdown and PDF output
- A user enters a public GitHub username or signs in with GitHub OAuth to analyze their own public profile.
- The app loads the user's public repositories from the GitHub API.
- The user chooses an analysis scope:
- single repository
- selected repositories
- portfolio slice
- The app fetches repository facts:
- repo metadata
- languages
- README
- root-level files
- default-branch HEAD commit SHA
- The app builds a cache fingerprint from the selected scope and repository freshness metadata.
- If a valid cached report exists, it is returned immediately.
- Otherwise, deterministic checks and scores are computed for each repository.
- The app generates:
- repo-level LLM audits for each repository
- a portfolio summary for multi-repo analysis
- The final report is rendered through deterministic Markdown templates.
- The result is saved in persistent SQLite cache and shown in the Streamlit UI.
- The user can export Markdown or PDF.
Owns the Streamlit UI:
- landing page and auth controls
- repository loading
- scope selection
- progress and cache-status messages
- report rendering and download actions
Owns GitHub API access:
- public repo listing
- repo languages
- README fetch
- root-entry fetch
- default-branch HEAD commit fetch
- retry handling and low-concurrency parallel detail fetching
Owns deterministic evaluation:
- category scores
- findings
- rule-based repository signals
Owns model calls:
- per-repo analysis
- portfolio summary generation
The app still uses LLMs for content quality, but not for final freeform report formatting.
Owns orchestration:
- cache key generation
- cache lookup and save
- deterministic checks
- repo-analysis execution
- portfolio synthesis
- deterministic final report assembly
- fallback behavior when model steps fail
Owns persistent report cache:
- SQLite schema
- report serialization and deserialization
- cache save and load
- cache freshness matching
Owns report export:
- Markdown bytes output
- HTML/CSS-to-PDF rendering via Playwright
- ReportLab fallback path
Defines shared typed application errors so GitHub, OAuth, OpenAI, and export failures can surface clearer user-facing messages.
The app uses two caching layers:
Short-lived cache for repeated GitHub fetches within the running app session.
SQLite-backed cache stored on the app host machine.
Freshness is validated using:
- selected analysis scope
- repository selection
updated_at- default branch
- default-branch HEAD commit SHA
- internal analysis cache version
This means unchanged repositories can reuse prior analysis, while modified repositories trigger a fresh run.
The app uses public-only GitHub OAuth.
- default scope:
read:user user:email - OAuth is used for identity and public-profile convenience
- repository analysis remains limited to public repositories
This avoids the broad repo scope that GitHub OAuth Apps would require for private-repo access.
The report pipeline is intentionally split:
- deterministic scoring and findings
- LLM-generated repo and portfolio analysis
- deterministic final Markdown formatting
That design preserves content quality while preventing the section-shape drift that occurred when the entire final report was generated as freeform LLM Markdown.
Markdown is assembled deterministically in src/report_builder.py.
PDF output is generated by:
- converting report content into structured HTML
- styling it with CSS
- rendering via Playwright/Chromium
If the primary PDF backend fails, the app falls back to ReportLab.
The app now treats failures explicitly:
- GitHub API failures surface typed messages
- rate-limit and transient request issues use retry handling where appropriate
- repo-analysis and portfolio-summary model failures fall back to deterministic summaries with warnings
- export failures are surfaced as dedicated export errors
The current architecture is strongest for:
- local use
- single-instance hosting
SQLite is acceptable in that model because the cache file lives on the same server that runs Streamlit. If the app later moves to multi-instance or ephemeral hosting, the persistence layer may need to move to a shared database or cache service.