Skip to content

Latest commit

 

History

History
182 lines (122 loc) · 4.99 KB

File metadata and controls

182 lines (122 loc) · 4.99 KB

Architecture Overview

This document explains the current runtime architecture of the GitHub Portfolio Reviewer Agent.

System Goal

The app evaluates one public GitHub repository or a selected public set of repositories and produces a recruiter-facing report with:

  • deterministic repository scoring
  • repo-level LLM analysis
  • portfolio-level synthesis when multiple repositories are selected
  • exportable Markdown and PDF output

High-Level Flow

  1. A user enters a public GitHub username or signs in with GitHub OAuth to analyze their own public profile.
  2. The app loads the user's public repositories from the GitHub API.
  3. The user chooses an analysis scope:
    • single repository
    • selected repositories
    • portfolio slice
  4. The app fetches repository facts:
    • repo metadata
    • languages
    • README
    • root-level files
    • default-branch HEAD commit SHA
  5. The app builds a cache fingerprint from the selected scope and repository freshness metadata.
  6. If a valid cached report exists, it is returned immediately.
  7. Otherwise, deterministic checks and scores are computed for each repository.
  8. The app generates:
    • repo-level LLM audits for each repository
    • a portfolio summary for multi-repo analysis
  9. The final report is rendered through deterministic Markdown templates.
  10. The result is saved in persistent SQLite cache and shown in the Streamlit UI.
  11. The user can export Markdown or PDF.

Main Modules

app.py

Owns the Streamlit UI:

  • landing page and auth controls
  • repository loading
  • scope selection
  • progress and cache-status messages
  • report rendering and download actions

src/github_client.py

Owns GitHub API access:

  • public repo listing
  • repo languages
  • README fetch
  • root-entry fetch
  • default-branch HEAD commit fetch
  • retry handling and low-concurrency parallel detail fetching

src/repo_checks.py

Owns deterministic evaluation:

  • category scores
  • findings
  • rule-based repository signals

src/openai_service.py

Owns model calls:

  • per-repo analysis
  • portfolio summary generation

The app still uses LLMs for content quality, but not for final freeform report formatting.

src/report_builder.py

Owns orchestration:

  • cache key generation
  • cache lookup and save
  • deterministic checks
  • repo-analysis execution
  • portfolio synthesis
  • deterministic final report assembly
  • fallback behavior when model steps fail

src/analysis_store.py

Owns persistent report cache:

  • SQLite schema
  • report serialization and deserialization
  • cache save and load
  • cache freshness matching

src/exporters.py

Owns report export:

  • Markdown bytes output
  • HTML/CSS-to-PDF rendering via Playwright
  • ReportLab fallback path

src/errors.py

Defines shared typed application errors so GitHub, OAuth, OpenAI, and export failures can surface clearer user-facing messages.

Caching Design

The app uses two caching layers:

1. Streamlit data cache

Short-lived cache for repeated GitHub fetches within the running app session.

2. Persistent analysis cache

SQLite-backed cache stored on the app host machine.

Freshness is validated using:

  • selected analysis scope
  • repository selection
  • updated_at
  • default branch
  • default-branch HEAD commit SHA
  • internal analysis cache version

This means unchanged repositories can reuse prior analysis, while modified repositories trigger a fresh run.

Auth Model

The app uses public-only GitHub OAuth.

  • default scope: read:user user:email
  • OAuth is used for identity and public-profile convenience
  • repository analysis remains limited to public repositories

This avoids the broad repo scope that GitHub OAuth Apps would require for private-repo access.

Report Generation Model

The report pipeline is intentionally split:

  • deterministic scoring and findings
  • LLM-generated repo and portfolio analysis
  • deterministic final Markdown formatting

That design preserves content quality while preventing the section-shape drift that occurred when the entire final report was generated as freeform LLM Markdown.

Export Pipeline

Markdown is assembled deterministically in src/report_builder.py.

PDF output is generated by:

  1. converting report content into structured HTML
  2. styling it with CSS
  3. rendering via Playwright/Chromium

If the primary PDF backend fails, the app falls back to ReportLab.

Failure Handling

The app now treats failures explicitly:

  • GitHub API failures surface typed messages
  • rate-limit and transient request issues use retry handling where appropriate
  • repo-analysis and portfolio-summary model failures fall back to deterministic summaries with warnings
  • export failures are surfaced as dedicated export errors

Current Hosting Assumption

The current architecture is strongest for:

  • local use
  • single-instance hosting

SQLite is acceptable in that model because the cache file lives on the same server that runs Streamlit. If the app later moves to multi-instance or ephemeral hosting, the persistence layer may need to move to a shared database or cache service.