Architecture Overview

This document explains the current runtime architecture of the GitHub Portfolio Reviewer Agent.

System Goal

The app evaluates one public GitHub repository or a selected public set of repositories and produces a recruiter-facing report with:

deterministic repository scoring
repo-level LLM analysis
portfolio-level synthesis when multiple repositories are selected
exportable Markdown and PDF output

High-Level Flow

A user enters a public GitHub username or signs in with GitHub OAuth to analyze their own public profile.
The app loads the user's public repositories from the GitHub API.
The user chooses an analysis scope:
- single repository
- selected repositories
- portfolio slice
The app fetches repository facts:
- repo metadata
- languages
- README
- root-level files
- default-branch HEAD commit SHA
The app builds a cache fingerprint from the selected scope and repository freshness metadata.
If a valid cached report exists, it is returned immediately.
Otherwise, deterministic checks and scores are computed for each repository.
The app generates:
- repo-level LLM audits for each repository
- a portfolio summary for multi-repo analysis
The final report is rendered through deterministic Markdown templates.
The result is saved in persistent SQLite cache and shown in the Streamlit UI.
The user can export Markdown or PDF.

Main Modules

`app.py`

Owns the Streamlit UI:

landing page and auth controls
repository loading
scope selection
progress and cache-status messages
report rendering and download actions

`src/github_client.py`

Owns GitHub API access:

public repo listing
repo languages
README fetch
root-entry fetch
default-branch HEAD commit fetch
retry handling and low-concurrency parallel detail fetching

`src/repo_checks.py`

Owns deterministic evaluation:

category scores
findings
rule-based repository signals

`src/openai_service.py`

Owns model calls:

per-repo analysis
portfolio summary generation

The app still uses LLMs for content quality, but not for final freeform report formatting.

`src/report_builder.py`

Owns orchestration:

cache key generation
cache lookup and save
deterministic checks
repo-analysis execution
portfolio synthesis
deterministic final report assembly
fallback behavior when model steps fail

`src/analysis_store.py`

Owns persistent report cache:

SQLite schema
report serialization and deserialization
cache save and load
cache freshness matching

`src/exporters.py`

Owns report export:

Markdown bytes output
HTML/CSS-to-PDF rendering via Playwright
ReportLab fallback path

`src/errors.py`

Defines shared typed application errors so GitHub, OAuth, OpenAI, and export failures can surface clearer user-facing messages.

Caching Design

The app uses two caching layers:

1. Streamlit data cache

Short-lived cache for repeated GitHub fetches within the running app session.

2. Persistent analysis cache

SQLite-backed cache stored on the app host machine.

Freshness is validated using:

selected analysis scope
repository selection
updated_at
default branch
default-branch HEAD commit SHA
internal analysis cache version

This means unchanged repositories can reuse prior analysis, while modified repositories trigger a fresh run.

Auth Model

The app uses public-only GitHub OAuth.

default scope: read:user user:email
OAuth is used for identity and public-profile convenience
repository analysis remains limited to public repositories

This avoids the broad repo scope that GitHub OAuth Apps would require for private-repo access.

Report Generation Model

The report pipeline is intentionally split:

deterministic scoring and findings
LLM-generated repo and portfolio analysis
deterministic final Markdown formatting

That design preserves content quality while preventing the section-shape drift that occurred when the entire final report was generated as freeform LLM Markdown.

Export Pipeline

Markdown is assembled deterministically in src/report_builder.py.

PDF output is generated by:

converting report content into structured HTML
styling it with CSS
rendering via Playwright/Chromium

If the primary PDF backend fails, the app falls back to ReportLab.

Failure Handling

The app now treats failures explicitly:

GitHub API failures surface typed messages
rate-limit and transient request issues use retry handling where appropriate
repo-analysis and portfolio-summary model failures fall back to deterministic summaries with warnings
export failures are surfaced as dedicated export errors

Current Hosting Assumption

The current architecture is strongest for:

local use
single-instance hosting

SQLite is acceptable in that model because the cache file lives on the same server that runs Streamlit. If the app later moves to multi-instance or ephemeral hosting, the persistence layer may need to move to a shared database or cache service.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture Overview

System Goal

High-Level Flow

Main Modules

`app.py`

`src/github_client.py`

`src/repo_checks.py`

`src/openai_service.py`

`src/report_builder.py`

`src/analysis_store.py`

`src/exporters.py`

`src/errors.py`

Caching Design

1. Streamlit data cache

2. Persistent analysis cache

Auth Model

Report Generation Model

Export Pipeline

Failure Handling

Current Hosting Assumption

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Architecture Overview

System Goal

High-Level Flow

Main Modules

app.py

src/github_client.py

src/repo_checks.py

src/openai_service.py

src/report_builder.py

src/analysis_store.py

src/exporters.py

src/errors.py

Caching Design

1. Streamlit data cache

2. Persistent analysis cache

Auth Model

Report Generation Model

Export Pipeline

Failure Handling

Current Hosting Assumption

`app.py`

`src/github_client.py`

`src/repo_checks.py`

`src/openai_service.py`

`src/report_builder.py`

`src/analysis_store.py`

`src/exporters.py`

`src/errors.py`