A deterministic, policy-driven system to audit, score, and optimize a GitHub portfolio.
When reviewing GitHub profiles, one recurring issue appears: there is no objective, structured way to evaluate the quality of a portfolio.
Most repositories:
- vary widely in quality
- lack clear positioning
- overlap in purpose
- fail to communicate technical depth
As a result, even strong candidates often present portfolios that underperform.
This project was built to solve that problem with a deterministic and explainable approach.
GitHub Portfolio Auditor is a system that:
- Collects repositories from GitHub
- Scans each repository (structure, documentation, testing, CI, signals)
- Scores them using a configurable policy
- Generates deterministic reviews (no LLM dependency)
- Ranks repositories for portfolio visibility
- Recommends which repositories to:
- feature
- improve
- merge
- archive
- make private
- Provides a dashboard to explore and optimize decisions
- Run an audit on a GitHub account
- Get a scored and ranked list of repositories
- Identify weak or redundant projects
- Apply prioritized improvements
- Re-run the audit and measure impact
The system is structured as follows:
src/
portfolio_auditor/
collectors/ GitHub API
scanners/ repository analysis
scoring/ policy-driven scoring
reviewing/ deterministic review
ranking/ ranking and selection
dashboard/ Streamlit UI
models/ domain models
configs/
scoring.yaml score dimension weights
action_impact_rules.yaml optimizer ROI rules (effort, category per action)
portfolio_rules.yaml portfolio decision thresholds
tests/
unit/
integration/
golden/ snapshot tests
smoke/
docs/
scoring_methodology.md
portfolio_decision_rules.md
Deterministic first All outputs are reproducible and explainable.
Policy-driven Scoring rules and optimizer effort estimates are externalized in YAML — no code change required to tune them.
Separation of concerns Scanning, scoring, reviewing and ranking are distinct layers.
Portfolio-oriented This is not a generic repo scorer, but a portfolio optimization tool.
- Python 3.11 or 3.12
- Git
# 1. Clone the repository
git clone https://github.com/<your-username>/github-portfolio-auditor.git
cd github-portfolio-auditor
# 2. Create and activate a virtual environment
python -m venv .venv
# Linux / macOS
source .venv/bin/activate
# Windows (PowerShell)
.venv\Scripts\Activate.ps1
# 3. Install the package and dev dependencies
pip install -e .[dev]
# 4. Create a .env file with your GitHub token
cp .env.example .env
# Then edit .env and set GITHUB_TOKEN=your_token_hereThe .env file supports the following variables:
GITHUB_TOKEN=your_token_here # required for private repos and higher rate limits
GITHUB_OWNER=your_github_username # optional — sets a default owner for CLI and dashboard
GITHUB_EXCLUDED_REPO_NAMES=repo1,repo2 # optional — comma-separated repos to exclude
WORKSPACE_DIR=data # optional — root of the data directory treeScoring weights are in configs/scoring.yaml.
Optimizer action rules (effort estimates, categories) are in configs/action_impact_rules.yaml.
portfolio-auditor full-run --owner <github-username>Use --refresh-clones to force re-cloning even if local clones already exist.
# Linux / macOS
PYTHONPATH=src streamlit run src/portfolio_auditor/dashboard/app.py
# Windows (PowerShell)
$env:PYTHONPATH = "src"
streamlit run src/portfolio_auditor/dashboard/app.pyThe dashboard reads from data/processed/<owner>/ — run the audit first.
# Full test suite with coverage
pytest --cov=portfolio_auditor --cov-report=term-missing --cov-fail-under=72
# By category
pytest tests/unit -q
pytest tests/golden -q
pytest tests/smoke -q# Formatting check
ruff format --check .
# Lint
ruff check .
# Auto-fix lint issues
ruff check --fix .This project enables:
- objective evaluation of a GitHub portfolio
- structured prioritization of improvements
- reduction of redundancy between projects
- stronger technical signaling
- measurable portfolio progression
It transforms a portfolio from a collection of repositories into a curated, intentional system.
- heuristic-based scoring
- no empirical calibration yet
- redundancy detection is approximate
- dashboard depends on generated artifacts
See docs/roadmap.md.
Mathieu Data / BI / Analytics Engineering



