|
| 1 | +# CONTEXT: GitHub Portfolio Auditor |
| 2 | + |
| 3 | +## Why this project? |
| 4 | + |
| 5 | +This project was born from a concrete need during my apprenticeship search in March 2026: |
| 6 | +- I wanted to **build a portfolio** to showcase my projects. |
| 7 | +- But I didn't know **which projects to highlight**, nor **which ones to improve or archive**. |
| 8 | +- No existing solution allowed me to **audit a GitHub portfolio in an objective and reproducible way**. |
| 9 | + |
| 10 | +The goal was to create a tool that: |
| 11 | +1. Analyzes each repository (structure, documentation, tests, CI, etc.). |
| 12 | +2. Scores projects based on configurable criteria. |
| 13 | +3. Ranks repositories by relevance for a portfolio. |
| 14 | +4. Recommends concrete actions (improve, merge, archive, etc.). |
| 15 | + |
| 16 | +--- |
| 17 | + |
| 18 | +## Origin and context |
| 19 | + |
| 20 | +- **Timeframe**: Started in mid-March 2026, developed over **about two intensive weeks** (roughly 4 hours per day). |
| 21 | +- **Motivation**: |
| 22 | + - **Professional**: Prepare for my apprenticeship search by having a coherent and impactful portfolio. |
| 23 | + - **Personal**: I am meticulous and like things to be well-structured and clean. |
| 24 | +- **Target audience**: Primarily myself (for now), as the tool uses a confidential GitHub API key. |
| 25 | +- **Usage**: |
| 26 | + - I applied this tool to **all my projects** to identify inconsistencies, redundancies, and areas for improvement. |
| 27 | + - Result: I was able to **prioritize which projects to improve** and **archive or delete those that didn't add value**. |
| 28 | + |
| 29 | +--- |
| 30 | + |
| 31 | +## Key learnings |
| 32 | + |
| 33 | +- **Software architecture**: |
| 34 | + - Design of a **modular system** (collectors, scanners, scorers, rankers). |
| 35 | + - **Policy-driven approach** (scoring rules externalized in YAML files). |
| 36 | +- **Advanced Python development**: |
| 37 | + - Extensive use of `dataclasses`, `pydantic`, and `Protocol` for typing interfaces. |
| 38 | + - Dependency and error management in a data pipeline. |
| 39 | +- **Data engineering**: |
| 40 | + - Building a complete pipeline: collection → processing → scoring → visualization. |
| 41 | +- **Testing**: |
| 42 | + - Implementation of unit tests, integration tests, and **golden tests** (snapshots) to ensure score stability. |
| 43 | +- **GitHub API**: |
| 44 | + - Managing API calls, rate limits, and network errors. |
| 45 | +- **Streamlit**: |
| 46 | + - Creating an interactive interface to explore results. |
| 47 | + |
| 48 | +--- |
| 49 | +## Key technical decisions |
| 50 | + |
| 51 | +- **Deterministic approach**: |
| 52 | + - Deliberate choice **not to use LLMs** to ensure reproducibility, avoid costs, and limit environmental impact. |
| 53 | + - The scoring model based on rules was **the most complex to design**: it required many iterations to balance relevance and simplicity. |
| 54 | + |
| 55 | +- **External configuration**: |
| 56 | + - Scoring weights and optimization rules are **modifiable via YAML files**, without touching the code. |
| 57 | + |
| 58 | +- **Abandoning LLM analysis**: |
| 59 | + - Initially considered for generating automatic reviews, this feature was discarded due to: |
| 60 | + - **Costs** (paid API calls). |
| 61 | + - **Non-determinism** (variable results). |
| 62 | + - **Environmental impact** (unnecessary calls). |
| 63 | + |
| 64 | +--- |
| 65 | +## Impact and results |
| 66 | + |
| 67 | +- **On my portfolio**: |
| 68 | + - Identification of **several redundant projects** to merge or archive. |
| 69 | + - **Clear prioritization** of projects to improve (e.g., add tests, improve documentation). |
| 70 | + - Time saved: **a few hours** to audit all my repositories (compared to days manually). |
| 71 | + |
| 72 | +- **On personal development**: |
| 73 | + - Better understanding of **open-source project quality criteria**. |
| 74 | + - Ability to **objectively evaluate** my own work. |
| 75 | + |
| 76 | +--- |
| 77 | +## Evolution and roadmap |
| 78 | + |
| 79 | +- **Future improvements**: |
| 80 | + - Add **business rules** to evaluate the consistency between a project's business needs and its implementation. |
| 81 | + - Integrate **additional metrics** (e.g., cyclomatic complexity, code coverage). |
| 82 | + - **Open-source the project** (if GitHub key management can be secured). |
| 83 | + |
| 84 | +- **Abandoned features**: |
| 85 | + - LLM analysis (see "Key technical decisions" section). |
| 86 | + |
| 87 | +--- |
| 88 | +## Why this project matters to me |
| 89 | + |
| 90 | +This is the project where I **learned the most about software architecture and data engineering**, while solving a **concrete and personal problem**. |
| 91 | +Unlike my other repositories (often school exercises or tests), this one is: |
| 92 | +- **100% autonomous** (no critical external dependencies). |
| 93 | +- **Tested and reliable** (72% coverage, golden tests). |
| 94 | +- **Used in production** (even if only by me for now). |
0 commit comments