Skip to content

Commit 2deae70

Browse files
authored
Create CONTEXT.md
1 parent 626f8ed commit 2deae70

1 file changed

Lines changed: 94 additions & 0 deletions

File tree

CONTEXT.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# CONTEXT: GitHub Portfolio Auditor
2+
3+
## Why this project?
4+
5+
This project was born from a concrete need during my apprenticeship search in March 2026:
6+
- I wanted to **build a portfolio** to showcase my projects.
7+
- But I didn't know **which projects to highlight**, nor **which ones to improve or archive**.
8+
- No existing solution allowed me to **audit a GitHub portfolio in an objective and reproducible way**.
9+
10+
The goal was to create a tool that:
11+
1. Analyzes each repository (structure, documentation, tests, CI, etc.).
12+
2. Scores projects based on configurable criteria.
13+
3. Ranks repositories by relevance for a portfolio.
14+
4. Recommends concrete actions (improve, merge, archive, etc.).
15+
16+
---
17+
18+
## Origin and context
19+
20+
- **Timeframe**: Started in mid-March 2026, developed over **about two intensive weeks** (roughly 4 hours per day).
21+
- **Motivation**:
22+
- **Professional**: Prepare for my apprenticeship search by having a coherent and impactful portfolio.
23+
- **Personal**: I am meticulous and like things to be well-structured and clean.
24+
- **Target audience**: Primarily myself (for now), as the tool uses a confidential GitHub API key.
25+
- **Usage**:
26+
- I applied this tool to **all my projects** to identify inconsistencies, redundancies, and areas for improvement.
27+
- Result: I was able to **prioritize which projects to improve** and **archive or delete those that didn't add value**.
28+
29+
---
30+
31+
## Key learnings
32+
33+
- **Software architecture**:
34+
- Design of a **modular system** (collectors, scanners, scorers, rankers).
35+
- **Policy-driven approach** (scoring rules externalized in YAML files).
36+
- **Advanced Python development**:
37+
- Extensive use of `dataclasses`, `pydantic`, and `Protocol` for typing interfaces.
38+
- Dependency and error management in a data pipeline.
39+
- **Data engineering**:
40+
- Building a complete pipeline: collection → processing → scoring → visualization.
41+
- **Testing**:
42+
- Implementation of unit tests, integration tests, and **golden tests** (snapshots) to ensure score stability.
43+
- **GitHub API**:
44+
- Managing API calls, rate limits, and network errors.
45+
- **Streamlit**:
46+
- Creating an interactive interface to explore results.
47+
48+
---
49+
## Key technical decisions
50+
51+
- **Deterministic approach**:
52+
- Deliberate choice **not to use LLMs** to ensure reproducibility, avoid costs, and limit environmental impact.
53+
- The scoring model based on rules was **the most complex to design**: it required many iterations to balance relevance and simplicity.
54+
55+
- **External configuration**:
56+
- Scoring weights and optimization rules are **modifiable via YAML files**, without touching the code.
57+
58+
- **Abandoning LLM analysis**:
59+
- Initially considered for generating automatic reviews, this feature was discarded due to:
60+
- **Costs** (paid API calls).
61+
- **Non-determinism** (variable results).
62+
- **Environmental impact** (unnecessary calls).
63+
64+
---
65+
## Impact and results
66+
67+
- **On my portfolio**:
68+
- Identification of **several redundant projects** to merge or archive.
69+
- **Clear prioritization** of projects to improve (e.g., add tests, improve documentation).
70+
- Time saved: **a few hours** to audit all my repositories (compared to days manually).
71+
72+
- **On personal development**:
73+
- Better understanding of **open-source project quality criteria**.
74+
- Ability to **objectively evaluate** my own work.
75+
76+
---
77+
## Evolution and roadmap
78+
79+
- **Future improvements**:
80+
- Add **business rules** to evaluate the consistency between a project's business needs and its implementation.
81+
- Integrate **additional metrics** (e.g., cyclomatic complexity, code coverage).
82+
- **Open-source the project** (if GitHub key management can be secured).
83+
84+
- **Abandoned features**:
85+
- LLM analysis (see "Key technical decisions" section).
86+
87+
---
88+
## Why this project matters to me
89+
90+
This is the project where I **learned the most about software architecture and data engineering**, while solving a **concrete and personal problem**.
91+
Unlike my other repositories (often school exercises or tests), this one is:
92+
- **100% autonomous** (no critical external dependencies).
93+
- **Tested and reliable** (72% coverage, golden tests).
94+
- **Used in production** (even if only by me for now).

0 commit comments

Comments
 (0)