Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
326 changes: 326 additions & 0 deletions _doc/practice/years/2026/github_stat_pr.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,326 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Nombre de PR fusionnées par personne agrégées par semaine\n",
"\n",
"Ce notebook récupère, via l'API GitHub, le nombre de *pull requests* (PR) fusionnées\n",
"pour **un ou plusieurs dépôts**, les regroupe par auteur et par semaine sur l'année écoulée,\n",
"puis affiche le résultat sous forme de graphique.\n",
"\n",
"**Dépendances :** `requests`, `pandas`, `matplotlib`.\n",
"\n",
"**Token GitHub :** l'API GitHub limite les appels non authentifiés à 60 requêtes par heure.\n",
"Pour lever cette limite, définissez la variable d'environnement `GITHUB_TOKEN`\n",
"avec un *Personal Access Token* (PAT) GitHub :\n",
"\n",
"```bash\n",
"export GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxx\n",
"```\n",
"\n",
"Sans token, le notebook fonctionne mais peut être limité sur de grands dépôts."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import datetime\n",
"import requests\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import matplotlib.dates as mdates"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paramètres\n",
"\n",
"Modifiez `REPOS` pour lister les dépôts à analyser sous la forme\n",
"`[(owner, repo), ...]`. Vous pouvez ajouter autant de dépôts que vous le souhaitez."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"REPOS = [\n",
" (\"sdpython\", \"teachpyx\"),\n",
" # (\"sdpython\", \"onnx-extended\"), # ajoutez d'autres dépôts ici\n",
"]\n",
"\n",
"# Jeton d'authentification GitHub (optionnel mais recommandé)\n",
"GITHUB_TOKEN = os.environ.get(\"GITHUB_TOKEN\", \"\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Récupération des PR fusionnées via l'API GitHub\n",
"\n",
"L'API REST GitHub expose le point d'accès `/repos/{owner}/{repo}/pulls`\n",
"avec `state=closed`. On filtre ensuite les PR dont le champ `merged_at` est renseigné\n",
"et dont la date de fusion est dans les 12 derniers mois.\n",
"\n",
"La pagination est gérée via le paramètre `page`.\n",
"La boucle principale itère sur chaque dépôt listé dans `REPOS`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def fetch_merged_prs(owner: str, repo: str, token: str = \"\") -> list[dict]:\n",
" \"\"\"Récupère toutes les PR fusionnées au cours de l'année écoulée pour un dépôt.\n",
"\n",
" :param owner: propriétaire du dépôt GitHub\n",
" :param repo: nom du dépôt GitHub\n",
" :param token: jeton d'authentification GitHub (optionnel)\n",
" :return: liste de dictionnaires avec les champs ``author``, ``merged_at``, ``repo``\n",
" \"\"\"\n",
" headers = {\"Accept\": \"application/vnd.github+json\"}\n",
" if token:\n",
" headers[\"Authorization\"] = f\"Bearer {token}\"\n",
"\n",
" since = datetime.datetime.now(datetime.timezone.utc) - datetime.timedelta(days=365)\n",
"\n",
" results = []\n",
" page = 1\n",
" per_page = 100\n",
"\n",
" while True:\n",
" url = (\n",
" f\"https://api.github.com/repos/{owner}/{repo}/pulls\"\n",
Comment thread
xadupre marked this conversation as resolved.
Outdated
" f\"?state=closed&per_page={per_page}&page={page}&sort=updated&direction=desc\"\n",
" )\n",
" response = requests.get(url, headers=headers, timeout=30)\n",
" try:\n",
" response.raise_for_status()\n",
" except requests.HTTPError as exc:\n",
" status = exc.response.status_code\n",
" if status == 401:\n",
" raise RuntimeError(\n",
" \"Authentification refusée (401). Vérifiez votre GITHUB_TOKEN.\"\n",
" ) from exc\n",
" if status == 403:\n",
" raise RuntimeError(\n",
" \"Accès refusé (403). Vous avez peut-être atteint la limite de l'API \"\n",
" \"GitHub (60 requêtes/h sans token). Définissez GITHUB_TOKEN.\"\n",
" ) from exc\n",
" if status == 404:\n",
" raise RuntimeError(\n",
" f\"Dépôt introuvable (404) : {owner}/{repo}. Vérifiez OWNER et REPO.\"\n",
" ) from exc\n",
" raise\n",
" prs = response.json()\n",
"\n",
" if not prs:\n",
Comment thread
xadupre marked this conversation as resolved.
Outdated
" break\n",
"\n",
" stop = False\n",
" for pr in prs:\n",
" merged_at = pr.get(\"merged_at\")\n",
" if not merged_at:\n",
" continue\n",
" merged_dt = datetime.datetime.fromisoformat(merged_at.replace(\"Z\", \"+00:00\"))\n",
" if merged_dt < since:\n",
" stop = True\n",
" break\n",
" author = (pr.get(\"user\") or {}).get(\"login\", \"unknown\")\n",
" results.append({\"author\": author, \"merged_at\": merged_dt, \"repo\": f\"{owner}/{repo}\"})\n",
"\n",
" if stop:\n",
" break\n",
"\n",
" page += 1\n",
"\n",
" return results\n",
"\n",
"\n",
"merged_prs = []\n",
"for owner, repo in REPOS:\n",
" prs = fetch_merged_prs(owner, repo, GITHUB_TOKEN)\n",
" print(f\" {owner}/{repo} : {len(prs)} PR(s) fusionnée(s)\")\n",
" merged_prs.extend(prs)\n",
"\n",
"print(f\"Total : {len(merged_prs)} PR(s) fusionnée(s) sur l'ensemble des dépôts.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Agrégation par auteur et par semaine"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df = pd.DataFrame(merged_prs)\n",
Comment thread
xadupre marked this conversation as resolved.
Outdated
"\n",
"if df.empty:\n",
" print(\"Aucune donnée à afficher.\")\n",
"else:\n",
" # Tronque la date au lundi de la semaine\n",
" df[\"week\"] = df[\"merged_at\"].dt.to_period(\"W\").dt.start_time\n",
"\n",
" weekly = (\n",
" df.groupby([\"repo\", \"author\", \"week\"])\n",
" .size()\n",
" .reset_index(name=\"pr_count\")\n",
" )\n",
" print(weekly.head(10))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tableau croisé (auteur × semaine, agrégé sur tous les dépôts)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if not df.empty:\n",
" # Agrégation sur tous les dépôts\n",
" pivot = weekly.pivot_table(\n",
" index=\"author\", columns=\"week\", values=\"pr_count\", aggfunc=\"sum\", fill_value=0\n",
" )\n",
" # Tri par nombre total de PR décroissant\n",
" pivot = pivot.loc[pivot.sum(axis=1).sort_values(ascending=False).index]\n",
" pivot"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tableau croisé par dépôt"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if not df.empty and len(REPOS) > 1:\n",
" for repo_name, grp in weekly.groupby(\"repo\"):\n",
" pvt = grp.pivot_table(\n",
" index=\"author\", columns=\"week\", values=\"pr_count\", aggfunc=\"sum\", fill_value=0\n",
" )\n",
" pvt = pvt.loc[pvt.sum(axis=1).sort_values(ascending=False).index]\n",
" print(f\"\\n=== {repo_name} ===\")\n",
" display(pvt)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualisation : nombre de PR fusionnées par semaine (empilé par auteur)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if not df.empty:\n",
" fig, ax = plt.subplots(figsize=(14, 5))\n",
"\n",
" stacked_height = None\n",
" weeks = pivot.columns # DatetimeIndex\n",
" week_nums = mdates.date2num(weeks.to_pydatetime())\n",
"\n",
" for author in pivot.index:\n",
" values = pivot.loc[author].values\n",
" if stacked_height is None:\n",
" ax.bar(week_nums, values, width=5, label=author)\n",
" stacked_height = values.copy()\n",
" else:\n",
" ax.bar(week_nums, values, width=5, bottom=stacked_height, label=author)\n",
" stacked_height += values\n",
"\n",
" ax.xaxis.set_major_formatter(mdates.DateFormatter(\"%Y-%m-%d\"))\n",
" ax.xaxis.set_major_locator(mdates.WeekdayLocator(byweekday=mdates.MO, interval=4))\n",
" plt.xticks(rotation=45, ha=\"right\")\n",
" ax.set_xlabel(\"Semaine\")\n",
" ax.set_ylabel(\"Nombre de PR fusionnées\")\n",
" repos_label = \", \".join(f\"{o}/{r}\" for o, r in REPOS)\n ax.set_title(f\"PR fusionnées par semaine — {repos_label}\")\n",
Comment thread
xadupre marked this conversation as resolved.
Outdated
" ax.legend(loc=\"upper left\", bbox_to_anchor=(1, 1), title=\"Auteur\")\n",
" plt.tight_layout()\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualisation : carte de chaleur (heatmap auteur × semaine)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if not df.empty:\n",
" fig, ax = plt.subplots(figsize=(14, max(3, len(pivot) * 0.5)))\n",
"\n",
" im = ax.imshow(pivot.values, aspect=\"auto\", cmap=\"YlOrRd\")\n",
" plt.colorbar(im, ax=ax, label=\"Nombre de PR\")\n",
"\n",
" ax.set_yticks(range(len(pivot.index)))\n",
" ax.set_yticklabels(pivot.index)\n",
"\n",
" # Affiche une étiquette de semaine sur 4\n",
" step = max(1, len(pivot.columns) // 12)\n",
" ax.set_xticks(range(0, len(pivot.columns), step))\n",
" ax.set_xticklabels(\n",
" [str(d)[:10] for d in pivot.columns[::step]], rotation=45, ha=\"right\"\n",
" )\n",
"\n",
" repos_label = \", \".join(f\"{o}/{r}\" for o, r in REPOS)\n ax.set_title(f\"Heatmap des PR fusionnées — {repos_label}\")\n",
" ax.set_xlabel(\"Semaine\")\n",
" ax.set_ylabel(\"Auteur\")\n",
" plt.tight_layout()\n",
" plt.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.12.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
Comment thread
xadupre marked this conversation as resolved.
Outdated
}
1 change: 1 addition & 0 deletions _doc/practice/years/2026/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@
:caption: machine learning

parcoursup_2026
github_stat_pr