You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+26-23Lines changed: 26 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,22 +1,22 @@
1
-
# Git Commit Categorize & ML.NET Analyser
1
+
# ?? Git Commit Categorizer & ML.NET Analyser
2
+
# Git Commit Categorizer & ML.NET Analyser
2
3
3
-
A way to predict commit category using custom organisation data. Instead of thinking whether to type feat: spam or fix: spam, let AI/ML predict it for you!
4
+
**Ever stared at a terminal wondering if your commit should be `feat:`, `fix:`, or `chore:`? Stop guessing.**
4
5
5
-
An intelligent, console-based application built with **.NET 9** that automatically fetches GitHub commit messages, categorizes them using **ML.NET**(K-Means Clustering), and intelligently labels those categories in plain English using the **Google Gemini API**. Finally, it provides an interactive tool to format new commit messages based on the trained model.
6
+
This tool acts as your personalized, AI-powered commit assistant. Rather than forcing you strictly into generic conventional commits, it fetches *your* organization's actual historical commit data, uses **Machine Learning (ML.NET)**to find natural patterns in how your team works, and leverages **Google Gemini AI** to automatically assign human-readable labels to those patterns. Finally, it provides an interactive prompt to correctly format your new commits on the fly!
6
7
7
-
## ? Features
8
-
-**GitHub Integration:** Fetches commits from repos in a target GitHub Organization.
9
-
-**Data Caching & Hashing:** Efficiently caches your datasets and only invalidates / re-trains when new data arrives.
10
-
-**ML.NET K-Means Clustering:** Learns from your commit history and automatically searches for the optimal $K$ value (number of clusters) using Grid Search and the Davies-Bouldin index.
11
-
-**Gemini AI Labeling:** Employs the `gemini-2.5-flash` model to analyze representative commits from each cluster and dynamically assigns highly accurate, human-readable labels (e.g., "UI Refactoring", "Bug Fixes").
12
-
-**Interactive Formatting:** Drop into an interactive command-line session to type new commit messages and instantly see them cleanly formatted as `Label: {commit}`.
8
+
## Why It's Awesome
9
+
-**Fully Autonomous ML Pipeline**: Automatically scales to your data. It uses Grid Search evaluating the Davies-Bouldin index to dynamically find the optimal number of categories (clusters) for your specific repositories. It even algorithmically penalizes outliers to naturally lean towards a readable 4-8 tag cluster size.
10
+
-**Smart & Configurable AI Labeling**: Uses the latest `gemini-2.5-flash` model algorithm to semantically label clusters (e.g., "Dependency Updates", "UI Fixes"). Features 4 configurable execution modes (`Hybrid`, `SinglePrompt`, `PerCluster`, `LocalOnly`) to tightly control API quotas and fallback natively on smart local heuristic algorithms!
11
+
-**State-of-the-Art Reliability**: Includes automatic cryptographic hashing (`SHA256`) of your datasets. If the data hasn't changed, ML training is completely bypassed and models/labels load in milliseconds from disk. Plus, native exponential backoff protects against Gemini server rate limiting.
12
+
-**Interactive Output**: Drop seamlessly into a real-time local terminal loop to instantly predict and format your next commit message against the running AI logic.
2. A **GitHub Personal Access Token** (to fetch commits securely)
17
-
3. A **Google Gemini API Key** (for cluster labeling inference)
16
+
2. A **GitHub Personal Access Token** (to securely fetch organization commits)
17
+
3. A **Google Gemini API Key** (for cluster semantic naming inferencing)
18
18
19
-
## ?? Setup & Installation
19
+
## Setup & Installation
20
20
21
21
1.**Clone the repository:**
22
22
```bash
@@ -38,7 +38,7 @@ An intelligent, console-based application built with **.NET 9** that automatical
38
38
dotnet build
39
39
```
40
40
41
-
## ?? Usage
41
+
## Usage
42
42
43
43
Run the project directly via the .NET CLI:
44
44
```bash
@@ -47,13 +47,16 @@ dotnet run
47
47
48
48
### What happens during runtime?
49
49
1. The app will pull and cache your JSON dataset.
50
-
2. If the data is new or uncached, ML.NET prepares the data (80/20 train validation split) and extracts text features. Otherwise, it efficiently reloads the cached model!
50
+
2. If the data is new or uncached, ML.NET prepares the data (80/20 train validation split) and extracts text features. Otherwise, it efficiently reloads the cached hash model!
51
51
52
52

53
-
3. A Grid Search evaluates clusters $K=2$ through $10$, identifies the best clustering structure, and saves the trained ML model globally (`kmeans_model.zip`).
53
+
54
+
3. A Grid Search evaluates clusters, applies math penalties to prefer $K=4-8$, identifies the best valid structure, and saves the trained ML model globally (`kmeans_model.zip`).
0 commit comments