Skip to content

Commit 41e4eb1

Browse files
authored
Merge pull request #33 from andev0x/optimization/func
Optimization/func
2 parents 6b288c5 + 4edcdcb commit 41e4eb1

11 files changed

Lines changed: 681 additions & 91 deletions

File tree

ALGORITHMS.md

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
# Gitmit Algorithms
2+
3+
## Overview
4+
Gitmit generates Conventional Commit messages by combining git diff parsing, heuristic analysis, weighted scoring, and template selection. The pipeline is fully offline and deterministic, with optional AI as a separate layer.
5+
6+
```
7+
Git status/diff → Parser → Analyzer → Templater → Formatter → Commit message
8+
```
9+
10+
## 1. Change Collection (Parser)
11+
**Location:** `internal/parser/git.go`
12+
13+
1. **Staged file discovery:** `git status --porcelain` is scanned to identify staged files and their actions (A/M/D/R/C).
14+
2. **Per-file diff extraction:** For each staged file, `git diff --cached -U0 -- <file>` is streamed.
15+
3. **Line stats:** Added/removed lines are counted by diff prefixes (`+`/`-`).
16+
4. **Major change flag:** A file is marked `IsMajor` when added+removed lines ≥ 500.
17+
18+
The parser returns a list of `Change` objects and aggregates totals for diff-stat analysis.
19+
20+
## 2. Analyzer: Feature & Context Extraction
21+
**Location:** `internal/analyzer/analyzer.go`
22+
23+
### 2.1 File/Topic/Item Detection
24+
- **Topic** is inferred from directory path with configurable overrides (`topicMappings`).
25+
- **Item** defaults to the filename without extension.
26+
- **Purpose** is inferred from keyword mappings and built-in keyword heuristics.
27+
28+
### 2.2 Symbol Extraction
29+
Regex-based extraction detects structures from added lines:
30+
- **Functions** (Go, JS/TS, Python, Java)
31+
- **Structs/Classes**
32+
- **Methods** (receiver-based Go methods)
33+
34+
These symbols are used to populate `{item}` placeholders and improve specificity.
35+
36+
### 2.3 Change Pattern Detection
37+
Single-file patterns include:
38+
- error handling, tests, imports, docs/comments, refactors
39+
- API/database/performance/security indicators
40+
- validation, logging, middleware, DI, CLI changes
41+
42+
### 2.4 Multi-file Pattern Detection
43+
Across all changes, Gitmit detects patterns such as:
44+
- **feature-addition** (many new files)
45+
- **bug-fix-cascade** (many modified files with fix keywords)
46+
- **refactor-sweep** (mixed A/M/D)
47+
- **test-suite-update** / **config-update**
48+
- **api-redesign** / **database-migration**
49+
50+
### 2.5 Special-Case Fallbacks
51+
Early exits provide deterministic messages for clear cases:
52+
- Single added file → `feat`
53+
- Single deleted file → `chore`
54+
- Only docs/config/deps → `docs`/`ci`/`chore(deps)`
55+
56+
## 3. Action (Type) Scoring Algorithm
57+
The commit **action** is determined by a weighted score map, with support for normalized confidence weights (default).
58+
59+
### 3.1 Normalized Scoring (Default)
60+
Gitmit uses **normalized confidence weights** to reduce noise when multiple signals compete.
61+
62+
1. **Normalize signals (0–1):**
63+
- **Branch hint:** 1.0 if branch name matches an action, 0.0 otherwise.
64+
- **Diff-stat:** 0–1 based on distance from thresholds (added/removed ratio).
65+
- **Keywords:** Raw keyword scores are normalized relative to the highest-scoring action.
66+
- **Multi-file patterns:** 1.0 if a relevant pattern is detected, 0.0 otherwise.
67+
2. **Apply confidence weights:**
68+
- branch: 0.35
69+
- diff-stat: 0.25
70+
- keywords: 0.25
71+
- multi-file patterns: 0.15
72+
3. **Final score:** `sum(weight × normalized_signal)` per action.
73+
4. **Selection:** The action with the highest final score is selected.
74+
5. **Fallback:** If top action score < 0.35, Gitmit falls back to file-based heuristics.
75+
76+
### 3.2 Legacy Additive Scoring
77+
If `normalizeScoring` is disabled in config, Gitmit falls back to raw score aggregation:
78+
1. **Branch name hints:** +3 to matching action.
79+
2. **Diff-stat ratio:** +2 to `feat` or `refactor`.
80+
3. **Keyword scoring:** per-action weights are added directly.
81+
4. **Multi-file patterns:** +3 or +4 to relevant actions.
82+
83+
## 4. Scope Selection
84+
- Single topic → that topic
85+
- Single directory → directory name
86+
- 2–3 topics → combined scope (sorted)
87+
- Many topics → most common or `core`
88+
- Commit history can override scope when consistent across recent commits
89+
90+
## 5. Template Selection & Scoring
91+
**Location:** `internal/templater/templater.go`
92+
93+
1. **Template group resolution:** action → template group (A/M/D/R/DOC/SECURITY/MISC).
94+
2. **Topic match:** exact → fuzzy → `_default`.
95+
3. **Template scoring:**
96+
- Base score 1.0
97+
- +2.0 for matching detected patterns
98+
- +1.5 for using detected symbols
99+
- +1.0 for meaningful purpose placeholders
100+
- +0.5–1.5 for file-type relevance
101+
- +1.0 for major change templates
102+
- -0.5 for generic templates when specifics exist
103+
4. **History de-dup:** recent messages are avoided when possible.
104+
105+
The highest-scoring template is selected, and placeholders (`{topic}`, `{item}`, `{purpose}`, `{source}`, `{target}`) are replaced.
106+
107+
## 6. Alternative Suggestions (Diversity Algorithm)
108+
When regenerating suggestions:
109+
- Used messages are filtered out.
110+
- Similarity is computed using:
111+
- **Word-level Jaccard similarity (60%)**
112+
- **Character position matching (40%)**
113+
- A diversity bonus favors less similar suggestions.
114+
- A small random factor introduces controlled variation.
115+
116+
## 7. Configuration Influence
117+
**Location:** `internal/config/config.go` + `docs/CONFIGURATION.md`
118+
119+
Configuration can adjust:
120+
- Topic mappings
121+
- Keyword mappings and weights
122+
- Diff-stat thresholds
123+
- Project-specific defaults
124+
125+
This allows the algorithm’s weighting to be tuned without code changes.

assets/prompts/system_prompt.txt

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,13 @@
1-
You are an expert developer assistant. Analyze the provided structured git diff metadata and generate a single-line commit message following the Conventional Commits specification.
1+
You are an expert developer assistant. Analyze the provided structured git diff metadata and generate a commit message following the Conventional Commits specification.
22

33
Guidelines:
44
1. Format MUST be: <type>(<scope>): <short description in present tense>
55
2. Allowed types: feat, fix, refactor, chore, test, docs, style, perf, ci, build, security
6-
3. Do NOT include any markdown, backticks, quotes, or introductory text like "Here is your commit message:".
7-
4. Output ONLY the raw string of the commit message.
6+
3. If the changes are complex, you MAY include a body separated by a blank line after the subject.
7+
4. Keep the subject line short (aim for ~50 characters).
8+
5. Wrap body lines at ~72 characters.
9+
6. Do NOT include any markdown, backticks, quotes, or introductory text like "Here is your commit message:".
10+
7. Output ONLY the raw string of the commit message.
811

912
Metadata Context:
1013
- Project Type: {{.ProjectType}}
@@ -15,4 +18,7 @@ Metadata Context:
1518
- Dependency Changes: {{.DependencyAlert}}
1619
- Added/Deleted Line Ratio: {{printf "%.2f" .DiffSummary.Ratio}}
1720

21+
Summarized Git Diff:
22+
{{.DiffContent}}
23+
1824
Output:

cmd/propose.go

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ func runPropose(cmd *cobra.Command, args []string) error {
9292
return err
9393
}
9494

95-
f := formatter.NewFormatter()
95+
f := formatter.NewFormatter(cfg.MaxSubjectLength, cfg.MaxBodyLength)
9696

9797
// Calculate Heuristic Suggestion (Always available)
9898
heuristicMsg, err := templater.GetMessage(commitMessage)
@@ -112,7 +112,7 @@ func runPropose(cmd *cobra.Command, args []string) error {
112112
client := ai.NewOllamaClient(cfg.Ollama)
113113
aiResponse, err := client.Generate(prompt)
114114
if err == nil && ai.IsValidCommitMessage(aiResponse) {
115-
aiMsg = strings.TrimSpace(aiResponse)
115+
aiMsg = f.FormatMessage(strings.TrimSpace(aiResponse), commitMessage.IsMajor)
116116
usingAI = true
117117
finalMessage = aiMsg
118118
}
@@ -220,7 +220,7 @@ func runPropose(cmd *cobra.Command, args []string) error {
220220
editedMessage = strings.TrimSpace(editedMessage)
221221

222222
if editedMessage != "" {
223-
finalMessage = editedMessage
223+
finalMessage = f.FormatMessage(editedMessage, commitMessage.IsMajor)
224224
usedSuggestions[finalMessage] = true
225225
color.Green("\n✓ Updated commit message:")
226226
} else {
@@ -240,7 +240,7 @@ func runPropose(cmd *cobra.Command, args []string) error {
240240
client := ai.NewOllamaClient(cfg.Ollama)
241241
aiResponse, err := client.Generate(prompt)
242242
if err == nil && ai.IsValidCommitMessage(aiResponse) {
243-
finalMessage = strings.TrimSpace(aiResponse)
243+
finalMessage = f.FormatMessage(strings.TrimSpace(aiResponse), commitMessage.IsMajor)
244244
regenerationCount++
245245
}
246246
}
@@ -264,7 +264,7 @@ func runPropose(cmd *cobra.Command, args []string) error {
264264
client := ai.NewOllamaClient(cfg.Ollama)
265265
aiResponse, err := client.Generate(prompt)
266266
if err == nil && ai.IsValidCommitMessage(aiResponse) {
267-
aiMsg = strings.TrimSpace(aiResponse)
267+
aiMsg = f.FormatMessage(strings.TrimSpace(aiResponse), commitMessage.IsMajor)
268268
finalMessage = aiMsg
269269
usingAI = true
270270
} else {

docs/CONFIGURATION.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,53 @@ Controls the threshold for the diff stat analysis algorithm. This ratio determin
104104
}
105105
```
106106

107+
### Normalized Scoring
108+
109+
**`normalizeScoring`** (boolean, default: true)
110+
111+
Enables normalized confidence weights for action selection. This algorithm reduces noise when multiple weak signals compete by calculating a weighted average instead of a raw additive score.
112+
113+
**`signalWeights`** (object)
114+
115+
Defines the confidence weights for different signal sources. Only used when `normalizeScoring` is `true`.
116+
117+
**Default weights:**
118+
- `branch`: 0.35 (strongest signal)
119+
- `diffStat`: 0.25
120+
- `keywords`: 0.25
121+
- `patterns`: 0.15 (multi-file patterns)
122+
123+
**Example:**
124+
```json
125+
{
126+
"normalizeScoring": true,
127+
"signalWeights": {
128+
"branch": 0.5,
129+
"diffStat": 0.2,
130+
"keywords": 0.2,
131+
"patterns": 0.1
132+
}
133+
}
134+
```
135+
136+
### Message Length Constraints
137+
138+
**`maxSubjectLength`** (int, default: 50)
139+
140+
Specifies the maximum character length for the first line (subject) of the commit message. If the generated or edited subject exceeds this limit, it will be automatically wrapped to the next line.
141+
142+
**`maxBodyLength`** (int, default: 72)
143+
144+
Specifies the maximum character length for each line in the body of the commit message. If the body text exceeds this limit, it will be wrapped at word boundaries.
145+
146+
**Example:**
147+
```json
148+
{
149+
"maxSubjectLength": 50,
150+
"maxBodyLength": 72
151+
}
152+
```
153+
107154
### Topic Mappings
108155

109156
**`topicMappings`** (object)

internal/ai/ai_test.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ func TestIsValidCommitMessage(t *testing.T) {
4444
expected bool
4545
}{
4646
{"feat(auth): add login functionality", true},
47+
{"feat(auth): add login\n\nThis is a body.", true},
4748
{"fix: resolve memory leak", true},
4849
{"chore(deps): update dependencies", true},
4950
{"Invalid message", false},

internal/ai/prompt.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ type PromptContext struct {
1919
CodeSymbols []string
2020
DependencyAlert string
2121
DiffSummary DiffSummary
22+
DiffContent string
2223
}
2324

2425
// DiffSummary contains ratio of changes
@@ -70,6 +71,7 @@ func RenderPrompt(msg *analyzer.CommitMessage, projectType, branchName string) (
7071
DiffSummary: DiffSummary{
7172
Ratio: ratio,
7273
},
74+
DiffContent: msg.FullDiff,
7375
}
7476

7577
var buf bytes.Buffer

0 commit comments

Comments
 (0)