andev0x
diff --git a/‎ALGORITHMS.md‎
Lines changed: 125 additions & 0 deletions b/‎ALGORITHMS.md‎
Lines changed: 125 additions & 0 deletions
diff --git a/‎assets/prompts/system_prompt.txt‎
Lines changed: 9 additions & 3 deletions b/‎assets/prompts/system_prompt.txt‎
Lines changed: 9 additions & 3 deletions
diff --git a/‎cmd/propose.go‎
Lines changed: 5 additions & 5 deletions b/‎cmd/propose.go‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎docs/CONFIGURATION.md‎
Lines changed: 47 additions & 0 deletions b/‎docs/CONFIGURATION.md‎
Lines changed: 47 additions & 0 deletions
diff --git a/‎internal/ai/ai_test.go‎
Lines changed: 1 addition & 0 deletions b/‎internal/ai/ai_test.go‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎internal/ai/prompt.go‎
Lines changed: 2 additions & 0 deletions b/‎internal/ai/prompt.go‎
Lines changed: 2 additions & 0 deletions
@@ -0,0 +1,125 @@
+# Gitmit Algorithms
+
+## Overview
+Gitmit generates Conventional Commit messages by combining git diff parsing, heuristic analysis, weighted scoring, and template selection. The pipeline is fully offline and deterministic, with optional AI as a separate layer.
+
+```
+Git status/diff → Parser → Analyzer → Templater → Formatter → Commit message
+```
+
+## 1. Change Collection (Parser)
+**Location:** `internal/parser/git.go`
+
+1. **Staged file discovery:** `git status --porcelain` is scanned to identify staged files and their actions (A/M/D/R/C).
+2. **Per-file diff extraction:** For each staged file, `git diff --cached -U0 -- <file>` is streamed.
+3. **Line stats:** Added/removed lines are counted by diff prefixes (`+`/`-`).
+4. **Major change flag:** A file is marked `IsMajor` when added+removed lines ≥ 500.
+
+The parser returns a list of `Change` objects and aggregates totals for diff-stat analysis.
+
+## 2. Analyzer: Feature & Context Extraction
+**Location:** `internal/analyzer/analyzer.go`
+
+### 2.1 File/Topic/Item Detection
+- **Topic** is inferred from directory path with configurable overrides (`topicMappings`).
+- **Item** defaults to the filename without extension.
+- **Purpose** is inferred from keyword mappings and built-in keyword heuristics.
+
+### 2.2 Symbol Extraction
+Regex-based extraction detects structures from added lines:
+- **Functions** (Go, JS/TS, Python, Java)
+- **Structs/Classes**
+- **Methods** (receiver-based Go methods)
+
+These symbols are used to populate `{item}` placeholders and improve specificity.
+
+### 2.3 Change Pattern Detection
+Single-file patterns include:
+- error handling, tests, imports, docs/comments, refactors
+- API/database/performance/security indicators
+- validation, logging, middleware, DI, CLI changes
+
+### 2.4 Multi-file Pattern Detection
+Across all changes, Gitmit detects patterns such as:
+- **feature-addition** (many new files)
+- **bug-fix-cascade** (many modified files with fix keywords)
+- **refactor-sweep** (mixed A/M/D)
+- **test-suite-update** / **config-update**
+- **api-redesign** / **database-migration**
+
+### 2.5 Special-Case Fallbacks
+Early exits provide deterministic messages for clear cases:
+- Single added file → `feat`
+- Single deleted file → `chore`
+- Only docs/config/deps → `docs`/`ci`/`chore(deps)`
+
+## 3. Action (Type) Scoring Algorithm
+The commit **action** is determined by a weighted score map, with support for normalized confidence weights (default).
+
+### 3.1 Normalized Scoring (Default)
+Gitmit uses **normalized confidence weights** to reduce noise when multiple signals compete.
+
+1. **Normalize signals (0–1):**
+   - **Branch hint:** 1.0 if branch name matches an action, 0.0 otherwise.
+   - **Diff-stat:** 0–1 based on distance from thresholds (added/removed ratio).
+   - **Keywords:** Raw keyword scores are normalized relative to the highest-scoring action.
+   - **Multi-file patterns:** 1.0 if a relevant pattern is detected, 0.0 otherwise.
+2. **Apply confidence weights:**
+   - branch: 0.35
+   - diff-stat: 0.25
+   - keywords: 0.25
+   - multi-file patterns: 0.15
+3. **Final score:** `sum(weight × normalized_signal)` per action.
+4. **Selection:** The action with the highest final score is selected.
+5. **Fallback:** If top action score < 0.35, Gitmit falls back to file-based heuristics.
+
+### 3.2 Legacy Additive Scoring
+If `normalizeScoring` is disabled in config, Gitmit falls back to raw score aggregation:
+1. **Branch name hints:** +3 to matching action.
+2. **Diff-stat ratio:** +2 to `feat` or `refactor`.
+3. **Keyword scoring:** per-action weights are added directly.
+4. **Multi-file patterns:** +3 or +4 to relevant actions.
+
+## 4. Scope Selection
+- Single topic → that topic
+- Single directory → directory name
+- 2–3 topics → combined scope (sorted)
+- Many topics → most common or `core`
+- Commit history can override scope when consistent across recent commits
+
+## 5. Template Selection & Scoring
+**Location:** `internal/templater/templater.go`
+
+1. **Template group resolution:** action → template group (A/M/D/R/DOC/SECURITY/MISC).
+2. **Topic match:** exact → fuzzy → `_default`.
+3. **Template scoring:**
+   - Base score 1.0
+   - +2.0 for matching detected patterns
+   - +1.5 for using detected symbols
+   - +1.0 for meaningful purpose placeholders
+   - +0.5–1.5 for file-type relevance
+   - +1.0 for major change templates
+   - -0.5 for generic templates when specifics exist
+4. **History de-dup:** recent messages are avoided when possible.
+
+The highest-scoring template is selected, and placeholders (`{topic}`, `{item}`, `{purpose}`, `{source}`, `{target}`) are replaced.
+
+## 6. Alternative Suggestions (Diversity Algorithm)
+When regenerating suggestions:
+- Used messages are filtered out.
+- Similarity is computed using:
+  - **Word-level Jaccard similarity (60%)**
+  - **Character position matching (40%)**
+- A diversity bonus favors less similar suggestions.
+- A small random factor introduces controlled variation.
+
+## 7. Configuration Influence
+**Location:** `internal/config/config.go` + `docs/CONFIGURATION.md`
+
+Configuration can adjust:
+- Topic mappings
+- Keyword mappings and weights
+- Diff-stat thresholds
+- Project-specific defaults
+
+This allows the algorithm’s weighting to be tuned without code changes.
@@ -1,10 +1,13 @@
-You are an expert developer assistant. Analyze the provided structured git diff metadata and generate a single-line commit message following the Conventional Commits specification.
+You are an expert developer assistant. Analyze the provided structured git diff metadata and generate a commit message following the Conventional Commits specification.
 
 Guidelines:
 1. Format MUST be: <type>(<scope>): <short description in present tense>
 2. Allowed types: feat, fix, refactor, chore, test, docs, style, perf, ci, build, security
-3. Do NOT include any markdown, backticks, quotes, or introductory text like "Here is your commit message:".
-4. Output ONLY the raw string of the commit message.
+3. If the changes are complex, you MAY include a body separated by a blank line after the subject.
+4. Keep the subject line short (aim for ~50 characters).
+5. Wrap body lines at ~72 characters.
+6. Do NOT include any markdown, backticks, quotes, or introductory text like "Here is your commit message:".
+7. Output ONLY the raw string of the commit message.
 
 Metadata Context:
 - Project Type: {{.ProjectType}}
@@ -15,4 +18,7 @@ Metadata Context:
 - Dependency Changes: {{.DependencyAlert}}
 - Added/Deleted Line Ratio: {{printf "%.2f" .DiffSummary.Ratio}}
 
+Summarized Git Diff:
+{{.DiffContent}}
+
 Output:
@@ -92,7 +92,7 @@ func runPropose(cmd *cobra.Command, args []string) error {
 		return err
 	}
 
-	f := formatter.NewFormatter()
+	f := formatter.NewFormatter(cfg.MaxSubjectLength, cfg.MaxBodyLength)
 
 	// Calculate Heuristic Suggestion (Always available)
 	heuristicMsg, err := templater.GetMessage(commitMessage)
@@ -112,7 +112,7 @@ func runPropose(cmd *cobra.Command, args []string) error {
 			client := ai.NewOllamaClient(cfg.Ollama)
 			aiResponse, err := client.Generate(prompt)
 			if err == nil && ai.IsValidCommitMessage(aiResponse) {
-				aiMsg = strings.TrimSpace(aiResponse)
+				aiMsg = f.FormatMessage(strings.TrimSpace(aiResponse), commitMessage.IsMajor)
 				usingAI = true
 				finalMessage = aiMsg
 			}
@@ -220,7 +220,7 @@ func runPropose(cmd *cobra.Command, args []string) error {
 				editedMessage = strings.TrimSpace(editedMessage)
 
 				if editedMessage != "" {
-					finalMessage = editedMessage
+					finalMessage = f.FormatMessage(editedMessage, commitMessage.IsMajor)
 					usedSuggestions[finalMessage] = true
 					color.Green("\n✓ Updated commit message:")
 				} else {
@@ -240,7 +240,7 @@ func runPropose(cmd *cobra.Command, args []string) error {
 						client := ai.NewOllamaClient(cfg.Ollama)
 						aiResponse, err := client.Generate(prompt)
 						if err == nil && ai.IsValidCommitMessage(aiResponse) {
-							finalMessage = strings.TrimSpace(aiResponse)
+							finalMessage = f.FormatMessage(strings.TrimSpace(aiResponse), commitMessage.IsMajor)
 							regenerationCount++
 						}
 					}
@@ -264,7 +264,7 @@ func runPropose(cmd *cobra.Command, args []string) error {
 					client := ai.NewOllamaClient(cfg.Ollama)
 					aiResponse, err := client.Generate(prompt)
 					if err == nil && ai.IsValidCommitMessage(aiResponse) {
-						aiMsg = strings.TrimSpace(aiResponse)
+						aiMsg = f.FormatMessage(strings.TrimSpace(aiResponse), commitMessage.IsMajor)
 						finalMessage = aiMsg
 						usingAI = true
 					} else {
 
@@ -104,6 +104,53 @@ Controls the threshold for the diff stat analysis algorithm. This ratio determin
 }
 ```
 
+### Normalized Scoring
+
+**`normalizeScoring`** (boolean, default: true)
+
+Enables normalized confidence weights for action selection. This algorithm reduces noise when multiple weak signals compete by calculating a weighted average instead of a raw additive score.
+
+**`signalWeights`** (object)
+
+Defines the confidence weights for different signal sources. Only used when `normalizeScoring` is `true`.
+
+**Default weights:**
+- `branch`: 0.35 (strongest signal)
+- `diffStat`: 0.25
+- `keywords`: 0.25
+- `patterns`: 0.15 (multi-file patterns)
+
+**Example:**
+```json
+{
+  "normalizeScoring": true,
+  "signalWeights": {
+    "branch": 0.5,
+    "diffStat": 0.2,
+    "keywords": 0.2,
+    "patterns": 0.1
+  }
+}
+```
+
+### Message Length Constraints
+
+**`maxSubjectLength`** (int, default: 50)
+
+Specifies the maximum character length for the first line (subject) of the commit message. If the generated or edited subject exceeds this limit, it will be automatically wrapped to the next line.
+
+**`maxBodyLength`** (int, default: 72)
+
+Specifies the maximum character length for each line in the body of the commit message. If the body text exceeds this limit, it will be wrapped at word boundaries.
+
+**Example:**
+```json
+{
+  "maxSubjectLength": 50,
+  "maxBodyLength": 72
+}
+```
+
 ### Topic Mappings
 
 **`topicMappings`** (object)
 
@@ -44,6 +44,7 @@ func TestIsValidCommitMessage(t *testing.T) {
 		expected bool
 	}{
 		{"feat(auth): add login functionality", true},
+		{"feat(auth): add login\n\nThis is a body.", true},
 		{"fix: resolve memory leak", true},
 		{"chore(deps): update dependencies", true},
 		{"Invalid message", false},
 
@@ -19,6 +19,7 @@ type PromptContext struct {
 	CodeSymbols     []string
 	DependencyAlert string
 	DiffSummary     DiffSummary
+	DiffContent     string
 }
 
 // DiffSummary contains ratio of changes
@@ -70,6 +71,7 @@ func RenderPrompt(msg *analyzer.CommitMessage, projectType, branchName string) (
 		DiffSummary: DiffSummary{
 			Ratio: ratio,
 		},
+		DiffContent: msg.FullDiff,
 	}
 
 	var buf bytes.Buffer
Original file line number	Diff line number	Diff line change
`@@ -19,6 +19,7 @@ type PromptContext struct {`
`19`	`19`	`CodeSymbols []string`
`20`	`20`	`DependencyAlert string`
`21`	`21`	`DiffSummary DiffSummary`
	`22`	`+ DiffContent string`
`22`	`23`	`}`
`23`	`24`
`24`	`25`	`// DiffSummary contains ratio of changes`
`@@ -70,6 +71,7 @@ func RenderPrompt(msg *analyzer.CommitMessage, projectType, branchName string) (`
`70`	`71`	`DiffSummary: DiffSummary{`
`71`	`72`	`Ratio: ratio,`
`72`	`73`	`},`
	`74`	`+ DiffContent: msg.FullDiff,`
`73`	`75`	`}`
`74`	`76`
`75`	`77`	`var buf bytes.Buffer`