diff --git a/ALGORITHMS.md b/ALGORITHMS.md new file mode 100644 index 0000000..f714562 --- /dev/null +++ b/ALGORITHMS.md @@ -0,0 +1,125 @@ +# Gitmit Algorithms + +## Overview +Gitmit generates Conventional Commit messages by combining git diff parsing, heuristic analysis, weighted scoring, and template selection. The pipeline is fully offline and deterministic, with optional AI as a separate layer. + +``` +Git status/diff → Parser → Analyzer → Templater → Formatter → Commit message +``` + +## 1. Change Collection (Parser) +**Location:** `internal/parser/git.go` + +1. **Staged file discovery:** `git status --porcelain` is scanned to identify staged files and their actions (A/M/D/R/C). +2. **Per-file diff extraction:** For each staged file, `git diff --cached -U0 -- ` is streamed. +3. **Line stats:** Added/removed lines are counted by diff prefixes (`+`/`-`). +4. **Major change flag:** A file is marked `IsMajor` when added+removed lines ≥ 500. + +The parser returns a list of `Change` objects and aggregates totals for diff-stat analysis. + +## 2. Analyzer: Feature & Context Extraction +**Location:** `internal/analyzer/analyzer.go` + +### 2.1 File/Topic/Item Detection +- **Topic** is inferred from directory path with configurable overrides (`topicMappings`). +- **Item** defaults to the filename without extension. +- **Purpose** is inferred from keyword mappings and built-in keyword heuristics. + +### 2.2 Symbol Extraction +Regex-based extraction detects structures from added lines: +- **Functions** (Go, JS/TS, Python, Java) +- **Structs/Classes** +- **Methods** (receiver-based Go methods) + +These symbols are used to populate `{item}` placeholders and improve specificity. + +### 2.3 Change Pattern Detection +Single-file patterns include: +- error handling, tests, imports, docs/comments, refactors +- API/database/performance/security indicators +- validation, logging, middleware, DI, CLI changes + +### 2.4 Multi-file Pattern Detection +Across all changes, Gitmit detects patterns such as: +- **feature-addition** (many new files) +- **bug-fix-cascade** (many modified files with fix keywords) +- **refactor-sweep** (mixed A/M/D) +- **test-suite-update** / **config-update** +- **api-redesign** / **database-migration** + +### 2.5 Special-Case Fallbacks +Early exits provide deterministic messages for clear cases: +- Single added file → `feat` +- Single deleted file → `chore` +- Only docs/config/deps → `docs`/`ci`/`chore(deps)` + +## 3. Action (Type) Scoring Algorithm +The commit **action** is determined by a weighted score map, with support for normalized confidence weights (default). + +### 3.1 Normalized Scoring (Default) +Gitmit uses **normalized confidence weights** to reduce noise when multiple signals compete. + +1. **Normalize signals (0–1):** + - **Branch hint:** 1.0 if branch name matches an action, 0.0 otherwise. + - **Diff-stat:** 0–1 based on distance from thresholds (added/removed ratio). + - **Keywords:** Raw keyword scores are normalized relative to the highest-scoring action. + - **Multi-file patterns:** 1.0 if a relevant pattern is detected, 0.0 otherwise. +2. **Apply confidence weights:** + - branch: 0.35 + - diff-stat: 0.25 + - keywords: 0.25 + - multi-file patterns: 0.15 +3. **Final score:** `sum(weight × normalized_signal)` per action. +4. **Selection:** The action with the highest final score is selected. +5. **Fallback:** If top action score < 0.35, Gitmit falls back to file-based heuristics. + +### 3.2 Legacy Additive Scoring +If `normalizeScoring` is disabled in config, Gitmit falls back to raw score aggregation: +1. **Branch name hints:** +3 to matching action. +2. **Diff-stat ratio:** +2 to `feat` or `refactor`. +3. **Keyword scoring:** per-action weights are added directly. +4. **Multi-file patterns:** +3 or +4 to relevant actions. + +## 4. Scope Selection +- Single topic → that topic +- Single directory → directory name +- 2–3 topics → combined scope (sorted) +- Many topics → most common or `core` +- Commit history can override scope when consistent across recent commits + +## 5. Template Selection & Scoring +**Location:** `internal/templater/templater.go` + +1. **Template group resolution:** action → template group (A/M/D/R/DOC/SECURITY/MISC). +2. **Topic match:** exact → fuzzy → `_default`. +3. **Template scoring:** + - Base score 1.0 + - +2.0 for matching detected patterns + - +1.5 for using detected symbols + - +1.0 for meaningful purpose placeholders + - +0.5–1.5 for file-type relevance + - +1.0 for major change templates + - -0.5 for generic templates when specifics exist +4. **History de-dup:** recent messages are avoided when possible. + +The highest-scoring template is selected, and placeholders (`{topic}`, `{item}`, `{purpose}`, `{source}`, `{target}`) are replaced. + +## 6. Alternative Suggestions (Diversity Algorithm) +When regenerating suggestions: +- Used messages are filtered out. +- Similarity is computed using: + - **Word-level Jaccard similarity (60%)** + - **Character position matching (40%)** +- A diversity bonus favors less similar suggestions. +- A small random factor introduces controlled variation. + +## 7. Configuration Influence +**Location:** `internal/config/config.go` + `docs/CONFIGURATION.md` + +Configuration can adjust: +- Topic mappings +- Keyword mappings and weights +- Diff-stat thresholds +- Project-specific defaults + +This allows the algorithm’s weighting to be tuned without code changes. diff --git a/assets/prompts/system_prompt.txt b/assets/prompts/system_prompt.txt index ffa2b96..d555461 100644 --- a/assets/prompts/system_prompt.txt +++ b/assets/prompts/system_prompt.txt @@ -1,10 +1,13 @@ -You are an expert developer assistant. Analyze the provided structured git diff metadata and generate a single-line commit message following the Conventional Commits specification. +You are an expert developer assistant. Analyze the provided structured git diff metadata and generate a commit message following the Conventional Commits specification. Guidelines: 1. Format MUST be: (): 2. Allowed types: feat, fix, refactor, chore, test, docs, style, perf, ci, build, security -3. Do NOT include any markdown, backticks, quotes, or introductory text like "Here is your commit message:". -4. Output ONLY the raw string of the commit message. +3. If the changes are complex, you MAY include a body separated by a blank line after the subject. +4. Keep the subject line short (aim for ~50 characters). +5. Wrap body lines at ~72 characters. +6. Do NOT include any markdown, backticks, quotes, or introductory text like "Here is your commit message:". +7. Output ONLY the raw string of the commit message. Metadata Context: - Project Type: {{.ProjectType}} @@ -15,4 +18,7 @@ Metadata Context: - Dependency Changes: {{.DependencyAlert}} - Added/Deleted Line Ratio: {{printf "%.2f" .DiffSummary.Ratio}} +Summarized Git Diff: +{{.DiffContent}} + Output: diff --git a/cmd/propose.go b/cmd/propose.go index af82a8f..cb97305 100644 --- a/cmd/propose.go +++ b/cmd/propose.go @@ -92,7 +92,7 @@ func runPropose(cmd *cobra.Command, args []string) error { return err } - f := formatter.NewFormatter() + f := formatter.NewFormatter(cfg.MaxSubjectLength, cfg.MaxBodyLength) // Calculate Heuristic Suggestion (Always available) heuristicMsg, err := templater.GetMessage(commitMessage) @@ -112,7 +112,7 @@ func runPropose(cmd *cobra.Command, args []string) error { client := ai.NewOllamaClient(cfg.Ollama) aiResponse, err := client.Generate(prompt) if err == nil && ai.IsValidCommitMessage(aiResponse) { - aiMsg = strings.TrimSpace(aiResponse) + aiMsg = f.FormatMessage(strings.TrimSpace(aiResponse), commitMessage.IsMajor) usingAI = true finalMessage = aiMsg } @@ -220,7 +220,7 @@ func runPropose(cmd *cobra.Command, args []string) error { editedMessage = strings.TrimSpace(editedMessage) if editedMessage != "" { - finalMessage = editedMessage + finalMessage = f.FormatMessage(editedMessage, commitMessage.IsMajor) usedSuggestions[finalMessage] = true color.Green("\n✓ Updated commit message:") } else { @@ -240,7 +240,7 @@ func runPropose(cmd *cobra.Command, args []string) error { client := ai.NewOllamaClient(cfg.Ollama) aiResponse, err := client.Generate(prompt) if err == nil && ai.IsValidCommitMessage(aiResponse) { - finalMessage = strings.TrimSpace(aiResponse) + finalMessage = f.FormatMessage(strings.TrimSpace(aiResponse), commitMessage.IsMajor) regenerationCount++ } } @@ -264,7 +264,7 @@ func runPropose(cmd *cobra.Command, args []string) error { client := ai.NewOllamaClient(cfg.Ollama) aiResponse, err := client.Generate(prompt) if err == nil && ai.IsValidCommitMessage(aiResponse) { - aiMsg = strings.TrimSpace(aiResponse) + aiMsg = f.FormatMessage(strings.TrimSpace(aiResponse), commitMessage.IsMajor) finalMessage = aiMsg usingAI = true } else { diff --git a/docs/CONFIGURATION.md b/docs/CONFIGURATION.md index 58609b0..d86a7f8 100644 --- a/docs/CONFIGURATION.md +++ b/docs/CONFIGURATION.md @@ -104,6 +104,53 @@ Controls the threshold for the diff stat analysis algorithm. This ratio determin } ``` +### Normalized Scoring + +**`normalizeScoring`** (boolean, default: true) + +Enables normalized confidence weights for action selection. This algorithm reduces noise when multiple weak signals compete by calculating a weighted average instead of a raw additive score. + +**`signalWeights`** (object) + +Defines the confidence weights for different signal sources. Only used when `normalizeScoring` is `true`. + +**Default weights:** +- `branch`: 0.35 (strongest signal) +- `diffStat`: 0.25 +- `keywords`: 0.25 +- `patterns`: 0.15 (multi-file patterns) + +**Example:** +```json +{ + "normalizeScoring": true, + "signalWeights": { + "branch": 0.5, + "diffStat": 0.2, + "keywords": 0.2, + "patterns": 0.1 + } +} +``` + +### Message Length Constraints + +**`maxSubjectLength`** (int, default: 50) + +Specifies the maximum character length for the first line (subject) of the commit message. If the generated or edited subject exceeds this limit, it will be automatically wrapped to the next line. + +**`maxBodyLength`** (int, default: 72) + +Specifies the maximum character length for each line in the body of the commit message. If the body text exceeds this limit, it will be wrapped at word boundaries. + +**Example:** +```json +{ + "maxSubjectLength": 50, + "maxBodyLength": 72 +} +``` + ### Topic Mappings **`topicMappings`** (object) diff --git a/internal/ai/ai_test.go b/internal/ai/ai_test.go index 42d7e0a..ac25679 100644 --- a/internal/ai/ai_test.go +++ b/internal/ai/ai_test.go @@ -44,6 +44,7 @@ func TestIsValidCommitMessage(t *testing.T) { expected bool }{ {"feat(auth): add login functionality", true}, + {"feat(auth): add login\n\nThis is a body.", true}, {"fix: resolve memory leak", true}, {"chore(deps): update dependencies", true}, {"Invalid message", false}, diff --git a/internal/ai/prompt.go b/internal/ai/prompt.go index 60d8dca..b0ccb79 100644 --- a/internal/ai/prompt.go +++ b/internal/ai/prompt.go @@ -19,6 +19,7 @@ type PromptContext struct { CodeSymbols []string DependencyAlert string DiffSummary DiffSummary + DiffContent string } // DiffSummary contains ratio of changes @@ -70,6 +71,7 @@ func RenderPrompt(msg *analyzer.CommitMessage, projectType, branchName string) ( DiffSummary: DiffSummary{ Ratio: ratio, }, + DiffContent: msg.FullDiff, } var buf bytes.Buffer diff --git a/internal/analyzer/analyzer.go b/internal/analyzer/analyzer.go index 5b074c8..03b5569 100644 --- a/internal/analyzer/analyzer.go +++ b/internal/analyzer/analyzer.go @@ -2,6 +2,7 @@ package analyzer import ( "bufio" + "fmt" "path/filepath" "regexp" "strings" @@ -32,6 +33,7 @@ type CommitMessage struct { DetectedStructs []string DetectedMethods []string ChangePatterns []string + FullDiff string } // Analyzer is responsible for analyzing git changes and generating commit message components @@ -105,6 +107,15 @@ func (a *Analyzer) AnalyzeChanges(totalAdded, totalRemoved int, branchName strin commitMessage.DetectedMethods = uniqueStrings(allMethods) commitMessage.ChangePatterns = uniqueStrings(allPatterns) + // Collect summarized diff for AI + var diffSummary strings.Builder + for _, change := range a.changes { + diffSummary.WriteString(fmt.Sprintf("File: %s\n", change.File)) + diffSummary.WriteString(a.summarizeDiff(change.Diff)) + diffSummary.WriteString("\n") + } + commitMessage.FullDiff = diffSummary.String() + // Determine if changes are only documentation, config, or dependencies commitMessage.IsDocsOnly = a.isDocsOnly() commitMessage.IsConfigOnly = a.isConfigOnly() @@ -115,62 +126,11 @@ func (a *Analyzer) AnalyzeChanges(totalAdded, totalRemoved int, branchName strin return msg } - // Initialize a score tracker for the action (type) - scoreMap := make(map[string]int) - - // Step 1: Scan the Branch status - if branchName != "" { - branchAction, branchScope := a.parseBranchName(branchName) - if branchAction != "" { - scoreMap[branchAction] += 3 - } - if branchScope != "" { - commitMessage.Scope = branchScope - } - } - - // Step 2: Add weights from diff stat ratio - statAction := a.analyzeDiffStat(totalAdded, totalRemoved) - if statAction != "" { - scoreMap[statAction] += 2 - } - - // Step 3: Aggregate keyword scores - keywordScores := a.calculateKeywordScores() - for action, score := range keywordScores { - scoreMap[action] += score - } - - // Step 4: Add weights from multi-file patterns - multiPatterns := a.detectMultiFilePatterns() - for _, p := range multiPatterns { - switch p { - case "feature-addition": - scoreMap["feat"] += 4 - case "bug-fix-cascade": - scoreMap["fix"] += 4 - case "refactor-sweep": - scoreMap["refactor"] += 3 - case "test-suite-update": - scoreMap["test"] += 4 - } - } - - // Step 5: Select the recommended type with the highest accumulated score - bestAction := "" - maxScore := -1 - for action, score := range scoreMap { - if score > maxScore { - maxScore = score - bestAction = action - } - } - - if bestAction != "" { - commitMessage.Action = bestAction + // Determine the recommended action (type) using scoring + if a.config.NormalizeScoring { + commitMessage.Action = a.calculateNormalizedAction(totalAdded, totalRemoved, branchName, commitMessage) } else { - // Fallback to default action determination if no signals - commitMessage.Action = a.determineAction(a.changes[0]) + commitMessage.Action = a.calculateAdditiveAction(totalAdded, totalRemoved, branchName, commitMessage) } // Default analysis based on the first change if no specific fallback applies @@ -1126,6 +1086,177 @@ func (a *Analyzer) analyzeHistoryScopes() string { return a.calculateHistoryScope(commits) } +// calculateAdditiveAction implements the legacy additive scoring logic +func (a *Analyzer) calculateAdditiveAction(totalAdded, totalRemoved int, branchName string, commitMessage *CommitMessage) string { + scoreMap := make(map[string]int) + + if branchName != "" { + branchAction, branchScope := a.parseBranchName(branchName) + if branchAction != "" { + scoreMap[branchAction] += 3 + } + if branchScope != "" { + commitMessage.Scope = branchScope + } + } + + statAction := a.analyzeDiffStat(totalAdded, totalRemoved) + if statAction != "" { + scoreMap[statAction] += 2 + } + + keywordScores := a.calculateKeywordScores() + for action, score := range keywordScores { + scoreMap[action] += score + } + + multiPatterns := a.detectMultiFilePatterns() + for _, p := range multiPatterns { + switch p { + case "feature-addition": + scoreMap["feat"] += 4 + case "bug-fix-cascade": + scoreMap["fix"] += 4 + case "refactor-sweep": + scoreMap["refactor"] += 3 + case "test-suite-update": + scoreMap["test"] += 4 + } + } + + bestAction := "" + maxScore := -1 + for action, score := range scoreMap { + if score > maxScore { + maxScore = score + bestAction = action + } + } + + if bestAction != "" { + return bestAction + } + return a.determineAction(a.changes[0]) +} + +// calculateNormalizedAction implements the new weighted average scoring logic +func (a *Analyzer) calculateNormalizedAction(totalAdded, totalRemoved int, branchName string, commitMessage *CommitMessage) string { + signals := make(map[string]map[string]float64) + signals["branch"] = make(map[string]float64) + signals["diffStat"] = make(map[string]float64) + signals["keywords"] = make(map[string]float64) + signals["patterns"] = make(map[string]float64) + + // 1. Branch signal (binary: 0 or 1) + if branchName != "" { + branchAction, branchScope := a.parseBranchName(branchName) + if branchAction != "" { + signals["branch"][branchAction] = 1.0 + } + if branchScope != "" { + commitMessage.Scope = branchScope + } + } + + // 2. Diff-stat signal (0-1 based on distance from thresholds) + ratio := 0.5 + if totalAdded+totalRemoved > 0 { + ratio = float64(totalAdded) / float64(totalAdded+totalRemoved) + } + + if ratio < 0.2 { + // Strong refactor signal + signals["diffStat"]["refactor"] = 1.0 - (ratio / 0.2) + } else if ratio > 0.8 { + // Strong feat signal if enough lines + if totalAdded > 30 { + signals["diffStat"]["feat"] = (ratio - 0.8) / 0.2 + } + } else if ratio >= 0.3 && ratio <= 0.7 { + // Balanced: refactor/fix signal + // Max signal at 0.5 + dist := ratio - 0.5 + if dist < 0 { + dist = -dist + } + signals["diffStat"]["refactor"] = 1.0 - (dist / 0.2) + } + + // 3. Keyword signal (min-max normalized per action relative to max score) + keywordScores := a.calculateKeywordScores() + maxKeywordScore := 0 + for _, score := range keywordScores { + if score > maxKeywordScore { + maxKeywordScore = score + } + } + if maxKeywordScore > 0 { + for action, score := range keywordScores { + signals["keywords"][action] = float64(score) / float64(maxKeywordScore) + } + } + + // 4. Multi-file pattern signal (binary: 0 or 1) + multiPatterns := a.detectMultiFilePatterns() + for _, p := range multiPatterns { + switch p { + case "feature-addition": + signals["patterns"]["feat"] = 1.0 + case "bug-fix-cascade": + signals["patterns"]["fix"] = 1.0 + case "refactor-sweep": + signals["patterns"]["refactor"] = 1.0 + case "test-suite-update": + signals["patterns"]["test"] = 1.0 + case "config-update": + signals["patterns"]["ci"] = 1.0 + } + } + + // Compute final weighted scores + finalScores := make(map[string]float64) + weights := a.config.SignalWeights + if weights == nil { + weights = map[string]float64{ + "branch": 0.35, + "diffStat": 0.25, + "keywords": 0.25, + "patterns": 0.15, + } + } + + allActions := make(map[string]bool) + for _, signalMap := range signals { + for action := range signalMap { + allActions[action] = true + } + } + + bestAction := "" + maxFinalScore := -1.0 + + for action := range allActions { + score := 0.0 + score += signals["branch"][action] * weights["branch"] + score += signals["diffStat"][action] * weights["diffStat"] + score += signals["keywords"][action] * weights["keywords"] + score += signals["patterns"][action] * weights["patterns"] + finalScores[action] = score + + if score > maxFinalScore { + maxFinalScore = score + bestAction = action + } + } + + // Fallback: If top action score is too low, use file-based heuristics + if maxFinalScore < 0.35 { + return a.determineAction(a.changes[0]) + } + + return bestAction +} + // calculateHistoryScope calculates the most frequent scope from a list of commit messages func (a *Analyzer) calculateHistoryScope(commits []string) string { scopeCounts := make(map[string]int) @@ -1153,3 +1284,29 @@ func (a *Analyzer) calculateHistoryScope(commits []string) string { return "" } + +// summarizeDiff extracts the most relevant lines from a diff to keep it concise for AI +func (a *Analyzer) summarizeDiff(diff string) string { + var summary strings.Builder + scanner := bufio.NewScanner(strings.NewReader(diff)) + lineCount := 0 + maxLines := 20 // Limit lines per file to avoid context bloat + + for scanner.Scan() { + line := scanner.Text() + // Only include added/removed lines and hunk headers + if strings.HasPrefix(line, "+") || strings.HasPrefix(line, "-") || strings.HasPrefix(line, "@@") { + if strings.HasPrefix(line, "+++") || strings.HasPrefix(line, "---") { + continue + } + summary.WriteString(line) + summary.WriteString("\n") + lineCount++ + } + if lineCount >= maxLines { + summary.WriteString("... (truncated)\n") + break + } + } + return summary.String() +} diff --git a/internal/analyzer/scoring_test.go b/internal/analyzer/scoring_test.go new file mode 100644 index 0000000..ed77c0c --- /dev/null +++ b/internal/analyzer/scoring_test.go @@ -0,0 +1,89 @@ +package analyzer + +import ( + "github.com/andev0x/gitmit/internal/config" + "github.com/andev0x/gitmit/internal/parser" + "testing" +) + +func TestNormalizedScoring(t *testing.T) { + cfg := &config.Config{ + NormalizeScoring: true, + SignalWeights: map[string]float64{ + "branch": 0.35, + "diffStat": 0.25, + "keywords": 0.25, + "patterns": 0.15, + }, + Keywords: map[string]map[string]int{ + "fix": {"error": 4}, + }, + } + + t.Run("Branch signal dominates keyword", func(t *testing.T) { + a := &Analyzer{ + config: cfg, + changes: []*parser.Change{ + {File: "main.go", Action: "M", Diff: "+ var x = \"error\""}, + }, + } + // branch "feat/new-ui" -> feat: 0.35 * 1.0 = 0.35 + // keyword "error" -> fix: 0.25 * 1.0 = 0.25 + // feat should win + msg := a.AnalyzeChanges(1, 0, "feat/new-ui") + if msg.Action != "feat" { + t.Errorf("Expected action feat, got %s", msg.Action) + } + }) + + t.Run("Keywords dominate if branch is missing", func(t *testing.T) { + a := &Analyzer{ + config: cfg, + changes: []*parser.Change{ + {File: "main.go", Action: "M", Diff: "+ var x = \"error\""}, + }, + } + // keyword "error" -> fix: 0.25 * 1.0 = 0.25 + // 0.25 < 0.35 (fallback threshold) + // So it should fallback to determineAction which for Action: "M" is refactor + msg := a.AnalyzeChanges(1, 0, "") + if msg.Action != "refactor" { + t.Errorf("Expected action refactor (fallback), got %s", msg.Action) + } + }) + + t.Run("Combined signals work together", func(t *testing.T) { + a := &Analyzer{ + config: cfg, + changes: []*parser.Change{ + {File: "main.go", Action: "M", Diff: "+ func NewFeature() {", Added: 40, Removed: 0}, + }, + } + // branch "feature/cool" -> feat: 0.35 + // ratio 1.0 -> feat: 0.25 * 1.0 = 0.25 + // total feat = 0.60 + msg := a.AnalyzeChanges(40, 0, "feature/cool") + if msg.Action != "feat" { + t.Errorf("Expected action feat, got %s", msg.Action) + } + }) + + t.Run("Fallback to additive if disabled", func(t *testing.T) { + cfgDisabled := *cfg + cfgDisabled.NormalizeScoring = false + a := &Analyzer{ + config: &cfgDisabled, + changes: []*parser.Change{ + {File: "main.go", Action: "M", Diff: "+ var x = \"error\""}, + }, + } + // In additive: + // branch "feat/new-ui" -> feat: 3 + // keyword "error" -> fix: 4 + // fix should win + msg := a.AnalyzeChanges(1, 0, "feat/new-ui") + if msg.Action != "fix" { + t.Errorf("Expected action fix, got %s", msg.Action) + } + }) +} diff --git a/internal/config/config.go b/internal/config/config.go index aa61d4f..90fb43c 100644 --- a/internal/config/config.go +++ b/internal/config/config.go @@ -17,6 +17,10 @@ type Config struct { Keywords map[string]map[string]int `json:"keywords"` // action -> keyword -> score Templates map[string]map[string]string `json:"templates"` // Custom templates DiffStatThreshold float64 `json:"diffStatThreshold"` // Threshold for add/delete ratio + NormalizeScoring bool `json:"normalizeScoring"` // Whether to use normalized confidence weights + SignalWeights map[string]float64 `json:"signalWeights"` // Weights for different signal sources + MaxSubjectLength int `json:"maxSubjectLength"` // Max length for the first line + MaxBodyLength int `json:"maxBodyLength"` // Max length for body lines } // OllamaConfig represents the structure of the ollama configuration block @@ -41,6 +45,15 @@ func LoadConfig() (*Config, error) { Keywords: make(map[string]map[string]int), Templates: make(map[string]map[string]string), DiffStatThreshold: 0.5, + NormalizeScoring: true, + SignalWeights: map[string]float64{ + "branch": 0.35, + "diffStat": 0.25, + "keywords": 0.25, + "patterns": 0.15, + }, + MaxSubjectLength: 50, + MaxBodyLength: 72, } // 1. Try to load embedded default config (optional) @@ -260,5 +273,35 @@ func mergeConfigFromFile(cfg *Config, path string) error { cfg.DiffStatThreshold = fileCfg.DiffStatThreshold } + // Normalize scoring + if data, err := os.ReadFile(path); err == nil { + var raw map[string]interface{} + if err := json.Unmarshal(data, &raw); err == nil { + if val, ok := raw["normalizeScoring"]; ok { + if b, ok := val.(bool); ok { + cfg.NormalizeScoring = b + } + } + } + } + + // Signal weights + if fileCfg.SignalWeights != nil { + if cfg.SignalWeights == nil { + cfg.SignalWeights = make(map[string]float64) + } + for k, v := range fileCfg.SignalWeights { + cfg.SignalWeights[k] = v + } + } + + // Message lengths + if fileCfg.MaxSubjectLength > 0 { + cfg.MaxSubjectLength = fileCfg.MaxSubjectLength + } + if fileCfg.MaxBodyLength > 0 { + cfg.MaxBodyLength = fileCfg.MaxBodyLength + } + return nil } diff --git a/internal/formatter/formatter.go b/internal/formatter/formatter.go index ee1ce6d..06e1ebb 100644 --- a/internal/formatter/formatter.go +++ b/internal/formatter/formatter.go @@ -6,44 +6,103 @@ import ( ) // Formatter is responsible for applying final formatting to commit messages -type Formatter struct{} +type Formatter struct { + MaxSubjectLength int + MaxBodyLength int +} // NewFormatter creates a new Formatter -func NewFormatter() *Formatter { - return &Formatter{} +func NewFormatter(maxSubject, maxBody int) *Formatter { + return &Formatter{ + MaxSubjectLength: maxSubject, + MaxBodyLength: maxBody, + } } // FormatMessage applies formatting rules to the commit message func (f *Formatter) FormatMessage(msg string, isMajor bool) string { - // Capitalize the first letter - /* if len(msg) > 0 { - r := []rune(msg) - r[0] = unicode.ToUpper(r[0]) - msg = string(r) - } */ - - // Remove redundant phrases - msg = strings.ReplaceAll(msg, "add add new", "add new") - msg = strings.ReplaceAll(msg, "feat feat", "feat") - msg = strings.ReplaceAll(msg, "fix fix", "fix") - - // Enforce summary length (soft limit for now, try to break at word boundaries) - if len(msg) > 72 { - truncatedMsg := msg - if len(truncatedMsg) > 72 { - truncatedMsg = truncatedMsg[:72] - lastSpace := strings.LastIndex(truncatedMsg, " ") - if lastSpace != -1 { - truncatedMsg = truncatedMsg[:lastSpace] + if msg == "" { + return "" + } + + // Split into subject and body + parts := strings.SplitN(msg, "\n", 2) + subject := strings.TrimSpace(parts[0]) + body := "" + if len(parts) > 1 { + body = strings.TrimSpace(parts[1]) + } + + // Remove redundant phrases from subject + subject = strings.ReplaceAll(subject, "add add new", "add new") + subject = strings.ReplaceAll(subject, "feat feat", "feat") + subject = strings.ReplaceAll(subject, "fix fix", "fix") + + // Add optional suffixes to subject + if isMajor { + subject = fmt.Sprintf("%s (massive refactor)", subject) + } + + // Wrap subject if too long + if f.MaxSubjectLength > 0 && len(subject) > f.MaxSubjectLength { + wrapped := f.wrapString(subject, f.MaxSubjectLength) + subjectParts := strings.SplitN(wrapped, "\n", 2) + subject = subjectParts[0] + if len(subjectParts) > 1 { + if body != "" { + body = subjectParts[1] + "\n\n" + body + } else { + body = subjectParts[1] } - msg = fmt.Sprintf("%s...", truncatedMsg) } } - // Add optional suffixes - if isMajor { - msg = fmt.Sprintf("%s (massive refactor)", msg) + // Wrap body if exists + if body != "" && f.MaxBodyLength > 0 { + body = f.wrapString(body, f.MaxBodyLength) + } + + if body != "" { + return subject + "\n\n" + body + } + return subject +} + +// wrapString wraps a string at the specified limit, preserving paragraphs +func (f *Formatter) wrapString(s string, limit int) string { + if limit <= 0 { + return s + } + + paragraphs := strings.Split(s, "\n\n") + var result strings.Builder + + for i, p := range paragraphs { + if i > 0 { + result.WriteString("\n\n") + } + + words := strings.Fields(p) + if len(words) == 0 { + continue + } + + currentLineLength := 0 + for j, word := range words { + if j > 0 { + if currentLineLength+1+len(word) > limit { + result.WriteString("\n") + currentLineLength = 0 + } else { + result.WriteString(" ") + currentLineLength++ + } + } + + result.WriteString(word) + currentLineLength += len(word) + } } - return msg + return result.String() } diff --git a/internal/formatter/formatter_test.go b/internal/formatter/formatter_test.go new file mode 100644 index 0000000..4203024 --- /dev/null +++ b/internal/formatter/formatter_test.go @@ -0,0 +1,61 @@ +package formatter + +import ( + "testing" +) + +func TestFormatMessage(t *testing.T) { + tests := []struct { + name string + msg string + maxSubject int + maxBody int + expected string + }{ + { + name: "short subject, no wrapping", + msg: "feat: add feature", + maxSubject: 50, + maxBody: 72, + expected: "feat: add feature", + }, + { + name: "long subject, wrap at 10", + msg: "feat: add new feature for login", + maxSubject: 10, + maxBody: 72, + expected: "feat: add\n\nnew feature for login", + }, + { + name: "subject and body, no wrapping", + msg: "feat: add feature\n\nThis is a body message.", + maxSubject: 50, + maxBody: 72, + expected: "feat: add feature\n\nThis is a body message.", + }, + { + name: "subject and body, wrap both", + msg: "feat: add feature\n\nThis is a body message that is very long.", + maxSubject: 10, + maxBody: 10, + expected: "feat: add\n\nfeature\n\nThis is a\nbody\nmessage\nthat is\nvery long.", + }, + { + name: "redundant phrases", + msg: "feat feat: add add new feature", + maxSubject: 50, + maxBody: 72, + expected: "feat: add new feature", + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + f := NewFormatter(tt.maxSubject, tt.maxBody) + actual := f.FormatMessage(tt.msg, false) + if actual != tt.expected { + t.Errorf("FormatMessage() = %q, want %q", actual, tt.expected) + } + }) + } +}