Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 125 additions & 0 deletions ALGORITHMS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Gitmit Algorithms

## Overview
Gitmit generates Conventional Commit messages by combining git diff parsing, heuristic analysis, weighted scoring, and template selection. The pipeline is fully offline and deterministic, with optional AI as a separate layer.

```
Git status/diff → Parser → Analyzer → Templater → Formatter → Commit message
```

## 1. Change Collection (Parser)
**Location:** `internal/parser/git.go`

1. **Staged file discovery:** `git status --porcelain` is scanned to identify staged files and their actions (A/M/D/R/C).
2. **Per-file diff extraction:** For each staged file, `git diff --cached -U0 -- <file>` is streamed.
3. **Line stats:** Added/removed lines are counted by diff prefixes (`+`/`-`).
4. **Major change flag:** A file is marked `IsMajor` when added+removed lines ≥ 500.

The parser returns a list of `Change` objects and aggregates totals for diff-stat analysis.

## 2. Analyzer: Feature & Context Extraction
**Location:** `internal/analyzer/analyzer.go`

### 2.1 File/Topic/Item Detection
- **Topic** is inferred from directory path with configurable overrides (`topicMappings`).
- **Item** defaults to the filename without extension.
- **Purpose** is inferred from keyword mappings and built-in keyword heuristics.

### 2.2 Symbol Extraction
Regex-based extraction detects structures from added lines:
- **Functions** (Go, JS/TS, Python, Java)
- **Structs/Classes**
- **Methods** (receiver-based Go methods)

These symbols are used to populate `{item}` placeholders and improve specificity.

### 2.3 Change Pattern Detection
Single-file patterns include:
- error handling, tests, imports, docs/comments, refactors
- API/database/performance/security indicators
- validation, logging, middleware, DI, CLI changes

### 2.4 Multi-file Pattern Detection
Across all changes, Gitmit detects patterns such as:
- **feature-addition** (many new files)
- **bug-fix-cascade** (many modified files with fix keywords)
- **refactor-sweep** (mixed A/M/D)
- **test-suite-update** / **config-update**
- **api-redesign** / **database-migration**

### 2.5 Special-Case Fallbacks
Early exits provide deterministic messages for clear cases:
- Single added file → `feat`
- Single deleted file → `chore`
- Only docs/config/deps → `docs`/`ci`/`chore(deps)`

## 3. Action (Type) Scoring Algorithm
The commit **action** is determined by a weighted score map, with support for normalized confidence weights (default).

### 3.1 Normalized Scoring (Default)
Gitmit uses **normalized confidence weights** to reduce noise when multiple signals compete.

1. **Normalize signals (0–1):**
- **Branch hint:** 1.0 if branch name matches an action, 0.0 otherwise.
- **Diff-stat:** 0–1 based on distance from thresholds (added/removed ratio).
- **Keywords:** Raw keyword scores are normalized relative to the highest-scoring action.
- **Multi-file patterns:** 1.0 if a relevant pattern is detected, 0.0 otherwise.
2. **Apply confidence weights:**
- branch: 0.35
- diff-stat: 0.25
- keywords: 0.25
- multi-file patterns: 0.15
3. **Final score:** `sum(weight × normalized_signal)` per action.
4. **Selection:** The action with the highest final score is selected.
5. **Fallback:** If top action score < 0.35, Gitmit falls back to file-based heuristics.

### 3.2 Legacy Additive Scoring
If `normalizeScoring` is disabled in config, Gitmit falls back to raw score aggregation:
1. **Branch name hints:** +3 to matching action.
2. **Diff-stat ratio:** +2 to `feat` or `refactor`.
3. **Keyword scoring:** per-action weights are added directly.
4. **Multi-file patterns:** +3 or +4 to relevant actions.

## 4. Scope Selection
- Single topic → that topic
- Single directory → directory name
- 2–3 topics → combined scope (sorted)
- Many topics → most common or `core`
- Commit history can override scope when consistent across recent commits

## 5. Template Selection & Scoring
**Location:** `internal/templater/templater.go`

1. **Template group resolution:** action → template group (A/M/D/R/DOC/SECURITY/MISC).
2. **Topic match:** exact → fuzzy → `_default`.
3. **Template scoring:**
- Base score 1.0
- +2.0 for matching detected patterns
- +1.5 for using detected symbols
- +1.0 for meaningful purpose placeholders
- +0.5–1.5 for file-type relevance
- +1.0 for major change templates
- -0.5 for generic templates when specifics exist
4. **History de-dup:** recent messages are avoided when possible.

The highest-scoring template is selected, and placeholders (`{topic}`, `{item}`, `{purpose}`, `{source}`, `{target}`) are replaced.

## 6. Alternative Suggestions (Diversity Algorithm)
When regenerating suggestions:
- Used messages are filtered out.
- Similarity is computed using:
- **Word-level Jaccard similarity (60%)**
- **Character position matching (40%)**
- A diversity bonus favors less similar suggestions.
- A small random factor introduces controlled variation.

## 7. Configuration Influence
**Location:** `internal/config/config.go` + `docs/CONFIGURATION.md`

Configuration can adjust:
- Topic mappings
- Keyword mappings and weights
- Diff-stat thresholds
- Project-specific defaults

This allows the algorithm’s weighting to be tuned without code changes.
12 changes: 9 additions & 3 deletions assets/prompts/system_prompt.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
You are an expert developer assistant. Analyze the provided structured git diff metadata and generate a single-line commit message following the Conventional Commits specification.
You are an expert developer assistant. Analyze the provided structured git diff metadata and generate a commit message following the Conventional Commits specification.

Guidelines:
1. Format MUST be: <type>(<scope>): <short description in present tense>
2. Allowed types: feat, fix, refactor, chore, test, docs, style, perf, ci, build, security
3. Do NOT include any markdown, backticks, quotes, or introductory text like "Here is your commit message:".
4. Output ONLY the raw string of the commit message.
3. If the changes are complex, you MAY include a body separated by a blank line after the subject.
4. Keep the subject line short (aim for ~50 characters).
5. Wrap body lines at ~72 characters.
6. Do NOT include any markdown, backticks, quotes, or introductory text like "Here is your commit message:".
7. Output ONLY the raw string of the commit message.

Metadata Context:
- Project Type: {{.ProjectType}}
Expand All @@ -15,4 +18,7 @@ Metadata Context:
- Dependency Changes: {{.DependencyAlert}}
- Added/Deleted Line Ratio: {{printf "%.2f" .DiffSummary.Ratio}}

Summarized Git Diff:
{{.DiffContent}}

Output:
10 changes: 5 additions & 5 deletions cmd/propose.go
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ func runPropose(cmd *cobra.Command, args []string) error {
return err
}

f := formatter.NewFormatter()
f := formatter.NewFormatter(cfg.MaxSubjectLength, cfg.MaxBodyLength)

// Calculate Heuristic Suggestion (Always available)
heuristicMsg, err := templater.GetMessage(commitMessage)
Expand All @@ -112,7 +112,7 @@ func runPropose(cmd *cobra.Command, args []string) error {
client := ai.NewOllamaClient(cfg.Ollama)
aiResponse, err := client.Generate(prompt)
if err == nil && ai.IsValidCommitMessage(aiResponse) {
aiMsg = strings.TrimSpace(aiResponse)
aiMsg = f.FormatMessage(strings.TrimSpace(aiResponse), commitMessage.IsMajor)
usingAI = true
finalMessage = aiMsg
}
Expand Down Expand Up @@ -220,7 +220,7 @@ func runPropose(cmd *cobra.Command, args []string) error {
editedMessage = strings.TrimSpace(editedMessage)

if editedMessage != "" {
finalMessage = editedMessage
finalMessage = f.FormatMessage(editedMessage, commitMessage.IsMajor)
usedSuggestions[finalMessage] = true
color.Green("\n✓ Updated commit message:")
} else {
Expand All @@ -240,7 +240,7 @@ func runPropose(cmd *cobra.Command, args []string) error {
client := ai.NewOllamaClient(cfg.Ollama)
aiResponse, err := client.Generate(prompt)
if err == nil && ai.IsValidCommitMessage(aiResponse) {
finalMessage = strings.TrimSpace(aiResponse)
finalMessage = f.FormatMessage(strings.TrimSpace(aiResponse), commitMessage.IsMajor)
regenerationCount++
}
}
Expand All @@ -264,7 +264,7 @@ func runPropose(cmd *cobra.Command, args []string) error {
client := ai.NewOllamaClient(cfg.Ollama)
aiResponse, err := client.Generate(prompt)
if err == nil && ai.IsValidCommitMessage(aiResponse) {
aiMsg = strings.TrimSpace(aiResponse)
aiMsg = f.FormatMessage(strings.TrimSpace(aiResponse), commitMessage.IsMajor)
finalMessage = aiMsg
usingAI = true
} else {
Expand Down
47 changes: 47 additions & 0 deletions docs/CONFIGURATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,53 @@ Controls the threshold for the diff stat analysis algorithm. This ratio determin
}
```

### Normalized Scoring

**`normalizeScoring`** (boolean, default: true)

Enables normalized confidence weights for action selection. This algorithm reduces noise when multiple weak signals compete by calculating a weighted average instead of a raw additive score.

**`signalWeights`** (object)

Defines the confidence weights for different signal sources. Only used when `normalizeScoring` is `true`.

**Default weights:**
- `branch`: 0.35 (strongest signal)
- `diffStat`: 0.25
- `keywords`: 0.25
- `patterns`: 0.15 (multi-file patterns)

**Example:**
```json
{
"normalizeScoring": true,
"signalWeights": {
"branch": 0.5,
"diffStat": 0.2,
"keywords": 0.2,
"patterns": 0.1
}
}
```

### Message Length Constraints

**`maxSubjectLength`** (int, default: 50)

Specifies the maximum character length for the first line (subject) of the commit message. If the generated or edited subject exceeds this limit, it will be automatically wrapped to the next line.

**`maxBodyLength`** (int, default: 72)

Specifies the maximum character length for each line in the body of the commit message. If the body text exceeds this limit, it will be wrapped at word boundaries.

**Example:**
```json
{
"maxSubjectLength": 50,
"maxBodyLength": 72
}
```

### Topic Mappings

**`topicMappings`** (object)
Expand Down
1 change: 1 addition & 0 deletions internal/ai/ai_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ func TestIsValidCommitMessage(t *testing.T) {
expected bool
}{
{"feat(auth): add login functionality", true},
{"feat(auth): add login\n\nThis is a body.", true},
{"fix: resolve memory leak", true},
{"chore(deps): update dependencies", true},
{"Invalid message", false},
Expand Down
2 changes: 2 additions & 0 deletions internal/ai/prompt.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ type PromptContext struct {
CodeSymbols []string
DependencyAlert string
DiffSummary DiffSummary
DiffContent string
}

// DiffSummary contains ratio of changes
Expand Down Expand Up @@ -70,6 +71,7 @@ func RenderPrompt(msg *analyzer.CommitMessage, projectType, branchName string) (
DiffSummary: DiffSummary{
Ratio: ratio,
},
DiffContent: msg.FullDiff,
}

var buf bytes.Buffer
Expand Down
Loading
Loading