-
Notifications
You must be signed in to change notification settings - Fork 0
feat: implement Git-based real-time incremental sync #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
e455428
15f65f4
fad111a
9062c84
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| # GrepTurbo Context | ||
|
|
||
| GrepTurbo provides index-accelerated regex search that stays in sync with real-time code changes using Git. | ||
|
|
||
| ## Language | ||
|
|
||
| **Baseline**: | ||
| The immutable on-disk index tied to a specific Git commit. | ||
| _Avoid_: Old index, Disk index, Saved index | ||
|
|
||
| **Overlay**: | ||
| The transient in-memory index of uncommitted (modified, added, or untracked) files detected during search. | ||
| _Avoid_: Live index, Delta index, Fresh index | ||
|
|
||
| **Commit Drift**: | ||
| The state where the current Git HEAD differs from the commit hash stored in the **Baseline**. | ||
|
|
||
| **Tombstone**: | ||
| A list of file paths that have been deleted since the **Baseline** was built, used to filter out dead results. | ||
|
|
||
| ## Relationships | ||
|
|
||
| - A **Search** merges results from the **Baseline** and the **Overlay**, while filtering out paths in the **Tombstone** set. | ||
| - **Commit Drift** triggers an automatic rebuild of the **Baseline**. | ||
|
|
||
| ## Example dialogue | ||
|
|
||
| > **Dev:** "If I just edited a file but haven't committed it, will the search find it?" | ||
| > **Domain expert:** "Yes, the **Overlay** will index your unsaved changes and merge them with the **Baseline** results." |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| AGENTS.md |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| # 0001-git-sync-strategy | ||
|
|
||
| To ensure search results reflect real-time code changes without the overhead of a background daemon or full index rebuilds, we are implementing a Git-based on-demand synchronization strategy. | ||
|
|
||
| ### Context | ||
| Users expect the search index to be "always fresh," but rebuilding a trigram index on every search is too slow. Approaches like background file watchers introduce complexity (daemons, locking), and content-hashing requires reading every file (defeating the index's performance). | ||
|
|
||
| ### Decision | ||
| We will use `git status` to identify modified, deleted, and untracked files during every search. | ||
| - **Baseline**: The on-disk index is tied to a specific Git commit hash stored in `metadata.json`. | ||
| - **Overlay**: Dirty files (modified/untracked) are re-indexed in-memory on the fly and their results are merged with the Baseline. | ||
| - **Tombstones**: Deleted files are filtered out of the Baseline results. | ||
| - **Commit Drift**: If the current Git HEAD differs from the Baseline hash, an automatic full rebuild is triggered. | ||
|
|
||
| ### Consequences | ||
| - **Trade-off**: Search depends on Git being present in the repository. | ||
| - **Accuracy**: Search results are 100% accurate relative to the current working tree. | ||
| - **Performance**: High-volume changes (thousands of uncommitted files) will increase search latency as they must be indexed in-memory, but we choose to prioritize accuracy over a "too many changes" threshold for now. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| package index | ||
|
|
||
| import ( | ||
| "encoding/json" | ||
| "os" | ||
| "os/exec" | ||
| "path/filepath" | ||
| "strings" | ||
| ) | ||
|
|
||
| // Metadata contains information about the built Baseline index. | ||
| type Metadata struct { | ||
| Commit string `json:"commit"` | ||
| RootDir string `json:"root_dir"` | ||
| Skip []string `json:"skip"` | ||
| } | ||
|
|
||
| // WriteMetadata saves index metadata to metadata.json in dir. | ||
| func WriteMetadata(dir, rootDir string, skip []string) error { | ||
| commit, err := CurrentCommit(rootDir) | ||
| if err != nil { | ||
| commit = "unknown" | ||
| } | ||
|
|
||
| m := Metadata{ | ||
| Commit: commit, | ||
| RootDir: rootDir, | ||
| Skip: skip, | ||
| } | ||
| data, err := json.MarshalIndent(m, "", " ") | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| return os.WriteFile(filepath.Join(dir, "metadata.json"), data, 0644) | ||
| } | ||
|
|
||
| // ReadMetadata loads index metadata from metadata.json in dir. | ||
| func ReadMetadata(dir string) (*Metadata, error) { | ||
| data, err := os.ReadFile(filepath.Join(dir, "metadata.json")) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| var m Metadata | ||
| if err := json.Unmarshal(data, &m); err != nil { | ||
| return nil, err | ||
| } | ||
| return &m, nil | ||
| } | ||
|
|
||
| // CurrentCommit returns the current Git HEAD commit hash in the given dir. | ||
| func CurrentCommit(dir string) (string, error) { | ||
| cmd := exec.Command("git", "rev-parse", "HEAD") | ||
| cmd.Dir = dir | ||
| out, err := cmd.Output() | ||
| if err != nil { | ||
| return "", err | ||
| } | ||
| return strings.TrimSpace(string(out)), nil | ||
| } | ||
|
Comment on lines
+53
to
+61
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: cd /tmp && find . -name "metadata.go" -type f 2>/dev/null | head -20Repository: yanurag-dev/GrepTurbo Length of output: 47 🏁 Script executed: # First, list the repository structure
git ls-files | head -30Repository: yanurag-dev/GrepTurbo Length of output: 661 🏁 Script executed: # Look for the metadata.go file specifically
git ls-files | grep metadata.goRepository: yanurag-dev/GrepTurbo Length of output: 92 🏁 Script executed: # Check if internal/index/metadata.go exists and read it
if [ -f "internal/index/metadata.go" ]; then
wc -l internal/index/metadata.go
fiRepository: yanurag-dev/GrepTurbo Length of output: 95 🏁 Script executed: # Read the metadata.go file
cat -n internal/index/metadata.goRepository: yanurag-dev/GrepTurbo Length of output: 1830 🏁 Script executed: # Now search for calls to CurrentCommit to verify if it's called from r.Sync()
rg "CurrentCommit" --type go -B 2 -A 2Repository: yanurag-dev/GrepTurbo Length of output: 1374 🏁 Script executed: # Check sync.go to see how CurrentCommit is used
rg "r\.Sync\|Sync\(" --type go -B 3 -A 5 | head -100Repository: yanurag-dev/GrepTurbo Length of output: 47 🏁 Script executed: # Look for how the search path calls Sync
rg "\.Sync\(" --type go -B 2 -A 2Repository: yanurag-dev/GrepTurbo Length of output: 1482 🏁 Script executed: # Read sync.go to see the Sync method
cat -n internal/index/sync.go | head -50Repository: yanurag-dev/GrepTurbo Length of output: 1571 🏁 Script executed: # Check if there are any Go linting tools configured that would flag noctx
if [ -f ".golangci.yml" ] || [ -f "golangci.yml" ]; then
echo "=== .golangci.yml ==="; cat -n .golangci.yml
fi
# Also check go.mod for any linters
cat -n go.modRepository: yanurag-dev/GrepTurbo Length of output: 368 🏁 Script executed: # Search for context usage patterns in the codebase to see if context passing is already used
rg "context\." --type go | head -20Repository: yanurag-dev/GrepTurbo Length of output: 47 🏁 Script executed: # Check the search.go more carefully to see the full request path
cat -n internal/query/search.goRepository: yanurag-dev/GrepTurbo Length of output: 5054
Use 🔒️ Proposed fix+import (
+ "context"
+ "time"
+ ...
+)
+
// CurrentCommit returns the current Git HEAD commit hash in the given dir.
func CurrentCommit(dir string) (string, error) {
- cmd := exec.Command("git", "rev-parse", "HEAD")
+ ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+ defer cancel()
+ cmd := exec.CommandContext(ctx, "git", "rev-parse", "HEAD")
cmd.Dir = dir
out, err := cmd.Output()
if err != nil {
return "", err
}
return strings.TrimSpace(string(out)), nil
}🧰 Tools🪛 golangci-lint (2.11.4)[error] 54-54: os/exec.Command must not be called. use os/exec.CommandContext (noctx) 🤖 Prompt for AI Agents |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,121 @@ | ||
| package index | ||
|
|
||
| import ( | ||
| "bufio" | ||
| "os" | ||
| "os/exec" | ||
| "path/filepath" | ||
| "strings" | ||
| "unicode/utf8" | ||
|
|
||
| "grepturbo/internal/posting" | ||
| "grepturbo/internal/trigram" | ||
| ) | ||
|
|
||
| // GitStatus holds the lists of files that have changed since the baseline. | ||
| type GitStatus struct { | ||
| Modified []string | ||
| Untracked []string | ||
| Deleted []string | ||
| } | ||
|
|
||
| // Overlay holds the transient in-memory index of dirty files. | ||
| type Overlay struct { | ||
| Posts posting.List | ||
| Files []string // fileID → filepath (starts from len(Baseline.Files)) | ||
| Tombstones map[string]bool // paths that should be ignored from Baseline | ||
| } | ||
|
|
||
| // Sync performs a Git-based synchronization. | ||
| func (r *Reader) Sync() (*Overlay, bool, error) { | ||
| current, err := CurrentCommit(r.Meta.RootDir) | ||
| if err != nil { | ||
| // Not in a git repo (or git not installed) — nothing to sync. | ||
| return &Overlay{ | ||
| Posts: make(posting.List), | ||
| Tombstones: make(map[string]bool), | ||
| }, false, nil | ||
| } | ||
|
|
||
| // Commit Drift detected | ||
| if r.Meta.Commit != current && r.Meta.Commit != "unknown" { | ||
| return nil, true, nil | ||
| } | ||
|
|
||
| status, err := GetGitStatus(r.Meta.RootDir) | ||
| if err != nil { | ||
| return nil, false, err | ||
| } | ||
|
|
||
| overlay := &Overlay{ | ||
| Posts: make(posting.List), | ||
| Tombstones: make(map[string]bool), | ||
| } | ||
|
|
||
| // Deleted files are Tombstones | ||
| for _, p := range status.Deleted { | ||
| overlay.Tombstones[filepath.Join(r.Meta.RootDir, p)] = true | ||
| } | ||
|
|
||
| // Modified files are both Tombstones (hide old version) and Indexed (show new version) | ||
| dirtyFiles := append(status.Modified, status.Untracked...) | ||
| for _, p := range status.Modified { | ||
| overlay.Tombstones[filepath.Join(r.Meta.RootDir, p)] = true | ||
| } | ||
|
|
||
| // Index dirty files in memory | ||
| for _, relPath := range dirtyFiles { | ||
| absPath := filepath.Join(r.Meta.RootDir, relPath) | ||
| data, err := os.ReadFile(absPath) | ||
| if err != nil { | ||
| continue // skip files we can't read | ||
| } | ||
| if !utf8.Valid(data) || len(data) > maxFileSize { | ||
| continue | ||
| } | ||
|
|
||
| fileID := uint32(len(r.Files) + len(overlay.Files)) | ||
| overlay.Files = append(overlay.Files, absPath) | ||
|
|
||
| for _, t := range trigram.Extract(string(data)) { | ||
| overlay.Posts.AddBatch(t, []uint32{fileID}) | ||
| } | ||
| } | ||
| overlay.Posts.Finalize() | ||
|
|
||
| return overlay, false, nil | ||
| } | ||
|
|
||
| // GetGitStatus runs 'git status --porcelain' in dir and returns the categorized files. | ||
| func GetGitStatus(dir string) (*GitStatus, error) { | ||
| cmd := exec.Command("git", "status", "--porcelain") | ||
| cmd.Dir = dir | ||
| out, err := cmd.Output() | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| status := &GitStatus{} | ||
| scanner := bufio.NewScanner(strings.NewReader(string(out))) | ||
| for scanner.Scan() { | ||
| line := scanner.Text() | ||
| if len(line) < 3 { | ||
| continue | ||
| } | ||
|
|
||
| // git status --porcelain format: "XY PATH" | ||
| xy := line[:2] | ||
| path := line[3:] | ||
|
|
||
| switch { | ||
| case xy == "??": | ||
| status.Untracked = append(status.Untracked, path) | ||
| case strings.Contains(xy, "D"): | ||
| status.Deleted = append(status.Deleted, path) | ||
| case strings.Contains(xy, "M") || strings.Contains(xy, "A"): | ||
| status.Modified = append(status.Modified, path) | ||
| } | ||
| } | ||
|
|
||
| return status, scanner.Err() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heading skips a level — use
##for top-level ADR sections.### Context(h3) directly follows the document's h1 title, skipping h2. The same applies to### Decision(line 8) and### Consequences(line 15).✏️ Proposed fix
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 5-5: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3
(MD001, heading-increment)
🤖 Prompt for AI Agents