Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions CONTEXT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# GrepTurbo Context

GrepTurbo provides index-accelerated regex search that stays in sync with real-time code changes using Git.

## Language

**Baseline**:
The immutable on-disk index tied to a specific Git commit.
_Avoid_: Old index, Disk index, Saved index

**Overlay**:
The transient in-memory index of uncommitted (modified, added, or untracked) files detected during search.
_Avoid_: Live index, Delta index, Fresh index

**Commit Drift**:
The state where the current Git HEAD differs from the commit hash stored in the **Baseline**.

**Tombstone**:
A list of file paths that have been deleted since the **Baseline** was built, used to filter out dead results.

## Relationships

- A **Search** merges results from the **Baseline** and the **Overlay**, while filtering out paths in the **Tombstone** set.
- **Commit Drift** triggers an automatic rebuild of the **Baseline**.

## Example dialogue

> **Dev:** "If I just edited a file but haven't committed it, will the search find it?"
> **Domain expert:** "Yes, the **Overlay** will index your unsaved changes and merge them with the **Baseline** results."
1 change: 1 addition & 0 deletions GEMINI.md
20 changes: 19 additions & 1 deletion cmd/grepturbo/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,25 @@ func runSearch(idxDir, pattern string) error {

matches, err := query.Search(r, pattern)
if err != nil {
return fmt.Errorf("error searching: %w", err)
if drift, ok := err.(*query.ErrCommitDrift); ok {
fmt.Fprintf(os.Stderr, "Notice: %s. Rebuilding...\n", drift.Error())
r.Close() // close before rebuild
if err := runBuild(r.Meta.RootDir, idxDir, r.Meta.Skip); err != nil {
return err
}
// Re-open and try search again
r2, err := index.NewReader(idxDir)
if err != nil {
return err
}
defer r2.Close()
matches, err = query.Search(r2, pattern)
if err != nil {
return err
}
} else {
return fmt.Errorf("error searching: %w", err)
}
}

if len(matches) == 0 {
Expand Down
18 changes: 18 additions & 0 deletions docs/adr/0001-git-sync-strategy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# 0001-git-sync-strategy

To ensure search results reflect real-time code changes without the overhead of a background daemon or full index rebuilds, we are implementing a Git-based on-demand synchronization strategy.

### Context

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Heading skips a level — use ## for top-level ADR sections.

### Context (h3) directly follows the document's h1 title, skipping h2. The same applies to ### Decision (line 8) and ### Consequences (line 15).

✏️ Proposed fix
-### Context
+## Context
 ...
-### Decision
+## Decision
 ...
-### Consequences
+## Consequences
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 5-5: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/adr/0001-git-sync-strategy.md` at line 5, The ADR uses h3 headings that
skip from the h1 title; change the section headings "Context", "Decision", and
"Consequences" from "### ..." to "## ..." so they are proper top-level ADR
sections; locate the heading lines matching "### Context", "### Decision", and
"### Consequences" and replace each "###" with "##" to restore correct heading
hierarchy.

Users expect the search index to be "always fresh," but rebuilding a trigram index on every search is too slow. Approaches like background file watchers introduce complexity (daemons, locking), and content-hashing requires reading every file (defeating the index's performance).

### Decision
We will use `git status` to identify modified, deleted, and untracked files during every search.
- **Baseline**: The on-disk index is tied to a specific Git commit hash stored in `metadata.json`.
- **Overlay**: Dirty files (modified/untracked) are re-indexed in-memory on the fly and their results are merged with the Baseline.
- **Tombstones**: Deleted files are filtered out of the Baseline results.
- **Commit Drift**: If the current Git HEAD differs from the Baseline hash, an automatic full rebuild is triggered.

### Consequences
- **Trade-off**: Search depends on Git being present in the repository.
- **Accuracy**: Search results are 100% accurate relative to the current working tree.
- **Performance**: High-volume changes (thousands of uncommitted files) will increase search latency as they must be indexed in-memory, but we choose to prioritize accuracy over a "too many changes" threshold for now.
15 changes: 12 additions & 3 deletions internal/index/builder.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,10 @@ const maxFileSize = 1 << 20 // 1 MB — skip files larger than this
// Builder walks a directory, extracts trigrams from each file,
// and accumulates an in-memory posting list.
type Builder struct {
Posts posting.List // trigram → sorted []fileID
Files []string // fileID → filepath (index == fileID)
Posts posting.List // trigram → sorted []fileID
Files []string // fileID → filepath (index == fileID)
RootDir string
Skip []string
}

func NewBuilder() *Builder {
Expand Down Expand Up @@ -76,6 +78,13 @@ type extractResult struct {
// Directories listed in skip are skipped entirely (e.g. "node_modules").
// Directories and files that fail to read are silently skipped.
func (b *Builder) Build(rootDir string, skip ...string) error {
absRoot, err := filepath.Abs(rootDir)
if err != nil {
return err
}
b.RootDir = absRoot
b.Skip = skip
Comment thread
coderabbitai[bot] marked this conversation as resolved.

skipSet := make(map[string]bool)
for k, v := range defaultSkipDirs {
skipSet[k] = v
Expand Down Expand Up @@ -126,7 +135,7 @@ func (b *Builder) Build(rootDir string, skip ...string) error {
close(done)
}()

err := filepath.WalkDir(rootDir, func(path string, d fs.DirEntry, err error) error {
err = filepath.WalkDir(absRoot, func(path string, d fs.DirEntry, err error) error {
if err != nil {
return nil
}
Expand Down
61 changes: 61 additions & 0 deletions internal/index/metadata.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
package index

import (
"encoding/json"
"os"
"os/exec"
"path/filepath"
"strings"
)

// Metadata contains information about the built Baseline index.
type Metadata struct {
Commit string `json:"commit"`
RootDir string `json:"root_dir"`
Skip []string `json:"skip"`
}

// WriteMetadata saves index metadata to metadata.json in dir.
func WriteMetadata(dir, rootDir string, skip []string) error {
commit, err := CurrentCommit(rootDir)
if err != nil {
commit = "unknown"
}

m := Metadata{
Commit: commit,
RootDir: rootDir,
Skip: skip,
}
data, err := json.MarshalIndent(m, "", " ")
if err != nil {
return err
}

return os.WriteFile(filepath.Join(dir, "metadata.json"), data, 0644)
}

// ReadMetadata loads index metadata from metadata.json in dir.
func ReadMetadata(dir string) (*Metadata, error) {
data, err := os.ReadFile(filepath.Join(dir, "metadata.json"))
if err != nil {
return nil, err
}

var m Metadata
if err := json.Unmarshal(data, &m); err != nil {
return nil, err
}
return &m, nil
}

// CurrentCommit returns the current Git HEAD commit hash in the given dir.
func CurrentCommit(dir string) (string, error) {
cmd := exec.Command("git", "rev-parse", "HEAD")
cmd.Dir = dir
out, err := cmd.Output()
if err != nil {
return "", err
}
return strings.TrimSpace(string(out)), nil
}
Comment on lines +53 to +61

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -name "metadata.go" -type f 2>/dev/null | head -20

Repository: yanurag-dev/GrepTurbo

Length of output: 47


🏁 Script executed:

# First, list the repository structure
git ls-files | head -30

Repository: yanurag-dev/GrepTurbo

Length of output: 661


🏁 Script executed:

# Look for the metadata.go file specifically
git ls-files | grep metadata.go

Repository: yanurag-dev/GrepTurbo

Length of output: 92


🏁 Script executed:

# Check if internal/index/metadata.go exists and read it
if [ -f "internal/index/metadata.go" ]; then
  wc -l internal/index/metadata.go
fi

Repository: yanurag-dev/GrepTurbo

Length of output: 95


🏁 Script executed:

# Read the metadata.go file
cat -n internal/index/metadata.go

Repository: yanurag-dev/GrepTurbo

Length of output: 1830


🏁 Script executed:

# Now search for calls to CurrentCommit to verify if it's called from r.Sync()
rg "CurrentCommit" --type go -B 2 -A 2

Repository: yanurag-dev/GrepTurbo

Length of output: 1374


🏁 Script executed:

# Check sync.go to see how CurrentCommit is used
rg "r\.Sync\|Sync\(" --type go -B 3 -A 5 | head -100

Repository: yanurag-dev/GrepTurbo

Length of output: 47


🏁 Script executed:

# Look for how the search path calls Sync
rg "\.Sync\(" --type go -B 2 -A 2

Repository: yanurag-dev/GrepTurbo

Length of output: 1482


🏁 Script executed:

# Read sync.go to see the Sync method
cat -n internal/index/sync.go | head -50

Repository: yanurag-dev/GrepTurbo

Length of output: 1571


🏁 Script executed:

# Check if there are any Go linting tools configured that would flag noctx
if [ -f ".golangci.yml" ] || [ -f "golangci.yml" ]; then
  echo "=== .golangci.yml ==="; cat -n .golangci.yml
fi

# Also check go.mod for any linters
cat -n go.mod

Repository: yanurag-dev/GrepTurbo

Length of output: 368


🏁 Script executed:

# Search for context usage patterns in the codebase to see if context passing is already used
rg "context\." --type go | head -20

Repository: yanurag-dev/GrepTurbo

Length of output: 47


🏁 Script executed:

# Check the search.go more carefully to see the full request path
cat -n internal/query/search.go

Repository: yanurag-dev/GrepTurbo

Length of output: 5054


CurrentCommit can hang indefinitely — use exec.CommandContext with a timeout.

exec.Command without a context can block forever if the git process hangs (slow NFS mount, blocking hook, etc.). This blocks every search call via the request path: Search() → r.Sync() → CurrentCommit() (and again at line 39 when reporting commit drift).

Use exec.CommandContext with a bounded timeout:

🔒️ Proposed fix
+import (
+    "context"
+    "time"
+    ...
+)
+
 // CurrentCommit returns the current Git HEAD commit hash in the given dir.
 func CurrentCommit(dir string) (string, error) {
-    cmd := exec.Command("git", "rev-parse", "HEAD")
+    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+    defer cancel()
+    cmd := exec.CommandContext(ctx, "git", "rev-parse", "HEAD")
     cmd.Dir = dir
     out, err := cmd.Output()
     if err != nil {
         return "", err
     }
     return strings.TrimSpace(string(out)), nil
 }
🧰 Tools
🪛 golangci-lint (2.11.4)

[error] 54-54: os/exec.Command must not be called. use os/exec.CommandContext

(noctx)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/index/metadata.go` around lines 53 - 61, The CurrentCommit function
uses exec.Command which can hang; change it to create a context with a bounded
timeout (e.g., context.WithTimeout(ctx, 5s)), call exec.CommandContext(ctx,
"git", "rev-parse", "HEAD"), defer cancel(), and use cmd.Output() as before so
the child is killed when the context times out; update any other callsites that
invoke CurrentCommit if they need to propagate a caller context (or accept a
timeout param) so commit lookups can't block indefinitely.

16 changes: 12 additions & 4 deletions internal/index/reader.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,21 @@ import (
// Reader holds the mmap'd lookup table and an open handle to postings.dat.
// Use NewReader to open, and Close when done.
type Reader struct {
table []byte // mmap'd contents of lookup.idx
numSlots uint32 // number of slots in the hash table
postings *os.File // open handle to postings.dat for random reads
Files []string // fileID → filepath
table []byte // mmap'd contents of lookup.idx
numSlots uint32 // number of slots in the hash table
postings *os.File // open handle to postings.dat for random reads
Files []string // fileID → filepath
Meta *Metadata // index metadata (e.g. baseline commit)
}

// NewReader opens the index written by Write and mmap's the lookup table.
func NewReader(dir string) (*Reader, error) {
// ── metadata.json ───────────────────────────────────────────────────────
meta, err := ReadMetadata(dir)
if err != nil {
return nil, fmt.Errorf("read metadata.json: %w", err)
}

// ── lookup.idx ──────────────────────────────────────────────────────────
lookupPath := filepath.Join(dir, "lookup.idx")
lf, err := os.Open(lookupPath)
Expand Down Expand Up @@ -77,6 +84,7 @@ func NewReader(dir string) (*Reader, error) {
numSlots: numSlots,
postings: pf,
Files: files,
Meta: meta,
}, nil
}

Expand Down
121 changes: 121 additions & 0 deletions internal/index/sync.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
package index

import (
"bufio"
"os"
"os/exec"
"path/filepath"
"strings"
"unicode/utf8"

"grepturbo/internal/posting"
"grepturbo/internal/trigram"
)

// GitStatus holds the lists of files that have changed since the baseline.
type GitStatus struct {
Modified []string
Untracked []string
Deleted []string
}

// Overlay holds the transient in-memory index of dirty files.
type Overlay struct {
Posts posting.List
Files []string // fileID → filepath (starts from len(Baseline.Files))
Tombstones map[string]bool // paths that should be ignored from Baseline
}

// Sync performs a Git-based synchronization.
func (r *Reader) Sync() (*Overlay, bool, error) {
current, err := CurrentCommit(r.Meta.RootDir)
if err != nil {
// Not in a git repo (or git not installed) — nothing to sync.
return &Overlay{
Posts: make(posting.List),
Tombstones: make(map[string]bool),
}, false, nil
}

// Commit Drift detected
if r.Meta.Commit != current && r.Meta.Commit != "unknown" {
return nil, true, nil
}

status, err := GetGitStatus(r.Meta.RootDir)
if err != nil {
return nil, false, err
}

overlay := &Overlay{
Posts: make(posting.List),
Tombstones: make(map[string]bool),
}

// Deleted files are Tombstones
for _, p := range status.Deleted {
overlay.Tombstones[filepath.Join(r.Meta.RootDir, p)] = true
}

// Modified files are both Tombstones (hide old version) and Indexed (show new version)
dirtyFiles := append(status.Modified, status.Untracked...)
for _, p := range status.Modified {
overlay.Tombstones[filepath.Join(r.Meta.RootDir, p)] = true
}

// Index dirty files in memory
for _, relPath := range dirtyFiles {
absPath := filepath.Join(r.Meta.RootDir, relPath)
data, err := os.ReadFile(absPath)
if err != nil {
continue // skip files we can't read
}
if !utf8.Valid(data) || len(data) > maxFileSize {
continue
}

fileID := uint32(len(r.Files) + len(overlay.Files))
overlay.Files = append(overlay.Files, absPath)

for _, t := range trigram.Extract(string(data)) {
overlay.Posts.AddBatch(t, []uint32{fileID})
}
}
overlay.Posts.Finalize()

return overlay, false, nil
}

// GetGitStatus runs 'git status --porcelain' in dir and returns the categorized files.
func GetGitStatus(dir string) (*GitStatus, error) {
cmd := exec.Command("git", "status", "--porcelain")
cmd.Dir = dir
out, err := cmd.Output()
if err != nil {
return nil, err
}

status := &GitStatus{}
scanner := bufio.NewScanner(strings.NewReader(string(out)))
for scanner.Scan() {
line := scanner.Text()
if len(line) < 3 {
continue
}

// git status --porcelain format: "XY PATH"
xy := line[:2]
path := line[3:]

switch {
case xy == "??":
status.Untracked = append(status.Untracked, path)
case strings.Contains(xy, "D"):
status.Deleted = append(status.Deleted, path)
case strings.Contains(xy, "M") || strings.Contains(xy, "A"):
status.Modified = append(status.Modified, path)
}
}

return status, scanner.Err()
}
Loading
Loading