Skip to content

yanurag-dev/GrepTurbo

Repository files navigation

GrepTurbo

Index-accelerated regex search. Skip irrelevant files entirely.

Go Version License Build Speedup Ask DeepWiki


GrepTurbo builds a local trigram index over your codebase so regex queries skip irrelevant files entirely — instead of scanning every byte like grep. The bigger your codebase, the bigger the win.


Benchmark

Tested on the Go standard library source (~10,000 files):

Tool Time Files Scanned
grep -rn 2.4 – 3.1s All 10,000
GrepTurbo search 0.4 – 0.9s ~50 candidates

6–7x faster on 10k files. Grows with codebase size. Repeated queries get faster as the OS caches the mmap'd index in the page cache.


Install

Pre-compiled Binaries (Recommended)

Download the latest binary for your platform from the Releases page.

From Source

git clone https://github.com/yanurag-dev/GrepTurbo
cd GrepTurbo
go build -o grepturbo ./cmd/grepturbo

Usage

Step 1 — build the index (once, or when files change):

GrepTurbo build -root ./myproject -out .GrepTurbo

Step 2 — search:

GrepTurbo search -index .GrepTurbo 'func.*Error'

Output is file:line:text, same as grep -n:

internal/index/reader.go:25:func NewReader(dir string) (*Reader, error) {
internal/query/search.go:26:func Search(r *index.Reader, pattern string) ([]Match, error) {

Flags

GrepTurbo build
  -root   <dir>    Directory to index (default: .)
  -out    <dir>    Where to write the index (default: .GrepTurbo)

GrepTurbo search
  -index  <dir>    Index directory to query (default: .GrepTurbo)

How It Works

regex → trigram decomposition → index lookup → intersect posting lists → candidate files → verify with regex
  1. Trigram decompositionfunc.*Error contains literals func and Error, producing trigrams fun unc and Err rro ror
  2. Index lookup — each trigram maps to a sorted posting list of file IDs that contain it
  3. Intersection — only files containing all required trigrams become candidates (10,000 → ~50)
  4. Verification — the real regex engine runs only on those ~50 files

The golden invariant: if a file matches the regex, it will always appear in the candidate set. No false negatives, ever.

Index on Disk

.GrepTurbo/
  lookup.idx    mmap'd hash table — trigram → byte offset in postings.dat
  postings.dat  posting lists — [count][fileID, fileID, ...]
  files.idx     fileID → filepath mapping

Only lookup.idx is loaded into memory (mmap'd). Posting lists are read from disk on demand.


Testing

Run the dynamic test script to benchmark and verify correctness against grep:

# Test default patterns on this repo
./scripts/test.sh

# Test a single pattern
./scripts/test.sh 'func.*Error'

# Test on any large codebase
./scripts/test.sh 'func.*Error' /path/to/large/repo

The script builds the binary, indexes the target directory, runs each pattern through both grep and GrepTurbo, compares results, and reports speedup + any false negatives.

# Run unit + integration tests
go test ./...

# Run a specific test
go test ./internal/query/... -run TestCorrectnessVsGrep -v

# Run benchmarks
go test ./... -bench=.

Architecture

See ARCHITECTURE.md for full diagrams covering the index build pipeline, query flow, on-disk format, regex decomposition rules, and incremental sync strategy.


Built with Go · MIT License
Coverage

About

A regex search accelerator that builds a local inverted index using trigram decomposition. Skips irrelevant files by intersecting posting lists, then runs regex only on candidate files. 10-100x faster than ripgrep on large codebases.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors