GrepTurbo builds a local trigram index over your codebase so regex queries skip irrelevant files entirely — instead of scanning every byte like
grep. The bigger your codebase, the bigger the win.
Tested on the Go standard library source (~10,000 files):
| Tool | Time | Files Scanned |
|---|---|---|
grep -rn |
2.4 – 3.1s | All 10,000 |
GrepTurbo search |
0.4 – 0.9s | ~50 candidates |
6–7x faster on 10k files. Grows with codebase size. Repeated queries get faster as the OS caches the mmap'd index in the page cache.
Download the latest binary for your platform from the Releases page.
git clone https://github.com/yanurag-dev/GrepTurbo
cd GrepTurbo
go build -o grepturbo ./cmd/grepturboStep 1 — build the index (once, or when files change):
GrepTurbo build -root ./myproject -out .GrepTurboStep 2 — search:
GrepTurbo search -index .GrepTurbo 'func.*Error'Output is file:line:text, same as grep -n:
internal/index/reader.go:25:func NewReader(dir string) (*Reader, error) {
internal/query/search.go:26:func Search(r *index.Reader, pattern string) ([]Match, error) {
GrepTurbo build
-root <dir> Directory to index (default: .)
-out <dir> Where to write the index (default: .GrepTurbo)
GrepTurbo search
-index <dir> Index directory to query (default: .GrepTurbo)
regex → trigram decomposition → index lookup → intersect posting lists → candidate files → verify with regex
- Trigram decomposition —
func.*Errorcontains literalsfuncandError, producing trigramsfun uncandErr rro ror - Index lookup — each trigram maps to a sorted posting list of file IDs that contain it
- Intersection — only files containing all required trigrams become candidates (10,000 → ~50)
- Verification — the real regex engine runs only on those ~50 files
The golden invariant: if a file matches the regex, it will always appear in the candidate set. No false negatives, ever.
.GrepTurbo/
lookup.idx mmap'd hash table — trigram → byte offset in postings.dat
postings.dat posting lists — [count][fileID, fileID, ...]
files.idx fileID → filepath mapping
Only lookup.idx is loaded into memory (mmap'd). Posting lists are read from disk on demand.
Run the dynamic test script to benchmark and verify correctness against grep:
# Test default patterns on this repo
./scripts/test.sh
# Test a single pattern
./scripts/test.sh 'func.*Error'
# Test on any large codebase
./scripts/test.sh 'func.*Error' /path/to/large/repoThe script builds the binary, indexes the target directory, runs each pattern through both grep and GrepTurbo, compares results, and reports speedup + any false negatives.
# Run unit + integration tests
go test ./...
# Run a specific test
go test ./internal/query/... -run TestCorrectnessVsGrep -v
# Run benchmarks
go test ./... -bench=.See ARCHITECTURE.md for full diagrams covering the index build pipeline, query flow, on-disk format, regex decomposition rules, and incremental sync strategy.