feat: BigramQuery trait + FileRecord/FileListView for zero-copy index support by magnusmalm · Pull Request #341 · dmtrKovalenko/fff.nvim

magnusmalm · 2026-04-05T22:32:21Z

Summary

Two changes toward zero-copy index support, following up on the discussion in #330.

Commit 1: BigramQuery trait

Extracts query() and is_ready() into a BigramQuery trait that BigramFilter implements. grep_search() now accepts Option<&dyn BigramQuery>, so external consumers can provide alternative implementations (e.g. an mmap-backed view) without changing the grep pipeline.

Static helpers (is_candidate, count_candidates) stay on BigramFilter since they operate on the returned Vec<u64>, not the index itself.

Commit 2: FileRecord + FileListView

Adds building blocks for mmap-friendly file list storage:

FileRecord: 24-byte repr(C) struct storing path offset, lengths, size, modified, and is_binary flag
FileListView<'a>: borrows records + string table from an mmap, provides indexed access without heap allocation
build_file_records(): convert &[FileItem] to records + string table
to_file_items(): convert back when the search pipeline needs owned FileItems

Wiring FileListView directly into match_and_score_files / grep_search would require a FileEntry trait that changes field access to method calls throughout score.rs and grep.rs. That felt too invasive for this PR. Happy to do it as a follow-up if you want to go that direction.

Benchmarked with fff-cli on buildroot (13k files). The dyn BigramQuery vtable dispatch adds no measurable overhead to grep or search.

Extract query() and is_ready() into a BigramQuery trait that BigramFilter implements. grep_search() now accepts Option<&dyn BigramQuery>, allowing external consumers to provide alternative implementations (e.g. a zero-copy mmap-backed view) without changing the grep pipeline. Static helpers (is_candidate, count_candidates) remain on BigramFilter since they operate on the returned Vec<u64> directly.

Add building blocks for mmap-friendly file list storage: - FileRecord: 24-byte repr(C) struct with path offset, lengths, size, modified time, and is_binary flag packed into the high bit - FileListView: borrows records + string table from an mmap, provides indexed access to paths and metadata without heap allocation - build_file_records(): convert &[FileItem] to records + string table - to_file_items(): convert back to owned FileItems for the search pipeline Tests for record layout, flags, and FileItem round-trip included.

## Motivation fff-c is already an editor-agnostic C library. However, the only way external consumers (Emacs Lisp, Python, scripts) could access struct fields was by computing byte offsets manually — a silently fragile approach that breaks whenever the struct layout changes. This is not theoretical: JonasThowsen/fff.el, an existing Emacs integration using libfff_c directly via FFI, hardcoded offsets that are **already wrong** against current main: Offset 32 → expected line_content, actually FffMatchRange* (pointer!) Offset 104 → expected line_number, actually byte_offset Offset 120 → expected col, actually context_before_count The struct grew (file_name, git_status, match_ranges, context arrays added) between the time fff.el was written and today, pushing all subsequent field offsets without any compile-time signal. ## Change Add crates/fff-c/src/accessors.rs with C-exported getter functions: FffFileItem: relative_path, file_name, git_status, size, is_binary FffGrepMatch: relative_path, file_name, line_content, line_number, col, byte_offset, is_binary FffSearchResult: count FffGrepResult: count cbindgen picks these up automatically — no changes to cbindgen.toml. Zero impact on the Neovim integration (Lua uses ffi.cdef which parses the full header directly and is unaffected by adding new functions). ## Why accessor functions, not repr(C) guarantees alone repr(C) (see also PR dmtrKovalenko#341 by magnusmalm) prevents the Rust compiler from reordering fields, but does not prevent upstream from adding new fields between existing ones. Accessor functions make the struct layout a true implementation detail — callers bind to names, not positions. ## Impact With this change, fff-c becomes usable from any language that can call a C function: Emacs Lisp (emacs-ffi), Python (ctypes/cffi), Helix, Kakoune, shell scripts via a thin wrapper, etc. A companion PR will follow at JonasThowsen/fff.el migrating from hardcoded offsets to these accessor functions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

magnusmalm added 3 commits April 6, 2026 00:13

style: rustfmt

8ca54f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: BigramQuery trait + FileRecord/FileListView for zero-copy index support#341

feat: BigramQuery trait + FileRecord/FileListView for zero-copy index support#341
magnusmalm wants to merge 3 commits intodmtrKovalenko:mainfrom
magnusmalm:feat/zerocopy-views

magnusmalm commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

magnusmalm commented Apr 5, 2026

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant