feat: BigramQuery trait + FileRecord/FileListView for zero-copy index support#341
Open
magnusmalm wants to merge 3 commits intodmtrKovalenko:mainfrom
Open
feat: BigramQuery trait + FileRecord/FileListView for zero-copy index support#341magnusmalm wants to merge 3 commits intodmtrKovalenko:mainfrom
magnusmalm wants to merge 3 commits intodmtrKovalenko:mainfrom
Conversation
Extract query() and is_ready() into a BigramQuery trait that BigramFilter implements. grep_search() now accepts Option<&dyn BigramQuery>, allowing external consumers to provide alternative implementations (e.g. a zero-copy mmap-backed view) without changing the grep pipeline. Static helpers (is_candidate, count_candidates) remain on BigramFilter since they operate on the returned Vec<u64> directly.
Add building blocks for mmap-friendly file list storage: - FileRecord: 24-byte repr(C) struct with path offset, lengths, size, modified time, and is_binary flag packed into the high bit - FileListView: borrows records + string table from an mmap, provides indexed access to paths and metadata without heap allocation - build_file_records(): convert &[FileItem] to records + string table - to_file_items(): convert back to owned FileItems for the search pipeline Tests for record layout, flags, and FileItem round-trip included.
ciolansteen
pushed a commit
to ciolansteen/fff.nvim
that referenced
this pull request
Apr 19, 2026
## Motivation
fff-c is already an editor-agnostic C library. However, the only way
external consumers (Emacs Lisp, Python, scripts) could access struct
fields was by computing byte offsets manually — a silently fragile
approach that breaks whenever the struct layout changes.
This is not theoretical: JonasThowsen/fff.el, an existing Emacs
integration using libfff_c directly via FFI, hardcoded offsets that
are **already wrong** against current main:
Offset 32 → expected line_content, actually FffMatchRange* (pointer!)
Offset 104 → expected line_number, actually byte_offset
Offset 120 → expected col, actually context_before_count
The struct grew (file_name, git_status, match_ranges, context arrays
added) between the time fff.el was written and today, pushing all
subsequent field offsets without any compile-time signal.
## Change
Add crates/fff-c/src/accessors.rs with C-exported getter functions:
FffFileItem: relative_path, file_name, git_status, size, is_binary
FffGrepMatch: relative_path, file_name, line_content, line_number,
col, byte_offset, is_binary
FffSearchResult: count
FffGrepResult: count
cbindgen picks these up automatically — no changes to cbindgen.toml.
Zero impact on the Neovim integration (Lua uses ffi.cdef which parses
the full header directly and is unaffected by adding new functions).
## Why accessor functions, not repr(C) guarantees alone
repr(C) (see also PR dmtrKovalenko#341 by magnusmalm) prevents the Rust compiler
from reordering fields, but does not prevent upstream from adding new
fields between existing ones. Accessor functions make the struct layout
a true implementation detail — callers bind to names, not positions.
## Impact
With this change, fff-c becomes usable from any language that can call
a C function: Emacs Lisp (emacs-ffi), Python (ctypes/cffi), Helix,
Kakoune, shell scripts via a thin wrapper, etc.
A companion PR will follow at JonasThowsen/fff.el migrating from
hardcoded offsets to these accessor functions.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two changes toward zero-copy index support, following up on the discussion in #330.
Commit 1: BigramQuery trait
Extracts
query()andis_ready()into aBigramQuerytrait thatBigramFilterimplements.grep_search()now acceptsOption<&dyn BigramQuery>, so external consumers can provide alternative implementations (e.g. an mmap-backed view) without changing the grep pipeline.Static helpers (
is_candidate,count_candidates) stay onBigramFiltersince they operate on the returnedVec<u64>, not the index itself.Commit 2: FileRecord + FileListView
Adds building blocks for mmap-friendly file list storage:
FileRecord: 24-byterepr(C)struct storing path offset, lengths, size, modified, and is_binary flagFileListView<'a>: borrows records + string table from an mmap, provides indexed access without heap allocationbuild_file_records(): convert&[FileItem]to records + string tableto_file_items(): convert back when the search pipeline needs ownedFileItemsWiring
FileListViewdirectly intomatch_and_score_files/grep_searchwould require aFileEntrytrait that changes field access to method calls throughoutscore.rsandgrep.rs. That felt too invasive for this PR. Happy to do it as a follow-up if you want to go that direction.Benchmarked with fff-cli on buildroot (13k files). The
dyn BigramQueryvtable dispatch adds no measurable overhead to grep or search.