Skip to content

feat: BigramQuery trait + FileRecord/FileListView for zero-copy index support#341

Open
magnusmalm wants to merge 3 commits intodmtrKovalenko:mainfrom
magnusmalm:feat/zerocopy-views
Open

feat: BigramQuery trait + FileRecord/FileListView for zero-copy index support#341
magnusmalm wants to merge 3 commits intodmtrKovalenko:mainfrom
magnusmalm:feat/zerocopy-views

Conversation

@magnusmalm
Copy link
Copy Markdown
Contributor

Summary

Two changes toward zero-copy index support, following up on the discussion in #330.

Commit 1: BigramQuery trait

Extracts query() and is_ready() into a BigramQuery trait that BigramFilter implements. grep_search() now accepts Option<&dyn BigramQuery>, so external consumers can provide alternative implementations (e.g. an mmap-backed view) without changing the grep pipeline.

Static helpers (is_candidate, count_candidates) stay on BigramFilter since they operate on the returned Vec<u64>, not the index itself.

Commit 2: FileRecord + FileListView

Adds building blocks for mmap-friendly file list storage:

  • FileRecord: 24-byte repr(C) struct storing path offset, lengths, size, modified, and is_binary flag
  • FileListView<'a>: borrows records + string table from an mmap, provides indexed access without heap allocation
  • build_file_records(): convert &[FileItem] to records + string table
  • to_file_items(): convert back when the search pipeline needs owned FileItems

Wiring FileListView directly into match_and_score_files / grep_search would require a FileEntry trait that changes field access to method calls throughout score.rs and grep.rs. That felt too invasive for this PR. Happy to do it as a follow-up if you want to go that direction.

Benchmarked with fff-cli on buildroot (13k files). The dyn BigramQuery vtable dispatch adds no measurable overhead to grep or search.

Extract query() and is_ready() into a BigramQuery trait that
BigramFilter implements. grep_search() now accepts
Option<&dyn BigramQuery>, allowing external consumers to provide
alternative implementations (e.g. a zero-copy mmap-backed view)
without changing the grep pipeline.

Static helpers (is_candidate, count_candidates) remain on
BigramFilter since they operate on the returned Vec<u64> directly.
Add building blocks for mmap-friendly file list storage:

- FileRecord: 24-byte repr(C) struct with path offset, lengths, size,
  modified time, and is_binary flag packed into the high bit
- FileListView: borrows records + string table from an mmap, provides
  indexed access to paths and metadata without heap allocation
- build_file_records(): convert &[FileItem] to records + string table
- to_file_items(): convert back to owned FileItems for the search pipeline

Tests for record layout, flags, and FileItem round-trip included.
ciolansteen pushed a commit to ciolansteen/fff.nvim that referenced this pull request Apr 19, 2026
## Motivation

fff-c is already an editor-agnostic C library. However, the only way
external consumers (Emacs Lisp, Python, scripts) could access struct
fields was by computing byte offsets manually — a silently fragile
approach that breaks whenever the struct layout changes.

This is not theoretical: JonasThowsen/fff.el, an existing Emacs
integration using libfff_c directly via FFI, hardcoded offsets that
are **already wrong** against current main:

  Offset 32 → expected line_content, actually FffMatchRange* (pointer!)
  Offset 104 → expected line_number,  actually byte_offset
  Offset 120 → expected col,          actually context_before_count

The struct grew (file_name, git_status, match_ranges, context arrays
added) between the time fff.el was written and today, pushing all
subsequent field offsets without any compile-time signal.

## Change

Add crates/fff-c/src/accessors.rs with C-exported getter functions:

  FffFileItem:     relative_path, file_name, git_status, size, is_binary
  FffGrepMatch:    relative_path, file_name, line_content, line_number,
                   col, byte_offset, is_binary
  FffSearchResult: count
  FffGrepResult:   count

cbindgen picks these up automatically — no changes to cbindgen.toml.
Zero impact on the Neovim integration (Lua uses ffi.cdef which parses
the full header directly and is unaffected by adding new functions).

## Why accessor functions, not repr(C) guarantees alone

repr(C) (see also PR dmtrKovalenko#341 by magnusmalm) prevents the Rust compiler
from reordering fields, but does not prevent upstream from adding new
fields between existing ones. Accessor functions make the struct layout
a true implementation detail — callers bind to names, not positions.

## Impact

With this change, fff-c becomes usable from any language that can call
a C function: Emacs Lisp (emacs-ffi), Python (ctypes/cffi), Helix,
Kakoune, shell scripts via a thin wrapper, etc.

A companion PR will follow at JonasThowsen/fff.el migrating from
hardcoded offsets to these accessor functions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant