Skip to content

applyBatchFileOps releases FSM lock between ops — concurrent readers can see partial batch state #447

@xe-nvdk

Description

@xe-nvdk

Summary

internal/cluster/raft/fsm.go:applyBatchFileOps runs each op through apply*FileStruct, each of which takes and releases f.mu individually. Between ops, concurrent readers (e.g. GetFilesByDatabase, GetAllFiles) can observe the FSM in a partially-updated state.

For compaction batches that delete N source files and register 1 compacted file in a single Raft command, a reader running mid-batch can briefly see:

  • The compacted file already registered + some source files not yet deleted → query returns duplicate rows.
  • Some source files already deleted + the compacted file not yet registered → query returns missing rows.

Both shapes break query correctness within the batch window. The window is short (microseconds typically) but it exists.

Source

Flagged by Gemini Code Assist on PR #446 (review comment). Pre-existing behaviour — applyBatchFileOps has had this property since it was introduced. Not introduced or made worse by the path-validation fix in PR #446, but worth fixing.

Proposed fix

Refactor each apply*FileStruct to separate the locking from the state mutation:

  1. Add apply*FileLocked variants that mutate f.files / f.filesByDB directly without taking f.mu — caller is responsible for holding the lock.
  2. apply*FileStruct becomes a thin wrapper that acquires f.mu then calls apply*FileLocked.
  3. applyBatchFileOps acquires f.mu ONCE for the whole apply loop, then calls apply*FileLocked for each op. Now the entire batch is atomic from the readers' perspective.
  4. Callbacks (onFileRegistered, onFileDeleted) currently fire inside the lock-released window — preserve that property by deferring callback fires until after the batch lock is released. Buffer them in a slice during the apply loop, then iterate after f.mu.Unlock().

Why deferred from PR #446

PR #446 is a focused security fix for GHSA-f85q-mvg8-qf37. The lock-isolation issue is real but pre-existing and the fix is non-trivial (touches every apply* method + callback fan-out logic). Better as its own PR with its own review surface.

Testing strategy for the fix

A goroutine concurrent-reader race test: launch a reader on GetFilesByDatabase in a tight loop while applyBatchFileOps processes a batch that deletes 3 source files and registers 1 compacted file. Without the fix, the reader will observe at least one intermediate state (count != N before AND count != 1 after) over many iterations. With the fix, the reader sees either the pre-batch state (N files) or the post-batch state (1 file), never an intermediate.

Severity

Medium per Gemini's classification. Not a security issue — correctness is eventually consistent, no data loss. But mid-batch query incorrectness is a real symptom for compaction-heavy workloads.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions