Skip to content

Reduce memory allocations by using bitset instead of sync.map#2284

Closed
djeebus wants to merge 8 commits intomainfrom
improve-dirty-tracking
Closed

Reduce memory allocations by using bitset instead of sync.map#2284
djeebus wants to merge 8 commits intomainfrom
improve-dirty-tracking

Conversation

@djeebus
Copy link
Copy Markdown
Contributor

@djeebus djeebus commented Apr 1, 2026

AI results via benchtest:

● Benchmark Results                                                                                                                                                               
                                                                                                                                                                                  
  Performance Comparison (sync.Map → bitset)                                                                                                                                      
  ┌─────────────────────┬────────────────┬──────────────┬──────────────┐                                                                                                          
  │      Operation      │ sync.Map (old) │ bitset (new) │   Speedup    │                                                                                                          
  ├─────────────────────┼────────────────┼──────────────┼──────────────┤                                                                                                          
  │ SetIsCached         │ 374.70 ns/op   │ 20.28 ns/op  │ 18.5x faster │                                                                                                          
  ├─────────────────────┼────────────────┼──────────────┼──────────────┤                                                                                                          
  │ IsCached_Hit        │ 42.09 ns/op    │ 25.07 ns/op  │ 1.7x faster  │                                                                                                          
  ├─────────────────────┼────────────────┼──────────────┼──────────────┤                                                                                                          
  │ IsCached_Miss       │ 63.02 ns/op    │ 11.46 ns/op  │ 5.5x faster  │                                                                                                          
  ├─────────────────────┼────────────────┼──────────────┼──────────────┤                                                                                                          
  │ WriteAt_SingleBlock │ 707.1 ns/op    │ 676.2 ns/op  │ 1.05x faster │                                                                                                          
  ├─────────────────────┼────────────────┼──────────────┼──────────────┤                                                                                                          
  │ WriteAt_MultiBlock  │ 2244 ns/op     │ 1674 ns/op   │ 1.3x faster  │                                                                                                          
  └─────────────────────┴────────────────┴──────────────┴──────────────┘                                                                                                          
  Memory Allocations                                                                                                                                                              
  ┌─────────────────────┬────────────────────┬──────────────────┬───────────┐                                                                                                     
  │      Operation      │   sync.Map (old)   │   bitset (new)   │ Reduction │                                                                                                     
  ├─────────────────────┼────────────────────┼──────────────────┼───────────┤                                                                                                     
  │ SetIsCached         │ 56 B/op, 1 alloc   │ 0 B/op, 0 allocs │ 100%      │                                                                                                     
  ├─────────────────────┼────────────────────┼──────────────────┼───────────┤                                                                                                     
  │ IsCached_Hit        │ 0 B/op             │ 0 B/op           │ -         │                                                                                                     
  ├─────────────────────┼────────────────────┼──────────────────┼───────────┤                                                                                                     
  │ IsCached_Miss       │ 0 B/op             │ 0 B/op           │ -         │                                                                                                     
  ├─────────────────────┼────────────────────┼──────────────────┼───────────┤                                                                                                     
  │ WriteAt_SingleBlock │ 56 B/op, 1 alloc   │ 0 B/op, 0 allocs │ 100%      │                                                                                                     
  ├─────────────────────┼────────────────────┼──────────────────┼───────────┤                                                                                                     
  │ WriteAt_MultiBlock  │ 224 B/op, 7 allocs │ 0 B/op, 0 allocs │ 100%      │                                                                                                     
  └─────────────────────┴────────────────────┴──────────────────┴───────────┘                                                                                                     
  Summary                                                                                                                                                                         
                                                                                                                                                                                  
  - SetIsCached is 18.5x faster with zero allocations (was 1 alloc per call)                                                                                                      
  - IsCached_Miss is 5.5x faster (bitset Test is much faster than sync.Map Load miss)                                                                                             
  - Multi-block writes eliminated 7 allocations per operation (one per block touched)                                                                                             
  - Overall geomean speedup: 66.56%                                                                                                                                               
                                                                                                                                                                                  
  The biggest win is eliminating the per-operation allocations, which directly reduces GC pressure. In the original profile, setIsCached was responsible for 0.91s of CPU time    
  with 92% in sync.Map.Store - this should now be near zero.   ```

@cursor
Copy link
Copy Markdown

cursor bot commented Apr 1, 2026

PR Summary

Medium Risk
Touches core block-cache dirty tracking and range calculations, which could affect diff/export correctness and cache hit detection under concurrency. Added tests reduce risk, but boundary-condition and threading changes warrant careful review.

Overview
Replaces the block cache’s dirty-block tracking from a per-offset sync.Map approach to a pre-sized bitset.BitSet with its own lock to reduce allocations and speed up isCached/setIsCached checks. Tightens cache write/read bookkeeping by handling negative offsets, zero-length ranges, and out-of-bounds writes explicitly, and adds targeted unit tests plus benchmarks to validate correctness and measure the new dirty-tracking performance.

Written by Cursor Bugbot for commit d52f1fb. This will update automatically on new commits. Configure here.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 3 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

@djeebus djeebus marked this pull request as ready for review April 1, 2026 22:58
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6a120a755c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@dobrac
Copy link
Copy Markdown
Contributor

dobrac commented Apr 2, 2026

We're already working on this performance improvement here: #2235

@dobrac dobrac marked this pull request as draft April 2, 2026 04:45
levb added a commit that referenced this pull request Apr 3, 2026
Roaring bitmap has a suspected concurrency issue under RWMutex that
causes spurious BytesNotAvailableError (EIO) in CI. Switch the default
from Roaring to bits-and-blooms/bitset which is proven reliable (Joe's
PR #2284 passes CI with it).

- Add BitsAndBlooms wrapper implementing atomicbitset.Bitset
- Default ("") now creates BitsAndBlooms (needs dirtyMu, like Roaring)
- "atomic" still uses Flat/Sharded (lock-free, best perf)
- "roaring" still available via feature flag
- Add cache-pattern test mirroring real chunk fetch/read sequences

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ValentaTomas
Copy link
Copy Markdown
Member

-> #2284 (comment), will work with that

@ValentaTomas ValentaTomas deleted the improve-dirty-tracking branch April 9, 2026 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants