perf(block): sync.Map -> atomic bitmap for cache dirty tracking by levb · Pull Request #2235 · e2b-dev/infra

levb · 2026-03-26T19:51:28Z

Pre-cursor to PR #2034, separated to reduce the scope there.

Replace the sync.Map-based dirty block tracking in block.Cache with a pre-allocated []atomic.Uint64 bitset. Each bit represents one block; atomic OR (atomic.Uint64.Or) is used for concurrent-safe marking.

Problem

The block cache tracks which blocks have been written (are "dirty") so that Slice() can decide whether to serve from mmap or return BytesNotAvailable. The previous implementation used sync.Map with one entry per dirty block offset. For a 64 MiB file at 4K block size, that's up to 16,384 map entries — each requiring a heap-allocated key and value, plus the internal hash map overhead of sync.Map.

Every call to setIsCached allocated a []int64 via header.BlocksOffsets() and then called sync.Map.Store() in a loop. Every call to isCached did the same allocation and called sync.Map.Load() in a loop. These are the hottest paths in the block device — called on every NBD read and every chunk fetch.

Solution

dirty field changed from sync.Map to []atomic.Uint64, sized at ceil(numBlocks / 64) and allocated once in NewCache.
setIsCached computes a bitmask per 64-bit word and applies it with atomic.Uint64.Or — one atomic op per word instead of one map store per block.
isCached / isBlockCached read bits with atomic.Uint64.Load — no allocation, no map lookup.
dirtySortedKeys iterates words and uses bits.TrailingZeros64 to extract set bits — produces a naturally sorted result without slices.Sort.

Benchmarks

64 MiB cache, 4K blocks, 4 MiB chunks (1024 blocks per chunk):

                   │  sync.Map      │            bitmap                  │
                   │     sec/op     │   sec/op     vs base               │
MarkRangeCached-16   63857.50n ± 3%   25.04n ± 6%  -99.96% (p=0.002 n=6)
IsCached_Hit-16       19406.0n ± 1%   908.6n ± 0%  -95.32% (p=0.002 n=6)
IsCached_Miss-16     1168.500n ± 1%   2.813n ± 1%  -99.76% (p=0.002 n=6)
Slice_Hit-16          19296.0n ± 1%   913.7n ± 1%  -95.27% (p=0.002 n=6)
Slice_Miss-16        1161.000n ± 3%   3.765n ± 0%  -99.68% (p=0.002 n=6)

                   │ sync.Map     │              bitmap                     │
                   │     B/op     │     B/op      vs base                   │
MarkRangeCached-16   64.05Ki ± 0%   0.00Ki ± 0%  -100.00% (p=0.002 n=6)
IsCached_Hit-16      8.000Ki ± 0%   0.000Ki ± 0%  -100.00% (p=0.002 n=6)
IsCached_Miss-16     8.000Ki ± 0%   0.000Ki ± 0%  -100.00% (p=0.002 n=6)
Slice_Hit-16         8.000Ki ± 0%   0.000Ki ± 0%  -100.00% (p=0.002 n=6)
Slice_Miss-16        8.000Ki ± 0%   0.000Ki ± 0%  -100.00% (p=0.002 n=6)

                   │ sync.Map     │              bitmap                     │
                   │  allocs/op   │  allocs/op   vs base                    │
MarkRangeCached-16    2.049k ± 0%   0.000k ± 0%  -100.00% (p=0.002 n=6)
IsCached_Hit-16        1.000 ± 0%    0.000 ± 0%  -100.00% (p=0.002 n=6)
IsCached_Miss-16       1.000 ± 0%    0.000 ± 0%  -100.00% (p=0.002 n=6)
Slice_Hit-16           1.000 ± 0%    0.000 ± 0%  -100.00% (p=0.002 n=6)
Slice_Miss-16          1.000 ± 0%    0.000 ± 0%  -100.00% (p=0.002 n=6)

Zero allocations across all operations. setIsCached goes from 64µs to 25ns (2550x), Slice hit from 19µs to 914ns (21x).

…y tracking Replace the `sync.Map`-based dirty block tracking in `block.Cache` with a pre-allocated `[]atomic.Uint64` bitset. Each bit represents one block; atomic OR (`atomic.Uint64.Or`) is used for concurrent-safe marking. ## Problem The block cache tracks which blocks have been written (are "dirty") so that `Slice()` can decide whether to serve from mmap or return `BytesNotAvailable`. The previous implementation used `sync.Map` with one entry per dirty block offset. For a 64 MiB file at 4K block size, that's up to 16,384 map entries — each requiring a heap-allocated key and value, plus the internal hash map overhead of sync.Map. Every call to `setIsCached` allocated a `[]int64` via `header.BlocksOffsets()` and then called `sync.Map.Store()` in a loop. Every call to `isCached` did the same allocation and called `sync.Map.Load()` in a loop. These are the hottest paths in the block device — called on every NBD read and every chunk fetch. ## Solution - `dirty` field changed from `sync.Map` to `[]atomic.Uint64`, sized at `ceil(numBlocks / 64)` and allocated once in `NewCache`. - `setIsCached` computes a bitmask per 64-bit word and applies it with `atomic.Uint64.Or` — one atomic op per word instead of one map store per block. - `isCached` / `isBlockCached` read bits with `atomic.Uint64.Load` — no allocation, no map lookup. - `dirtySortedKeys` iterates words and uses `bits.TrailingZeros64` to extract set bits — produces a naturally sorted result without `slices.Sort`. ## Benchmarks 64 MiB cache, 4K blocks, 4 MiB chunks (1024 blocks per chunk): ``` │ sync.Map │ bitmap │ │ sec/op │ sec/op vs base │ MarkRangeCached-16 63857.50n ± 3% 25.04n ± 6% -99.96% (p=0.002 n=6) IsCached_Hit-16 19406.0n ± 1% 908.6n ± 0% -95.32% (p=0.002 n=6) IsCached_Miss-16 1168.500n ± 1% 2.813n ± 1% -99.76% (p=0.002 n=6) Slice_Hit-16 19296.0n ± 1% 913.7n ± 1% -95.27% (p=0.002 n=6) Slice_Miss-16 1161.000n ± 3% 3.765n ± 0% -99.68% (p=0.002 n=6) │ sync.Map │ bitmap │ │ B/op │ B/op vs base │ MarkRangeCached-16 64.05Ki ± 0% 0.00Ki ± 0% -100.00% (p=0.002 n=6) IsCached_Hit-16 8.000Ki ± 0% 0.000Ki ± 0% -100.00% (p=0.002 n=6) IsCached_Miss-16 8.000Ki ± 0% 0.000Ki ± 0% -100.00% (p=0.002 n=6) Slice_Hit-16 8.000Ki ± 0% 0.000Ki ± 0% -100.00% (p=0.002 n=6) Slice_Miss-16 8.000Ki ± 0% 0.000Ki ± 0% -100.00% (p=0.002 n=6) │ sync.Map │ bitmap │ │ allocs/op │ allocs/op vs base │ MarkRangeCached-16 2.049k ± 0% 0.000k ± 0% -100.00% (p=0.002 n=6) IsCached_Hit-16 1.000 ± 0% 0.000 ± 0% -100.00% (p=0.002 n=6) IsCached_Miss-16 1.000 ± 0% 0.000 ± 0% -100.00% (p=0.002 n=6) Slice_Hit-16 1.000 ± 0% 0.000 ± 0% -100.00% (p=0.002 n=6) Slice_Miss-16 1.000 ± 0% 0.000 ± 0% -100.00% (p=0.002 n=6) ``` Zero allocations across all operations. `setIsCached` goes from 64µs to 25ns (2550x), `Slice` hit from 19µs to 914ns (21x). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

packages/orchestrator/pkg/sandbox/block/cache.go

- Fix OOB panic in setIsCached when range extends past cache size. The old sync.Map.Store silently accepted out-of-bounds keys, but the bitmap would panic on index OOB. Cap n to len(dirty)*64. - Add TestSetIsCached_PastCacheSize to verify the fix. - Add TestSetIsCached_ConcurrentOverlapping: 8 goroutines with overlapping ranges under -race to prove atomic OR correctness. - Remove redundant sort.SliceIsSorted (covered by require.Equal). - Use 2 MiB block size in BoundaryCrossing test (real hugepage size) to exercise a second block size and drop nolint:unparam. - Make benchmark consts consistently int64. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dobrac

Could we either use a library (like the bits-and-blooms/bitset) for this functionality - prefered, or refactor the []atomic.Uint64 functionality to a separate file (module)?

packages/orchestrator/pkg/sandbox/block/cache.go

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

…bitmap

…o lev-block-cache-bitmap

…bitmap

dobrac

Lets then move it to its own utility right away, there is almost none added cost doing it already

Move the []atomic.Uint64 bitmap from block.Cache into a standalone atomicbitset.Bitset in packages/shared/pkg/atomicbitset. Word-level masking in HasRange reduces atomic loads on the chunker hot path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

packages/shared/pkg/atomicbitset/bitset.go

ValentaTomas · 2026-03-27T23:14:01Z

Could we either use a library (like the bits-and-blooms/bitset) for this functionality - prefered, or refactor the []atomic.Uint64 functionality to a separate file (module)?

@dobrac For the compressed bitmaps, https://github.com/RoaringBitmap/roaring seems to be widely used, and for the filesystem, it could really improve the default sizes (even beyond this PR's optimization). The only problem is that it is also not atomic, so we would need have a mutex there. This PR is better in a way that the it effectively shards the mutexes via using the atomic.

packages/orchestrator/pkg/sandbox/block/cache.go

packages/shared/pkg/atomicbitset/bitset.go

dobrac · 2026-03-30T16:24:26Z

We've agreed @levb to put this on hold for now unless there is something important that needs it

- Remove newCache/newFullFetchChunker/newStreamingChunker indirection - Cache creates its own bitset internally, no external passing - Remove AddDirtyOffset, pass dirty BitSet directly to DiffMetadataBuilder - Remove debug logging

…nfra into lev-block-cache-bitmap

…ping hi

…, remove unused impls

CardinalityInRange correlated with integration test bus errors in VMs. Revert to the proven Contains loop until the library method is vetted.

…or HasRange Remove Flat, Sharded impls and Bitset interface — only one consumer, always roaring. Rename to concrete Bitset struct with New() constructor. Switch HasRange from per-bit Contains loop to Rank()-based cardinality check: O(containers) instead of O(range_size), ~19ns for 1024-block chunk queries. CardinalityInRange is buggy in roaring v2.16.0 (off-by-one in runContainer16.getCardinalityInRange when query falls in a gap between run intervals). Test documents the bug for future library upgrade.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6c0f1dd8fe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

packages/orchestrator/pkg/sandbox/block/cache.go

packages/shared/pkg/atomicbitset/bitset.go

packages/orchestrator/pkg/sandbox/block/cache.go

packages/orchestrator/pkg/sandbox/block/streaming_chunk.go

ValentaTomas · 2026-04-10T07:39:38Z

For now, using roaring fork with a fix. After merging RoaringBitmap/roaring#521 we can switch to the original lib.

Switch HasRange from Rank() to CardinalityInRange() which is O(k) in containers spanned rather than O(C) in total containers. Use e2b-dev/roaring fork that fixes an off-by-one in runContainer16.getCardinalityInRange for gaps between runs within a single container (upstream PR RoaringBitmap/roaring#521). Also: init dirty bitset in zero-size Cache path, fix stale comment in streaming_chunk.go.

Outdated after the latest changes.

levb · 2026-04-10T11:54:45Z

👍

e2b-request-same-site-reviewers bot assigned djeebus Mar 26, 2026

claude bot reviewed Mar 26, 2026

View reviewed changes

packages/orchestrator/pkg/sandbox/block/cache.go Outdated Show resolved Hide resolved

levb and others added 2 commits March 26, 2026 13:03

chore: auto-commit generated changes

7238599

levb marked this pull request as ready for review March 26, 2026 20:35

levb requested review from ValentaTomas, dobrac and jakubno as code owners March 26, 2026 20:35

dobrac assigned dobrac and unassigned djeebus Mar 27, 2026

dobrac added the improvement Improvement for current functionality label Mar 27, 2026

dobrac requested changes Mar 27, 2026

View reviewed changes

levb and others added 4 commits March 27, 2026 05:30

perf(block): precompute OTEL for chunker hot paths (#2236)

791950e

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Merge branch 'main' of github.com:e2b-dev/infra into lev-block-cache-…

f8e5216

…bitmap

Merge branch 'lev-block-cache-bitmap' of github.com:e2b-dev/infra int…

6b71c54

…o lev-block-cache-bitmap

PR feedback: i->blockIdx, restore comments

7fc43fc

levb requested a review from dobrac March 27, 2026 13:11

Merge branch 'main' of github.com:e2b-dev/infra into lev-block-cache-…

be976c0

…bitmap

dobrac requested changes Mar 27, 2026

View reviewed changes

levb commented Mar 27, 2026

View reviewed changes

packages/shared/pkg/atomicbitset/bitset.go Outdated Show resolved Hide resolved

levb requested a review from dobrac March 27, 2026 15:52

PR feedback: start/endBlock() helpers

19a857a

ValentaTomas mentioned this pull request Mar 29, 2026

Add filesystem zero copy during pause via reflink (XFS, Btrfs) #2249

Merged

6 tasks

dobrac previously requested changes Mar 30, 2026

View reviewed changes

dobrac marked this pull request as draft March 30, 2026 16:24

dobrac mentioned this pull request Apr 2, 2026

Reduce memory allocations by using bitset instead of sync.map #2284

Closed

ValentaTomas added 3 commits April 9, 2026 22:58

Merge branch 'lev-block-cache-bitmap' of https://github.com/e2b-dev/i…

2982447

…nfra into lev-block-cache-bitmap

cleanup: remove syncmap impl, comments, add NewDiffMetadata constructor

2ca2a54

ValentaTomas force-pushed the lev-block-cache-bitmap branch from 861d113 to 2ca2a54 Compare April 10, 2026 06:11

ValentaTomas added 4 commits April 9, 2026 23:24

fix(atomicbitset): handle HasRange when lo is past capacity after cap…

9d73473

…ping hi

refactor(atomicbitset): use CardinalityInRange for efficient HasRange…

c865445

…, remove unused impls

revert(atomicbitset): use Contains loop instead of CardinalityInRange

2355b53

CardinalityInRange correlated with integration test bus errors in VMs. Revert to the proven Contains loop until the library method is vetted.

ValentaTomas marked this pull request as ready for review April 10, 2026 07:24

chatgpt-codex-connector bot reviewed Apr 10, 2026

View reviewed changes

packages/orchestrator/pkg/sandbox/block/cache.go Outdated Show resolved Hide resolved

packages/shared/pkg/atomicbitset/bitset.go Outdated Show resolved Hide resolved

claude bot reviewed Apr 10, 2026

View reviewed changes

packages/orchestrator/pkg/sandbox/block/cache.go Show resolved Hide resolved

packages/orchestrator/pkg/sandbox/block/streaming_chunk.go Show resolved Hide resolved

ValentaTomas added 10 commits April 10, 2026 00:46

cleanup: remove Has, startBlock/endBlock, simplify isCached/setIsCached

6c5d773

cleanup: remove no-op build_metadata_ms telemetry

654e7e0

cleanup: use header.BlockIdx/TotalBlocks in isCached/setIsCached

f17b34d

cleanup: trim comments, rename lo/hi to start/end

bd74143

cleanup: add header.BlockEnd, use BlockIdx/BlockEnd in cache

1246344

rename BlockEnd to BlockEndIdx

de1f5af

cleanup: use int64 for Bitset methods, drop uint casts

7bc7f27

cleanup: Bitset methods take uint64, callers cast explicitly

ac83970

rename BlockEndIdx to BlockCeilIdx, add comments

f48c751

ValentaTomas requested a review from dobrac April 10, 2026 08:31

Switch to upstream as the fix was merged

001c809

ValentaTomas approved these changes Apr 10, 2026

View reviewed changes

dobrac approved these changes Apr 10, 2026

View reviewed changes

dobrac merged commit 481aa06 into main Apr 10, 2026
43 checks passed

dobrac deleted the lev-block-cache-bitmap branch April 10, 2026 18:08

Conversation

levb commented Mar 26, 2026

Problem

Solution

Benchmarks

Uh oh!

Uh oh!

dobrac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dobrac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ValentaTomas commented Mar 27, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dobrac commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ValentaTomas commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

levb commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dobrac commented Mar 30, 2026 •

edited

Loading

ValentaTomas commented Apr 10, 2026 •

edited

Loading