Skip to content

Reduce symbol list peak heap usage during load and compaction#2986

Open
G-D-Petrov wants to merge 9 commits into
masterfrom
gpetrov/reduce_sl_memory_load
Open

Reduce symbol list peak heap usage during load and compaction#2986
G-D-Petrov wants to merge 9 commits into
masterfrom
gpetrov/reduce_sl_memory_load

Conversation

@G-D-Petrov
Copy link
Copy Markdown
Collaborator

@G-D-Petrov G-D-Petrov commented Mar 25, 2026

Reference Issues/PRs

What does this implement or fix?

The symbol list load and compaction path had higher peak heap usage than necessary. The original
implementation collected all SYMBOL_LIST AtomKeys into a sorted vector in a first pass
(get_all_symbol_list_keys), then iterated that vector a second time to build the journal update
map (load_journal_keysMapType). This meant the full key vector and the full update map
coexisted on the heap throughout the merge phase. During compaction, a further
vector<VariantKey> was built from the same key vector before deletion — so all three
allocations could overlap.

For a library with N uncompacted journal entries (steady-state path where a compaction key exists):

Phase master branch
Load + merge vector<AtomKey>(N) + MapType(N entries) MapType(N entries) only
Compaction write vector<AtomKey>(N) + vector<VariantKey>(N) + CollectionType CollectionType + vector<VariantKey>(N)
Compaction deletion vector<AtomKey>(N) + vector<VariantKey>(N) + CollectionType vector<VariantKey>(N) only

Changes

cpp/arcticdb/version/symbol_list.cpp

Streaming journal load — eliminated the two-pass key vector:

  • Removed get_all_symbol_list_keys (first-pass collection into a sorted vector<AtomKey>)
    and load_journal_keys (second-pass map construction from that vector).
  • Replaced with load_journal_streaming: a single iterate_type pass that builds the
    update_map (MapType) directly without materialising an intermediate key vector. The
    latest compaction key is tracked inline. Each symbol's entry list is sorted at the end
    of the pass via sort_update_map_entries.
  • load_journal_streaming accepts a collect_keys parameter: when true (compaction
    expected), it also collects VariantKeys during the same iterate pass — avoiding a
    separate storage iteration for key collection while keeping the non-compaction load
    path lean (no key vector allocated at all).
  • Extracted helpers: add_update_map_entry, sort_update_map_entries, key_sort_comparator.

Compaction path — early free + direct VariantKey collection:

  • attempt_load now passes collect_keys = (will_attempt_compaction == YES), so keys are
    only collected when compaction may follow. The compaction path performs the same number
    of storage iterations as master (one full iterate for load + key collection, one filtered
    iterate for has_recent_compaction). The non-compaction read path saves one full iterate.
  • compact_internal moves symbols_ into write_symbols (which now takes CollectionType
    by value), so the symbol vector is freed when write_symbols returns. Deletion uses
    in-place erase (remove_if + erase) to filter out the newly written key, then
    passes the same vector to remove_keys_sync via move — no second vector is allocated.
    This ensures CollectionType and the deletion keys never coexist on the heap.
  • has_recent_compaction now takes optional<AtomKey> instead of
    optional<vector<AtomKey>::const_iterator> — simpler interface matching the new
    LoadResult::compaction_key_.
  • Deletion keys are collected as VariantKeys directly during the load iterate, avoiding
    the vector<AtomKey>vector<VariantKey> conversion that master performed in
    delete_keys. Keys are sorted before deletion for storage backend performance
    (e.g. InMemoryStorage).
  • Removed load_from_symbol_list_keys, load_from_version_keys,
    collect_all_symbol_list_keys, last_compaction, LoadResult::detach_symbol_list_keys().
  • Retained the "Would delete unseen key" defensive assertion from master in the
    version-keys path (no compaction key found). Adapted to use vector<VariantKey> keys
    via to_atom(). Only runs once per library before the first compaction.

Segment reading simplification:

  • Removed read_old_style_list_from_storage and read_new_style_list_from_storage;
    replaced with a single for_each_segment_entry template that dispatches based on
    field count (1 = old-style all-ADD, 6 = new-style with deletions). Column row-count
    consistency checks are preserved inside for_each_segment_entry before the additions
    and deletions loops. Used by read_from_storage via a visitor lambda.

Merge logic refactoring:

  • Extracted merge_existing_entries from the monolithic merge_existing_with_journal_keys.
    The new function takes existing_keys + update_map + min_allowed_interval and
    returns ExistingKeysMergeResult (symbols + problematic map). Consumed entries are
    erased from update_map.
  • Renamed merge_existing_with_journal_keysmerge_existing_with_journal_map (now
    takes MapType& directly instead of vector<AtomKey>&).

Other changes:

  • delete_keys return type changed from vector<Store::RemoveKeyResultType> to void
    (return value was unused at every call site).
  • LoadResult simplified: maybe_previous_compaction (optional<iterator>) →
    compaction_key_ (optional<AtomKey>); timestamp_ removed; symbol_list_keys_
    changed from vector<AtomKey> to vector<VariantKey> (only populated when compaction
    is expected); total_key_count_ added.

cpp/arcticdb/version/symbol_list.hpp

  • Removed Compaction and MaybeCompaction type aliases.
  • LoadResult updated as described above.
  • delete_keys declaration updated to return void.
  • load() now builds the output set<StreamId> before calling compact_internal,
    since compaction moves symbols_ into write_symbols (by value). Uses copy-insertion
    so that compact_internal can still consume symbols_ for the write.

cpp/arcticdb/version/test/test_symbol_list.cpp

Added two equivalence tests verifying load with and without no_compaction=true return
identical symbol sets:

  • DirectPathEquivalence: 50 symbols, compaction, 10 new + 10 remove journal entries.
    Remove entries resolve as ADD (version map still shows those symbols as existing). 60
    symbols expected.
  • DirectPathEquivalenceWithTrueDeletes: 50 symbols, compaction, 10 new + 10 tombstoned
    deletes (version map agrees). Remove entries resolve as DELETE. 50 symbols expected,
    deleted symbols verified absent.

cpp/arcticdb/version/test/benchmark_symbol_list.cpp (new)

New C++ microbenchmarks measuring wall time and peak heap delta via PeakHeapTracker
(mallinfo2 polled at 100 µs intervals):

Benchmark Scenario
BM_symbol_list_load Load from compacted segment + 100 journal entries (100K, 300K; 1M local only)
BM_symbol_list_load_many_entries Load with large uncompacted journal (1K×1K; 300K×1 local only)
BM_symbol_list_compaction Full compaction cycle (1K×100; 1K×1K, 10K×100, 300K×1 local only)
BM_symbol_list_load_s3 Load via mock S3 store (10K symbols)
BM_symbol_list_compaction_s3 Compaction via mock S3 store (1K×100, 10K×10 local only)

cpp/arcticdb/CMakeLists.txt

Added version/test/benchmark_symbol_list.cpp to the benchmarks executable source list.

python/benchmarks/list_symbols.py

Three new ASV benchmark classes covering the optimised paths, each with time_list_symbols
and peakmem_list_symbols measurements across LMDB and S3 backends:

Class Scenario
ListSymbolsWithCompactedCache Compacted cache + 0 or 100 journal entries on top — typical production read path
ListSymbolsCompaction list_symbols triggering compaction with 1K symbols × 10 uncompacted versions
ListSymbolsWithDeletes Compacted cache followed by 10% symbol deletes + new adds

ListSymbolsWithoutCache.num_symbols reduced from [100, 1000] to [1000] to trim the
benchmark matrix.

python/.asv/results/benchmarks.json

Updated ASV benchmark registry to include the three new benchmark classes.


Benchmark Results (C++)

All runs on a 128-core machine, release build. Peak figures are heap delta measured via
mallinfo2 polled at 100 µs intervals from a trimmed baseline.

Benchmarks marked (local only) are commented out in the source and excluded from CI
because their setup or run time exceeds ~5 s on CI hardware. Run them locally with
make bench-cpp FILTER=<name>.

Load path (compacted cache + small journal tail)

Benchmark master time branch time master peak branch peak peak reduction
BM_symbol_list_load/100K 70 ms 64 ms 22.5 MB 20.4 MB -9%
BM_symbol_list_load/300K 244 ms 252 ms 90.0 MB 74.6 MB -17%
BM_symbol_list_load/1M (local only) 940 ms 1035 ms 187.6 MB 176.7 MB -6%

The load path with no_compaction=true does not collect keys, so only the streaming
update_map construction contributes to peak. The key vector elimination cuts 9–17% peak
for the typical 100K–300K symbol range.

Load path (compacted cache + large uncompacted journal)

Benchmark master time branch time master peak branch peak peak reduction
BM_symbol_list_load_many_entries/1K×1K 1521 ms 731 ms 37.5 MB 31.4 MB -16%
BM_symbol_list_load_many_entries/300K×1 (local only) 840 ms 785 ms 197.3 MB 83.0 MB -58%

The 300K×1 case shows the largest memory gain: with 300K distinct symbols each contributing
one journal entry, master held a vector<AtomKey> (300K keys) alongside the update map
simultaneously. The branch builds the update map in a single pass without materialising
the key vector, cutting peak by ~114 MB.

The 1K×1K case is also 2× faster (1521 ms → 731 ms): the streaming approach avoids
a second iteration over the sorted key vector that master needed to build the update map.

Compaction path

Benchmark master time branch time master peak branch peak peak reduction
BM_symbol_list_compaction/1K×100 2580 ms 3239 ms 45.0 MB 42.1 MB -6%
BM_symbol_list_compaction/1K×1K (local only) 15416 ms 15067 ms 37.5 MB 42.7 MB +14%
BM_symbol_list_compaction/10K×100 (local only) 16012 ms 15423 ms 41.2 MB 41.5 MB
BM_symbol_list_compaction/300K×1 (local only) 8034 ms 7354 ms 223.7 MB 199.9 MB -11%

Wall-clock times are close to master (same number of storage iterations: one full + one
filtered). The 1K×100 and 300K×1 cases show clear peak improvements (-6% and -11%) from
the in-place erase of the written key (no second vector<VariantKey> is allocated for
deletion). The 1K×1K case shows a peak regression (+14%) because the pre-collected
VariantKey vector overlaps with the update_map during load — at that scale the
update_map (1K × 1K SymbolEntryData entries) dominates peak and the savings from
the deletion path cannot offset it.

S3 mock store

Benchmark master time branch time master peak branch peak peak reduction
BM_symbol_list_load_s3/10K 37.6 ms 36.4 ms 2.84 MB 2.38 MB -16%
BM_symbol_list_compaction_s3/1K×100 (local only) 38038 ms 39208 ms 103.8 MB 65.1 MB -37%
BM_symbol_list_compaction_s3/10K×10 (local only) 38950 ms 66.0 MB

S3 compaction peak for 1K×100 dropped from 103.8 to 65.1 MB (-37%). Moving symbols_
into write_symbols (by value) prevents overlap with the deletion key vector, and the
in-place erase eliminates the second vector<VariantKey> allocation during deletion.

The 10K×10 case (same 100K total entries, 10× more distinct symbols) peaks at 66.0 MB —
similar to the 1K×100 case since total entry count is the same.

@G-D-Petrov G-D-Petrov added the patch Small change, should increase patch version label Mar 26, 2026
@G-D-Petrov G-D-Petrov force-pushed the gpetrov/reduce_sl_memory_load branch from 2a815f0 to 5844198 Compare April 23, 2026 15:02
@G-D-Petrov G-D-Petrov changed the title Optimize SL memory load Reduce symbol list peak heap usage during load and compaction May 13, 2026
@G-D-Petrov G-D-Petrov marked this pull request as ready for review May 13, 2026 14:39
Comment on lines 154 to 167
for (auto i = 0L; i < seg.column(0).row_count(); ++i) {
auto stream_id = stream_id_from_segment(data_type, seg, i, 0);
auto reference_id = VersionId{scalar_at<uint64_t>(seg, i, 1)};
auto reference_time = timestamp{scalar_at<int64_t>(seg, i, 2)};
ARCTICDB_RUNTIME_DEBUG(
log::symbol(), "Reading added symbol {}: {}@{}", stream_id, reference_id, reference_time
);
output.emplace_back(stream_id, reference_id, reference_time, ActionType::ADD);
visitor(stream_id_from_segment(data_type, seg, i, 0),
VersionId{scalar_at<uint64_t>(seg, i, 1)},
timestamp{scalar_at<int64_t>(seg, i, 2)},
ActionType::ADD);
}

if (seg.descriptor().field_count() == 6) {
util::check(
seg.column(3).row_count() == seg.column(4).row_count() &&
seg.column(3).row_count() == seg.column(5).row_count(),
"Column mismatch in symbol segment deletions: {} {} {}",
seg.column(3).row_count(),
seg.column(4).row_count(),
seg.column(5).row_count()
);

for (auto i = 0L; i < seg.column(3).row_count(); ++i) {
auto stream_id = stream_id_from_segment(data_type, seg, i, 3);
auto reference_id = VersionId{scalar_at<uint64_t>(seg, i, 4)};
auto reference_time = timestamp{scalar_at<int64_t>(seg, i, 5)};
ARCTICDB_RUNTIME_DEBUG(
log::symbol(), "Reading deleted symbol {}: {}@{}", stream_id, reference_id, reference_time
);
output.emplace_back(stream_id, reference_id, reference_time, ActionType::DELETE);
visitor(stream_id_from_segment(data_type, seg, i, 3),
VersionId{scalar_at<uint64_t>(seg, i, 4)},
timestamp{scalar_at<int64_t>(seg, i, 5)},
ActionType::DELETE);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The column row-count consistency checks that previously lived in read_new_style_list_from_storage have been dropped here:

util::check(
    seg.column(0).row_count() == seg.column(1).row_count() &&
        seg.column(0).row_count() == seg.column(2).row_count(),
    "Column mismatch in symbol segment additions: {} {} {}", ...);

Without them, if a stored segment is corrupt and has column(1).row_count() < column(0).row_count(), the scalar_at<uint64_t>(seg, i, 1) / scalar_at<int64_t>(seg, i, 2) calls on line 156–157 (and 164–165 for deletions) will either read garbage or rely on scalar_at's internal bounds check to throw an opaque error — losing the clear "Column mismatch in symbol segment additions/deletions" diagnostic that pointed directly at the storage corruption.

This is a defensive on-disk-data-integrity check and is cheap (three integer comparisons before the loops). Please restore it inside for_each_segment_entry before the additions and deletions loops.

Comment on lines +510 to 524
load_result.total_key_count_ = journal.total_key_count;
load_result.symbol_list_keys_ = std::move(journal.all_keys);

if (journal.compaction_key) {
ARCTICDB_RUNTIME_DEBUG(log::symbol(), "Loading symbols from symbol list keys");
auto existing = read_from_storage(store, *journal.compaction_key);
load_result.symbols_ =
merge_existing_with_journal_map(version_map, store, journal.update_map, std::move(existing));
} else {
ARCTICDB_RUNTIME_DEBUG(log::symbol(), "Loading symbols from version keys");
auto previous_entries = load_previous_from_version_keys(store, data, will_attempt_compaction);
load_result.symbols_ =
merge_existing_with_journal_map(version_map, store, journal.update_map, std::move(previous_entries));
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Would delete unseen key" sanity check from master has been removed without a replacement:

std::unordered_set<StreamId> keys_in_versions;
for (const auto& entry : load_result.symbols_)
    keys_in_versions.emplace(entry.stream_id_);

for (const auto& key : load_result.symbol_list_keys_)
    util::check(
        keys_in_versions.find(StreamId{std::get<StringIndex>(key.start_index())}) != keys_in_versions.end(),
        "Would delete unseen key {}", key);

This was a defensive assertion guarding the version-keys path: it ensured that every journal key about to be deleted during compaction corresponded to a symbol present in the merged output. If a bug in the merge logic (or in problematic-symbol resolution) ever drops a symbol that still has live journal entries, delete_keys/remove_keys_sync would now silently destroy those journal entries — exactly the silent-data-loss class of bug this check was designed to catch.

Either restore the equivalent assertion against load_result.symbol_list_keys_ (the VariantKey collection) before remove_keys_sync in compact_internal, or justify in the PR description why the check is no longer needed.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 13, 2026

ArcticDB Code Review Summary

All previously flagged issues have been addressed. No new issues introduced in the latest commit.

Correctness

  • Column row-count consistency validation in for_each_segment_entryfixed (symbol_list.cpp:154-161 for additions and :170-177 for deletions).
  • The "Would delete unseen key" assertion — restored (symbol_list.cpp:540-554).

Performance

  • The "Would delete unseen key" check at symbol_list.cpp:540-554fixed in latest commit. Now builds an unordered_set<StreamId> once from load_result.symbols_ and performs a hashed lookup per journal key (O(N+M)), matching master's complexity.

Testing

  • Test gap previously flagged on DirectPathEquivalenceaddressed by the new DirectPathEquivalenceWithTrueDeletes test which tombstones via the version map so the merge actually emits ActionType::DELETE.

Comment thread cpp/arcticdb/version/symbol_list.cpp Outdated
Comment on lines +540 to +555
// Verify every journal key we'd delete during compaction corresponds to a symbol in the
// merged output. Guards against silent data loss from merge bugs.
// O(N*M) but only runs once per library (version-keys path, before any compaction exists).
if (collect_keys) {
for (const auto& key : load_result.symbol_list_keys_) {
auto stream_id = StreamId{std::get<StringIndex>(to_atom(key).start_index())};
util::check(
std::any_of(
load_result.symbols_.begin(),
load_result.symbols_.end(),
[&stream_id](const SymbolListEntry& e) { return e.stream_id_ == stream_id; }
),
"Would delete unseen key {}",
key
);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to see this check restored, but the implementation is O(N*M) where master was O(N+M).

Master built an unordered_set<StreamId> once from load_result.symbols_ and did a single hashed lookup per journal key:

std::unordered_set<StreamId> keys_in_versions;
for (const auto& entry : load_result.symbols_)
    keys_in_versions.emplace(entry.stream_id_);

for (const auto& key : load_result.symbol_list_keys_)
    util::check(
        keys_in_versions.find(StreamId{std::get<StringIndex>(to_atom(key).start_index())}) != keys_in_versions.end(),
        "Would delete unseen key {}", key);

The current std::any_of does a full linear scan of symbols_ for every entry in symbol_list_keys_. This path triggers on the first compaction of a library — exactly the scenario the PR is optimising — so for a library reaching first compaction with e.g. 300K uncompacted journal entries (cf. the 300K×1 benchmark in the PR description) this becomes ~9×10¹⁰ comparisons.

Suggest using a hashed lookup like master, or std::binary_search over symbols_ (it's already sorted by stream_id_ when problematic_symbols is non-empty, and merge_existing_entries keeps it sorted otherwise). The comment claiming "only runs once per library" doesn't change that this single run can be very slow at scale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

patch Small change, should increase patch version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant