Skip to content

Add cross-key worker boot test (Phase 0a, #6244)#6332

Open
ndr-ds wants to merge 5 commits into
mainfrom
ndr-ds/phase-0a-cross-key-boot
Open

Add cross-key worker boot test (Phase 0a, #6244)#6332
ndr-ds wants to merge 5 commits into
mainfrom
ndr-ds/phase-0a-cross-key-boot

Conversation

@ndr-ds
Copy link
Copy Markdown
Contributor

@ndr-ds ndr-ds commented May 18, 2026

Motivation

Part of the A/B testing infrastructure epic (#912). Before we can hot-swap validator databases, we need to confirm empirically that a WorkerState can start on a RocksDB backup written by a different validator — with no keypair (observer mode) or a mismatched keypair — without panicking or corrupting state.

Proposal

Adds a DatabaseBackup test-only trait (#[cfg(with_testing)]) in linera-views with a single backup_to(&self, dir: &Path) -> anyhow::Result<()> method, implemented for RocksDbDatabaseInternal via the RocksDB backup API, and threaded through the wrapper stack via blanket impls on LruCachingDatabase<D>, ValueSplittingDatabase<D>, and MeteredDatabase<D>. A matching backup_to method and a new connect_for_testing constructor are added to DbStorage.

Two tests in linera-core exercise the invariant:

  • test_no_key_boots_on_cross_key_backup — observer mode (key_pair = None)
  • test_mismatched_key_boots_on_cross_key_backup — fresh random keypair that does not match the backup source

Both tests spin up a 4-validator TestBuilder, issue a root chain, take a RocksDB backup of one validator's storage, restore it into a fresh namespace, open a new WorkerState over it, and assert that handle_chain_info_query succeeds.

Test Plan

cargo test -p linera-core --features rocksdb worker_backup -- --nocapture

Both tests pass in ~0.4 s.

Release Plan

  • Nothing to do / These changes follow the usual release cycle.

Links

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 18, 2026

Instruction Count Benchmark Results

Baseline: d4536444eb

Deterministic metrics — reproducible across runs (34 benchmarks)
Benchmark Instructions Total R+W
BucketQueueView
delete_500_from_1000 22,397 (No change) 34,353 (No change)
front_100_from_1000 5,670 (-0.37%) 8,377 (-0.31%)
pre_save_1000 42,532 (No change) 59,695 (No change)
push_1000 24,336 (No change) 33,279 (No change)
Cold Load
load_1000 693,857 (No change) 1,010,338 (No change)
CollectionView
indices_100 188,727 (-0.08%) 260,639 (-0.08%)
load_all_100_from_storage 632,879 (-0.00%) 891,540 (-0.00%)
load_all_100_in_memory 337,000 (No change) 469,568 (No change)
pre_save_100 265,494 (-0.01%) 367,185 (-0.01%)
try_load_10_from_100 100,153 (+0.14%) 141,860 (+0.14%)
MapView
contains_key_10_from_100 51,942 (-0.04%) 73,842 (-0.04%)
contains_key_10_from_1000 351,912 (No change) 497,658 (No change)
get_10_from_100 54,668 (No change) 77,723 (No change)
get_10_from_1000 354,692 (+0.01%) 501,608 (+0.01%)
get_100_missing_from_1000 606,340 (No change) 846,298 (No change)
indices_100 98,672 (No change) 135,748 (No change)
indices_1000 930,371 (No change) 1,296,348 (No change)
insert_100 255,855 (-0.01%) 354,597 (-0.01%)
insert_1000 2,949,625 (+0.00%) 4,002,402 (+0.00%)
post_save_1000 1,024,163 (-0.00%) 1,477,183 (-0.00%)
pre_save_100 330,195 (+0.01%) 459,994 (+0.01%)
pre_save_1000 3,357,873 (No change) 4,733,895 (No change)
remove_500_from_1000 1,192,752 (+0.00%) 1,668,653 (+0.00%)
QueueView
delete_500_from_1000 11,238 (No change) 13,342 (No change)
front_100_from_1000 8,635 (+0.24%) 13,376 (+0.19%)
pre_save_1000 1,141,747 (No change) 1,626,399 (No change)
push_1000 24,270 (-0.09%) 33,192 (-0.08%)
ReentrantCollectionView
contains_key_10_from_100 142,258 (+0.10%) 202,574 (+0.10%)
indices_100 234,264 (-0.07%) 326,629 (-0.07%)
load_all_100_from_storage 799,935 (No change) 1,125,891 (No change)
load_all_100_in_memory 409,119 (No change) 561,465 (No change)
pre_save_100 351,225 (+0.01%) 489,277 (+0.01%)
RegisterView
get_set_100 81,262 (-0.03%) 120,084 (-0.02%)
pre_save 5,488 (No change) 8,099 (No change)

Regression threshold: 1%${\color{red}\textbf{red}}$ = regression, ${\color{green}\textbf{green}}$ = improvement.

Cache-dependent metrics — expect fluctuations between runs (34 benchmarks)
Benchmark L1 Hits LLC Hits RAM Hits Est. Cycles
BucketQueueView
delete_500_from_1000 34,151 (-0.01%) 38 (${\color{red}\textbf{+5.56\%%}}$) 164 (No change) 40,081 (+0.02%)
front_100_from_1000 8,207 (-0.27%) 34 (${\color{green}\textbf{-8.11\%%}}$) 136 (-0.73%) 13,137 (-0.55%)
pre_save_1000 59,218 (No change) 77 (No change) 400 (No change) 73,603 (No change)
push_1000 33,069 (-0.00%) 49 (${\color{red}\textbf{+2.08\%%}}$) 161 (No change) 38,949 (+0.01%)
Cold Load
load_1000 1,001,804 (-0.00%) 8,358 (+0.02%) 176 (No change) 1,049,754 (+0.00%)
CollectionView
indices_100 259,377 (-0.08%) 865 (${\color{red}\textbf{+1.29\%%}}$) 397 (No change) 277,597 (-0.06%)
load_all_100_from_storage 887,042 (No change) 3,833 (-0.65%) 665 (-0.15%) 929,482 (-0.02%)
load_all_100_in_memory 467,439 (+0.00%) 1,383 (-0.22%) 746 (No change) 500,464 (-0.00%)
pre_save_100 365,243 (-0.01%) 1,341 (No change) 601 (-0.17%) 392,983 (-0.02%)
try_load_10_from_100 140,999 (+0.15%) 630 (${\color{green}\textbf{-1.72\%%}}$) 231 (No change) 152,234 (+0.10%)
MapView
contains_key_10_from_100 73,527 (-0.05%) 106 (${\color{red}\textbf{+13.98\%%}}$) 209 (-0.48%) 81,372 (-0.01%)
contains_key_10_from_1000 494,475 (+0.00%) 2,973 (-0.10%) 210 (No change) 516,690 (-0.00%)
get_10_from_100 77,410 (-0.00%) 92 (${\color{red}\textbf{+1.10\%%}}$) 221 (No change) 85,605 (+0.00%)
get_10_from_1000 498,408 (+0.01%) 2,979 (-0.23%) 221 (+0.45%) 521,038 (+0.01%)
get_100_missing_from_1000 843,082 (-0.00%) 2,982 (+0.03%) 234 (No change) 866,182 (+0.00%)
indices_100 135,122 (+0.00%) 222 (-0.89%) 404 (No change) 150,372 (-0.01%)
indices_1000 1,288,675 (+0.00%) 6,485 (-0.03%) 1,188 (No change) 1,362,680 (-0.00%)
insert_100 353,849 (-0.01%) 86 (No change) 662 (-0.15%) 377,449 (-0.02%)
insert_1000 3,995,370 (+0.00%) 3,039 (+0.10%) 3,993 (+0.03%) 4,150,320 (+0.00%)
post_save_1000 1,465,791 (-0.00%) 11,205 (-0.01%) 187 (-0.53%) 1,528,361 (-0.00%)
pre_save_100 458,609 (+0.01%) 764 (-0.65%) 621 (+0.16%) 484,164 (+0.01%)
pre_save_1000 4,719,963 (+0.00%) 10,109 (-0.02%) 3,823 (No change) 4,904,313 (-0.00%)
remove_500_from_1000 1,664,260 (+0.00%) 4,205 (No change) 188 (+0.53%) 1,691,865 (+0.00%)
QueueView
delete_500_from_1000 13,164 (No change) 39 (No change) 139 (No change) 18,224 (No change)
front_100_from_1000 13,172 (+0.15%) 38 (${\color{red}\textbf{+15.15\%%}}$) 166 (+0.61%) 19,172 (+0.42%)
pre_save_1000 1,621,701 (+0.00%) 2,732 (-0.04%) 1,966 (No change) 1,704,171 (-0.00%)
push_1000 32,980 (-0.09%) 50 (${\color{red}\textbf{+11.11\%%}}$) 162 (-0.61%) 38,900 (-0.10%)
ReentrantCollectionView
contains_key_10_from_100 201,348 (+0.10%) 1,025 (No change) 201 (No change) 213,508 (+0.09%)
indices_100 325,044 (-0.07%) 1,216 (${\color{red}\textbf{+1.16\%%}}$) 369 (-0.27%) 344,039 (-0.06%)
load_all_100_from_storage 1,119,319 (+0.00%) 6,165 (-0.02%) 407 (No change) 1,164,389 (-0.00%)
load_all_100_in_memory 559,072 (No change) 1,842 (No change) 551 (No change) 587,567 (No change)
pre_save_100 486,346 (+0.01%) 2,237 (-0.13%) 694 (+0.14%) 521,821 (+0.01%)
RegisterView
get_set_100 119,867 (-0.02%) 37 (${\color{green}\textbf{-9.76\%%}}$) 180 (-0.55%) 126,352 (-0.06%)
pre_save 7,889 (+0.03%) 41 (${\color{green}\textbf{-4.65\%%}}$) 169 (No change) 14,009 (-0.06%)

Cache metrics fluctuate because anything that changes the virtual memory layout
shifts which data lands on which cache lines, changing the L1/LLC/RAM distribution.
Probable causes: ASLR (even across identical binaries), executable binary size changes,
shared library size changes, and even filename length differences.

Cachegrind simulates a two-level cache (L1 + LLC) auto-detected from the host CPU.
Est. Cycles = L1 hits + 5 × LLC hits + 35 × RAM hits.

Runner cache sizes: L1d cache: 64 KiB (2 instances);L1i cache: 64 KiB (2 instances) L2 cache: 1 MiB (2 instances);L3 cache: 32 MiB (1 instance)

@ndr-ds ndr-ds marked this pull request as ready for review May 19, 2026 20:10
@ndr-ds ndr-ds requested review from afck, deuszx and ma2bd May 19, 2026 20:10
@ma2bd
Copy link
Copy Markdown
Contributor

ma2bd commented May 20, 2026

I wonder if the test could be written with memory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[A/B] Phase 0a: cross-key worker integration test (validator boots on peer's DB backup)

2 participants