Skip to content

Improve the performance of the access to storage.#5744

Closed
MathieuDutSik wants to merge 4 commits into
linera-io:mainfrom
MathieuDutSik:performance_execution
Closed

Improve the performance of the access to storage.#5744
MathieuDutSik wants to merge 4 commits into
linera-io:mainfrom
MathieuDutSik:performance_execution

Conversation

@MathieuDutSik
Copy link
Copy Markdown
Contributor

Motivation

Access to storage is done in the execution_chain_actor. It always goes in two
steps of accessing first to the view and then loading doing the operation.
This is repetitive operation, already with the serialization. This is hitting us
quite hard because a contract would have a lot of read operations.

We used to have a cache of views in ReentrantCollectionView. But we removed it
because it is not great to put the cache in the views.

Proposal

Put it in the ExecutionStateActor. We unfortunately cannot access by an index because
of long lived services. So, for any query, we have to access via a BTreeMap<_,_>.

The views are accessed via Arc<RwLock<KeyValueStoreView<C>>>. So, we expose
some of the internals of the ReentrantCollectionView to the user. That is a little
unfortunate.

Other approaches:

  • Put the Arc<RwLock<KeyValueStoreView<C>>> in the SyncRuntimeContract. But it is a major change and for a start we do not even have a Context in SyncRuntimeContract. So block_on would be needed to use.
  • Other idea is to store the KeyValueStoreView as a single BTreeMap<Vec<u8>,Vec<u8>> stored as a RegisterView. This is a somewhat orthogonal approach but still something to consider.

Test Plan

CI.

Release Plan

It can be put in testnet_conway.

Links

None

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 19, 2026

Instruction Count Benchmark Results

Baseline: c6bffe64a7

Deterministic metrics — reproducible across runs (34 benchmarks)
Benchmark Instructions Total R+W
Cold Load
load_1000 695,084 (No change) 1,012,101 (No change)
CollectionView
indices_100 193,448 (-0.04%) 269,167 (-0.04%)
load_all_100_from_storage 638,987 (No change) 902,291 (No change)
load_all_100_in_memory 342,197 (No change) 479,097 (No change)
pre_save_100 288,473 (No change) 399,990 (No change)
try_load_10_from_100 101,875 (No change) 144,259 (No change)
MapView
contains_key_10_from_100 53,905 (No change) 76,560 (No change)
contains_key_10_from_1000 356,452 (No change) 503,836 (No change)
get_10_from_100 56,372 (-0.09%) 80,125 (-0.10%)
get_10_from_1000 359,016 (No change) 507,549 (No change)
get_100_missing_from_1000 611,788 (No change) 854,367 (No change)
indices_100 102,186 (No change) 140,918 (No change)
indices_1000 952,656 (No change) 1,329,289 (No change)
insert_100 258,818 (No change) 357,934 (No change)
insert_1000 2,965,219 (No change) 4,015,726 (No change)
post_save_1000 1,028,257 (No change) 1,482,811 (No change)
pre_save_100 333,989 (No change) 464,521 (No change)
pre_save_1000 3,500,325 (No change) 4,935,486 (No change)
remove_500_from_1000 1,190,450 (No change) 1,661,605 (No change)
QueueView / BucketQueueView
delete_500_from_1000 25,251 (No change) 39,336 (No change)
front_100_from_1000 7,045 (No change) 10,386 (No change)
pre_save_1000 44,367 (No change) 62,329 (No change)
push_1000 25,711 (No change) 35,288 (No change)
delete_500_from_1000 11,587 (No change) 14,317 (No change)
front_100_from_1000 10,683 (No change) 16,049 (No change)
pre_save_1000 1,044,276 (No change) 1,500,608 (No change)
push_1000 25,638 (No change) 35,191 (No change)
ReentrantCollectionView
contains_key_10_from_100 143,321 (-0.04%) 203,873 (-0.04%)
indices_100 238,776 (-0.02%) 334,668 (-0.02%)
load_all_100_from_storage 805,118 (No change) 1,134,918 (No change)
load_all_100_in_memory 395,248 (No change) 543,761 (No change)
pre_save_100 352,585 (+0.11%) 491,163 (+0.14%)
RegisterView
get_set_100 82,696 (No change) 122,160 (No change)
pre_save 6,889 (No change) 10,123 (No change)

Regression threshold: 1%${\color{red}\textbf{red}}$ = regression, ${\color{green}\textbf{green}}$ = improvement.

Cache-dependent metrics — expect fluctuations between runs (34 benchmarks)
Benchmark L1 Hits LLC Hits RAM Hits Est. Cycles
Cold Load
load_1000 1,003,549 (+0.00%) 8,387 (+0.05%) 165 (${\color{green}\textbf{-5.17%}}$) 1,051,259 (-0.03%)
CollectionView
indices_100 267,909 (-0.03%) 879 (-0.57%) 379 (${\color{green}\textbf{-3.56%}}$) 285,569 (-0.21%)
load_all_100_from_storage 897,711 (+0.00%) 3,928 (No change) 652 (-0.91%) 940,171 (-0.02%)
load_all_100_in_memory 476,913 (+0.00%) 1,448 (-0.21%) 736 (${\color{green}\textbf{-1.74%}}$) 509,913 (-0.09%)
pre_save_100 397,972 (+0.00%) 1,423 (+0.42%) 595 (${\color{green}\textbf{-1.82%}}$) 425,912 (-0.08%)
try_load_10_from_100 143,386 (+0.01%) 656 (+0.15%) 217 (${\color{green}\textbf{-5.65%}}$) 154,261 (-0.28%)
MapView
contains_key_10_from_100 76,253 (+0.02%) 109 (${\color{red}\textbf{+1.87%}}$) 198 (${\color{green}\textbf{-6.60%}}$) 83,728 (-0.56%)
contains_key_10_from_1000 500,644 (+0.00%) 2,994 (+0.10%) 198 (${\color{green}\textbf{-6.60%}}$) 522,544 (-0.09%)
get_10_from_100 79,818 (-0.08%) 106 (-0.93%) 201 (${\color{green}\textbf{-6.94%}}$) 87,383 (-0.68%)
get_10_from_1000 504,356 (+0.00%) 2,992 (-0.20%) 201 (${\color{green}\textbf{-6.94%}}$) 526,351 (-0.10%)
get_100_missing_from_1000 851,152 (+0.00%) 2,995 (-0.27%) 220 (${\color{green}\textbf{-6.38%}}$) 873,827 (-0.06%)
indices_100 140,258 (+0.01%) 261 (-0.38%) 399 (${\color{green}\textbf{-2.92%}}$) 155,528 (-0.26%)
indices_1000 1,321,605 (+0.00%) 6,501 (-0.03%) 1,183 (${\color{green}\textbf{-1.00%}}$) 1,395,515 (-0.03%)
insert_100 357,170 (+0.00%) 113 (+0.89%) 651 (${\color{green}\textbf{-2.40%}}$) 380,520 (-0.14%)
insert_1000 4,008,681 (+0.00%) 3,064 (+0.10%) 3,981 (-0.43%) 4,163,336 (-0.01%)
post_save_1000 1,471,413 (+0.00%) 11,226 (+0.01%) 172 (${\color{green}\textbf{-8.02%}}$) 1,533,563 (-0.03%)
pre_save_100 463,124 (+0.00%) 790 (+0.25%) 607 (${\color{green}\textbf{-1.94%}}$) 488,319 (-0.08%)
pre_save_1000 4,919,825 (+0.00%) 11,842 (-0.01%) 3,819 (-0.31%) 5,112,700 (-0.01%)
remove_500_from_1000 1,657,227 (+0.00%) 4,208 (No change) 170 (${\color{green}\textbf{-8.11%}}$) 1,684,217 (-0.03%)
QueueView / BucketQueueView
delete_500_from_1000 39,116 (+0.02%) 63 (${\color{green}\textbf{-1.56%}}$) 157 (${\color{green}\textbf{-4.85%}}$) 44,926 (-0.61%)
front_100_from_1000 10,188 (+0.07%) 68 (${\color{red}\textbf{+1.49%}}$) 130 (${\color{green}\textbf{-5.80%}}$) 15,078 (${\color{green}\textbf{-1.75%}}$)
pre_save_1000 61,954 (+0.00%) 96 (${\color{red}\textbf{+1.05%}}$) 279 (${\color{green}\textbf{-1.06%}}$) 72,199 (-0.14%)
push_1000 35,056 (+0.03%) 74 (${\color{green}\textbf{-5.13%}}$) 158 (${\color{green}\textbf{-4.82%}}$) 40,956 (-0.70%)
delete_500_from_1000 14,116 (+0.03%) 71 (${\color{red}\textbf{+5.97%}}$) 130 (${\color{green}\textbf{-5.80%}}$) 19,021 (${\color{green}\textbf{-1.33%}}$)
front_100_from_1000 15,830 (+0.04%) 64 (${\color{red}\textbf{+3.23%}}$) 155 (${\color{green}\textbf{-4.91%}}$) 21,575 (${\color{green}\textbf{-1.21%}}$)
pre_save_1000 1,495,887 (+0.00%) 2,765 (-0.18%) 1,956 (-0.31%) 1,578,172 (-0.01%)
push_1000 34,954 (+0.02%) 81 (${\color{red}\textbf{+2.53%}}$) 156 (${\color{green}\textbf{-4.88%}}$) 40,819 (-0.64%)
ReentrantCollectionView
contains_key_10_from_100 202,632 (-0.03%) 1,048 (-0.29%) 193 (${\color{green}\textbf{-5.85%}}$) 214,627 (-0.23%)
indices_100 333,067 (-0.02%) 1,238 (-0.16%) 363 (${\color{green}\textbf{-2.94%}}$) 351,962 (-0.13%)
load_all_100_from_storage 1,128,117 (+0.00%) 6,399 (No change) 402 (${\color{green}\textbf{-1.23%}}$) 1,174,182 (-0.01%)
load_all_100_in_memory 541,215 (+0.00%) 1,931 (-0.10%) 615 (${\color{green}\textbf{-1.28%}}$) 572,395 (-0.05%)
pre_save_100 488,180 (+0.15%) 2,306 (No change) 677 (${\color{green}\textbf{-1.74%}}$) 523,405 (+0.06%)
RegisterView
get_set_100 121,915 (+0.00%) 71 (${\color{red}\textbf{+2.90%}}$) 174 (${\color{green}\textbf{-4.40%}}$) 128,360 (-0.21%)
pre_save 9,893 (+0.10%) 72 (${\color{red}\textbf{+1.41%}}$) 158 (${\color{green}\textbf{-6.51%}}$) 15,783 (${\color{green}\textbf{-2.29%}}$)

Cache metrics fluctuate because anything that changes the virtual memory layout
shifts which data lands on which cache lines, changing the L1/LLC/RAM distribution.
Probable causes: ASLR (even across identical binaries), executable binary size changes,
shared library size changes, and even filename length differences.

Cachegrind simulates a two-level cache (L1 + LLC) auto-detected from the host CPU.
Est. Cycles = L1 hits + 5 × LLC hits + 35 × RAM hits.

Runner cache sizes: L1d cache: 512 KiB (16 instances);L1i cache: 512 KiB (16 instances) L2 cache: 8 MiB (16 instances);L3 cache: 512 MiB (16 instances)

@MathieuDutSik MathieuDutSik marked this pull request as ready for review March 19, 2026 10:25
@MathieuDutSik MathieuDutSik requested review from afck and ma2bd March 19, 2026 10:25
&mut self,
short_key: &[u8],
) -> Result<Arc<RwLock<W>>, ViewError> {
self.try_load_view_mut(short_key).await
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they are identical, why do we need both try_load_view_mut and try_load_view_arc?

if let Some(arc) = self.key_value_stores.get(application_id) {
return Ok(arc.clone());
}
let arc = self.state.users.try_load_view_arc(application_id).await?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably still misunderstanding the underlying problem, but:

Now that we call try_load_view_arc/try_load_view_mut instead of just try_load_view, we are inserting the child view into the reentrant collection view's internal map, aren't we? So is it even necessary to keep the map in the actor, too? Isn't just replacing try_load_view with try_load_view_mut enough to make the collection view effectively cache its children?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very similar, but not exactly the same since we are instead storing the serialized key in the ReentrantCollectionView and the key itself in the KeyValueStoreView.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And serializing the key (just an app ID, is it?) is taking that much time that it's worth introducing a separate map?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem in my view is that there are a lot of storage calls. So, improving their speed is quite important. But yes, there is another way to address this: use the ByteReentrantCollectionView and pass the serialization.

But to be honest, I was hoping for more improvements: Avoiding the access to the entry in the BTreeMap altogether. But that proved impossible.

@MathieuDutSik
Copy link
Copy Markdown
Contributor Author

That PR brings benefit, but they are small. So, I prefer to kill it.

@MathieuDutSik MathieuDutSik deleted the performance_execution branch May 10, 2026 10:13
@MathieuDutSik MathieuDutSik restored the performance_execution branch May 10, 2026 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants