feat(iota-core): Post consensus load shedding by cyberphysic4l · Pull Request #11301 · iotaledger/iota

cyberphysic4l · 2026-04-23T10:24:06Z

Description of change

Adds a post-consensus load shedding path for the certificate-less (white-flag) flow. Validators now broadcast their measured load shedding percentage to peers via a new consensus transaction, and each validator deterministically drops user transactions during post-consensus categorization based on the stake-weighted quorum percentile of those load shedding percentages — so the entire committee sheds the same set of transactions in the same round.

Prior to white-flag changes, the load shedding percentage was only used locally and was based on execution queue latency. There was also a transactions manager queue based threshold and writeback cache threshold above which everything was dropped. Now the previous latency based load shedding percentage is combined with a transaction manager queue and writeback cache based load shedding percentage, and all the dropping for these is done after consensus.

How the change has been tested

Basic tests (linting, compilation, formatting, unit/integration tests)
Patch-specific tests (correctness, functionality coverage)
I have added tests that prove my fix is effective or that my feature works
I have checked that new and existing unit tests pass locally with my changes

bingyanglin · 2026-05-04T09:00:10Z

+        let mut weighted_values: Vec<(u8, StakeUnit)> = committee
+            .members()
+            .map(|(authority, stake)| {
+                let percentage = notifications.get(authority).copied().unwrap_or(0);


Validators that haven't sent a notification default to 0%. A malicious validator could stay silent to keep the quorum shedding percentage low. Maybe require periodic heartbeats even when unchanged? Or do I miss something?

but again - what do you do if somebody doesn't send the heartbeat? you treat it as zero?

A malicious validator could also simply broadcast a value of 0 and achieve the same effect, nothing enforces particular values to be sent, it is all based on local an unprovable measurements. The security against malicious behaviour comes from the fact that the stake weighted quorum percentile is used to compute the effective load shedding value, limiting the ability of malicious validators to bias the load shedding percentage.

bingyanglin · 2026-05-04T09:00:51Z

+                    .overload_info
+                    .load_shedding_percentage
+                    .load(std::sync::atomic::Ordering::Relaxed);
+                if current != last_notified_percentage {


This only sends when the percentage changes. New validators or validators recovering from partitions won't get fresh data if the system is stable. Maybe send a heartbeat every N intervals even when unchanged?

This data is only relevant during an epoch and the overload monitor is reset at the epoch boundary. I don't think it's necessary to issue every N intervals with the same status. If a node has restarted and is catching up from either clean or outdated consensus database, then all of the overload notifications live inside the DAG anyway, and will be process by consensus_handler in the correct order anyway. Could you describe the scenario where not issuing at regular intervals would be problematic?

This is an important concern that you've highlighted and I have been trying to figure out exactly how different new validator or crash and recovery situations will work. If I understand correctly now, validators will always process all commits in order, and when processing a new commit (where load shedding is done) the epoch store tables will always reflect the latest overload notification from each authority. If there was no notificiation in the epoch, the value is zero. So getting fresh data is not a concern in the sense that the data comes from commits and we are guaranteed to have fresh commit data when we need it.

Although getting fresh data from commits is not a problem, broadcasting fresh data about local conditions may be what you mean. So if a validator was overloaded when it crashed and hence the committed overload notification for this validator has a high load shedding percentage, they may not update this by broadcasting a new value when they recover from the crash. I think this may be better solved by broadcasting (even a zero value) upon start up to make sure each validator has put out a fresh value since last starting up. Does that make sense or do you think there is another reason for having a heartbeat-style approach to broadcasting?

piotrm50 · 2026-05-05T11:08:06Z

+                // quorum load shedding percentage.
+                if drop_percentage > 0 {
+                    if let Some(digest) = tx.0.transaction.executable_transaction_digest() {
+                        if should_reject_tx(drop_percentage, digest, drop_seed) {


should we make sure to not drop UserTransactions such as OverloadNotification or AuthorityCapabilities? OverloadNotification is more crucial I think. Or is that handled someplace else?

This was actually handled already, executable_transaction_digest() would return digests only for user originating transactions and system transactions, but system transactions already dealt with in earlier match arm. I've made it more explicit now by having the user_transaction_digest() method which will only return digests for user originated transaction, so OverloadNotification and AuthorityCapabilities etc will never be dropped by this load shedding.

piotrm50 · 2026-05-05T11:16:46Z

+                    .overload_info
+                    .load_shedding_percentage
+                    .load(std::sync::atomic::Ordering::Relaxed);
+                if current != last_notified_percentage {


This data is only relevant during an epoch and the overload monitor is reset at the epoch boundary. I don't think it's necessary to issue every N intervals with the same status. If a node has restarted and is catching up from either clean or outdated consensus database, then all of the overload notifications live inside the DAG anyway, and will be process by consensus_handler in the correct order anyway. Could you describe the scenario where not issuing at regular intervals would be problematic?

piotrm50 · 2026-05-05T11:25:42Z

+                kind: ConsensusTransactionKind::OverloadNotificationV1(authority, percentage),
+                ..
+            }) => {
+                if &transaction.sender_authority() != authority {


validate_transactions_feature_gating - does this test pass if there is no feature gating on this transaction kind? Perhaps the test is missing something, but in general we don't want to allow malicious validators to issue a transaction kind before it's supported by a feature flag despite the software version being able to deserialize it. Say, we merge this to develop and release, but mainnet/testnet does not have pcool enabled. Then all nodes would try to process this transaction kind (and potentially crash) without protocol supporting this feature. That's what the validate_transactions_feature_gating test was supposed to catch, so maybe it's actually gated somehow, but I figured I'd ask anyway just to make sure.

Ok, I see this is checked in consensus_validator.rs, but I added explicit check here for SignedCapabilityNotification and there is a similar TODO for UserTransactionV1, so I guess we should clean this up at some point.

piotrm50 · 2026-05-05T12:04:21Z

+        let mut weighted_values: Vec<(u8, StakeUnit)> = committee
+            .members()
+            .map(|(authority, stake)| {
+                let percentage = notifications.get(authority).copied().unwrap_or(0);


but again - what do you do if somebody doesn't send the heartbeat? you treat it as zero?

piotrm50 · 2026-05-05T12:06:06Z

                "Percentage of transactions shed due to consensus queue length.",
                registry)
                .unwrap(),
+            cache_backpressure_load_shedding_percentage: register_int_gauge_with_registry!(


we should add metrics to count how many transactions were actually shedded post-consensus. Optionally we should count how many notifications we've received from each authority, but I'm not sure if that's worth polluting the metrics (we have too many already 😅).

roman1e2f5p8s

one test I would add is to exercise the actual drop behavior inside process_consensus_transactions_and_commit_boundary.
see test_consensus_commit_prologue_generation at authority_tests.rs:4671 for how such a test could be constructed

roman1e2f5p8s · 2026-05-20T08:25:51Z

+                    }
+                }
+
+                if tx.0.is_user_tx_with_randomness() {


Suggested change

if tx.0.is_user_tx_with_randomness() {

if !enable_white_flag && tx.0.is_user_tx_with_randomness() {

see #10764 why it should NOT be removed.

roman1e2f5p8s · 2026-05-20T10:00:31Z

+            let mut last_notified_percentage: u32 = state
+                .overload_info
+                .local_load_shedding_percentage
+                .load(std::sync::atomic::Ordering::Relaxed);


shouldn't last_notified_percentage be initialized from the persisted view instead of the atomic on startup? something like

let mut last_notified_percentage: u32 = epoch_store .load_overload_notifications() .expect("...") .get(&authority_name) .copied() .map(u32::from) .unwrap_or(0);

roman1e2f5p8s · 2026-05-20T10:44:12Z

+                ConsensusTransactionKind::OverloadNotificationV1(_, _) => {
+                    if !self.epoch_store.protocol_config().enable_white_flag_flow() {
+                        return Err(IotaError::UnsupportedFeature {
+                            error:
+                                "OverloadNotificationV1 not supported at current protocol version"
+                                    .into(),
+                        });
+                    }
+                }


Suggested change

ConsensusTransactionKind::OverloadNotificationV1(_, _) => {

if !self.epoch_store.protocol_config().enable_white_flag_flow() {

return Err(IotaError::UnsupportedFeature {

error:

"OverloadNotificationV1 not supported at current protocol version"

.into(),

});

}

}

ConsensusTransactionKind::OverloadNotificationV1(authority_name, percentage) => {

if !self.epoch_store.protocol_config().enable_white_flag_flow() {

return Err(IotaError::UnsupportedFeature {

error:

"OverloadNotificationV1 not supported at current protocol version"

.into(),

});

}

if *percentage > 100 {

return Err(IotaError::HandleConsensusTransactionFailure(format!(

"OverloadNotificationV1 with invalid percentage {percentage} from \

authority {}",

authority_name.concise(),

)));

}

}

not just feature gate, but worth doing a cheap check - some other variants do that here.

…s drop

…pped by post consensus load shedding.

… deterministically

iota-ci added core-protocol research labels Apr 23, 2026

cyberphysic4l force-pushed the protocol-research/feat/post-consensus-load-shedding branch from a827011 to 7464fd6 Compare April 24, 2026 12:00

cyberphysic4l changed the base branch from develop to protocol-research/feat/certificate-less-pre-consensus-load-shedding April 24, 2026 12:01

cyberphysic4l changed the title ~~Protocol research/feat/post consensus load shedding~~ feat(iota-core): Post consensus load shedding Apr 27, 2026

vekkiokonio linked an issue Apr 27, 2026 that may be closed by this pull request

pcool: PoC post-consensus load shedding #11203

Open

roman1e2f5p8s force-pushed the protocol-research/feat/certificate-less-pre-consensus-load-shedding branch from 0cb3220 to 0f921e7 Compare April 29, 2026 07:57

roman1e2f5p8s force-pushed the protocol-research/feat/post-consensus-load-shedding branch from 5a62a09 to 8bf43a5 Compare April 29, 2026 12:26

roman1e2f5p8s reviewed Apr 29, 2026

View reviewed changes

Comment thread crates/iota-core/src/overload_monitor.rs Outdated

cyberphysic4l force-pushed the protocol-research/feat/post-consensus-load-shedding branch from 8bf43a5 to 0808cfd Compare April 30, 2026 15:23

cyberphysic4l marked this pull request as ready for review May 1, 2026 14:13

cyberphysic4l requested review from a team as code owners May 1, 2026 14:13

Thoralf-M reviewed May 1, 2026

View reviewed changes

Comment thread crates/iota-types/src/messages_consensus.rs Outdated

bingyanglin reviewed May 4, 2026

View reviewed changes

piotrm50 requested changes May 5, 2026

View reviewed changes

roman1e2f5p8s force-pushed the protocol-research/feat/certificate-less-pre-consensus-load-shedding branch from 014f91c to c520090 Compare May 7, 2026 09:51

roman1e2f5p8s requested review from a team as code owners May 7, 2026 09:51

roman1e2f5p8s force-pushed the protocol-research/feat/certificate-less-pre-consensus-load-shedding branch from c520090 to 8afda17 Compare May 7, 2026 12:40

cyberphysic4l force-pushed the protocol-research/feat/post-consensus-load-shedding branch from 3d4b491 to 76e20ea Compare May 7, 2026 15:46

roman1e2f5p8s removed request for a team May 7, 2026 17:06

cyberphysic4l force-pushed the protocol-research/feat/post-consensus-load-shedding branch from 76e20ea to aabd9b6 Compare May 11, 2026 14:27

cyberphysic4l changed the base branch from protocol-research/feat/certificate-less-pre-consensus-load-shedding to consensus/feat/transaction-certificateless-flow-feature May 11, 2026 14:29

roman1e2f5p8s reviewed May 20, 2026

View reviewed changes

cyberphysic4l added 18 commits May 21, 2026 09:55

feat: create overload notification v1 consensus transactions

1c574e7

feat: calculate quorum load shedding percentage

2930e51

feat: post consensus load shedding

48be734

fix: compare new load shedding percentage to quorum percentage

98575c4

feat: reuse should_reject_tx function

1fa1392

feat: queue length based load shedding

49d0bfc

fix: reuse graduated load shedding calculation function

b6caa19

feat: Add post-consensus load shedding component for writeback cache

fd6e7cf

feat: test writeback cache load shedding

3b4c6b8

feat: ensure latest overload notifications are used for post consensu…

fe50abc

…s drop

fix: formatting

2cb7701

feat: use consensus commit output for persisting overload notifications

54ea642

fix: AuthorityOverloadConfig formatting

6c11e84

feat: explicitly make sure only user-originating transactions are dro…

98ebfdb

…pped by post consensus load shedding.

fmt correctly

1424560

fix: overlay queued overload notifications to compute drop percentage…

27ff058

… deterministically

fix: use local load shedding percentage in computation feedback loop.

f8fc8bb

feat: new test for loading overload notifications

c9e6b55

cyberphysic4l force-pushed the protocol-research/feat/post-consensus-load-shedding branch from e093081 to c9e6b55 Compare May 21, 2026 09:06

cyberphysic4l changed the base branch from consensus/feat/transaction-certificateless-flow-feature to consensus/feat/pcool-feature May 21, 2026 09:07

	if tx.0.is_user_tx_with_randomness() {
	if !enable_white_flag && tx.0.is_user_tx_with_randomness() {

Conversation

cyberphysic4l commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of change

How the change has been tested

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

roman1e2f5p8s left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

cyberphysic4l commented Apr 23, 2026 •

edited

Loading