feat(iota-core): Post consensus load shedding#11301
Conversation
a827011 to
7464fd6
Compare
0cb3220 to
0f921e7
Compare
5a62a09 to
8bf43a5
Compare
8bf43a5 to
0808cfd
Compare
| let mut weighted_values: Vec<(u8, StakeUnit)> = committee | ||
| .members() | ||
| .map(|(authority, stake)| { | ||
| let percentage = notifications.get(authority).copied().unwrap_or(0); |
There was a problem hiding this comment.
Validators that haven't sent a notification default to 0%. A malicious validator could stay silent to keep the quorum shedding percentage low. Maybe require periodic heartbeats even when unchanged? Or do I miss something?
There was a problem hiding this comment.
but again - what do you do if somebody doesn't send the heartbeat? you treat it as zero?
There was a problem hiding this comment.
A malicious validator could also simply broadcast a value of 0 and achieve the same effect, nothing enforces particular values to be sent, it is all based on local an unprovable measurements. The security against malicious behaviour comes from the fact that the stake weighted quorum percentile is used to compute the effective load shedding value, limiting the ability of malicious validators to bias the load shedding percentage.
| .overload_info | ||
| .load_shedding_percentage | ||
| .load(std::sync::atomic::Ordering::Relaxed); | ||
| if current != last_notified_percentage { |
There was a problem hiding this comment.
This only sends when the percentage changes. New validators or validators recovering from partitions won't get fresh data if the system is stable. Maybe send a heartbeat every N intervals even when unchanged?
There was a problem hiding this comment.
This data is only relevant during an epoch and the overload monitor is reset at the epoch boundary. I don't think it's necessary to issue every N intervals with the same status. If a node has restarted and is catching up from either clean or outdated consensus database, then all of the overload notifications live inside the DAG anyway, and will be process by consensus_handler in the correct order anyway. Could you describe the scenario where not issuing at regular intervals would be problematic?
There was a problem hiding this comment.
This is an important concern that you've highlighted and I have been trying to figure out exactly how different new validator or crash and recovery situations will work. If I understand correctly now, validators will always process all commits in order, and when processing a new commit (where load shedding is done) the epoch store tables will always reflect the latest overload notification from each authority. If there was no notificiation in the epoch, the value is zero. So getting fresh data is not a concern in the sense that the data comes from commits and we are guaranteed to have fresh commit data when we need it.
Although getting fresh data from commits is not a problem, broadcasting fresh data about local conditions may be what you mean. So if a validator was overloaded when it crashed and hence the committed overload notification for this validator has a high load shedding percentage, they may not update this by broadcasting a new value when they recover from the crash. I think this may be better solved by broadcasting (even a zero value) upon start up to make sure each validator has put out a fresh value since last starting up. Does that make sense or do you think there is another reason for having a heartbeat-style approach to broadcasting?
| // quorum load shedding percentage. | ||
| if drop_percentage > 0 { | ||
| if let Some(digest) = tx.0.transaction.executable_transaction_digest() { | ||
| if should_reject_tx(drop_percentage, digest, drop_seed) { |
There was a problem hiding this comment.
should we make sure to not drop UserTransactions such as OverloadNotification or AuthorityCapabilities? OverloadNotification is more crucial I think. Or is that handled someplace else?
There was a problem hiding this comment.
This was actually handled already, executable_transaction_digest() would return digests only for user originating transactions and system transactions, but system transactions already dealt with in earlier match arm. I've made it more explicit now by having the user_transaction_digest() method which will only return digests for user originated transaction, so OverloadNotification and AuthorityCapabilities etc will never be dropped by this load shedding.
| .overload_info | ||
| .load_shedding_percentage | ||
| .load(std::sync::atomic::Ordering::Relaxed); | ||
| if current != last_notified_percentage { |
There was a problem hiding this comment.
This data is only relevant during an epoch and the overload monitor is reset at the epoch boundary. I don't think it's necessary to issue every N intervals with the same status. If a node has restarted and is catching up from either clean or outdated consensus database, then all of the overload notifications live inside the DAG anyway, and will be process by consensus_handler in the correct order anyway. Could you describe the scenario where not issuing at regular intervals would be problematic?
| kind: ConsensusTransactionKind::OverloadNotificationV1(authority, percentage), | ||
| .. | ||
| }) => { | ||
| if &transaction.sender_authority() != authority { |
There was a problem hiding this comment.
validate_transactions_feature_gating - does this test pass if there is no feature gating on this transaction kind? Perhaps the test is missing something, but in general we don't want to allow malicious validators to issue a transaction kind before it's supported by a feature flag despite the software version being able to deserialize it. Say, we merge this to develop and release, but mainnet/testnet does not have pcool enabled. Then all nodes would try to process this transaction kind (and potentially crash) without protocol supporting this feature. That's what the validate_transactions_feature_gating test was supposed to catch, so maybe it's actually gated somehow, but I figured I'd ask anyway just to make sure.
There was a problem hiding this comment.
Ok, I see this is checked in consensus_validator.rs, but I added explicit check here for SignedCapabilityNotification and there is a similar TODO for UserTransactionV1, so I guess we should clean this up at some point.
| let mut weighted_values: Vec<(u8, StakeUnit)> = committee | ||
| .members() | ||
| .map(|(authority, stake)| { | ||
| let percentage = notifications.get(authority).copied().unwrap_or(0); |
There was a problem hiding this comment.
but again - what do you do if somebody doesn't send the heartbeat? you treat it as zero?
| "Percentage of transactions shed due to consensus queue length.", | ||
| registry) | ||
| .unwrap(), | ||
| cache_backpressure_load_shedding_percentage: register_int_gauge_with_registry!( |
There was a problem hiding this comment.
we should add metrics to count how many transactions were actually shedded post-consensus. Optionally we should count how many notifications we've received from each authority, but I'm not sure if that's worth polluting the metrics (we have too many already 😅).
014f91c to
c520090
Compare
c520090 to
8afda17
Compare
3d4b491 to
76e20ea
Compare
76e20ea to
aabd9b6
Compare
roman1e2f5p8s
left a comment
There was a problem hiding this comment.
one test I would add is to exercise the actual drop behavior inside process_consensus_transactions_and_commit_boundary.
see test_consensus_commit_prologue_generation at authority_tests.rs:4671 for how such a test could be constructed
| } | ||
| } | ||
|
|
||
| if tx.0.is_user_tx_with_randomness() { |
There was a problem hiding this comment.
| if tx.0.is_user_tx_with_randomness() { | |
| if !enable_white_flag && tx.0.is_user_tx_with_randomness() { |
see #10764 why it should NOT be removed.
| let mut last_notified_percentage: u32 = state | ||
| .overload_info | ||
| .local_load_shedding_percentage | ||
| .load(std::sync::atomic::Ordering::Relaxed); |
There was a problem hiding this comment.
shouldn't last_notified_percentage be initialized from the persisted view instead of the atomic on startup? something like
let mut last_notified_percentage: u32 = epoch_store
.load_overload_notifications()
.expect("...")
.get(&authority_name)
.copied()
.map(u32::from)
.unwrap_or(0);| ConsensusTransactionKind::OverloadNotificationV1(_, _) => { | ||
| if !self.epoch_store.protocol_config().enable_white_flag_flow() { | ||
| return Err(IotaError::UnsupportedFeature { | ||
| error: | ||
| "OverloadNotificationV1 not supported at current protocol version" | ||
| .into(), | ||
| }); | ||
| } | ||
| } |
There was a problem hiding this comment.
| ConsensusTransactionKind::OverloadNotificationV1(_, _) => { | |
| if !self.epoch_store.protocol_config().enable_white_flag_flow() { | |
| return Err(IotaError::UnsupportedFeature { | |
| error: | |
| "OverloadNotificationV1 not supported at current protocol version" | |
| .into(), | |
| }); | |
| } | |
| } | |
| ConsensusTransactionKind::OverloadNotificationV1(authority_name, percentage) => { | |
| if !self.epoch_store.protocol_config().enable_white_flag_flow() { | |
| return Err(IotaError::UnsupportedFeature { | |
| error: | |
| "OverloadNotificationV1 not supported at current protocol version" | |
| .into(), | |
| }); | |
| } | |
| if *percentage > 100 { | |
| return Err(IotaError::HandleConsensusTransactionFailure(format!( | |
| "OverloadNotificationV1 with invalid percentage {percentage} from \ | |
| authority {}", | |
| authority_name.concise(), | |
| ))); | |
| } | |
| } |
not just feature gate, but worth doing a cheap check - some other variants do that here.
…pped by post consensus load shedding.
… deterministically
e093081 to
c9e6b55
Compare
Description of change
Adds a post-consensus load shedding path for the certificate-less (white-flag) flow. Validators now broadcast their measured load shedding percentage to peers via a new consensus transaction, and each validator deterministically drops user transactions during post-consensus categorization based on the stake-weighted quorum percentile of those load shedding percentages — so the entire committee sheds the same set of transactions in the same round.
Prior to white-flag changes, the load shedding percentage was only used locally and was based on execution queue latency. There was also a transactions manager queue based threshold and writeback cache threshold above which everything was dropped. Now the previous latency based load shedding percentage is combined with a transaction manager queue and writeback cache based load shedding percentage, and all the dropping for these is done after consensus.
How the change has been tested