Skip to content

[consensus] HACK: Skip proposals to simulate proposer faults in P90 test#19328

Closed
danielxiangzl wants to merge 3 commits into
mainfrom
daniel/latency-skip-proposal
Closed

[consensus] HACK: Skip proposals to simulate proposer faults in P90 test#19328
danielxiangzl wants to merge 3 commits into
mainfrom
daniel/latency-skip-proposal

Conversation

@danielxiangzl
Copy link
Copy Markdown
Contributor

Summary

  • ~20% of validators (last address byte mod 10 < 2) skip both regular and optimistic proposals 50% of the time (even rounds)
  • Lightweight alternative to killing pods — simulates proposer failures directly in consensus
  • Prototype/experiment code — not for merge to main

Test plan

  • Run forge land_blocking (P90 latency test) and observe round timeout rate
  • Verify faulty validators trigger timeouts and leader reputation kicks in

🤖 Generated with Claude Code

danielxiangzl and others added 2 commits April 3, 2026 13:11
Replaces the land_blocking forge suite with a latency-focused test that
uses a mainnet-representative validator distribution (~70% EU, ~20% NA,
~10% Asia) instead of the previous even 25%/25%/25%/25% four-region split.

The even split over-weights Asia (25% vs ~2% on mainnet) and under-weights
EU, making P90 thresholds misleading. The new distribution causes EU
proposers to dominate rounds as they do on mainnet, exercising the actual
latency bottlenecks (distant proposers racing with EU batch arrival).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@danielxiangzl danielxiangzl added the CICD:run-forge-e2e-perf Run the e2e perf forge only label Apr 4, 2026
Last 2 validators (by ordered index) skip both regular and optimistic
proposals 50% of the time (even rounds). Uses last 2 to avoid EU nodes
which dominate the front of the ordered list in mainnet-like distribution.

This is prototype/experiment code — not for merge to main.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@danielxiangzl danielxiangzl force-pushed the daniel/latency-skip-proposal branch from d911a1b to 8aa9231 Compare April 4, 2026 00:50
@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 4, 2026

✅ Forge suite realistic_env_max_load success on 8aa9231232f954754acc0b39558bcc117684099e

Forge report malformed: Expecting property name enclosed in double quotes: line 10 column 1 (char 182)
'{\n  "metrics": [\n    {\n      "test_name": "performance benchmark",\n      "metric": "submitted_txn",\n      "value": 1427880.0\n    },\n    {\n      "test_name": "performance benchmark",\n[2026-04-04T01:24:41Z INFO  aptos_forge::report] Test Ok\n      "metric": "expired_txn",\n      "value": 0.0\n    },\n    {\n      "test_name": "performance benchmark",\n      "metric": "avg_tps",\n      "value": 3500.1082648790402\n    },\n    {\n      "test_name": "performance benchmark",\n      "metric": "avg_latency",\n      "value": 292.21386173184356\n    },\n    {\n      "test_name": "performance benchmark",\n      "metric": "p50_latency",\n      "value": 200.0\n    },\n    {\n      "test_name": "performance benchmark",\n      "metric": "p90_latency",\n      "value": 300.0\n    },\n    {\n      "test_name": "performance benchmark",\n      "metric": "p99_latency",\n      "value": 1200.0\n    }\n  ],\n  "text": "performance benchmark : committed: 3500.11 txn/s, latency: 292.21 ms, (p50: 200 ms, p70: 200, p90: 300 ms, p99: 1200 ms), latency samples: 28640\\nLatency breakdown for phase 0: [\\"MempoolToBlockCreation: max: 0.151, avg: 0.111\\", \\"ConsensusProposalToOrdered: max: 0.086, avg: 0.084\\", \\"ConsensusOrderedToCommit: max: 0.016, avg: 0.015\\", \\"ConsensusProposalToCommit: max: 0.102, avg: 0.099\\"]\\nMax non-epoch-change gap was: 1 rounds at version 14301 (avg 0.00) [limit 10], 1.47s no progress at version 1651808 (avg 0.04s) [limit 30].\\nMax epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 10], 0.00s no progress at version 0 (avg 0.00s) [limit 30].\\nTest Ok"\n}'
Trailing Log Lines:
networkchaos.chaos-mesh.org "4-gcp--as-southeast1-to-3-gcp--us-east4-netem" deleted from forge-e2e-pr-19328 namespace
[2026-04-04T01:24:30Z INFO  ureq::unit] sending request POST http://vmagent-victoria-metrics-agent.victoria-metrics.svc:8429/api/v1/import/prometheus
test CompositeNetworkTest ... ok
Test Statistics: 
performance benchmark : committed: 3500.11 txn/s, latency: 292.21 ms, (p50: 200 ms, p70: 200, p90: 300 ms, p99: 1200 ms), latency samples: 28640
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 0.151, avg: 0.111", "ConsensusProposalToOrdered: max: 0.086, avg: 0.084", "ConsensusOrderedToCommit: max: 0.016, avg: 0.015", "ConsensusProposalToCommit: max: 0.102, avg: 0.099"]
Max non-epoch-change gap was: 1 rounds at version 14301 (avg 0.00) [limit 10], 1.47s no progress at version 1651808 (avg 0.04s) [limit 30].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 10], 0.00s no progress at version 0 (avg 0.00s) [limit 30].
Test Ok

=== BEGIN JUNIT ===
<?xml version="1.0" encoding="UTF-8"?>
<testsuites name="forge" tests="1" failures="0" errors="0" uuid="5d7cea94-8e0e-4c3c-94c1-ea09267acd5b">
    <testsuite name="local" tests="1" disabled="0" errors="0" failures="0">
        <testcase name="CompositeNetworkTest(network:multi-region-network-emulation(performance benchmark)) with ">
        </testcase>
    </testsuite>
</testsuites>
=== END JUNIT ===
[2026-04-04T01:24:41Z INFO  aptos_forge::backend::k8s::cluster_helper] Deleting namespace forge-e2e-pr-19328: Some(NamespaceStatus { conditions: None, phase: Some("Terminating") })
[2026-04-04T01:24:41Z INFO  aptos_forge::backend::k8s::cluster_helper] aptos-node resources for Forge removed in namespace: forge-e2e-pr-19328
[2026-04-04T01:24:41Z INFO  ureq::unit] sending request POST http://vmagent-victoria-metrics-agent.victoria-metrics.svc:8429/api/v1/import/prometheus

test result: ok. 1 passed; 0 soft failed; 0 hard failed; 0 filtered out

Debugging output:
NAME                                         READY   STATUS      RESTARTS   AGE
aptos-node-0-validator-0                     1/1     Running     0          14m
aptos-node-1-validator-0                     1/1     Running     0          14m
aptos-node-10-validator-0                    1/1     Running     0          14m
aptos-node-11-validator-0                    1/1     Running     0          14m
aptos-node-12-validator-0                    1/1     Running     0          14m
aptos-node-13-validator-0                    1/1     Running     0          14m
aptos-node-14-validator-0                    1/1     Running     0          14m
aptos-node-15-validator-0                    1/1     Running     0          14m
aptos-node-16-validator-0                    1/1     Running     0          14m
aptos-node-17-validator-0                    1/1     Running     0          14m
aptos-node-18-validator-0                    1/1     Running     0          14m
aptos-node-19-validator-0                    1/1     Running     0          14m
aptos-node-2-validator-0                     1/1     Running     0          14m
aptos-node-3-validator-0                     1/1     Running     0          14m
aptos-node-4-validator-0                     1/1     Running     0          14m
aptos-node-5-validator-0                     1/1     Running     0          14m
aptos-node-6-validator-0                     1/1     Running     0          14m
aptos-node-7-validator-0                     1/1     Running     0          14m
aptos-node-8-validator-0                     1/1     Running     0          14m
aptos-node-9-validator-0                     1/1     Running     0          14m
forge-testnet-deployer-4h6qq                 0/1     Completed   0          14m
genesis-aptos-genesis-eforge618dfd42-nvgdn   0/1     Completed   0          14m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CICD:run-forge-e2e-perf Run the e2e perf forge only

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant