bridge: implement additional methods for self_heal#584
Conversation
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
@claude review |
|
Claude finished @marcello33's task in 2m 29s —— View job Code Review: bridge self_heal improvements
SummaryGood refactor that generalizes the self-heal loop from handling only I found one logic bug, one misleading log, a documentation inaccuracy, and a few minor items. Issues1. Bug: misleading "remaining" log when loop breaks early on failure
if remaining := l1MaxNonce - heimdallNonce - queued; remaining > 0 {
rl.Logger.Warn("Self-healing: stake event backlog exceeds per-cycle limit; ...")
}The loop (line 217-225) breaks early when Suggestion: Distinguish the two exit reasons: hitLimit := queued >= maxStakeNoncesPerCycle
total := l1MaxNonce - heimdallNonce
if hitLimit && total > queued {
rl.Logger.Warn("Self-healing: stake event backlog exceeds per-cycle limit; will continue next cycle", ...)
}No log is needed on the failure path — 2. Documentation: incorrect event names in
|
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (65.1%) is below the target coverage (90.0%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #584 +/- ##
=========================================
+ Coverage 42.1% 43.0% +0.8%
=========================================
Files 172 172
Lines 17722 17823 +101
=========================================
+ Hits 7476 7678 +202
+ Misses 9167 9044 -123
- Partials 1079 1101 +22
... and 2 files with indirect coverage changes
🚀 New features to boost your workflow:
|
Code ReviewTwo cross-chain security issues found in the newly added Issue 1: Missing receipt.Status check before trusting event logs File: heimdall-v2/bridge/listener/rootchain_selfheal_graph.go Lines 336 to 342 in bacaef6 After fetching the L1 transaction receipt, the code extracts logs without verifying the transaction succeeded. A reverted transaction can still carry logs, and a compromised subgraph endpoint could return a tx hash for a failed transaction. Rule violated (cross-chain.md, L1 Receipt Validation): Check receipt.Status == 1 (success) before trusting event logs -- reverted txs still emit events in some edge cases Red Flag (cross-chain.md): Trusting event data without receipt status check Suggested fix: Add a Issue 2: Missing log.Address verification against expected StakingInfo contract File: heimdall-v2/bridge/listener/rootchain_selfheal_graph.go Lines 341 to 348 in bacaef6 The log is selected solely by its numeric index from the receipt. It is never verified that The normal (non-self-heal) listener path uses Rules violated:
Suggested fix: After finding the log with Note: The pre-existing |
|
@claude addressed all your points. Review again. |
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
|



Summary
Extends the bridge self-heal loop to cover the full set of nonce-gated stake events emitted by the L1
StakingInfocontract:StakeUpdate(already covered), plusSignerChangeandUnstakeInit(new). The three events share a single per-validator nonce counter, so recovery is unified into one combined GraphQL query that returns the max L1 nonce across all three entities, then fetches and replays the missing event when Heimdall lags.Also, this PR fixes the buggy bridge related metrics which were not reporting correctly.
Operator note: metric rename
The three self-heal Prometheus metrics have been renamed and moved into the
heimdallv2namespace to match the rest of heimdall's metrics and follow Prometheus naming convention (snake_case +_totalsuffix). The previous mixed-case names were silently filtered out by Datadog/Grafana queries.self_healing_<chain>_StakeUpdateheimdallv2_self_healing_stake_events_processed_totalself_healing_<chain>_StateSyncedheimdallv2_self_healing_state_syncs_processed_totalself_healing_<chain>_NewHeaderBlockheimdallv2_self_healing_checkpoint_acks_processed_totalUpdate dashboards / alerting rules accordingly at deploy time.
Subgraph dependency
The new query path requires
signerChangesandunstakeInitsentities in the deployed Polygon PoS subgraph, added hereCross-chain validation hardening
getStakeEventLogByNoncenow validates the L1 receipt before trusting the log:receipt.Status == ReceiptStatusSuccessful(rejects reverted txs)log.Address == StakingInfo address from ChainManager params(rejects logs from unexpected contracts)Rollout
pos-opssub_graph_urlis already switched tov0.0.2, so this PR is deployment-ready.Test plan
Self-healing: retrieved nonces for validatorlines in logs and nosubgraph returned errors.heimdallv2_self_healing_stake_events_processed_totalexposed and incrementing on a synthetic stuck nonce.