Skip to content

bridge: improvements#575

Merged
marcello33 merged 12 commits into
developfrom
kamui/bridge-improvements
May 19, 2026
Merged

bridge: improvements#575
marcello33 merged 12 commits into
developfrom
kamui/bridge-improvements

Conversation

@kamuikatsurgi
Copy link
Copy Markdown
Member

@kamuikatsurgi kamuikatsurgi commented Apr 23, 2026

Description

  • Batch receipt prefetch in ExtendVote: Pre-fetches L1 tx receipts in a single batch RPC call before side handler execution, with a timeout derived from CometBFT’s timeout_precommit / 2.
  • Event lookup optimisation: Replaces per-log linear ABI scan in the root chain listener with a pre-built topic → event map for O(1) lookups.
  • Bridge handler hardening: Adds early returns after json.Marshal / UnpackLog errors in all root chain log handlers (previously continued with nil/corrupt data), and normalises error log keys.
  • Clerk mempool check simplification: Replaces raw HTTP + manual base64/JSON decoding with the typed CometBFT RPC client (UnconfirmedTxs), and adds a direct event-record existence check before broadcasting.

Copilot AI review requested due to automatic review settings April 23, 2026 09:08
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes Heimdall’s bridge + vote-extension execution path by reducing redundant L1 RPC calls (batch receipt prefetch + caching), speeding up rootchain event dispatch (topic→event map), and hardening several bridge handlers to fail fast on encoding/decoding errors.

Changes:

  • Add L1 receipt prefetching (batch JSON-RPC) and introduce caches for finalized block headers + receipts.
  • Replace per-log ABI linear scans with a prebuilt topic→event lookup; stagger task ETAs and increase retry count.
  • Simplify clerk mempool checking by switching to the typed CometBFT RPC client and add an event-record existence check before broadcasting.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
helper/receipt.go Adds PrefetchReceipts batch prefetch + cache-warming for L1 receipts.
helper/call.go Introduces LRU receipt cache + finalized block header cache; adds batch receipt RPC.
helper/call_test.go Adds unit tests for finalized block cache semantics and concurrency.
go.mod Adds github.com/hashicorp/golang-lru/v2 as a direct dependency.
bridge/util/common.go Minor cleanup (whitespace-only changes in diff).
bridge/processor/base.go Replaces raw HTTP mempool fetch with typed CometBFT UnconfirmedTxs.
bridge/processor/clerk.go Uses event-record existence check before broadcasting clerk events.
bridge/listener/rootchain.go Adds event topic map, task staggering, and increases task retry count.
bridge/listener/rootchain_selfheal.go Uses event topic map lookup and guards against empty topics.
bridge/listener/rootchain_log.go Adds early returns on marshal/unpack errors; normalizes log keys.
bridge/listener/rootchain_listener_test.go Adds tests around event map, stagger interval, and early-return behavior.
app/abci.go Adds receipt prefetch step to ExtendVoteHandler and tx-hash extraction helpers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread bridge/processor/clerk.go Outdated
Comment thread app/abci.go
Comment thread app/abci.go
Comment thread helper/receipt.go Outdated
Comment thread bridge/listener/rootchain_listener_test.go Outdated
Comment thread bridge/processor/base.go
Comment thread bridge/listener/rootchain.go Outdated
Comment thread bridge/listener/rootchain.go Outdated
@claude
Copy link
Copy Markdown

claude Bot commented Apr 23, 2026

Code review

Found 2 issues. Checked for bugs and CLAUDE.md compliance.

Issue 1: Data race on taskStaggerDelay (bridge/listener/rootchain.go)

taskStaggerDelay is written in queryAndBroadcastEvents (runs in the StartHeaderProcess goroutine, line 93) and read in SendTaskWithDelay via rl.taskStaggerDelay (line 269). However, SendTaskWithDelay is also called from the startSelfHealing goroutine (line 102) via processCheckpointAck, processStakeUpdate (which further fans out into goroutines with sync.WaitGroup), and processStateSynced, all of which call handleLog then SendTaskWithDelay.

There is no mutex, atomic, or channel protecting this field. The Go race detector will flag this as a data race.

Suggested fix: Make taskStaggerDelay a local variable in queryAndBroadcastEvents and pass it as a parameter to handleLog/SendTaskWithDelay, since the self-heal paths do not set it and would use a stale value anyway. Alternatively, use atomic.Int64 or a sync.Mutex.

Per security.md: Tests pass with -race flag to detect data races.

Issue 2: RetryCount change from 3 to 10 breaks existing tests (bridge/listener/rootchain.go)

The change at line 265 updates signature.RetryCount from 3 to 10, but the pre-existing test file bridge/listener/rootchain_test.go still asserts the old value at line 81 (mock matcher sig.RetryCount != 3 returns false, so mock will not match) and at line 611 (require.Equal(t, 3, signature.RetryCount) will fail).

Per CLAUDE.md CI Requirements: All tests pass (make test).

@claude
Copy link
Copy Markdown

claude Bot commented Apr 23, 2026

test

@claude
Copy link
Copy Markdown

claude Bot commented Apr 23, 2026

Inline Review Comment - bridge/listener/rootchain.go line 242

Data race: taskStaggerDelay is written here and read in SendTaskWithDelay (line 269). SendTaskWithDelay is also called from the startSelfHealing goroutine (line 102). No synchronization protects this field. Fix: make taskStaggerDelay a local variable and pass it as a parameter.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 23, 2026

Inline Review Comment - bridge/listener/rootchain.go line 265

RetryCount changed from 3 to 10 but pre-existing tests in rootchain_test.go still assert value 3: line 81 (mock matcher) and line 611 (require.Equal). These tests will fail.

@kamuikatsurgi
Copy link
Copy Markdown
Member Author

@claude review.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 23, 2026

Claude finished @kamuikatsurgi's task in 3m 9s —— View job


Code Review: bridge improvements

  • Gather context and read all changed files
  • Review ABCI changes (app/abci.go) — consensus-critical path
  • Review caching layer (helper/call.go, helper/receipt.go)
  • Review cache tests (helper/call_test.go)
  • Review bridge listener changes (rootchain.go, rootchain_log.go, rootchain_selfheal.go)
  • Review bridge listener tests (rootchain_listener_test.go, rootchain_test.go)
  • Review processor changes (base.go, clerk.go)
  • Review remaining files (go.mod, bridge/util/common.go)
  • Check for previously reported issues (data race, test breakage)
  • Post comprehensive review

Found 1 issue and 3 suggestions. Previous review issues (data race, test breakage) have been addressed.

Issue

1. UnconfirmedTxs with nil limit can fetch the entire mempoolbridge/processor/base.go:208

bp.httpClient.UnconfirmedTxs(ctx, nil) passes nil for the limit parameter, which fetches all unconfirmed transactions. During network congestion this could return thousands of transactions, causing unnecessary memory allocation and iteration on the bridge process (which is not on the consensus-critical path, but still an operational concern).

Consider passing a reasonable limit. The function only checks for clerk MsgEventRecord txs with a matching (txHash, logIndex) pair — there's no need to fetch the entire mempool:

limit := 1000 // reasonable upper bound
result, err := bp.httpClient.UnconfirmedTxs(ctx, &limit)

Suggestions

1. Stagger race between batch processing and self-healbridge/listener/rootchain.go:247-251 / rootchain_selfheal.go:367-369

The data race is fixed with staggerMu — good. However, there's a minor timing interaction: if queryAndBroadcastEvents (goroutine A) sets taskStaggerDelay = 5s then calls handleLog, and concurrently processEvent (self-heal goroutine B) resets taskStaggerDelay = 0 before goroutine A's SendTaskWithDelay reads it, the batch event loses its stagger. This isn't a safety issue (the event still fires), but it reduces the effectiveness of the thundering-herd mitigation during concurrent batch + self-heal processing.

A cleaner approach would be to pass the stagger as a parameter to handleLog/SendTaskWithDelay instead of storing it as mutable shared state. For self-heal, pass 0; for batch, pass the computed delay. This eliminates the shared state entirely.

2. Hardcoded prefetch timeoutapp/abci.go:975

prefetchReceiptsTimeout = 500ms assumes timeout_precommit = 1s. If operators configure a different timeout_precommit, this constant won't track it. Since this is on the consensus-critical ExtendVoteHandler path, an oversized timeout could delay vote extension delivery. Consider reading the value from CometBFT config, or at minimum adding a comment that operators should adjust this if they change timeout_precommit.

3. PR description says "retry count from 3 to 10" but the code uses 5 (bridge/listener/rootchain.go:273). Minor doc inconsistency — consider updating the PR description.

What looks good

  • Receipt caching design is sound. The LRU cache stores immutable receipt data; finality is always rechecked in GetConfirmedTxReceipt. The monotonic finalized block cache correctly avoids regression. Thread-safety is handled by hashicorp/golang-lru (thread-safe by default) and sync.RWMutex for the finalized block cache.

  • Event map optimization (rootchain.go:66-72) is a clean improvement. The e := event copy inside the range loop correctly avoids the loop variable capture issue. O(1) lookup replaces per-log linear ABI scan.

  • Handler hardening (rootchain_log.go) — all handlers now return early on json.Marshal and UnpackLog errors, preventing nil/corrupt data from reaching SendTaskWithDelay. Previously several handlers continued after errors.

  • Clerk existence check (clerk.go:114-128) — checking GetClerkEventRecord before broadcasting avoids unnecessary transactions for already-processed events.

  • Mempool check simplification (base.go:188-237) — replacing raw HTTP + manual base64/JSON decoding with the typed UnconfirmedTxs RPC client is cleaner and less error-prone. The nil check for bp.httpClient (line 196) handles test/benchmark cases.

  • PrefetchReceipts (helper/receipt.go) — correctly type-asserts to *ContractCaller (line 68-71), filters already-cached hashes (line 74-80), and uses batch RPC. The "finality checked later" design is safe since GetConfirmedTxReceipt always verifies finality.

  • Test coverage — new tests for event map, stagger behavior, early-return on bad ABI, finalized block cache semantics, and concurrency. The TestFinalizedBlockCache_ConcurrentAccess test with 100 goroutines will catch race conditions under -race.

  • BatchGetMainChainTxReceipts (helper/call.go:1192-1222) — properly validates receipt.BlockNumber != nil before caching, has nil check for MainChainRPCClient, and gracefully handles per-element errors.


@claude
Copy link
Copy Markdown

claude Bot commented Apr 23, 2026

Code review

Found 1 issue. Checked for bugs and CLAUDE.md compliance.

Issue: Stale taskStaggerDelay leaks into self-heal checkpoint ACK path (bridge/listener/rootchain_selfheal.go)

processCheckpointAck calls SendTaskWithDelay at line 174 without resetting taskStaggerDelay to 0 first. In contrast, processEvent correctly resets it at line 367-369. Since SendTaskWithDelay now reads and applies taskStaggerDelay to the ETA, checkpoint ACK tasks dispatched by self-heal will inherit whatever stagger was left over from the last batch of events processed by queryAndBroadcastEvents, adding an unpredictable delay.

Comment thread helper/call.go Outdated
Comment thread bridge/listener/rootchain.go Outdated
Comment thread bridge/listener/rootchain.go Outdated
Copilot AI review requested due to automatic review settings April 24, 2026 11:08
@kamuikatsurgi kamuikatsurgi changed the title bridge: batch prefetch receipts, cache L1 data, harden event handlers bridge: improvements Apr 24, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

❌ Patch coverage is 37.72242% with 175 lines in your changes missing coverage. Please review.
✅ Project coverage is 42.4%. Comparing base (fcf72a4) to head (c450f22).

Files with missing lines Patch % Lines
helper/call.go 40.5% 47 Missing ⚠️
bridge/listener/rootchain.go 23.7% 45 Missing ⚠️
app/abci.go 62.5% 22 Missing and 5 partials ⚠️
bridge/processor/base.go 0.0% 23 Missing ⚠️
bridge/listener/rootchain_log.go 32.0% 17 Missing ⚠️
helper/receipt.go 40.0% 8 Missing and 1 partial ⚠️
bridge/listener/rootchain_selfheal.go 0.0% 4 Missing ⚠️
bridge/processor/clerk.go 0.0% 2 Missing ⚠️
bridge/util/common.go 0.0% 1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (37.7%) is below the target coverage (90.0%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           develop    #575     +/-   ##
=========================================
+ Coverage     42.1%   42.4%   +0.2%     
=========================================
  Files          172     172             
  Lines        17722   17899    +177     
=========================================
+ Hits          7476    7601    +125     
- Misses        9167    9210     +43     
- Partials      1079    1088      +9     
Files with missing lines Coverage Δ
bridge/listener/heimdall.go 8.8% <100.0%> (ø)
bridge/util/common.go 19.5% <0.0%> (-1.8%) ⬇️
bridge/processor/clerk.go 0.0% <0.0%> (ø)
bridge/listener/rootchain_selfheal.go 0.0% <0.0%> (ø)
helper/receipt.go 72.7% <40.0%> (-27.3%) ⬇️
bridge/listener/rootchain_log.go 35.8% <32.0%> (+21.4%) ⬆️
bridge/processor/base.go 1.8% <0.0%> (+0.3%) ⬆️
app/abci.go 58.7% <62.5%> (+0.3%) ⬆️
bridge/listener/rootchain.go 8.3% <23.7%> (+8.3%) ⬆️
helper/call.go 14.5% <40.5%> (+4.1%) ⬆️

... and 1 file with indirect coverage changes

Files with missing lines Coverage Δ
bridge/listener/heimdall.go 8.8% <100.0%> (ø)
bridge/util/common.go 19.5% <0.0%> (-1.8%) ⬇️
bridge/processor/clerk.go 0.0% <0.0%> (ø)
bridge/listener/rootchain_selfheal.go 0.0% <0.0%> (ø)
helper/receipt.go 72.7% <40.0%> (-27.3%) ⬇️
bridge/listener/rootchain_log.go 35.8% <32.0%> (+21.4%) ⬆️
bridge/processor/base.go 1.8% <0.0%> (+0.3%) ⬆️
app/abci.go 58.7% <62.5%> (+0.3%) ⬆️
bridge/listener/rootchain.go 8.3% <23.7%> (+8.3%) ⬆️
helper/call.go 14.5% <40.5%> (+4.1%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/abci.go Outdated
Comment thread go.mod Outdated
@kamuikatsurgi
Copy link
Copy Markdown
Member Author

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Claude finished @kamuikatsurgi's task in 5m 1s —— View job


Code Review: bridge improvements

  • Gather context and read all changed files
  • Review consensus-critical path (app/abci.go, app/abci_test.go)
  • Review caching layer (helper/call.go, helper/receipt.go, helper/call_test.go)
  • Review bridge listener (rootchain.go, rootchain_log.go, rootchain_selfheal.go)
  • Review bridge listener tests (rootchain_listener_test.go, rootchain_test.go, rootchain_log_test.go)
  • Review processor changes (base.go, clerk.go)
  • Review remaining files (bridge/util/common.go)
  • Check status of previously reported issues
  • Post comprehensive review

Found 2 issues and 2 suggestions. All previously reported issues have been addressed.

Issues

1. getPrefetchMu lazy initialization is not thread-safehelper/call.go:1341-1346

func (c *ContractCaller) getPrefetchMu() *sync.RWMutex {
	if c.prefetchMu == nil {
		c.prefetchMu = &sync.RWMutex{}
	}
	return c.prefetchMu
}

This check-then-write has a TOCTOU race. If two goroutines call getPrefetchMu() concurrently on a ContractCaller where prefetchMu is nil, both see nil, both create a new sync.RWMutex, and each operates on a different instance — silently defeating all mutex protection. NewContractCaller() initializes the field, so this only affects manually-constructed instances (tests today), but the pattern is dangerous if construction paths change later. Fix: use sync.Once or always initialize eagerly in all construction paths, and remove the lazy init.

Fix this →

2. Duplicate import of checkpoint typesapp/abci.go:28-29

"github.com/0xPolygon/heimdall-v2/x/checkpoint/types"
checkpointTypes "github.com/0xPolygon/heimdall-v2/x/checkpoint/types"

The same package is imported twice under different aliases. This compiles (Go allows it if both names are used) but is confusing, redundant, and likely unintentional. Use one or the other consistently.

Fix this →

Suggestions

1. UnconfirmedTxs with nil limit fetches entire mempoolbridge/processor/base.go:208

bp.httpClient.UnconfirmedTxs(ctx, nil) passes nil for the limit, which fetches all unconfirmed transactions. During congestion this could be thousands of txs, causing unnecessary memory allocation and iteration. Consider passing a reasonable upper bound:

limit := 1000
result, err := bp.httpClient.UnconfirmedTxs(ctx, &limit)

2. Potential deadlock on panic during lock holdhelper/call.go:524-536 + app/abci.go:251-256

If a panic occurs while prefetchMu.RLock() is held inside getOrFetchReceipt or GetConfirmedTxReceipt, the deferred EndPrefetchRound() in ExtendVoteHandler will call prefetchMu.Lock(), causing a deadlock (RLock held → panic unwinds → defer calls Lock → blocks forever). The code between lock/unlock is simple enough that panics are extremely unlikely in practice, but using defer prefetchMu.RUnlock() immediately after each lock acquisition would make this bulletproof.

Previously reported issues — Status

All previously reported issues have been addressed:

Issue Status
Data race on taskStaggerDelay ✅ Fixed — stagger passed as parameter, no shared state
RetryCount test breakage (3→5) ✅ Fixed — tests updated
Stale stagger in self-heal paths ✅ Fixed — self-heal passes 0
Off-by-one in stagger (first event = 1s) ✅ Fixed — increment after use, first event = 0
Receipt LRU cache reorg risk ✅ Redesigned — round-scoped map with Begin/EndPrefetchRound
httpClient nil check in base processor ✅ Added
TxHash validation for MsgEventRecord ✅ Added verifyHexTxHash with hexutil.Decode + length check
Prefetch log level too high ✅ Downgraded to Debug

What looks good

  • Round-scoped prefetch lifecycleBeginPrefetchRound/EndPrefetchRound cleanly scopes the receipt cache to a single ExtendVote round. Combined with the finality check in GetConfirmedTxReceipt (which always verifies against a finalized block before accepting), this is safe for post-merge Ethereum where finalized blocks are never reorged.

  • Stagger as parameter — Passing staggerDelay through handleLog → handler → SendTaskWithDelay eliminates all shared mutable state. Self-heal correctly passes 0. This is exactly the fix reviewers previously suggested.

  • Event map optimization (rootchain.go:61-67) — e := event copy inside the range loop correctly avoids the loop variable capture issue. O(1) topic→event lookup replaces per-log linear ABI scan.

  • Handler hardening (rootchain_log.go) — All handlers now return early on json.Marshal and UnpackLog errors, plus error log keys are normalized to lowercase "error".

  • TxHash validation (app/abci.go:1083-1089) — verifyHexTxHash properly validates 0x-prefixed 32-byte hex; verifyTxHash checks len == 32. Both prevent malformed hashes from entering the prefetch batch. Good test coverage in TestExtractTxHashMsgEventRecordValidation.

  • Clerk existence check (clerk.go:114-128) — Checking GetClerkEventRecord before broadcasting avoids unnecessary mempool queries and duplicate transactions.

  • Mempool check simplification (base.go:188-237) — Typed UnconfirmedTxs RPC client replaces raw HTTP + manual base64/JSON decoding. Early type assertion to *clerkTypes.MsgEventRecord simplifies the matching logic.

  • Test coverage — Good new tests for event map lookups, stagger semantics, early-return on bad ABI, prefetch round lifecycle, concurrent access, and TxHash validation edge cases.


Comment thread bridge/processor/base.go
Copilot AI review requested due to automatic review settings April 24, 2026 11:54
@kamuikatsurgi
Copy link
Copy Markdown
Member Author

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Claude finished @kamuikatsurgi's task in 2m 21s —— View job


Code Review: bridge improvements

  • Gather context and read all changed files
  • Review consensus-critical path (app/abci.go, app/abci_test.go)
  • Review caching layer (helper/call.go, helper/receipt.go, helper/call_test.go)
  • Review bridge listener (rootchain.go, rootchain_log.go, rootchain_selfheal.go)
  • Review bridge listener tests (rootchain_listener_test.go, rootchain_test.go, rootchain_log_test.go)
  • Review processor changes (base.go, clerk.go)
  • Review remaining files (bridge/util/common.go)
  • Check status of previously reported issues
  • Post comprehensive review

Found 2 issues and 2 suggestions. All previously reported issues have been addressed.

Issues

1. Duplicate import in new test filebridge/listener/rootchain_listener_test.go:9-10

"github.com/ethereum/go-ethereum/common"
ethCommon "github.com/ethereum/go-ethereum/common"

The same package is imported under two names. Go technically allows this (both aliases resolve and are used: common.HexToHash on line 19 and ethCommon.Hash on line 17), but it's confusing and unnecessary. The production code in rootchain.go already uses ethCommon consistently. Use one alias throughout. Copilot also flagged this — some Go versions may reject it outright.

Fix this →

2. PrefetchReceipts overwrites the prefetched map instead of merginghelper/receipt.go:81-83

caller.prefetchMu.Lock()
caller.prefetchedReceipts = receipts
caller.prefetchMu.Unlock()

BatchGetMainChainTxReceipts only returns receipts where receipt.BlockNumber != nil and the individual batch element succeeded. If some hashes fail (e.g., RPC timeout on a subset), the returned receipts map is smaller than what was requested. But PrefetchReceipts replaces the entire prefetchedReceipts map (which BeginPrefetchRound initialized as empty). This is fine today because PrefetchReceipts is called exactly once per round.

However, if PrefetchReceipts were ever called twice in the same round (e.g., retry logic), the second call would overwrite the first call's results. This is a latent issue, not currently triggered. Worth a comment or defensive merge:

caller.prefetchMu.Lock()
for h, r := range receipts {
    caller.prefetchedReceipts[h] = r
}
caller.prefetchMu.Unlock()

Suggestions

1. prefetchMu nil on zero-value ContractCallerhelper/call.go

BeginPrefetchRound and EndPrefetchRound both call c.prefetchMu.Lock() without a nil guard. Tests TestBeginPrefetchRound_ZeroValuePanics and TestEndPrefetchRound_ZeroValuePanics correctly document this panics on zero-value ContractCaller. In production, NewContractCaller() always initializes it, so this is safe. But if any future code path constructs a ContractCaller{} directly and reaches ExtendVoteHandler, it'll crash the node. Consider either: (a) a nil guard at the top of BeginPrefetchRound/EndPrefetchRound, or (b) always initialize prefetchMu in the struct literal (field default).

2. checkTxAgainstMempool limited to 100 txsbridge/processor/base.go:208-209

limit := 100
result, err := bp.httpClient.UnconfirmedTxs(ctx, &limit)

Good that you addressed the previous nil-limit issue (which defaulted to 30). CometBFT caps at maxPerPage=100 server-side. Under heavy load, duplicate clerk txs beyond position 100 won't be caught, and the processor will re-broadcast. This isn't a safety issue (Heimdall rejects duplicates downstream), but worth a comment noting the server-side cap and that this is a best-effort dedup.

Previously Reported Issues — Status

All previously reported issues have been addressed:

Issue Status
Data race on taskStaggerDelay ✅ Fixed — stagger passed as local variable/parameter, no shared state
RetryCount test breakage (3→5) ✅ Fixed — tests updated to expect 5
Stale stagger in self-heal paths ✅ Fixed — self-heal passes 0 explicitly
Off-by-one in stagger (first event = 1s) ✅ Fixed — increment after assignment, first event = 0
httpClient nil check in base processor ✅ Added at line 196
TxHash validation for MsgEventRecord ✅ Added verifyHexTxHash with hexutil.Decode + length check
Prefetch log level too high ✅ Downgraded to Debug
Receipt cache reorg risk (LRU) ✅ Redesigned — round-scoped map with Begin/EndPrefetchRound, no persistent LRU
UnconfirmedTxs nil limit (30 default) ✅ Fixed — explicit limit := 100
Stagger counter inflation from skipped logs ✅ Fixed — validLogCount only increments for matched events

What Looks Good

  • Round-scoped prefetch lifecycleBeginPrefetchRound/EndPrefetchRound cleanly bounds the receipt cache to a single ExtendVote round. The finalized header cache is also reset per-round, preventing stale finality data from persisting. The fast path in GetConfirmedTxReceipt (line 789-793) checks cachedFinalizedHeader and falls through to a fresh GetMainChainFinalizedBlock() call if the receipt block is ahead — correct design.

  • Stagger as parameter — Passing staggerDelay through handleLog → handler → SendTaskWithDelay as a time.Duration parameter eliminates all shared mutable state. Self-heal paths correctly pass 0. This is the clean solution reviewers suggested.

  • Event map optimization (rootchain.go:61-67) — e := event copy inside the range loop correctly avoids the Go loop variable capture issue. O(1) topic→event lookup replaces per-log linear ABI scan.

  • Handler hardening (rootchain_log.go) — All handlers now return early on json.Marshal and UnpackLog errors, preventing nil/corrupt data from reaching SendTaskWithDelay. Error log keys are normalized to lowercase "error".

  • TxHash validation (app/abci.go:1082-1089) — verifyTxHash checks len == 32 for byte hashes; verifyHexTxHash uses hexutil.Decode + length check for string hashes. Both prevent malformed hashes from entering the prefetch batch. Test coverage in TestExtractTxHashMsgEventRecordValidation covers valid, non-prefixed, and short hashes.

  • Clerk existence check (clerk.go:114-128) — GetClerkEventRecord check before broadcasting avoids unnecessary mempool queries and duplicate transactions.

  • Mempool check simplification (base.go:188-237) — Typed UnconfirmedTxs RPC client replaces raw HTTP + manual base64/JSON decoding. Direct type assertion to *clerkTypes.MsgEventRecord simplifies matching logic.

  • BatchGetMainChainTxReceipts (helper/call.go:1150-1180) — Properly validates receipt.BlockNumber != nil before including in results, has nil check for MainChainRPCClient, and gracefully handles per-element errors.

  • Test coverage — Good new tests for event map lookups, stagger semantics, early-return on bad ABI, prefetch round lifecycle, concurrent access, and TxHash validation edge cases.


Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread bridge/listener/rootchain_listener_test.go
Comment thread bridge/listener/rootchain.go
Comment thread bridge/listener/rootchain_listener_test.go Outdated
Comment thread bridge/processor/clerk.go
@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

marcello33
marcello33 previously approved these changes Apr 30, 2026
Copilot AI review requested due to automatic review settings May 5, 2026 09:20
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 5, 2026

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/abci.go
@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

@marcello33 marcello33 merged commit 598beb9 into develop May 19, 2026
25 of 26 checks passed
@kamuikatsurgi kamuikatsurgi deleted the kamui/bridge-improvements branch May 19, 2026 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants