L-02 Fix false clearing of finality violation error in case of termproary …#429
Conversation
✅ API Diff Results - No breaking changes |
|
it's a pr between two feature branches, not sure entirely sure if you need my (or anyone's) review. added @pavel-raykov since he probably has more context over this codebase |
…ation-on-temp-errs # Conflicts: # pkg/logpoller/log_poller.go # pkg/logpoller/log_poller_internal_test.go # pkg/logpoller/log_poller_test.go # pkg/logpoller/orm_test.go
📊 API Diff Results
|
There was a problem hiding this comment.
Pull request overview
This PR narrows when the log poller clears a latched finality-violation health error, moving the reset from the outer PollAndSaveLogs success path into the deeper reorg/canonical-chain verification path. It also adds regression tests around several transient-failure scenarios to confirm the latch is preserved until recovery is verified.
Changes:
- Remove unconditional clearing of
finalityViolatedafter any nil return frompollAndSaveLogs. - Clear the finality latch only after
getCurrentBlockMaybeHandleReorgsucceeds during polling. - Add internal tests for transient RPC/DB-like failures and one happy-path recovery case.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
pkg/logpoller/log_poller.go |
Moves finality-latch clearing into the canonical-chain verification portion of polling. |
pkg/logpoller/log_poller_internal_test.go |
Adds targeted regression tests for latch persistence and successful clearing behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Problem
Possible Clearing of finalityViolated Masks Unresolved Finality Violations
The PollAndSaveLogs function clears the finalityViolated latch whenever pollAndSaveLogs returns nil, but that helper returns nil on several non-success paths: failed latestBlocks or latestSafeBlock reads, failed InsertLogsWithBlocks writes, and even when there are no new blocks to process. As a result, after a genuine finality violation has been latched, a later soft failure can incorrectly clear that latch, emit a misleading "completed successfully" log, and cause Healthy to report no error even though no successful recovery poll has actually occurred.
The interaction between the backup and primary pollers can amplify this behaviour. Both run in the same select loop, and when a finality violation is detected by BackupPollAndSaveLogs and latched via finalityViolated, the primary poller (which runs far more frequently) may clear it on its very next tick. Thus, there is a risk of persistent oscillation where the true error state is visible only in brief windows and Healthy reports healthy for the majority of the time, even though the underlying violation has not been resolved.
Fix
Clear finality violation error only once we successfully confirm that DB state matches RPC