Skip to content

fix: silence expected user-workflow failures from Sentry error alerts#526

Merged
chrisli30 merged 2 commits into
stagingfrom
fix/sentry-error-noise
Apr 16, 2026
Merged

fix: silence expected user-workflow failures from Sentry error alerts#526
chrisli30 merged 2 commits into
stagingfrom
fix/sentry-error-noise

Conversation

@chrisli30
Copy link
Copy Markdown
Member

Summary

PR #463 bridged every logger.Error(...) call to Sentry. That surfaced a pre-existing shape problem: many Error-level logs fire on expected user-workflow outcomes (simulated reverts, on-chain UserOp reverts, workflow-step failures), which were harmless before the bridge but now flood Sentry. The 2026-04-16 weekly review shows 14 issues all with the same culprit frame (pkg/logger func1) and all first-seen 2026-03-04 — the PR #463 deploy date.

This PR:

  • Downgrades expected user-workflow failure sites to Warn across:
    • tenderly_client.go — simulated revert (simulation.status=false, call-trace revert)
    • vm_runner_contract_write.go — method-execution-failed summary
    • executor.go + engine.go — per-task/per-simulation completed-with-failures summaries
  • Removes 5 leftover CRITICAL DEBUG traces in vm_runner_contract_write.go — one of which (ReturnData is nil) was a top Sentry issue even though tenderly_client.go explicitly nulls ReturnData on a reverted simulation (expected post-condition).
  • Extracts preset.LogBundlerError to classify bundler errors:
    • On-chain reverts (success=false in UserOperationEvent) go to Warn (the target contract reverted, not our bug).
    • Infra/AA failures (bundler unreachable, AA21 prefund, AA25 nonce, paymaster revert) stay Error and remain in Sentry.
  • Applies the classifier to three call sites: contract-write UserOp, ETH-transfer UserOp, and the withdrawal RPC path in aggregator/rpc_server.go (previously unconditionally Error, now covered).

Real infra failures (bundler down, AA21, storage writes, scheduler errors, etc.) keep paging via the untouched logger.Error sites.

Test plan

  • go build ./pkg/logger/... ./pkg/erc4337/... ./core/taskengine/... ./aggregator/... clean
  • go test ./pkg/logger/... passes
  • go test -run IsUserOpRevert^|LogBundlerError ./pkg/erc4337/preset/ — 10 new subtests pass (nil, unrelated, AA21, AA25, direct marker, wrapped marker; Warn/Error dispatch; nil-logger guard)
  • Post-deploy: confirm the 10 INVESTIGATE and 4 MONITOR Sentry issues in the 2026-04-16 weekly review stop emitting new events from these code paths
  • Post-deploy: confirm real infra failures (e.g. bundler restart, AA21 on an underfunded wallet) still page Sentry

🤖 Generated with Claude Code

will-dz added 2 commits April 16, 2026 12:13
PR #463 bridged every logger.Error(...) call to Sentry, which flooded the
dashboard with user-workflow failures (contract reverts, failed Tenderly
simulations, on-chain UserOp reverts) that are expected outcomes, not
infra bugs. This downgrades those sites to Warn and removes a set of
leftover CRITICAL DEBUG traces.

- vm_runner_contract_write.go:
  - Remove 5 leftover CRITICAL DEBUG log statements; the ReturnData
    nil path is expected when simulation reverts (tenderly_client.go
    explicitly nulls ReturnData in that case).
  - Downgrade method execution failed summary to Warn.
  - In the bundler-failed branch, keep Error for real infra/AA problems
    (AA21 prefund, bundler unreachable, etc.) but drop to Warn when the
    underlying error is success=false in UserOperationEvent (on-chain
    revert of the user target contract).
- vm_runner_eth_transfer.go: same bundler Error/Warn branching.
- tenderly_client.go: downgrade the two simulation-revert sites to Warn.
- executor.go and engine.go: downgrade the task/simulation completed
  with failures summary lines to Warn; ExecutionStatus_FAILED on the
  persisted execution is the authoritative signal.
…al path

The on-chain-revert vs infra-failure classification added in the prior
commit was duplicated across contract-write and eth-transfer bundler
failure sites, and the withdrawal RPC path still unconditionally logged
Error for the same kinds of bundler failures. Pull the logic into a
single helper in pkg/erc4337/preset so all three call sites share it.

- New pkg/erc4337/preset/bundler_error.go:
  - IsUserOpRevert(err) predicate
  - LogBundlerError(lgr, err, msg, tags...) picks Warn for on-chain
    reverts, Error for infra/AA failures
- Replaces the inline branches in vm_runner_contract_write.go and
  vm_runner_eth_transfer.go
- Applies the same classification to the withdrawal UserOp failure site
  in aggregator/rpc_server.go so user-level withdrawal reverts (e.g.
  ERC20 transfer reverted) no longer page Sentry
- Tests cover nil, unrelated errors, AA-code infra errors, direct/wrapped
  revert markers, and the nil-logger guard
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces noisy Sentry alerts by downgrading expected user-workflow failures (simulated/on-chain reverts and per-task failure summaries) from Error to Warn, while keeping true infra/AA failures at Error via a new bundler error classifier in the ERC-4337 preset layer.

Changes:

  • Add preset.IsUserOpRevert + preset.LogBundlerError to classify on-chain UserOp reverts as Warn and keep infra/AA failures as Error, with focused unit tests.
  • Replace several logger.Error(...) calls with Warn(...) for expected simulation/workflow failure summaries and Tenderly revert outcomes.
  • Remove / downgrade noisy “CRITICAL DEBUG” logs around Tenderly return data handling.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pkg/erc4337/preset/bundler_error.go Introduces revert detection + severity-aware bundler error logging helper.
pkg/erc4337/preset/bundler_error_test.go Adds unit tests validating classification and nil-logger safety.
core/taskengine/vm_runner_eth_transfer.go Uses preset.LogBundlerError for ETH transfer UserOp failures to avoid paging on reverts.
core/taskengine/vm_runner_contract_write.go Uses preset.LogBundlerError, removes noisy debug traces, and downgrades method-failure summary to Warn.
core/taskengine/tenderly_client.go Downgrades simulation revert logs to Warn (expected workflow outcomes).
core/taskengine/executor.go Downgrades task “completed with failures” summary from Error to Warn.
core/taskengine/engine.go Downgrades workflow simulation failure summary from Error to Warn.
aggregator/rpc_server.go Uses preset.LogBundlerError on withdrawal UserOp failure path for revert-vs-infra severity.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@chrisli30 chrisli30 merged commit 3263c78 into staging Apr 16, 2026
23 checks passed
@chrisli30 chrisli30 deleted the fix/sentry-error-noise branch April 16, 2026 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants