Skip to content

fix: stale smart wallet detection, dep CVE bumps, gRPC log noise, CI runners#520

Merged
chrisli30 merged 5 commits into
mainfrom
staging
Apr 11, 2026
Merged

fix: stale smart wallet detection, dep CVE bumps, gRPC log noise, CI runners#520
chrisli30 merged 5 commits into
mainfrom
staging

Conversation

@chrisli30

@chrisli30 chrisli30 commented Apr 11, 2026

Copy link
Copy Markdown
Member

Summary

Bundles four changes ready to ship from staging to main:

  • fix: detect stale smart wallet rows from upgraded factories — adds a wsalt:<owner>:<factory>:<salt> secondary index so StoreWallet, GetWallet, and ListWallets can detect when a factory's account implementation upgrade has produced a new derivation address for the same triple. The previously canonical row is flagged StaleDerivation=true and hard-filtered out of ListWallets responses (preserved on disk because the on-chain wallet may still hold assets). Ships /ava backfill-wallet-salt-index for one-off cleanup of existing zombie rows. This fixes the three-salt-0 collision observed on Base for 0xc60e71bd….
  • chore: bump go-ethereum to 1.16.9 and labstack/echo to 4.15.1 — clears HIGH GHSA-2gjw-fg97-vg3r (go-ethereum p2p DoS) and the matching MEDIUM. Echo bump removes its transitive golang-jwt/jwt v3 dep; we only use middleware.Logger / middleware.Recover, not the JWT middleware, so the removal is invisible to our code.
  • fix: downgrade gRPC reconnect failure log from Error to Warn — quiets noisy operator-reconnect errors that aren't actually actionable.
  • ci: switch lightweight workflows from Blacksmith to GitHub-hosted runners — restores hosted runners for low-cost workflows.

Storage migration

The wallet-salt-index fix introduces a new BadgerDB key prefix (wsalt:<owner>:<factory>:<salt> → address) but does NOT modify or migrate any existing w:<owner>:<address> rows on its own. The new code path starts populating the index for new wallet creations and detects future implementation upgrades immediately upon deploy. Existing zombie rows from past Base factory upgrades remain in BadgerDB and stay visible in ListWallets responses until the operator runs /ava backfill-wallet-salt-index during a maintenance window.

Rollout sequence:

  1. Merge this PR → release builds → Watchtower picks up new image → aggregator restarts with new code (backwards-compatible).
  2. Let it soak for a day or two; the prevention path is now active for any new wallet creations.
  3. Schedule maintenance window per chain:
    docker stop aggregator-base
    docker run --rm --volumes-from aggregator-base avaprotocol/ap-avs:latest \
      backfill-wallet-salt-index --config /app/config/aggregator-base.yaml --dry-run
    # inspect summary, then re-run without --dry-run
    docker start aggregator-base
  4. Sanity check via getWallets SDK call against the previously affected EOA.

The migration was dry-run validated against the staging Sepolia DB (220 wallets, 0 stale — Sepolia's factory hasn't been upgraded). Base will be the meaningful run.

Test plan

  • go build ./... clean
  • go test ./core/taskengine/ -run 'TestStoreWallet_|TestLookupCanonicalWalletAddress|TestMarkWalletStale|TestMarkPreviousCanonicalStaleIfAny|TestListWallets_HardFiltersStaleRows|TestBackfillWalletSaltIndex' — 13 new unit tests pass, including a regression test that replays the exact 3-salt-0 collision from the Base report
  • TestExecuteTask_SequentialContractWrites_Sepolia end-to-end against the live staging bundler via SSH tunnel — passes (USDC approval + USDC→WETH swap executed on-chain through the bundler)
  • Standalone migration script dry-run on Sepolia BadgerDB (220 rows scanned, 0 stale, all 220 secondary index entries would be written)
  • /ava backfill-wallet-salt-index --config config/aggregator-sepolia.yaml --dry-run cobra subcommand registers and runs end-to-end (errored out at the BadgerDB lock check, which is the expected operational guard)
  • make test shows the same 5 pre-existing live-network failures unrelated to this PR (Base bundler tests; TestExecuteTask_SequentialContractWrites_Sepolia now passes with the staging tunnel up)
  • Post-merge: Watchtower picks up the new image; verify aggregator-base and aggregator-ethereum start cleanly
  • Post-merge: schedule maintenance window and run backfill-wallet-salt-index on Base + Ethereum

…ners

Claude agent, Claude code review, and release workflows are API-bound,
not compute-bound. Using free GitHub-hosted runners instead of paid
Blacksmith 4-vCPU runners for these lightweight jobs.

Compute-heavy workflows (tests, Docker builds, deploys) remain on
Blacksmith for caching and performance benefits.
DNS resolution failures during ping reconnection are transient and
self-heal on the next 5-second retry. Logging at Error level sends
unnecessary noise to Sentry (EIGENLAYER-AVS-19).
Wallets are keyed by their derived address (w:<owner>:<address>), so
when a factory's account implementation is upgraded and the same
(owner, factory, salt) triple starts deriving to a new address, the old
row stays in BadgerDB and ListWallets surfaces both as separate entries
with the same salt -- the three salt-0 collision observed on Base for
0xc60e71bd0f2e6d8832Fea1a2d56091C48493C788.

Add a wsalt:<owner>:<factory>:<salt> -> address secondary index that
StoreWallet writes atomically alongside the primary key. The fresh
derivation paths in GetWallet / ListWallets consult it before inserting
and, on mismatch, flip the previous wallet to StaleDerivation=true so
ListWallets hard-filters it from the response. Stale rows are preserved
on disk because the on-chain wallet may still hold assets.

Ship /ava backfill-wallet-salt-index for one-off cleanup of existing
zombie rows. Walks every wallet, re-derives via the live RPC, writes the
index for canonical rows, flags mismatches stale. Operational notes for
the production Docker rollout are in --help.

Tests cover schema atomicity, the engine prevention path, the
ListWallets filter, and the migration core with a fake deriver --
including a regression test that replays the exact 3-salt-0 collision
from the Base report.
Clears two of the open dependabot advisories on staging:

- HIGH GHSA-2gjw-fg97-vg3r: go-ethereum DoS via malicious p2p message
  (also clears MEDIUM GHSA tracking the same fix). Single patch
  version bump within the 1.16.x line.

- The labstack/echo bump removes the transitive github.com/golang-jwt/jwt
  v3.2.2 dependency that came in via echo/middleware. We only use
  middleware.Logger() and middleware.Recover(), not the JWT middleware,
  so removing the v3 jwt is invisible to our code. v3 is no longer
  maintained and the only fix path is to drop it.

Note: jwt v3 still appears in go.sum as a transitive dep of
github.com/Layr-Labs/eigensdk-go (used by its fireblocks signing
client, which we do not invoke). Even the latest eigensdk-go
v1.0.0-rc.1 still pulls jwt v3, so HIGH GHSA-mh63-6h87-95cp cannot
be cleared from this repo without an upstream change.

The labstack/echo bump pulled in incidental upgrades to
golang.org/x/{crypto,net,sync,sys,text,time}, mattn/go-colorable, and
stretchr/testify. All are minor patch bumps surfaced by go mod tidy.
@socket-security

socket-security Bot commented Apr 11, 2026

Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Updatedgithub.com/​labstack/​echo/​v4@​v4.12.0 ⏵ v4.15.172 +1100100100100
Updatedgolang.org/​x/​crypto@​v0.45.0 ⏵ v0.46.074 +1100100100100
Updatedgithub.com/​ethereum/​go-ethereum@​v1.16.8 ⏵ v1.16.976 +199 +1710010070
Updatedgithub.com/​stretchr/​testify@​v1.10.0 ⏵ v1.11.196 +1100100100100

View full report

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR bundles production-ready fixes and chores: it adds a (owner, factory, salt) → address secondary index to detect/mark stale smart-wallet rows after factory implementation upgrades (plus a backfill command + tests), bumps several Go dependencies to address CVEs, reduces gRPC reconnect log severity, and switches select CI workflows back to GitHub-hosted runners.

Changes:

  • Add wallet salt secondary index + stale-row handling, plus backfill tooling and unit tests.
  • Bump go-ethereum / echo (and related transitive deps) to remediate reported advisories.
  • Reduce operational noise (gRPC reconnect log level) and update CI runners.

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
core/taskengine/schema.go Adds wsalt: key builder, canonical lookup helper, and extends StoreWallet/MarkWalletStale to maintain the new secondary index.
core/taskengine/engine.go Detects factory upgrades on write paths and hard-filters StaleDerivation wallets from ListWallets.
core/taskengine/migrate_wallet_salt_index.go Introduces the backfill core that scans wallet rows, writes wsalt: entries, and marks stale derivations.
core/taskengine/wallet_salt_index_test.go Adds a comprehensive unit test suite covering index writes, stale marking, list filtering, and backfill behaviors.
cmd/backfillWalletSaltIndex.go Adds /ava backfill-wallet-salt-index cobra subcommand wrapper around the backfill core.
scripts/migration/backfill_wallet_salt_index/main.go Adds a standalone go run wrapper for local iteration of the backfill.
model/user.go Adds StaleDerivation field (and docs) to SmartWallet model.
operator/worker_loop.go Downgrades gRPC ping reconnect failure log from Error to Warn.
go.mod Bumps go-ethereum, echo, testify, and several x/* libraries.
go.sum Updates dependency checksums consistent with go.mod bumps.
.github/workflows/release-on-pr-close.yml Switches workflow runner from Blacksmith to ubuntu-latest.
.github/workflows/claude.yml Switches workflow runner from Blacksmith to ubuntu-latest.
.github/workflows/claude-code-review.yml Switches workflow runner from Blacksmith to ubuntu-latest.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

wallet.Address.Hex(), owner.Hex(), factory.Hex(), wallet.Salt.String())
if !opts.DryRun {
if setErr := db.Set(indexKey, []byte(strings.ToLower(wallet.Address.Hex()))); setErr != nil {
logf(" [err] write secondary index: %v", setErr)

Copilot AI Apr 11, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BackfillWalletSaltIndex currently logs write failures for secondary-index updates but still returns a nil error at the end. That can make an operator think the migration succeeded even though the index write failed (partial/incorrect backfill). Consider returning an error immediately on db.Set failure (or collecting failures and returning a summarized error) so the CLI exits non-zero and the run can be retried safely.

Suggested change
logf(" [err] write secondary index: %v", setErr)
logf(" [err] write secondary index: %v", setErr)
return stats, fmt.Errorf("write secondary index for owner=%s factory=%s salt=%s address=%s: %w",
owner.Hex(), factory.Hex(), wallet.Salt.String(), wallet.Address.Hex(), setErr)

Copilot uses AI. Check for mistakes.
Comment on lines +150 to +157
stats.NewlyMarkedStale++
logf(" [stale] %s ≠ live derivation %s (owner=%s factory=%s salt=%s)",
wallet.Address.Hex(), derived.Hex(), owner.Hex(), factory.Hex(), wallet.Salt.String())
if !opts.DryRun {
if markErr := MarkWalletStale(db, owner, wallet.Address.Hex()); markErr != nil {
logf(" [err] mark stale: %v", markErr)
}
}

Copilot AI Apr 11, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BackfillWalletSaltIndex also swallows failures when marking a wallet stale (it only logs). If MarkWalletStale fails, the stats will report NewlyMarkedStale but the DB will remain unchanged. This should return an error (or at least track and return a final error) so the migration run can be treated as failed.

Suggested change
stats.NewlyMarkedStale++
logf(" [stale] %s ≠ live derivation %s (owner=%s factory=%s salt=%s)",
wallet.Address.Hex(), derived.Hex(), owner.Hex(), factory.Hex(), wallet.Salt.String())
if !opts.DryRun {
if markErr := MarkWalletStale(db, owner, wallet.Address.Hex()); markErr != nil {
logf(" [err] mark stale: %v", markErr)
}
}
logf(" [stale] %s ≠ live derivation %s (owner=%s factory=%s salt=%s)",
wallet.Address.Hex(), derived.Hex(), owner.Hex(), factory.Hex(), wallet.Salt.String())
if opts.DryRun {
stats.NewlyMarkedStale++
continue
}
if markErr := MarkWalletStale(db, owner, wallet.Address.Hex()); markErr != nil {
logf(" [err] mark stale: %v", markErr)
return stats, fmt.Errorf("mark wallet stale for owner=%s wallet=%s: %w", owner.Hex(), wallet.Address.Hex(), markErr)
}
stats.NewlyMarkedStale++

Copilot uses AI. Check for mistakes.
Comment on lines +65 to +68
items, err := db.GetByPrefix([]byte("w:"))
if err != nil {
return nil, fmt.Errorf("scan wallet records: %w", err)
}

Copilot AI Apr 11, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db.GetByPrefix loads all wallet rows (keys + full values) into memory before processing. On large BadgerDBs this can cause high memory usage or OOM during the migration. Consider iterating with a Badger iterator (or extending storage.Storage with a streaming iterator) so records are processed incrementally.

Copilot uses AI. Check for mistakes.
Comment on lines +76 to +79
fmt.Printf("Config: %s\n", configPath)
fmt.Printf("DB path: %s\n", nodeConfig.DbPath)
fmt.Printf("RPC URL: %s\n", nodeConfig.SmartWallet.EthRpcUrl)
fmt.Printf("Mode: %s\n", backfillModeLabel(backfillWalletSaltIndexDryRun))

Copilot AI Apr 11, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command prints the full smart-wallet RPC URL to stdout. If the URL contains credentials/API keys (common for managed RPC providers), this can leak secrets into logs/CI artifacts/terminal scrollback. Consider redacting userinfo/query params (or only printing host) by default, and optionally printing the full URL only behind an explicit --verbose/--print-secrets flag.

Copilot uses AI. Check for mistakes.
Comment on lines +73 to +76
fmt.Printf("Config: %s\n", *configPath)
fmt.Printf("DB path: %s\n", cfg.DbPath)
fmt.Printf("RPC URL: %s\n", cfg.SmartWallet.EthRpcUrl)
fmt.Printf("Mode: %s\n", modeLabel(*dryRun))

Copilot AI Apr 11, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standalone migration wrapper prints the full RPC URL to stdout, which can leak API keys if the URL embeds credentials or query params. Consider redacting sensitive parts by default (print host only) and require an explicit flag to print the full URL.

Copilot uses AI. Check for mistakes.
Three changes from the PR #520 review:

1. BackfillWalletSaltIndex now fails fast (returns a non-nil error) when
   db.Set on the secondary index or MarkWalletStale on a zombie row
   fails. The previous behavior logged the failure and returned nil,
   so the CLI exited 0 and operators thought the migration succeeded
   on a partial backfill. Both write paths are idempotent, so the
   operator can fix the underlying storage issue and re-run.

2. Add taskengine.RedactRPCURL and use it in both the /ava
   backfill-wallet-salt-index subcommand and the standalone script.
   Tenderly, Alchemy, and Infura embed API keys directly in the URL
   path or query string, so the previous "RPC URL: <full-url>" status
   line dumped provider secrets into operator terminal scrollback and
   any captured CI logs. The redacted form preserves scheme + host
   for diagnostics while replacing the path with <redacted>.

3. Add test coverage:
   - TestRedactRPCURL with 11 cases (Tenderly/Infura/Alchemy paths,
     query-string credentials, userinfo credentials, localhost
     tunnels, bare hosts, and unparseable inputs).
   - TestBackfillWalletSaltIndex_FailsFastOnSetError and
     TestBackfillWalletSaltIndex_FailsFastOnMarkStaleError use a
     closingDB wrapper around the real BadgerStorage to inject
     storage failures and confirm the migration surfaces them.

Skipped Copilot suggestion #3 (db.GetByPrefix loads everything into
memory). It's an existing pattern across the taskengine package, the
migration runs once during a maintenance window with the aggregator
stopped, and realistic Base wallet counts (thousands) fit comfortably.
A streaming iterator should be a storage.Storage interface change
addressing all consumers, not piecemeal here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants