Skip to content

fix: Tenderly state overrides for simulation balance propagation; include chainId audience claim in JWT#519

Merged
chrisli30 merged 5 commits into
mainfrom
staging
Apr 10, 2026
Merged

fix: Tenderly state overrides for simulation balance propagation; include chainId audience claim in JWT#519
chrisli30 merged 5 commits into
mainfrom
staging

Conversation

@chrisli30

Copy link
Copy Markdown
Member

This pull request introduces several improvements and fixes across the aggregator, authentication, simulation, and summarization components. The most significant changes are the addition of simulation state injection for event triggers (especially for ERC20 transfers), enhancements to loop step error handling, and improved extraction of contract addresses in summarization logic. Comprehensive tests for simulation state management and ERC20 balance overrides have also been added.

Simulation and State Injection Enhancements:

  • Added the injectEventTriggerStateOverrides and injectTransferEventState methods in core/taskengine/engine.go to automatically inject balance changes into the simulation state when a Transfer event is detected, ensuring downstream contract write simulations reflect the correct balances. This fixes issues where simulated contract writes would incorrectly fail due to missing balance updates. [1] [2]

  • Introduced core/taskengine/simulation_state_test.go with detailed tests for simulation state map operations, ERC20 balance slot computation, and integration scenarios where balance overrides are required for successful contract write simulations.

Authentication and JWT Handling:

  • Modified JWT creation in aggregator/key.go to ensure the aud claim uses the smart wallet chain ID, aligning with the verifier's expectations and supporting cross-chain configurations. Also updated the issuer to use a constant. [1] [2]

  • Updated JWT verification logic in core/auth/user.go to allow admin-role JWTs to manage any wallet and to clarify subject matching for per-wallet keys.

Loop Step and Summarization Improvements:

  • Changed loop step result handling in core/taskengine/manual_trigger_loop_test.go so that partial iteration failures cause the step to report success=false with a descriptive error, supporting better detection of partial successes.

  • Improved contract address and method name extraction in summarization logic for LOOP steps in core/taskengine/summarizer_context_memory.go, including a fallback that checks iteration-level metadata. Added extractResolvedContractAddressFromLoopOutput for robust address extraction from loop outputs. [1] [2]

Other:

  • Added a missing import for structpb in summarizer context memory tests.

@chrisli30

Copy link
Copy Markdown
Member Author

@claude, review this PR

@claude

claude Bot commented Apr 10, 2026

Copy link
Copy Markdown

Claude finished @chrisli30's task in 3m 43s —— View job


PR Review: fix: Tenderly state overrides for simulation balance propagation

  • Read changed files and gather context
  • Review simulation state changes (simulation_state.go, engine.go, vm.go)
  • Review authentication/JWT changes (aggregator/key.go, core/auth/user.go)
  • Review Tenderly client changes (tenderly_client.go)
  • Review summarization changes (summarizer_context_memory.go)
  • Review test coverage
  • Post final review

Summary

This PR bundles four distinct fixes. The simulation state injection for ERC20 Transfer triggers is the most substantial piece of new infrastructure; the JWT audience fix and loop partial-failure reporting are well-motivated corrections. The summarization fallback is a clean improvement.

Overall the approach is sound. There are several items below worth addressing before merge.


Issues Found

1. Logic bug: non-zero balance falls through to an error, not the reference-holder probe

core/taskengine/simulation_state.go:205–292

The comment at line 227 says "Non-zero balance but no slot matched — fall through to reference-holder probe", but the reference-holder block is gated on if expectedBalance.Sign() == 0 (line 233). When a holder has a non-zero balance and none of commonBalanceSlots match, execution skips the reference-holder section entirely and hits the final return -1, error on line 292. The comment is wrong and the behavior is incorrect for tokens with unusual storage layouts — those will fail slot detection rather than falling back.

Fix: restructure so the reference-holder probe runs any time the direct probe fails, regardless of balance:

// After the primary probe fails, try reference holders unconditionally
// (for zero-balance holders AND for non-standard layouts).

Fix this →


2. Excessive Info-level logging in production simulation paths

core/taskengine/engine.go:2992–3004, 3085–3107, 3119–3132

Several log lines that look like diagnostic instrumentation were left at Info level:

n.logger.Info("SimulateTask: state override check", ...)
n.logger.Info("injectEventTriggerStateOverrides: called", ...)
n.logger.Info("injectTransferEventState: called", ...)
n.logger.Info("injectTransferEventState: extracted fields", ...)

These fire on every simulation. They should be Debug (or removed). The ones that are already Debug are fine. Warn/Info for success paths like "Discovered ERC20 balance storage slot" inside simulation_state.go are also noisier than needed — those can be Debug.

Fix this →


3. Double RPC dial in InjectERC20BalanceChange

core/taskengine/simulation_state.go:317–335

ProbeERC20BalanceSlot already dials rpcURL, queries the chain, and closes the client. InjectERC20BalanceChange then dials it again to fetch currentBalance. Since the context is shared, the second dial could reuse the connection pool, but it still pays the dial overhead twice. Consider accepting a *ethclient.Client as a parameter, or refactoring to pass the already-open client through.


4. callData append aliasing (low-severity, but fragile)

core/taskengine/simulation_state.go:188–189, 246, 330–331

balanceOfSig := crypto.Keccak256([]byte("balanceOf(address)"))[:4]
callData := append(balanceOfSig, common.LeftPadBytes(holder.Bytes(), 32)...)

crypto.Keccak256 returns a 32-byte slice; slicing to [:4] gives len=4, cap=32. append will write into the backing array rather than allocate. This doesn't cause a bug here because balanceOfSig is never read after the append, but the pattern is fragile. The conventional safe form is:

sig := crypto.Keccak256([]byte("balanceOf(address)"))[:4]
callData := make([]byte, 4+32)
copy(callData, sig)
copy(callData[4:], common.LeftPadBytes(holder.Bytes(), 32))

5. Breaking behavioral change for loop partial failures — needs explicit callout

core/taskengine/vm.go:4886–4926, core/taskengine/manual_trigger_loop_test.go

The PR changes loop partial-failure reporting from success=true (the behavior explicitly set in issue #511) to success=false. This is a deliberate design change and the motivation is sound (better detection at the execution level), but it inverts the semantics documented in #511 and could affect any client or system that currently treats success=true on loop steps as "the loop ran, inspect individual outputs."

The PR description mentions this as "changed loop step result handling so that partial iteration failures cause the step to report success=false", but given it reverses an explicitly documented behavior from a recent issue, it warrants a clear note in the PR (or a linked issue explaining the rationale).


6. user.go: duplicate admin role check paths

core/auth/user.go:62–106

After this change, there are two independent admin-role checks — one in the sub == "apikey" path (lines 63–86) and one in the new EOA-subject path (lines 93–101). Both are correct in isolation, but the duplication means a future change to admin logic needs to be applied in both places. Consider extracting hasAdminRole(claims) as a small helper.

Fix this →


7. Minor: wrapResultWithTxHash is now a thin wrapper

core/taskengine/vm.go:4295, 4300–4303

wrapResultWithTxHash is still called at line 4295 for the ethTransfer path, so it must stay. But since it now only delegates to wrapResultWithMetadata, it could be inlined to reduce the indirection. Not blocking, just a housekeeping note.


Positive Notes

  • The SimulationStateMap design is clean — mutex-protected, address-normalized to lowercase, separate maps for storage and ETH balances.
  • The Tenderly state-object merge logic in tenderly_client.go is a clear improvement over the old single-contract override.
  • The invalid_state_storage retry now strips storage overrides across all addresses (not just the target contract), which is the correct fix.
  • JWT audience correction in aggregator/key.go is well-commented and the guard against a zero SmartWallet.ChainID prevents silent misconfiguration.
  • Test coverage is solid: simulation_state_test.go covers basic ops, ERC20 slot computation, and the integration scenario; summarizer_context_memory_test.go adds two meaningful new cases for LOOP steps.

@claude

claude Bot commented Apr 10, 2026

Copy link
Copy Markdown

Code Review

Overall this is solid, well-structured work with good test coverage. A few issues worth addressing before merge.


Bugs / Correctness

1. Partial iteration failure detection is broken when firstError is nil

core/taskengine/vm.go, near the end of executeLoopWithQueue:

if iterationFailCount > 0 && firstError != nil {

If per-iteration runner errors don't propagate into firstError (e.g. the runner captures the error internally and returns a nil result but no error), you can have iterationFailCount > 0 with firstError == nil. In that case the loop reports success=true with nil results in the array — a silent failure. The gate should be iterationFailCount > 0 alone; the error message can fall back to a generic string when firstError is nil.


2. vm.smartWalletConfig nil check missing on ERC20 path

core/taskengine/engine.go, injectTransferEventState:

// ETH path — nil guarded ✓
if vm.smartWalletConfig != nil && vm.smartWalletConfig.EthRpcUrl != "" {

// ERC20 path — no nil guard ✗
if err := vm.simulationState.InjectERC20BalanceChange(ctx, vm.smartWalletConfig, ...)

InjectERC20BalanceChange does handle a nil config by returning an error, so this won't panic, but it's an asymmetry that will produce a "no RPC URL available" warning log on every simulation run when the config is absent. Align the two paths for clarity.


Performance

3. Two RPC connections opened per ERC20 balance injection

InjectERC20BalanceChange calls ProbeERC20BalanceSlot, which opens an ethclient connection, closes it, and then InjectERC20BalanceChange opens a second connection to query balanceOf. Consider threading the client through or reusing it:

client, err := ethclient.DialContext(ctx, rpcURL)
if err != nil { ... }
defer client.Close()
mappingSlot, err := s.probeERC20BalanceSlotWithClient(ctx, client, tokenContract, holder)

The current approach opens up to two TCP connections per simulated transfer event.


4. simulationState.IsEmpty() acquires a mutex inside an Info-level log call

core/taskengine/engine.go, injectEventTriggerStateOverrides:

n.logger.Info("injectEventTriggerStateOverrides: called",
    "simulationStateEmpty", vm.simulationState.IsEmpty())

IsEmpty() takes s.mu.Lock(). Calling it just to populate a log field (that's usually discarded) on every simulation is unnecessary lock contention. Either promote this to Debug level or read the field without locking for a hint value in a log context.

Similarly, the new Info log added directly in SimulateTask fires on every simulation, not just event-trigger ones:

n.logger.Info("SimulateTask: state override check",
    "triggerType", triggerType, ...)

This belongs at Debug level.


Code Quality

5. Hardcoded "admin" string in VerifyJwtKeyForUser

core/auth/user.go:

if roleStr, ok := v.(string); ok && ApiRole(roleStr) == "admin" {

If there's an AdminRole constant in this package, use it here. If not, define one. Magic strings in auth logic are fragile — a typo here silently breaks admin access rather than failing at compile time.


6. RawStateDiff []interface{} is untyped

ContractWriteSimulationResult.RawStateDiff and MergeRawStateDiff([]interface{}) do repeated .(map[string]interface{}) assertions. A small struct:

type StateDiffEntry struct {
    Address  string `json:"address"`
    Key      string `json:"key"`
    Original string `json:"original"`
    Dirty    string `json:"dirty"`
}

would remove the defensive type assertions in MergeRawStateDiff and make the JSON contract explicit.


7. erc20BalanceSlot allocates on every call

func erc20BalanceSlot(holder common.Address, mappingSlot int64) common.Hash {
    key := common.LeftPadBytes(holder.Bytes(), 32)
    slot := common.LeftPadBytes(big.NewInt(mappingSlot).Bytes(), 32)
    return crypto.Keccak256Hash(append(key, slot...))
}

This path is called in a loop over commonBalanceSlots for every probe attempt. A single [64]byte on the stack avoids three heap allocations per iteration:

var buf [64]byte
copy(buf[12:32], holder.Bytes())
big.NewInt(mappingSlot).FillBytes(buf[32:])
return crypto.Keccak256Hash(buf[:])

Test Coverage

8. Integration test silently skips in CI

TestSimulationStateMap_ContractWriteWithBalanceOverride calls t.Skip when no test config is available. Given that most CI environments won't have a Tenderly key + Sepolia RPC, the "with_state_override_succeeds" assertion — the one that validates the core fix — never runs. Either gate it behind a build tag (//go:build integration) and add it to make test/integration, or extract it into a separate integration test file so the skip is explicit rather than hidden inside a regular go test run.


Minor Nits

  • commonBalanceSlots includes slot 51 which is unusual — worth a brief comment explaining which token/protocol uses it.
  • In BuildStateObjects, calling s.getOrCreateAddrObj is documented "Must be called with s.mu held" but the comment isn't on the call sites — easy to miss in a future refactor.
  • The wrapResultWithTxHash wrapper now just delegates to wrapResultWithMetadata with an empty contractAddress. The old function can be deleted if no callers outside this file use it (or kept as a convenience wrapper — either is fine, just confirm there are no stale callers).

chrisli30 pushed a commit that referenced this pull request Apr 10, 2026
- Handle iterationFailCount > 0 with firstError == nil (fallback message)
- Add nil guard for vm.smartWalletConfig on ERC20 injection path
- Demote debug/trace logs from Info to Debug level
- Add comment explaining slot 51 (Compound cToken contracts)
- Handle iterationFailCount > 0 with firstError == nil (generic fallback)
- Add nil guard for vm.smartWalletConfig on ERC20 injection path
- Demote simulation debug logs from Info to Debug level
- Remove IsEmpty() mutex acquisition from log arguments
- Add comment explaining commonBalanceSlots entries
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants