Skip to content

Fix parallel execution metrics and opcode tracing#11445

Closed
benaadams wants to merge 12 commits into
masterfrom
processing-stats
Closed

Fix parallel execution metrics and opcode tracing#11445
benaadams wants to merge 12 commits into
masterfrom
processing-stats

Conversation

@benaadams
Copy link
Copy Markdown
Member

@benaadams benaadams commented Apr 30, 2026

Changes

  • Preserve the block-processing metric scope across BAL parallel validation workers so EVM opcode, call, and storage counters are attributed to block processing.
  • Aggregate ProcessingStats over the actual report-window delta instead of the last processed block, keeping block count/range, MGas, transactions, blobs, and throughput aligned when multiple blocks are logged together.
  • Forward only parallel-safe block tracers into BAL parallel worker receipt tracers, and make the opcode tracing plugin safe for concurrent transaction tracing.
  • Optimize the opcode block tracer hot path by removing AsyncLocal and the shared transaction-completion lock, using thread-local active tx state plus one-shot atomic accumulation.
  • Keep BAL parallel validation enabled for validated EIP-8037 blocks and cover the ordered gas-limit validation behavior.
  • Stabilize CI-sensitive tests around ParallelUnbalancedWork worker scheduling, Merge.Plugin payload-processing timeouts, and SnapSync server-state persistence.
  • Add regression coverage for processing-stat aggregation, parallel metric scope propagation, opcode tracing under parallel execution, and BAL EIP-8037 validation.

Types of changes

What types of changes does your code introduce?

  • Bugfix (a non-breaking change that fixes an issue)

Testing

Requires testing

  • Yes

If yes, did you write tests?

  • Yes

Notes on testing

  • dotnet test --project src/Nethermind/Nethermind.Consensus.Test/Nethermind.Consensus.Test.csproj -c release -- --filter FullyQualifiedName~ProcessingStatsTests
  • dotnet test --project src/Nethermind/Nethermind.Consensus.Test/Nethermind.Consensus.Test.csproj -c checked -- --filter FullyQualifiedName~ProcessingStatsTests
  • dotnet test --project src/Nethermind/Nethermind.Blockchain.Test/Nethermind.Blockchain.Test.csproj -c release -- --filter FullyQualifiedName~Parallel_validation_preserves_processing_thread_metric_scope_for_worker_transactions
  • dotnet test --project src/Nethermind/Nethermind.Blockchain.Test/Nethermind.Blockchain.Test.csproj -c release -- --filter FullyQualifiedName~Parallel_validation_forwards_parallel_safe_block_tracer_to_worker_transactions
  • dotnet test --project src/Nethermind/Nethermind.Blockchain.Test/Nethermind.Blockchain.Test.csproj -c release -- --filter "FullyQualifiedName~Parallel_validation_preserves_processing_thread_metric_scope_for_worker_transactions|FullyQualifiedName~PrepareForProcessing_keeps_parallel_bal_execution|FullyQualifiedName~IncrementalValidation_rejects_eip8037_tx_when_worst_case_exceeds_ordered_remaining_gas"
  • dotnet test --project src/Nethermind/Nethermind.OpcodeTracing.Plugin.Test/Nethermind.OpcodeTracing.Plugin.Test.csproj -c release
  • dotnet test --project src/Nethermind/Nethermind.Core.Test/Nethermind.Core.Test.csproj -c checked -- --filter FullyQualifiedName~ParallelUnbalancedWorkTests.For_WhenWorkerFaults_OtherWorkersStopFetchingWork
  • dotnet build src/Nethermind/Nethermind.Merge.Plugin.Test/Nethermind.Merge.Plugin.Test.csproj -c release
  • dotnet build src/Nethermind/Nethermind.Consensus/Nethermind.Consensus.csproj -c release
  • dotnet build src/Nethermind/Nethermind.Synchronization.Test/Nethermind.Synchronization.Test.csproj -c release --no-restore

Documentation

Requires documentation update

  • Yes
  • No

Requires explanation in Release Notes

  • Yes
  • No

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 30, 2026

Claude finished @benaadams's task in 4m 32s —— View job


PR Review: Fix Processing stats for parallel and aggregation

  • Gather context and load repo rules
  • Read changed files
  • Analyze correctness, thread-safety, and performance
  • Post inline comments for specific issues
  • Write final review summary

Summary

This PR fixes two distinct bugs in processing statistics:

  1. Aggregation across multi-block branches (ProcessingStats.cs, BlockchainProcessor.cs): The old UpdateStats(Block?, ...) signature only saw the last block in a branch. The new UpdateStats(IReadOnlyList<Block>, ...) call correctly accumulates gas, transactions, blob counts, and block count across all blocks in a branch. The BlockFrom calculation was also wrong — block.Number - chunkBlocks + 1 is incorrect when blocks arrive in batches larger than 1 across reporting windows; it's now tracked exactly via _chunkFirstBlockNumber.

  2. Parallel worker metrics (ParallelBlockValidationTransactionsExecutor.cs): EVM op-code metrics (MainThreadOpCodes, MainThreadSLoadOpcode, etc.) route increments based on ProcessingThread.IsBlockProcessingThread. Before this fix, parallel workers ran with that flag unset, so their metric increments landed in _otherXxx counters rather than _mainXxx, causing the CaptureStartStatsUpdateStats delta to miss all parallel transaction work. Verified in Evm/Metrics.cs that the counters are global and use Interlocked operations, so the propagation pattern is thread-safe.


Findings

No Critical, High, or Medium issues. Low findings noted via inline comments:

# Severity File Description
1 Low ProcessingStats.cs:222 Metrics.GasUsed = block.GasUsed uses last block's gas, while data.GasUsed (used elsewhere in the same method) is the batch total. Likely intentional as a live-monitoring metric, but deserves a comment.
2 Low ProcessingStats.cs:270 _chunkBlobs accumulates after the HasStateForBlock early-return guard (line 244), while other chunk counters accumulate before it. Pre-existing asymmetry, unchanged by this PR, but worth noting since blob-counting was touched.
3 Low ProcessingStatsTests.cs:83 The UpdateStats_aggregates_multiple_updates_until_report_window test uses a 1100 ms Task.Delay to cross the 1-second reporting threshold. Functional but slow and fragile on a busy CI host.

What looks good

  • Correctness of aggregation: All six _chunk* fields now accumulate across the whole branch before the 1-second reporting window, and _chunkFirstBlockNumber is properly reset to -1 sentinel after each report.
  • Thread-safety: GenerateReport is serialized under _reportLock; Evm.Metrics counters use Interlocked; IsBlockProcessingThread is thread-local, so save/restore in finally is the right pattern.
  • Object pool cleanup: BlockDataPolicy.Return now resets the new fields (BlockCount, FirstBlockNumber, GasUsed, TransactionCount, BlobCount), preventing stale data from a recycled object.
  • Test coverage: Both bugs get direct regression tests. [NonParallelizable] on ProcessingStatsTests correctly avoids metric pollution from concurrent test runs.
  • Parallel exception safety: The existing TrySetCanceled unblocking path for IncrementalValidation is preserved and the finally correctly restores IsBlockProcessingThread even on the i == 0 early-return path.

Mergeable.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates processing statistics reporting to correctly handle batched/parallel block processing by aggregating totals across multiple processed blocks and preserving metric scope during parallel transaction execution.

Changes:

  • Change IProcessingStats.UpdateStats to accept a list of processed blocks and aggregate gas/tx/blob totals per update.
  • Fix reporting window block-range (BlockFrom) when multiple blocks are processed per stats update.
  • Preserve ProcessingThread.IsBlockProcessingThread across parallel transaction-validation workers; add coverage tests for both stats aggregation and the processing-thread flag behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/Nethermind/Nethermind.Consensus/Processing/ProcessingStats.cs Aggregate stats across multiple blocks; track first block number in the reporting window.
src/Nethermind/Nethermind.Consensus/Processing/IProcessingStats.cs Update API contract/docs to accept multiple processed blocks.
src/Nethermind/Nethermind.Consensus/Processing/BlockchainProcessor.cs Pass the processed block batch into stats tracking.
src/Nethermind/Nethermind.Consensus/Processing/BlockProcessor.ParallelBlockValidationTransactionsExecutor.cs Ensure ProcessingThread metric scope is preserved across parallel workers.
src/Nethermind/Nethermind.Consensus.Test/ProcessingStatsTests.cs New tests validating multi-block aggregation/reporting behavior.
src/Nethermind/Nethermind.Blockchain.Test/BlockProcessorTests.cs New tests for BAL parallel execution behavior and processing-thread flag propagation.
Comments suppressed due to low confidence (1)

src/Nethermind/Nethermind.Consensus/Processing/ProcessingStats.cs:224

  • UpdateStats can now batch multiple blocks, so chunkBlocks may become >1 within a single call. The code later in GenerateReport still performs MEV/reward detection and multiple state reads (HasStateForBlock/GetBalance) even though the logging path skips rewards when chunkBlocks > 1. Consider short-circuiting those reward/state-root calculations when chunkBlocks > 1 to avoid unnecessary work during catch-up / parallel processing windows.
            long chunkFirstBlockNumber = _chunkFirstBlockNumber;
            if (chunkFirstBlockNumber == -1)
            {
                chunkFirstBlockNumber = data.FirstBlockNumber;
                _chunkFirstBlockNumber = chunkFirstBlockNumber;
            }

            long chunkBlocks = (_chunkBlocks += data.BlockCount);

            Metrics.Blocks = blockNumber;
            Metrics.BlockchainHeight = blockNumber;

            Metrics.Transactions += data.TransactionCount;
            Metrics.TotalDifficulty = block.TotalDifficulty ?? UInt256.Zero;
            Metrics.LastDifficulty = block.Difficulty;
            Metrics.GasUsed = block.GasUsed;
            Metrics.GasLimit = block.GasLimit;


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/Nethermind/Nethermind.Consensus/Processing/IProcessingStats.cs
Comment thread src/Nethermind/Nethermind.Consensus/Processing/ProcessingStats.cs
Comment thread src/Nethermind/Nethermind.Consensus/Processing/ProcessingStats.cs Outdated
Comment thread src/Nethermind/Nethermind.Consensus.Test/ProcessingStatsTests.cs Outdated
@benaadams
Copy link
Copy Markdown
Member Author

@claude re-review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 30, 2026

Claude finished @benaadams's task in 3m 40s —— View job


PR Re-Review: Fix Processing stats for parallel and aggregation

  • Load repo rules
  • Read all changed files
  • Analyze correctness, thread-safety, and performance
  • Write final review summary

Summary

Re-review is clean. The three Low findings from the previous review are all addressed. No new issues found.


Previous Low Findings — Resolved

# Severity Status Notes
1 Low ✅ Resolved A clarifying comment was added at ProcessingStats.cs:222: // These gauges describe the latest processed block; chunk totals above use data.GasUsed and data.TransactionCount.
2 Low ✅ Fixed _chunkBlobs now accumulates before the HasStateForBlock early-return guard (line 245), making it symmetric with all other chunk fields. The old code had blobs after the guard; the new code aligns them.
3 Low ✅ Fixed UpdateStats_aggregates_multiple_updates_until_report_window no longer uses a Task.Delay. The TestProcessingStats subclass overrides GetReportMs() and exposes AdvanceReportWindow() to advance the virtual clock by 1001 ms — purely synchronous, no timing sensitivity.

Additional Verification (re-review specific)

UnsafeQueueUserWorkItem confirms the fix is necessary. ParallelUnbalancedWork.For enqueues worker items via ThreadPool.UnsafeQueueUserWorkItem, which explicitly does NOT capture ExecutionContext. Since ProcessingThread.IsBlockProcessingThread is AsyncLocal<bool> stored in ExecutionContext, worker threads would have received false (the default) without the explicit capture-and-restore in this PR. The fix is the only correct approach for this work-queue pattern.

i == 0 slot is now correctly attributed. The ApplyStateChanges slot (index 0) can run on any worker thread, including thread-pool threads that would previously have had IsBlockProcessingThread = false. The slot is now correctly attributed to the block processing context, consistent with all tx-processing slots.

Accumulation ordering is fully symmetric. All eight _chunk* accumulators (_chunkMGas, _chunkProcessingMicroseconds, _chunkTx, _chunkFirstBlockNumber, _chunkBlocks, _opCodes/_callOps/etc., and now _chunkBlobs) are incremented before the HasStateForBlock early-return guard. When the state check fails for a particular block, all chunk totals are still captured in the next report — the correct behavior for a sliding-window aggregator.

_chunkFirstBlockNumber = -1 sentinel is safe. Block numbers are non-negative; -1 cannot be a valid block number. The sentinel is reset in the pool policy, in the post-report reset block, and initialized in the field declaration. All paths are covered.

No stale pool data. BlockDataPolicy.Return now zeros the five new fields (BlockCount, FirstBlockNumber, GasUsed, TransactionCount, BlobCount). Recycled BlockData objects start clean.


Mergeable.

@github-actions github-actions Bot added the evm label Apr 30, 2026
@benaadams benaadams changed the title Fix Processing stats for parallel and aggregation Fix parallel execution metrics and opcode tracing Apr 30, 2026
@benaadams
Copy link
Copy Markdown
Member Author

@claude re-review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 30, 2026

Claude finished @benaadams's task in 3m 37s —— View job


PR Re-Review: Fix parallel execution metrics and opcode tracing

  • Load repo rules
  • Read all changed files (including new opcode tracing changes)
  • Analyze correctness, thread-safety, and performance
  • Write final review summary

Summary

This re-review covers the new additions since the last pass: IParallelSafeBlockTracer, CompositeBlockTracer.GetParallelSafeTracer, the BlockReceiptsTracer.OtherTracer accessor, and the OpcodeBlockTracer concurrency overhaul. The ProcessingStats and parallel-worker metrics changes previously reviewed remain clean.


OpcodeBlockTracer thread-safety analysis

The locking design is correct:

  • _currentTxTracer is AsyncLocal<OpcodeCountingTxTracer?> — each parallel worker gets its own slot even when threads are reused from the ThreadPool.
  • _lock serialises all writes to _builder, _transactions, and _opcodeCounters (via Accumulate) and all reads in Build. No reader operates outside the lock.
  • EndBlockTrace is only called once, by the main thread, after ParallelUnbalancedWork.For has joined all workers (CombineReceipts is called after the parallel loop, and the worker BlockReceiptsTracer instances do not have EndBlockTrace called on them).
  • TOCTOU between the lock-release in StartNewTxTrace and the new OpcodeCountingTxTracer() allocation is benign: the null-conditional _builder?.Accumulate(...) in EndTxTrace silently discards a tracer created after EndBlockTrace set _builder = null. Correct behaviour for a late-arriving tx.
  • OpcodeTraceBuilder.Accumulate is always called inside _lock, so the non-atomic _transactions++ and array += value in AccumulateInto are safe.

The OpcodeBlockTracerTests.Counts_all_transaction_traces_when_completed_in_parallel test exercises all of this correctly.


Findings

No Critical, High, or Medium issues.

One Low finding (new code):

# Severity File Issue
1 Low CompositeBlockTracer.cs (~line 130) LINQ in GetParallelSafeTracer

Detail — Low: parallelSafeTracers.Any(static tracer => tracer.IsTracingRewards) violates the repo rule "No LINQ when a simple for/foreach works." A for loop over the just-constructed parallelSafeTracers list is straightforward here. Not a hot path, but inconsistent with the codebase standard.

Fix this →


What looks good

  • IParallelSafeBlockTracer marker interface: Clean opt-in contract — tracers that are not parallel-safe are silently skipped (filtered to NullBlockTracer) rather than causing a runtime failure.
  • GetParallelSafeTracer composition: Correct recursive descent through nested CompositeBlockTracers. Returns NullBlockTracer.Instance when nothing is parallel-safe, a single tracer directly when only one qualifies, and a new CompositeBlockTracer otherwise — avoids unnecessary wrapper allocation.
  • Parallel worker tracer wiring: parallelSafeTracer is resolved once before the loop, then set on each worker BlockReceiptsTracer after StartNewBlockTrace (so the shared tracer's StartNewBlockTrace is only called by the main thread, not duplicated n times by workers). This is the correct call ordering.
  • BlockReceiptsTracer.OtherTracer accessor: Narrow, read-only. Only the parallel executor needs it; the existing SetOtherTracer mutation path is unchanged.
  • InternalsVisibleTo in a new file: Correct pattern (InternalsVisibility.cs) — keeps the attribute separate from class files.
  • Test coverage: Parallel_validation_forwards_parallel_safe_block_tracer_to_worker_transactions verifies that StartedTransactions, EndedTransactions, and OpcodeCount all equal txCount, giving direct regression coverage for the tracer forwarding path.

Mergeable (Low finding only).

@github-actions github-actions Bot added the sync label May 1, 2026
@benaadams
Copy link
Copy Markdown
Member Author

Fixed in #11436

@benaadams benaadams closed this May 8, 2026
@benaadams benaadams deleted the processing-stats branch May 8, 2026 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants