Skip to content

Pathological compile time on multiplied modifier (m m m m …) with many _; placeholders — affects legacy and via-IR/SSA-CFG #16699

@msooseth

Description

@msooseth

Description

A small (~3.2 KB, 126 lines) Solidity source consisting of a single modifier with many _; placeholders applied four times (m m m m) on a function causes the legacy codegen + optimizer pipeline to take 141.6 s to compile, compared to 28 ms for --via-ir --optimize — roughly a 5000× slowdown. The legacy pipeline without --optimize is also slow (10.3 s); both legacy configurations use ~1.5 GB of peak RAM, vs 19 MB for the via-IR pipelines.

Configuration Time Peak RSS
--via-ir 46 ms 19 MB
--via-ir --optimize 28 ms 19 MB
--via-ir --optimize --experimental --via-ssa-cfg 30 ms 19 MB
legacy (no opt) 10 313 ms 1547 MB
legacy --optimize 141 595 ms 1522 MB

Found via differential fuzzing.

Environment

  • Compiler version: 0.8.35-develop.2026.5.7+commit.b83005c9.Linux.g++
  • Compilation pipeline (legacy, IR, EOF): legacy is affected (both with and without --optimize); all three IR-based configurations are fast.
  • Target EVM version (as per compiler settings): osaka
  • Framework/IDE: solc command line
  • EVM execution environment / backend / blockchain client: N/A — pure compilation
  • Operating system: Linux 7.0.3-arch1-2

Steps to Reproduce

solc --bin --evm-version osaka --optimize C.sol  # ~141 s
solc --bin --evm-version osaka            C.sol  # ~10 s

Full source (126 lines): source.sol.

Inline source (click to expand)
contract C {
    uint256 public x;
    modifier m() {
        for (uint256 i; i < 10; i++) {         _;
            return;
            for ( i; i < 10; i++) {
            {
            {
            assembly {
                for { let a := 0} lt(a,1) { a := add(a, 1) } { continue let b := 42 }
                for { let a := 0} lt(a,1) { a := add(a, 1) } { continue let b := 42 }
            }
            }
            return;
            for (uint256 i; i < 10; i++) {
                _;
                uint t;
                uint8 x = 0xff;
                for (uint256 i; i < 10; i++) { _; return; ++x; }
            }
            }
            { /* more nested for/_;/return/assembly … */ }
            /* 12 `_;` placeholders in total in this modifier body, interleaved
               with `return;`, deeply-nested for-loops and inline assembly */
            ...
        }
    }

    function f() public m  m m m returns (uint) {  // modifier applied 4 times
        for (uint256 i = 0; i < 10; i++) {
            ++x;
        }
    }
}

The full text is in the gist. The salient feature is that modifier m contains 12 _; placeholder sites, and is applied four times on f(). In the legacy codegen each placeholder site inlines the next modifier's body, so the chain m m m m f materialises on the order of 12⁴ ≈ 2 × 10⁴ inlined copies of the inner code path, embedded inside heavily-nested control-flow (nested for-loops, inline assembly with continue/dead-store patterns, and many return; statements that the legacy frontend leaves in place).

Nature of the slowdown

The slowdown is a legacy-codegen compile-time issue, not a runtime / output bug. All five configurations produce bytecode successfully. The via-IR pipeline (with or without optimizer, and with or without SSA-CFG) handles this source in under 50 ms; only the legacy pipeline is slow:

  1. Without --optimize, legacy already takes ~10 s and 1.5 GB of RAM. Even the no-optimize legacy build runs the peephole optimiser and JumpdestRemover, which dominate.
  2. With --optimize, legacy jumps to ~142 s. The added cost is overwhelmingly in evmasm::BlockDeduplicator::deduplicate() on the assembled EVM code.

The Solidity feature driving the blow-up is modifier application multiplied across _; placeholders: m m m m on f, where m contains 12 _;, produces a multiplicative expansion of the inner body inside the legacy ContractCompiler::appendModifierOrFunctionCode chain. Each instance generates structurally similar EVM blocks, feeding the assembly optimiser an enormous list of basic blocks.

Relevant perf data

Full perf top-50 reports and flamegraphs are in the attached gist; the salient parts:

legacy --optimize — 141.6 s (flamegraph, perf top50)

93.09%  evmasm::Assembly::optimiseInternal
65.99%  evmasm::BlockDeduplicator::deduplicate
57.02% 25.19%  BlockDeduplicator::deduplicate lambda                  ← 25% self
17.31%  9.66%  BlockDeduplicator::BlockIterator::operator++           ← ~10% self
 9.80%  9.79%  evmasm::AssemblyItem::instruction                      ← ~10% self
 8.68%  CommonSubexpressionEliminator::getOptimizedItems
 7.91%  5.98%  evmasm::SemanticInformation::altersControlFlow         ← ~6% self
 7.06%  PeepholeOptimiser::optimise

BlockDeduplicator::deduplicate alone accounts for ~66% of the entire 141 s run, with ~25% self-time in its per-pair comparison lambda.

legacy (no opt) — 10.3 s (flamegraph, perf top50)

84.49%  evmasm::Assembly::optimiseInternal     ← runs even without --optimize
66.42%  6.41%   PeepholeOptimiser::optimise
50.93% 16.78%   applyMethods<PushPop, OpPop, OpStop, …>     ← ~17% self
14.56%  9.32%   AssemblyItem::operator==(Instruction)        ← ~9% self
14.35%  3.64%   JumpdestRemover::optimise
13.76% 13.59%   std::vector<AssemblyItem>::push_back
 9.13%  9.12%   evmasm::AssemblyItem::instruction
 7.50%  7.43%   AssemblyItem::bytesRequired

Even without --optimize, the legacy assembler still runs a peephole pass and JumpdestRemover; together they dominate the 10 s run — the input EVM assembly is simply very large.

via-IR pipelines — under 50 ms each

All three via-IR configurations finish in ~30–46 ms and use 19 MB. Profiles are essentially flat (see attached). The IR pipeline either avoids producing the same item explosion (dead-code / reachability done earlier) or handles it without quadratic blow-up.

Potential reasons

A few observations, presented without proposing a fix:

  • BlockDeduplicator::deduplicate is ~66% of the legacy --optimize run, with 25% self-time in the per-pair comparison lambda and 10% self in the iterator advance. The dominant frames are consistent with roughly quadratic block-pair comparison over a very long block list.

  • Modifier-inlining multiplier. m has 12 _; sites, and f is decorated with m m m m. In the legacy ContractCompiler pipeline each _; inlines the next modifier's body in full, so the chain produces on the order of 12⁴ ≈ 2 × 10⁴ copies of the inner body — plus the modifier-body assembly itself replicated similarly. The legacy frontend does not drop post-return; regions before emitting EVM, so dead branches still reach the optimizer.

  • Even without --optimize (legacy, 10 s), peephole + JumpdestRemover dominate, with applyMethods<…> at 17% self and AssemblyItem::operator== at 9% self. The size of the emitted item vector — driven by the modifier-inlining blow-up above — appears to be the underlying driver across both legacy configurations.

  • All three via-IR pipelines are unaffected (28–46 ms, 19 MB). The two codegens scale very differently on modifier-heavy / dead-code-heavy inputs (contrast with e.g. Slow compilation under via-ir & SSA-CFG on deeply nested try/catch #16697, where via-IR is the slow path).

Attachments

All artefacts are in a single gist: https://gist.github.com/msooseth/b221f0ac40d878147f1209370050558a

  • source.sol — the full reproducer (126 lines)
  • noOpt_viaIR=false.perf_top50.txt, opt_viaIR=false.perf_top50.txt — slow legacy profiles
  • noOpt_viaIR=true.perf_top50.txt, opt_viaIR=true.perf_top50.txt, opt_ssaCFG.perf_top50.txt — fast via-IR reference profiles
  • corresponding *.flamegraph.svg files for each configuration

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions