Skip to content

Cranelift: excessively large constant pools and deferred traps can cause MachBuffer assertion failures #12968

@cfallin

Description

@cfallin

In Cranelift's MachBuffer, which manages the physical layout and intra-function relocation/label patching of machine code, we currently have functionality for constant pools and deferred traps that ride along with the "islands" of jump veneers used to extend label distances.

The original implementation of MachBuffer did not have these concepts (they were added later in #6011 and #6384) and the effect of these on the veneer generation invariants is somewhat problematic. A recently discovered case: on aarch64, where the range for a forward reference to a constant pool (from an ldr instruction) is limited by a 19-bit offset field, building a single basic block with more than ~64k constant loads (e.g. vconst instructions) is sufficient to cause an assertion failure, because we cannot insert an island in the middle of a basic block.

The correctness of the original design of veneer islands rested on a few key choices: (i) we check the forward label reference deadline (nearest expiring range) against the worst-case basic block size between every pair of basic blocks, emitting an island if necessary; (ii) a basic block can only add a few new unresolved forward label references (from the terminator branch; note that br_table uses 32-bit PC-rel jump offsets in a table so isn't relevant here), so we can't "generate deadlines faster than we can meet them" so to speak; and (iii) every label reference had either a veneer or natively supported a range large enough for our largest input module (e.g. 2GiB on x86-64), so we could always extend.

Constants and deferred traps break this model: they can increase the island size arbitrarily as we pass through a single basic block during emission, such that we can create a closer deadline than we can possibly meet, and they can also push out the start of branch veneers when we do decide to emit an island such that they are already out-of-range. And, constant references cannot have veneers: unlike a branch, where we can always put another branch at the target, we are emitting data that will be loaded directly.

There are several fixes that might seem to preserve constant pools and deferred traps:

  • We could support emission of islands in the middle of basic blocks.
  • We could interleave the handling of constants/deferred traps and veneers in island emission, sorting by deadline, rather than handle one or the other first. (We previously did veneers first, but that created problems for constant range; now we do constant first, and that creates problems for veneer range.)
  • We could always emit long-range references to the constant pool (e.g. ADRP+ADR on aarch64, which has a multi-GiB PC-rel range)
  • Or we could give up and do both inline again, like we did pre-2023. (This means that a constant load either jumps around the inline constant, or uses a sequence of instructions to build it up on RISC ISAs, e.g. MOVK+MOVZ on aarch64.)

A few thoughts:

  • Islands in the middle of basic blocks are certainly possible, but complicate emission code somewhat. Effectively we need to emit a jump around the island, that jump needs to have a large enough range for the island size, and we need to account for that jump when checking the deadline. All solvable issues. Code quality suffers (mid-block jump always taken) but maybe that's the price to pay with very very large function bodies; and it's fundamentally unavoidable if blocks are arbitrarily large.
  • Interleaving constants+veneers, I think/will claim, is also necessary: with arbitrary large pending island size, it's possible to force island emission with a reference of shorter range either to a constant or branch target, and so we need to be able to start the island with the most urgent deadline in either case.

I think combining those two bits, plus always considering deadlines wrt worst-case island size, is sufficient, but I would want to write a proof (like I did for label resolution/tracking) to be sure.

I want to get a temperature check from others, though: this is also getting pretty complex, and there's a certain appeal of the simplicity we had pre-2023. We know it works no matter what. Is there any appetite to going back to that?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions