Skip to content

[Transformation] Add a compiler pass that sinks measurement operations past independent ops#4749

Open
cabreraam wants to merge 5 commits into
NVIDIA:mainfrom
cabreraam:sink-msmts
Open

[Transformation] Add a compiler pass that sinks measurement operations past independent ops#4749
cabreraam wants to merge 5 commits into
NVIDIA:mainfrom
cabreraam:sink-msmts

Conversation

@cabreraam

Copy link
Copy Markdown
Collaborator

Adds a new sink-measurements that operates on funcOps and moves Quake measurement operations later within a block when doing so is semantically safe.
The pass sinks measurements past operations that do not consume the measurement result and do not touch the measured quantum target.
It uses conservative quantum alias checks, only proving non-aliasing for cases such as distinct local quake.alloca values or different constant-index quake.extract_ref values from the same local veq.

The barriers (ops we won't move measurement ops past) include:

  • users of the measurement result, such as quake.discriminate
  • operations that may alias the measured target
  • other measurement operations
  • region operations
  • terminators
  • side-effecting non-Quake operations

Adds cudaq/test/Transforms/sink_measurements.qke with coverage for

  • cases where measurement ops can be sunk
  • cases where we stop conservatively due to encountered barriers to sinking, i.e.,
    • aliasing
      • e.g., we cannot determine that two !quake.refs don't point to the same qubit
    • dynamic indices
      • e.g., runtime dependent indexing into veqs
    • whole veq operands
    • downstream ops depend on depend on measurement targets
    • region op
    • side-effecting classical ops.

Signed-off-by: Anthony Cabrera <antcabrera@nvidia.com>
…s past independent ops

Signed-off-by: Anthony Cabrera <antcabrera@nvidia.com>
Signed-off-by: Anthony Cabrera <antcabrera@nvidia.com>
@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown

CI Summary (push) — ✅ passed

Run #28112597840 · ✅ 6 · ⏩ 7 · ❌ 0 · ⛔ 0

Top-level jobs (13)
Job Result
binaries ⏩ skipped
build_and_test ✅ success
config_devdeps ✅ success
config_source_build ⏩ skipped
config_wheeldeps ✅ success
devdeps ✅ success
docker_image ⏩ skipped
gen_code_coverage ⏩ skipped
metadata ✅ success
python_metapackages ⏩ skipped
python_wheels ⏩ skipped
source_build ⏩ skipped
wheeldeps ✅ success
⏩ Skipped jobs (7) — intentionally skipped on PR builds; run on merge_group / workflow_dispatch
Job
binaries
config_source_build
docker_image
gen_code_coverage
python_metapackages
python_wheels
source_build
All sub-jobs (42) — every matrix leg, with links
Job Status Link
Build and test (amd64, gcc12, openmpi) / Dev environment (Debug) ✅ success view
Build and test (amd64, gcc12, openmpi) / Dev environment (Python) ✅ success view
Build and test (amd64, llvm, openmpi) / Dev environment (Debug) ✅ success view
Build and test (amd64, llvm, openmpi) / Dev environment (Python) ✅ success view
Build and test (arm64, llvm, openmpi) / Dev environment (Debug) ✅ success view
Build and test (arm64, llvm, openmpi) / Dev environment (Python) ✅ success view
CI Summary ❔ in_progress view
Configure build (devdeps) ✅ success view
Configure build (source_build) ⏩ skipped view
Configure build (wheeldeps) ✅ success view
Create CUDA Quantum installer ⏩ skipped view
Create Docker images ⏩ skipped view
Create Python metapackages ⏩ skipped view
Create Python wheels ⏩ skipped view
Gen code coverage ⏩ skipped view
Load dependencies (amd64, gcc12) / Caching ✅ success view
Load dependencies (amd64, gcc12) / Finalize ✅ success view
Load dependencies (amd64, gcc12) / Metadata ✅ success view
Load dependencies (amd64, llvm) / Caching ✅ success view
Load dependencies (amd64, llvm) / Finalize ✅ success view
Load dependencies (amd64, llvm) / Metadata ✅ success view
Load dependencies (arm64, gcc12) / Caching ✅ success view
Load dependencies (arm64, gcc12) / Finalize ✅ success view
Load dependencies (arm64, gcc12) / Metadata ✅ success view
Load dependencies (arm64, llvm) / Caching ✅ success view
Load dependencies (arm64, llvm) / Finalize ✅ success view
Load dependencies (arm64, llvm) / Metadata ✅ success view
Load source build cache ⏩ skipped view
Load wheel dependencies (amd64, 12.6) / Caching ✅ success view
Load wheel dependencies (amd64, 12.6) / Finalize ✅ success view
Load wheel dependencies (amd64, 12.6) / Metadata ✅ success view
Load wheel dependencies (amd64, 13.0) / Caching ✅ success view
Load wheel dependencies (amd64, 13.0) / Finalize ✅ success view
Load wheel dependencies (amd64, 13.0) / Metadata ✅ success view
Load wheel dependencies (arm64, 12.6) / Caching ✅ success view
Load wheel dependencies (arm64, 12.6) / Finalize ✅ success view
Load wheel dependencies (arm64, 12.6) / Metadata ✅ success view
Load wheel dependencies (arm64, 13.0) / Caching ✅ success view
Load wheel dependencies (arm64, 13.0) / Finalize ✅ success view
Load wheel dependencies (arm64, 13.0) / Metadata ✅ success view
Prepare cache clean-up ❔ in_progress view
Retrieve PR info ✅ success view
✅ Required checks (6/6) — declared in .github/required-checks.yml for push
Required check Status Link
Build and test (amd64, llvm, openmpi) / Dev environment (Debug) ✅ success view
Build and test (amd64, llvm, openmpi) / Dev environment (Python) ✅ success view
Build and test (arm64, llvm, openmpi) / Dev environment (Debug) ✅ success view
Build and test (arm64, llvm, openmpi) / Dev environment (Python) ✅ success view
Build and test (amd64, gcc12, openmpi) / Dev environment (Debug) ✅ success view
Build and test (amd64, gcc12, openmpi) / Dev environment (Python) ✅ success view

@atgeller atgeller left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one comment about pass flexibility.


// 2. candidate has regions (since we don't currently look into regions for
// sinking)
if (candidate->getNumRegions() > 0)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a similar test for if the candidate is in the same block? That would make this pass more flexible to go after the lower-to-cfg pass.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that caused me to consider: what about function calls? Presumably they can use measurement results in arbitrary ways, and do arbitrary aliasing of qubits.

@schweitzpgi

schweitzpgi commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

General question: we already have passes to erase measurements and to add measurements (back). Do we need another one?

Also, why do this in the reference semantics when it is easier in value semantics?

@schweitzpgi schweitzpgi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated previously, this should really be done in value-semantics form of quake.

Does it handle measurements of veq types?

The barriers (ops we won't move measurement ops past) include:

  • users of the measurement result, such as quake.discriminate
  • operations that may alias the measured target
  • other measurement operations
  • region operations
  • terminators
  • side-effecting non-Quake operations

This is too constrained, btw.

You can never move an operation into a position where its uses dominate the operation itself. (That's invalid SSA.) You can always "move" an operation ahead of or behind operations that have no relationship at all. This is, in fact, semantically true and does not require modifying the IR in any way whatsoever.

  %2 = op1 %0
  %3 = op2 %1

is exactly the same as

  %3 = op2 %1
  %2 = op1 %0

(assuming these operation are pure). When in value-semantics form measurement operations are pure, btw.

Also, side-effecting classical compute operations don't effect quantum operations like measurements. They would run on completely distinct hardware.

if (lhsLoc.baseIsLocalAlloca && rhsLoc.baseIsLocalAlloca)
return false; // different known unique bases cannot alias
return true; // otherwise, we conservatively assume they may alias
} else {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else after return


/// Want to check if `candidate` uses any result produced by *producer
bool usesAnyResultOf(mlir::Operation *candidate, mlir::Operation *producer) {
for (mlir::Value result : producer->getResults()) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unneeded braces

@cabreraam

Copy link
Copy Markdown
Collaborator Author

@schweitzpgi tl;dr: seeking alignment. Is it reasonable to re-work this pass using value semantics so it can feed the analysis of UnitaryOpGroupingAnalysis?


Thanks for the feedback, @schweitzpgi! Based on your insights, I've been doing some research this morning into the existing parts of the codebase to address your comments.

why do this in the reference semantics when it is easier in value semantics?

Yeah, value semantics makes more sense; I agree. I won't need to reason about aliasing with value semantics, and instead just use use-def/dominance rather than aliasing. I'm going to rework the pass in that direction. Thanks for the suggestion!

we already have passes to erase measurements and to add measurements (back). Do we need another one?

Understood. Definitely don't want to add redundant features; based on the passes that I've found that touch measurement operations, my understanding is that those passes cover different measurement lifecycle tasks:

  • add-measurements adds terminal measurements if they're missing
  • observe-ansatz removes/replaces measurements for observe
  • combine-measurements / expand-measurements reshape measurement ops.
  • mapping already defers terminal measurements internally for routing.

The intent for my pass is different from those passes. It would not add, erase, replace, combine, or expand measurements. Instead, it would preserve existing terminal readout and move it later in value-semantics form when SSA/dominance allows. My thought is that this pass can normalize IR and feed UnitaryOpGroupingAnalysis so independent measurements do not split otherwise contiguous unitary regions.

@schweitzpgi

schweitzpgi commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

@schweitzpgi tl;dr: seeking alignment. Is it reasonable to re-work this pass using value semantics so it can feed the analysis of UnitaryOpGroupingAnalysis?

Thanks for the feedback, @schweitzpgi! Based on your insights, I've been doing some research this morning into the existing parts of the codebase to address your comments.

why do this in the reference semantics when it is easier in value semantics?

Yeah, value semantics makes more sense; I agree. I won't need to reason about aliasing with value semantics, and instead just use use-def/dominance rather than aliasing. I'm going to rework the pass in that direction. Thanks for the suggestion!

we already have passes to erase measurements and to add measurements (back). Do we need another one?

Understood. Definitely don't want to add redundant features; based on the passes that I've found that touch measurement operations, my understanding is that those passes cover different measurement lifecycle tasks:

  • add-measurements adds terminal measurements if they're missing
  • observe-ansatz removes/replaces measurements for observe
  • combine-measurements / expand-measurements reshape measurement ops.
  • mapping already defers terminal measurements internally for routing.

The intent for my pass is different from those passes. It would not add, erase, replace, combine, or expand measurements. Instead, it would preserve existing terminal readout and move it later in value-semantics form when SSA/dominance allows. My thought is that this pass can normalize IR and feed UnitaryOpGroupingAnalysis so independent measurements do not split otherwise contiguous unitary regions.

Thanks for the detailed explanation. Super helpful!

We still need to be mindful of barking up the wrong tree here and simply relinking a bunch of linked lists without semantic justification.

Let's take a concrete example. Here is a list of operations. The list is just a notional structure to hold the IR in a specific order to preserve implicit constraints (like side-effects or I/O, etc.). When we're dealing with pure operations, the list has no semantics.

In reference form, we have side-effects to preserve the implicit gate ordering.

  func.func @__nvqpp__mlirgen__function_f._Z1fv() attributes {"cudaq-entrypoint", "cudaq-kernel", no_this} {
    %cst = arith.constant 3.000000e+00 : f64
    %cst_0 = arith.constant 4.000000e+00 : f64
    %0 = quake.alloca !quake.veq<5>
    %1 = quake.extract_ref %0[0] : (!quake.veq<5>) -> !quake.ref
    quake.x %1 : (!quake.ref) -> ()
    %2 = quake.extract_ref %0[1] : (!quake.veq<5>) -> !quake.ref
    quake.x %2 : (!quake.ref) -> ()
    %3 = quake.extract_ref %0[2] : (!quake.veq<5>) -> !quake.ref
    quake.x %3 : (!quake.ref) -> ()
    %4 = quake.extract_ref %0[3] : (!quake.veq<5>) -> !quake.ref
    %5 = quake.extract_ref %0[0] : (!quake.veq<5>) -> !quake.ref
    quake.x [%4] %5 : (!quake.ref, !quake.ref) -> ()
    %6 = quake.extract_ref %0[1] : (!quake.veq<5>) -> !quake.ref
    %measOut = quake.mz %6 : (!quake.ref) -> !cc.measure_handle
    %7 = quake.discriminate %measOut : (!cc.measure_handle) -> i1
    %8 = cc.cast unsigned %7 : (i1) -> i32
    %9 = quake.extract_ref %0[3] : (!quake.veq<5>) -> !quake.ref
    %10 = quake.extract_ref %0[0] : (!quake.veq<5>) -> !quake.ref
    quake.y [%9] %10 : (!quake.ref, !quake.ref) -> ()
    %11 = quake.extract_ref %0[4] : (!quake.veq<5>) -> !quake.ref
    %12 = quake.extract_ref %0[2] : (!quake.veq<5>) -> !quake.ref
    quake.z [%11] %12 : (!quake.ref, !quake.ref) -> ()
    %13 = quake.extract_ref %0[3] : (!quake.veq<5>) -> !quake.ref
    quake.rx (%cst_0) %13 : (f64, !quake.ref) -> ()
    %14 = quake.extract_ref %0[0] : (!quake.veq<5>) -> !quake.ref
    %measOut_1 = quake.mz %14 : (!quake.ref) -> !cc.measure_handle
    %15 = quake.discriminate %measOut_1 : (!cc.measure_handle) -> i1
    %16 = cc.cast unsigned %15 : (i1) -> i32
    %17 = quake.extract_ref %0[2] : (!quake.veq<5>) -> !quake.ref
    quake.rx (%cst) %17 : (f64, !quake.ref) -> ()
    %18 = quake.extract_ref %0[2] : (!quake.veq<5>) -> !quake.ref
    %measOut_2 = quake.mz %18 : (!quake.ref) -> !cc.measure_handle
    %19 = quake.discriminate %measOut_2 : (!cc.measure_handle) -> i1
    %20 = cc.cast unsigned %19 : (i1) -> i32
    call @_Z7dagwoodiii(%8, %16, %20) : (i32, i32, i32) -> ()
    return
  }

In this form the 3 mz operations, the 3 discriminate operations, and the 3 cast operations can all be physically moved (modifying the linked list at the IR level) in the IR to just above the call to dagwood.

But let's go ahead and convert this to value semantics.

  func.func @__nvqpp__mlirgen__function_f._Z1fv() attributes {"cudaq-entrypoint", "cudaq-kernel", no_this} {
    %cst = arith.constant 3.000000e+00 : f64
    %0 = quake.null_wire
    %1 = quake.null_wire
    %2 = quake.null_wire
    %3 = quake.null_wire
    %4 = quake.null_wire
    %5 = quake.x %0 : (!quake.wire) -> !quake.wire
    %6 = quake.x %1 : (!quake.wire) -> !quake.wire
    %7 = quake.x %2 : (!quake.wire) -> !quake.wire
    %8:2 = quake.x [%3] %5 : (!quake.wire, !quake.wire) -> (!quake.wire, !quake.wire)
    %measOut, %wires = quake.mz %6 : (!quake.wire) -> (!cc.measure_handle, !quake.wire)
    %9 = quake.discriminate %measOut : (!cc.measure_handle) -> i1
    %10 = cc.cast unsigned %9 : (i1) -> i32
    %11:2 = quake.y [%8#0] %8#1 : (!quake.wire, !quake.wire) -> (!quake.wire, !quake.wire)
    %12:2 = quake.z [%4] %7 : (!quake.wire, !quake.wire) -> (!quake.wire, !quake.wire)
    %measOut_0, %wires_1 = quake.mz %11#1 : (!quake.wire) -> (!cc.measure_handle, !quake.wire)
    %13 = quake.discriminate %measOut_0 : (!cc.measure_handle) -> i1
    %14 = cc.cast unsigned %13 : (i1) -> i32
    %15 = quake.rx (%cst) %12#1 : (f64, !quake.wire) -> !quake.wire
    %measOut_2, %wires_3 = quake.mz %15 : (!quake.wire) -> (!cc.measure_handle, !quake.wire)
    %16 = quake.discriminate %measOut_2 : (!cc.measure_handle) -> i1
    %17 = cc.cast unsigned %16 : (i1) -> i32
    call @_Z7dagwoodiii(%10, %14, %17) : (i32, i32, i32) -> ()
    return
  }

Here, the linked list in the IR structure has no semantics. There are literally no side-effects here. (Other than the call to dagwood which is needed to anchor this sea of ops and prevent DCE from erasing everything.)

It is perfectly legitimate to run this code without any changes to the IR itself through a process and transcribe it as the sea-of-ops graph that it actually is.

flowchart TD
  cst["%cst = arith.constant 3.0 : f64"]
  n0["%0 = quake.null_wire"]
  n1["%1 = quake.null_wire"]
  n2["%2 = quake.null_wire"]
  n3["%3 = quake.null_wire"]
  n4["%4 = quake.null_wire"]

  x0["%5 = quake.x %0"]
  x1["%6 = quake.x %1"]
  x2["%7 = quake.x %2"]

  x3["%8:2 = quake.x [%3] %5"]
  y0["%11:2 = quake.y [%8#0] %8#1"]
  z0["%12:2 = quake.z [%4] %7"]

  mz0["%measOut, %wires = quake.mz %6"]
  disc0["%9 = quake.discriminate %measOut"]
  cast0["%10 = cc.cast unsigned %9 -> i32"]

  mz1["%measOut_0, %wires_1 = quake.mz %11#1"]
  disc1["%13 = quake.discriminate %measOut_0"]
  cast1["%14 = cc.cast unsigned %13 -> i32"]

  rx0["%15 = quake.rx(%cst) %12#1"]
  mz2["%measOut_2, %wires_3 = quake.mz %15"]
  disc2["%16 = quake.discriminate %measOut_2"]
  cast2["%17 = cc.cast unsigned %16 -> i32"]

  callDag["call @_Z7dagwoodiii(%10, %14, %17)"]

  %% Wires into first layer of X gates
  n0 --> x0
  n1 --> x1
  n2 --> x2
  n3 --> x3
  n4 --> z0

  %% Dataflow between quantum gates
  x0 --> x3
  x1 --> mz0
  x2 --> z0
  x3 --> y0
  z0 --> rx0

  %% Measurement and classical path 0
  mz0 --> disc0
  disc0 --> cast0

  %% Measurement and classical path 1
  y0 --> mz1
  mz1 --> disc1
  disc1 --> cast1

  %% Rotation / measurement / classical path 2
  cst --> rx0
  rx0 --> mz2
  mz2 --> disc2
  disc2 --> cast2

  %% Final call
  cast0 --> callDag
  cast1 --> callDag
  cast2 --> callDag
Loading

This shows that physically moving the IR around and relinking it in its containers is quite simply unnecessary for the compiler.

What's nice about this graph visualization (and not relying on the linear transcription above it), is that it is immediately obvious that there are 3 completely unrelated subgraphs of quake IR that can all be run in parallel. dagwood just has to wait for the dataflow of those 3 subgraphs to conclude and once it has, it can be called. So we can leverage pure analysis here to ascertain these subgraphs and there is no effort involved in massaging Operations around in the IR into any peculiar positions required.

@sacpis sacpis left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cabreraam Overall LGTM. Have doubt about one of the quantum op followed by the classical one.

Comment on lines +97 to +99
for (mlir::Value target : msmt.getTargets()) {
for (auto operand : cudaq::quake::getQuantumOperands(candidate))
if (mayAliasQuantumValues(target, operand))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a measurement can be incorrectly sunk past a quake.exp_pauli or quake.apply that operates on the measured qubit. Here the function relies on getQuantumOperands and getQuantumResults which call getQuantumTypesFromRange. That function assumes the operand layout as (classical...)(quantum..). It skips the leading classical operands (// Skip over classical types at the beginning). Then if any remaining operands is classical, it returns an empty range.

Can we please investigate this issue?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two quake ops violate the assumption by placing a classical operand after quantum ones are

  1. quake.exp_pauli: args are [parameters(float), controls(q), targets(q), pauli(cc.ptr/charspan)].
    The trailing pauli operand forces the empty return.
  2. quake.apply: [indirect_callee(cc.callable), controls(q), actuals(AnyType)]
    Any classical actual after the quantum controls forces the empty return.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @sacpis!

Yes, I agree. As currently written, we would:

  1. incorrectly sink a measurement past an exp_pauli because of an incorrect generalization about where classical and quantum arguments appear in quake ops.
  2. Same for apply

Using the value-semantics approach will allow us to account for these issues.

I am working on a rewrite that will address this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants