[Transformation] Add a compiler pass that sinks measurement operations past independent ops by cabreraam · Pull Request #4749 · NVIDIA/cuda-quantum

cabreraam · 2026-06-16T21:17:02Z

Adds a new sink-measurements that operates on funcOps and moves Quake measurement operations later within a block when doing so is semantically safe.
The pass sinks measurements past operations that do not consume the measurement result and do not touch the measured quantum target.
It uses conservative quantum alias checks, only proving non-aliasing for cases such as distinct local quake.alloca values or different constant-index quake.extract_ref values from the same local veq.

The barriers (ops we won't move measurement ops past) include:

users of the measurement result, such as quake.discriminate
operations that may alias the measured target
other measurement operations
region operations
terminators
side-effecting non-Quake operations

Adds cudaq/test/Transforms/sink_measurements.qke with coverage for

cases where measurement ops can be sunk
cases where we stop conservatively due to encountered barriers to sinking, i.e.,
- aliasing
  - e.g., we cannot determine that two !quake.refs don't point to the same qubit
- dynamic indices
  - e.g., runtime dependent indexing into veqs
- whole veq operands
- downstream ops depend on depend on measurement targets
- region op
- side-effecting classical ops.

Signed-off-by: Anthony Cabrera <antcabrera@nvidia.com>

…s past independent ops Signed-off-by: Anthony Cabrera <antcabrera@nvidia.com>

Signed-off-by: Anthony Cabrera <antcabrera@nvidia.com>

github-actions · 2026-06-16T22:00:56Z

CI Summary (`push`) — ✅ passed

Run #28112597840 · ✅ 6 · ⏩ 7 · ❌ 0 · ⛔ 0

Top-level jobs (13)

Job	Result
`binaries`	⏩ skipped
`build_and_test`	✅ success
`config_devdeps`	✅ success
`config_source_build`	⏩ skipped
`config_wheeldeps`	✅ success
`devdeps`	✅ success
`docker_image`	⏩ skipped
`gen_code_coverage`	⏩ skipped
`metadata`	✅ success
`python_metapackages`	⏩ skipped
`python_wheels`	⏩ skipped
`source_build`	⏩ skipped
`wheeldeps`	✅ success

⏩ Skipped jobs (7) — intentionally skipped on PR builds; run on merge_group / workflow_dispatch

Job
`binaries`
`config_source_build`
`docker_image`
`gen_code_coverage`
`python_metapackages`
`python_wheels`
`source_build`

All sub-jobs (42) — every matrix leg, with links

Job	Status	Link
Build and test (amd64, gcc12, openmpi) / Dev environment (Debug)	✅ success	view
Build and test (amd64, gcc12, openmpi) / Dev environment (Python)	✅ success	view
Build and test (amd64, llvm, openmpi) / Dev environment (Debug)	✅ success	view
Build and test (amd64, llvm, openmpi) / Dev environment (Python)	✅ success	view
Build and test (arm64, llvm, openmpi) / Dev environment (Debug)	✅ success	view
Build and test (arm64, llvm, openmpi) / Dev environment (Python)	✅ success	view
CI Summary	❔ in_progress	view
Configure build (devdeps)	✅ success	view
Configure build (source_build)	⏩ skipped	view
Configure build (wheeldeps)	✅ success	view
Create CUDA Quantum installer	⏩ skipped	view
Create Docker images	⏩ skipped	view
Create Python metapackages	⏩ skipped	view
Create Python wheels	⏩ skipped	view
Gen code coverage	⏩ skipped	view
Load dependencies (amd64, gcc12) / Caching	✅ success	view
Load dependencies (amd64, gcc12) / Finalize	✅ success	view
Load dependencies (amd64, gcc12) / Metadata	✅ success	view
Load dependencies (amd64, llvm) / Caching	✅ success	view
Load dependencies (amd64, llvm) / Finalize	✅ success	view
Load dependencies (amd64, llvm) / Metadata	✅ success	view
Load dependencies (arm64, gcc12) / Caching	✅ success	view
Load dependencies (arm64, gcc12) / Finalize	✅ success	view
Load dependencies (arm64, gcc12) / Metadata	✅ success	view
Load dependencies (arm64, llvm) / Caching	✅ success	view
Load dependencies (arm64, llvm) / Finalize	✅ success	view
Load dependencies (arm64, llvm) / Metadata	✅ success	view
Load source build cache	⏩ skipped	view
Load wheel dependencies (amd64, 12.6) / Caching	✅ success	view
Load wheel dependencies (amd64, 12.6) / Finalize	✅ success	view
Load wheel dependencies (amd64, 12.6) / Metadata	✅ success	view
Load wheel dependencies (amd64, 13.0) / Caching	✅ success	view
Load wheel dependencies (amd64, 13.0) / Finalize	✅ success	view
Load wheel dependencies (amd64, 13.0) / Metadata	✅ success	view
Load wheel dependencies (arm64, 12.6) / Caching	✅ success	view
Load wheel dependencies (arm64, 12.6) / Finalize	✅ success	view
Load wheel dependencies (arm64, 12.6) / Metadata	✅ success	view
Load wheel dependencies (arm64, 13.0) / Caching	✅ success	view
Load wheel dependencies (arm64, 13.0) / Finalize	✅ success	view
Load wheel dependencies (arm64, 13.0) / Metadata	✅ success	view
Prepare cache clean-up	❔ in_progress	view
Retrieve PR info	✅ success	view

✅ Required checks (6/6) — declared in .github/required-checks.yml for push

Required check	Status	Link
Build and test (amd64, llvm, openmpi) / Dev environment (Debug)	✅ success	view
Build and test (amd64, llvm, openmpi) / Dev environment (Python)	✅ success	view
Build and test (arm64, llvm, openmpi) / Dev environment (Debug)	✅ success	view
Build and test (arm64, llvm, openmpi) / Dev environment (Python)	✅ success	view
Build and test (amd64, gcc12, openmpi) / Dev environment (Debug)	✅ success	view
Build and test (amd64, gcc12, openmpi) / Dev environment (Python)	✅ success	view

atgeller

LGTM, just one comment about pass flexibility.

atgeller · 2026-06-22T23:11:54Z

+
+  // 2. candidate has regions (since we don't currently look into regions for
+  // sinking)
+  if (candidate->getNumRegions() > 0)


Should we have a similar test for if the candidate is in the same block? That would make this pass more flexible to go after the lower-to-cfg pass.

Actually, that caused me to consider: what about function calls? Presumably they can use measurement results in arbitrary ways, and do arbitrary aliasing of qubits.

schweitzpgi · 2026-06-23T16:17:18Z

General question: we already have passes to erase measurements and to add measurements (back). Do we need another one?

Also, why do this in the reference semantics when it is easier in value semantics?

schweitzpgi

As stated previously, this should really be done in value-semantics form of quake.

Does it handle measurements of veq types?

The barriers (ops we won't move measurement ops past) include:

users of the measurement result, such as quake.discriminate

operations that may alias the measured target

other measurement operations

region operations

terminators

side-effecting non-Quake operations

This is too constrained, btw.

You can never move an operation into a position where its uses dominate the operation itself. (That's invalid SSA.) You can always "move" an operation ahead of or behind operations that have no relationship at all. This is, in fact, semantically true and does not require modifying the IR in any way whatsoever.

  %2 = op1 %0
  %3 = op2 %1

is exactly the same as

  %3 = op2 %1
  %2 = op1 %0

(assuming these operation are pure). When in value-semantics form measurement operations are pure, btw.

Also, side-effecting classical compute operations don't effect quantum operations like measurements. They would run on completely distinct hardware.

schweitzpgi · 2026-06-23T17:11:53Z

+    if (lhsLoc.baseIsLocalAlloca && rhsLoc.baseIsLocalAlloca)
+      return false; // different known unique bases cannot alias
+    return true;    // otherwise, we conservatively assume they may alias
+  } else {


else after return

schweitzpgi · 2026-06-23T17:12:33Z

+
+/// Want to check if `candidate` uses any result produced by *producer
+bool usesAnyResultOf(mlir::Operation *candidate, mlir::Operation *producer) {
+  for (mlir::Value result : producer->getResults()) {


unneeded braces

cabreraam · 2026-06-23T19:20:18Z

@schweitzpgi tl;dr: seeking alignment. Is it reasonable to re-work this pass using value semantics so it can feed the analysis of UnitaryOpGroupingAnalysis?

Thanks for the feedback, @schweitzpgi! Based on your insights, I've been doing some research this morning into the existing parts of the codebase to address your comments.

why do this in the reference semantics when it is easier in value semantics?

Yeah, value semantics makes more sense; I agree. I won't need to reason about aliasing with value semantics, and instead just use use-def/dominance rather than aliasing. I'm going to rework the pass in that direction. Thanks for the suggestion!

we already have passes to erase measurements and to add measurements (back). Do we need another one?

Understood. Definitely don't want to add redundant features; based on the passes that I've found that touch measurement operations, my understanding is that those passes cover different measurement lifecycle tasks:

add-measurements adds terminal measurements if they're missing
observe-ansatz removes/replaces measurements for observe
combine-measurements / expand-measurements reshape measurement ops.
mapping already defers terminal measurements internally for routing.

The intent for my pass is different from those passes. It would not add, erase, replace, combine, or expand measurements. Instead, it would preserve existing terminal readout and move it later in value-semantics form when SSA/dominance allows. My thought is that this pass can normalize IR and feed UnitaryOpGroupingAnalysis so independent measurements do not split otherwise contiguous unitary regions.

schweitzpgi · 2026-06-24T17:52:37Z

@schweitzpgi tl;dr: seeking alignment. Is it reasonable to re-work this pass using value semantics so it can feed the analysis of UnitaryOpGroupingAnalysis?

Thanks for the feedback, @schweitzpgi! Based on your insights, I've been doing some research this morning into the existing parts of the codebase to address your comments.

why do this in the reference semantics when it is easier in value semantics?

Yeah, value semantics makes more sense; I agree. I won't need to reason about aliasing with value semantics, and instead just use use-def/dominance rather than aliasing. I'm going to rework the pass in that direction. Thanks for the suggestion!

we already have passes to erase measurements and to add measurements (back). Do we need another one?

Understood. Definitely don't want to add redundant features; based on the passes that I've found that touch measurement operations, my understanding is that those passes cover different measurement lifecycle tasks:

add-measurements adds terminal measurements if they're missing

observe-ansatz removes/replaces measurements for observe

combine-measurements / expand-measurements reshape measurement ops.

mapping already defers terminal measurements internally for routing.

The intent for my pass is different from those passes. It would not add, erase, replace, combine, or expand measurements. Instead, it would preserve existing terminal readout and move it later in value-semantics form when SSA/dominance allows. My thought is that this pass can normalize IR and feed UnitaryOpGroupingAnalysis so independent measurements do not split otherwise contiguous unitary regions.

Thanks for the detailed explanation. Super helpful!

We still need to be mindful of barking up the wrong tree here and simply relinking a bunch of linked lists without semantic justification.

Let's take a concrete example. Here is a list of operations. The list is just a notional structure to hold the IR in a specific order to preserve implicit constraints (like side-effects or I/O, etc.). When we're dealing with pure operations, the list has no semantics.

In reference form, we have side-effects to preserve the implicit gate ordering.

  func.func @__nvqpp__mlirgen__function_f._Z1fv() attributes {"cudaq-entrypoint", "cudaq-kernel", no_this} {
    %cst = arith.constant 3.000000e+00 : f64
    %cst_0 = arith.constant 4.000000e+00 : f64
    %0 = quake.alloca !quake.veq<5>
    %1 = quake.extract_ref %0[0] : (!quake.veq<5>) -> !quake.ref
    quake.x %1 : (!quake.ref) -> ()
    %2 = quake.extract_ref %0[1] : (!quake.veq<5>) -> !quake.ref
    quake.x %2 : (!quake.ref) -> ()
    %3 = quake.extract_ref %0[2] : (!quake.veq<5>) -> !quake.ref
    quake.x %3 : (!quake.ref) -> ()
    %4 = quake.extract_ref %0[3] : (!quake.veq<5>) -> !quake.ref
    %5 = quake.extract_ref %0[0] : (!quake.veq<5>) -> !quake.ref
    quake.x [%4] %5 : (!quake.ref, !quake.ref) -> ()
    %6 = quake.extract_ref %0[1] : (!quake.veq<5>) -> !quake.ref
    %measOut = quake.mz %6 : (!quake.ref) -> !cc.measure_handle
    %7 = quake.discriminate %measOut : (!cc.measure_handle) -> i1
    %8 = cc.cast unsigned %7 : (i1) -> i32
    %9 = quake.extract_ref %0[3] : (!quake.veq<5>) -> !quake.ref
    %10 = quake.extract_ref %0[0] : (!quake.veq<5>) -> !quake.ref
    quake.y [%9] %10 : (!quake.ref, !quake.ref) -> ()
    %11 = quake.extract_ref %0[4] : (!quake.veq<5>) -> !quake.ref
    %12 = quake.extract_ref %0[2] : (!quake.veq<5>) -> !quake.ref
    quake.z [%11] %12 : (!quake.ref, !quake.ref) -> ()
    %13 = quake.extract_ref %0[3] : (!quake.veq<5>) -> !quake.ref
    quake.rx (%cst_0) %13 : (f64, !quake.ref) -> ()
    %14 = quake.extract_ref %0[0] : (!quake.veq<5>) -> !quake.ref
    %measOut_1 = quake.mz %14 : (!quake.ref) -> !cc.measure_handle
    %15 = quake.discriminate %measOut_1 : (!cc.measure_handle) -> i1
    %16 = cc.cast unsigned %15 : (i1) -> i32
    %17 = quake.extract_ref %0[2] : (!quake.veq<5>) -> !quake.ref
    quake.rx (%cst) %17 : (f64, !quake.ref) -> ()
    %18 = quake.extract_ref %0[2] : (!quake.veq<5>) -> !quake.ref
    %measOut_2 = quake.mz %18 : (!quake.ref) -> !cc.measure_handle
    %19 = quake.discriminate %measOut_2 : (!cc.measure_handle) -> i1
    %20 = cc.cast unsigned %19 : (i1) -> i32
    call @_Z7dagwoodiii(%8, %16, %20) : (i32, i32, i32) -> ()
    return
  }

In this form the 3 mz operations, the 3 discriminate operations, and the 3 cast operations can all be physically moved (modifying the linked list at the IR level) in the IR to just above the call to dagwood.

But let's go ahead and convert this to value semantics.

  func.func @__nvqpp__mlirgen__function_f._Z1fv() attributes {"cudaq-entrypoint", "cudaq-kernel", no_this} {
    %cst = arith.constant 3.000000e+00 : f64
    %0 = quake.null_wire
    %1 = quake.null_wire
    %2 = quake.null_wire
    %3 = quake.null_wire
    %4 = quake.null_wire
    %5 = quake.x %0 : (!quake.wire) -> !quake.wire
    %6 = quake.x %1 : (!quake.wire) -> !quake.wire
    %7 = quake.x %2 : (!quake.wire) -> !quake.wire
    %8:2 = quake.x [%3] %5 : (!quake.wire, !quake.wire) -> (!quake.wire, !quake.wire)
    %measOut, %wires = quake.mz %6 : (!quake.wire) -> (!cc.measure_handle, !quake.wire)
    %9 = quake.discriminate %measOut : (!cc.measure_handle) -> i1
    %10 = cc.cast unsigned %9 : (i1) -> i32
    %11:2 = quake.y [%8#0] %8#1 : (!quake.wire, !quake.wire) -> (!quake.wire, !quake.wire)
    %12:2 = quake.z [%4] %7 : (!quake.wire, !quake.wire) -> (!quake.wire, !quake.wire)
    %measOut_0, %wires_1 = quake.mz %11#1 : (!quake.wire) -> (!cc.measure_handle, !quake.wire)
    %13 = quake.discriminate %measOut_0 : (!cc.measure_handle) -> i1
    %14 = cc.cast unsigned %13 : (i1) -> i32
    %15 = quake.rx (%cst) %12#1 : (f64, !quake.wire) -> !quake.wire
    %measOut_2, %wires_3 = quake.mz %15 : (!quake.wire) -> (!cc.measure_handle, !quake.wire)
    %16 = quake.discriminate %measOut_2 : (!cc.measure_handle) -> i1
    %17 = cc.cast unsigned %16 : (i1) -> i32
    call @_Z7dagwoodiii(%10, %14, %17) : (i32, i32, i32) -> ()
    return
  }

Here, the linked list in the IR structure has no semantics. There are literally no side-effects here. (Other than the call to dagwood which is needed to anchor this sea of ops and prevent DCE from erasing everything.)

It is perfectly legitimate to run this code without any changes to the IR itself through a process and transcribe it as the sea-of-ops graph that it actually is.

flowchart TD
  cst["%cst = arith.constant 3.0 : f64"]
  n0["%0 = quake.null_wire"]
  n1["%1 = quake.null_wire"]
  n2["%2 = quake.null_wire"]
  n3["%3 = quake.null_wire"]
  n4["%4 = quake.null_wire"]

  x0["%5 = quake.x %0"]
  x1["%6 = quake.x %1"]
  x2["%7 = quake.x %2"]

  x3["%8:2 = quake.x [%3] %5"]
  y0["%11:2 = quake.y [%8#0] %8#1"]
  z0["%12:2 = quake.z [%4] %7"]

  mz0["%measOut, %wires = quake.mz %6"]
  disc0["%9 = quake.discriminate %measOut"]
  cast0["%10 = cc.cast unsigned %9 -> i32"]

  mz1["%measOut_0, %wires_1 = quake.mz %11#1"]
  disc1["%13 = quake.discriminate %measOut_0"]
  cast1["%14 = cc.cast unsigned %13 -> i32"]

  rx0["%15 = quake.rx(%cst) %12#1"]
  mz2["%measOut_2, %wires_3 = quake.mz %15"]
  disc2["%16 = quake.discriminate %measOut_2"]
  cast2["%17 = cc.cast unsigned %16 -> i32"]

  callDag["call @_Z7dagwoodiii(%10, %14, %17)"]

  %% Wires into first layer of X gates
  n0 --> x0
  n1 --> x1
  n2 --> x2
  n3 --> x3
  n4 --> z0

  %% Dataflow between quantum gates
  x0 --> x3
  x1 --> mz0
  x2 --> z0
  x3 --> y0
  z0 --> rx0

  %% Measurement and classical path 0
  mz0 --> disc0
  disc0 --> cast0

  %% Measurement and classical path 1
  y0 --> mz1
  mz1 --> disc1
  disc1 --> cast1

  %% Rotation / measurement / classical path 2
  cst --> rx0
  rx0 --> mz2
  mz2 --> disc2
  disc2 --> cast2

  %% Final call
  cast0 --> callDag
  cast1 --> callDag
  cast2 --> callDag

This shows that physically moving the IR around and relinking it in its containers is quite simply unnecessary for the compiler.

What's nice about this graph visualization (and not relying on the linear transcription above it), is that it is immediately obvious that there are 3 completely unrelated subgraphs of quake IR that can all be run in parallel. dagwood just has to wait for the dataflow of those 3 subgraphs to conclude and once it has, it can be called. So we can leverage pure analysis here to ascertain these subgraphs and there is no effort involved in massaging Operations around in the IR into any peculiar positions required.

sacpis

@cabreraam Overall LGTM. Have doubt about one of the quantum op followed by the classical one.

sacpis · 2026-06-29T16:26:23Z

+  for (mlir::Value target : msmt.getTargets()) {
+    for (auto operand : cudaq::quake::getQuantumOperands(candidate))
+      if (mayAliasQuantumValues(target, operand))


Seems like a measurement can be incorrectly sunk past a quake.exp_pauli or quake.apply that operates on the measured qubit. Here the function relies on getQuantumOperands and getQuantumResults which call getQuantumTypesFromRange. That function assumes the operand layout as (classical...)(quantum..). It skips the leading classical operands (// Skip over classical types at the beginning). Then if any remaining operands is classical, it returns an empty range.

Can we please investigate this issue?

Two quake ops violate the assumption by placing a classical operand after quantum ones are

quake.exp_pauli: args are [parameters(float), controls(q), targets(q), pauli(cc.ptr/charspan)].
The trailing pauli operand forces the empty return.

quake.apply: [indirect_callee(cc.callable), controls(q), actuals(AnyType)]
Any classical actual after the quantum controls forces the empty return.

Thanks for the feedback @sacpis!

Yes, I agree. As currently written, we would:

incorrectly sink a measurement past an exp_pauli because of an incorrect generalization about where classical and quantum arguments appear in quake ops.

Same for apply

Using the value-semantics approach will allow us to account for these issues.

I am working on a rewrite that will address this!

cabreraam added 3 commits June 15, 2026 16:47

sink measurements

35e5acf

Signed-off-by: Anthony Cabrera <antcabrera@nvidia.com>

[transformation] add a compiler pass that sinks measurement operation…

323a758

…s past independent ops Signed-off-by: Anthony Cabrera <antcabrera@nvidia.com>

formatting test cases

249a606

Signed-off-by: Anthony Cabrera <antcabrera@nvidia.com>

cabreraam requested review from atgeller, khalatepradnya, sacpis and schweitzpgi June 16, 2026 21:17

atgeller approved these changes Jun 22, 2026

View reviewed changes

Merge branch 'main' into sink-msmts

11d45de

schweitzpgi reviewed Jun 23, 2026

View reviewed changes

Merge branch 'main' into sink-msmts

49b80d2

sacpis reviewed Jun 29, 2026

View reviewed changes

Uh oh!

Conversation

cabreraam commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI Summary (push) — ✅ passed

Uh oh!

atgeller left a comment

Choose a reason for hiding this comment

Uh oh!

atgeller Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

atgeller Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

schweitzpgi commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

schweitzpgi left a comment

Choose a reason for hiding this comment

Uh oh!

schweitzpgi Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

schweitzpgi Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

cabreraam commented Jun 23, 2026

Uh oh!

schweitzpgi commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sacpis left a comment

Choose a reason for hiding this comment

Uh oh!

sacpis Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

sacpis Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

cabreraam Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions Bot commented Jun 16, 2026 •

edited

Loading

CI Summary (`push`) — ✅ passed

schweitzpgi commented Jun 23, 2026 •

edited

Loading

schweitzpgi commented Jun 24, 2026 •

edited

Loading