[Transformation] Add a compiler pass that sinks measurement operations past independent ops#4749
[Transformation] Add a compiler pass that sinks measurement operations past independent ops#4749cabreraam wants to merge 5 commits into
Conversation
Signed-off-by: Anthony Cabrera <antcabrera@nvidia.com>
…s past independent ops Signed-off-by: Anthony Cabrera <antcabrera@nvidia.com>
Signed-off-by: Anthony Cabrera <antcabrera@nvidia.com>
CI Summary (
|
| Job | Result |
|---|---|
binaries |
⏩ skipped |
build_and_test |
✅ success |
config_devdeps |
✅ success |
config_source_build |
⏩ skipped |
config_wheeldeps |
✅ success |
devdeps |
✅ success |
docker_image |
⏩ skipped |
gen_code_coverage |
⏩ skipped |
metadata |
✅ success |
python_metapackages |
⏩ skipped |
python_wheels |
⏩ skipped |
source_build |
⏩ skipped |
wheeldeps |
✅ success |
⏩ Skipped jobs (7) — intentionally skipped on PR builds; run on merge_group / workflow_dispatch
| Job |
|---|
binaries |
config_source_build |
docker_image |
gen_code_coverage |
python_metapackages |
python_wheels |
source_build |
All sub-jobs (42) — every matrix leg, with links
| Job | Status | Link |
|---|---|---|
| Build and test (amd64, gcc12, openmpi) / Dev environment (Debug) | ✅ success | view |
| Build and test (amd64, gcc12, openmpi) / Dev environment (Python) | ✅ success | view |
| Build and test (amd64, llvm, openmpi) / Dev environment (Debug) | ✅ success | view |
| Build and test (amd64, llvm, openmpi) / Dev environment (Python) | ✅ success | view |
| Build and test (arm64, llvm, openmpi) / Dev environment (Debug) | ✅ success | view |
| Build and test (arm64, llvm, openmpi) / Dev environment (Python) | ✅ success | view |
| CI Summary | ❔ in_progress | view |
| Configure build (devdeps) | ✅ success | view |
| Configure build (source_build) | ⏩ skipped | view |
| Configure build (wheeldeps) | ✅ success | view |
| Create CUDA Quantum installer | ⏩ skipped | view |
| Create Docker images | ⏩ skipped | view |
| Create Python metapackages | ⏩ skipped | view |
| Create Python wheels | ⏩ skipped | view |
| Gen code coverage | ⏩ skipped | view |
| Load dependencies (amd64, gcc12) / Caching | ✅ success | view |
| Load dependencies (amd64, gcc12) / Finalize | ✅ success | view |
| Load dependencies (amd64, gcc12) / Metadata | ✅ success | view |
| Load dependencies (amd64, llvm) / Caching | ✅ success | view |
| Load dependencies (amd64, llvm) / Finalize | ✅ success | view |
| Load dependencies (amd64, llvm) / Metadata | ✅ success | view |
| Load dependencies (arm64, gcc12) / Caching | ✅ success | view |
| Load dependencies (arm64, gcc12) / Finalize | ✅ success | view |
| Load dependencies (arm64, gcc12) / Metadata | ✅ success | view |
| Load dependencies (arm64, llvm) / Caching | ✅ success | view |
| Load dependencies (arm64, llvm) / Finalize | ✅ success | view |
| Load dependencies (arm64, llvm) / Metadata | ✅ success | view |
| Load source build cache | ⏩ skipped | view |
| Load wheel dependencies (amd64, 12.6) / Caching | ✅ success | view |
| Load wheel dependencies (amd64, 12.6) / Finalize | ✅ success | view |
| Load wheel dependencies (amd64, 12.6) / Metadata | ✅ success | view |
| Load wheel dependencies (amd64, 13.0) / Caching | ✅ success | view |
| Load wheel dependencies (amd64, 13.0) / Finalize | ✅ success | view |
| Load wheel dependencies (amd64, 13.0) / Metadata | ✅ success | view |
| Load wheel dependencies (arm64, 12.6) / Caching | ✅ success | view |
| Load wheel dependencies (arm64, 12.6) / Finalize | ✅ success | view |
| Load wheel dependencies (arm64, 12.6) / Metadata | ✅ success | view |
| Load wheel dependencies (arm64, 13.0) / Caching | ✅ success | view |
| Load wheel dependencies (arm64, 13.0) / Finalize | ✅ success | view |
| Load wheel dependencies (arm64, 13.0) / Metadata | ✅ success | view |
| Prepare cache clean-up | ❔ in_progress | view |
| Retrieve PR info | ✅ success | view |
✅ Required checks (6/6) — declared in .github/required-checks.yml for push
| Required check | Status | Link |
|---|---|---|
| Build and test (amd64, llvm, openmpi) / Dev environment (Debug) | ✅ success | view |
| Build and test (amd64, llvm, openmpi) / Dev environment (Python) | ✅ success | view |
| Build and test (arm64, llvm, openmpi) / Dev environment (Debug) | ✅ success | view |
| Build and test (arm64, llvm, openmpi) / Dev environment (Python) | ✅ success | view |
| Build and test (amd64, gcc12, openmpi) / Dev environment (Debug) | ✅ success | view |
| Build and test (amd64, gcc12, openmpi) / Dev environment (Python) | ✅ success | view |
atgeller
left a comment
There was a problem hiding this comment.
LGTM, just one comment about pass flexibility.
|
|
||
| // 2. candidate has regions (since we don't currently look into regions for | ||
| // sinking) | ||
| if (candidate->getNumRegions() > 0) |
There was a problem hiding this comment.
Should we have a similar test for if the candidate is in the same block? That would make this pass more flexible to go after the lower-to-cfg pass.
There was a problem hiding this comment.
Actually, that caused me to consider: what about function calls? Presumably they can use measurement results in arbitrary ways, and do arbitrary aliasing of qubits.
|
General question: we already have passes to erase measurements and to add measurements (back). Do we need another one? Also, why do this in the reference semantics when it is easier in value semantics? |
schweitzpgi
left a comment
There was a problem hiding this comment.
As stated previously, this should really be done in value-semantics form of quake.
Does it handle measurements of veq types?
The barriers (ops we won't move measurement ops past) include:
- users of the measurement result, such as quake.discriminate
- operations that may alias the measured target
- other measurement operations
- region operations
- terminators
- side-effecting non-Quake operations
This is too constrained, btw.
You can never move an operation into a position where its uses dominate the operation itself. (That's invalid SSA.) You can always "move" an operation ahead of or behind operations that have no relationship at all. This is, in fact, semantically true and does not require modifying the IR in any way whatsoever.
%2 = op1 %0
%3 = op2 %1is exactly the same as
%3 = op2 %1
%2 = op1 %0(assuming these operation are pure). When in value-semantics form measurement operations are pure, btw.
Also, side-effecting classical compute operations don't effect quantum operations like measurements. They would run on completely distinct hardware.
| if (lhsLoc.baseIsLocalAlloca && rhsLoc.baseIsLocalAlloca) | ||
| return false; // different known unique bases cannot alias | ||
| return true; // otherwise, we conservatively assume they may alias | ||
| } else { |
|
|
||
| /// Want to check if `candidate` uses any result produced by *producer | ||
| bool usesAnyResultOf(mlir::Operation *candidate, mlir::Operation *producer) { | ||
| for (mlir::Value result : producer->getResults()) { |
|
@schweitzpgi tl;dr: seeking alignment. Is it reasonable to re-work this pass using value semantics so it can feed the analysis of Thanks for the feedback, @schweitzpgi! Based on your insights, I've been doing some research this morning into the existing parts of the codebase to address your comments.
Yeah, value semantics makes more sense; I agree. I won't need to reason about aliasing with value semantics, and instead just use use-def/dominance rather than aliasing. I'm going to rework the pass in that direction. Thanks for the suggestion!
Understood. Definitely don't want to add redundant features; based on the passes that I've found that touch measurement operations, my understanding is that those passes cover different measurement lifecycle tasks:
The intent for my pass is different from those passes. It would not add, erase, replace, combine, or expand measurements. Instead, it would preserve existing terminal readout and move it later in value-semantics form when SSA/dominance allows. My thought is that this pass can normalize IR and feed |
Thanks for the detailed explanation. Super helpful! We still need to be mindful of barking up the wrong tree here and simply relinking a bunch of linked lists without semantic justification. Let's take a concrete example. Here is a list of operations. The list is just a notional structure to hold the IR in a specific order to preserve implicit constraints (like side-effects or I/O, etc.). When we're dealing with pure operations, the list has no semantics. In reference form, we have side-effects to preserve the implicit gate ordering. func.func @__nvqpp__mlirgen__function_f._Z1fv() attributes {"cudaq-entrypoint", "cudaq-kernel", no_this} {
%cst = arith.constant 3.000000e+00 : f64
%cst_0 = arith.constant 4.000000e+00 : f64
%0 = quake.alloca !quake.veq<5>
%1 = quake.extract_ref %0[0] : (!quake.veq<5>) -> !quake.ref
quake.x %1 : (!quake.ref) -> ()
%2 = quake.extract_ref %0[1] : (!quake.veq<5>) -> !quake.ref
quake.x %2 : (!quake.ref) -> ()
%3 = quake.extract_ref %0[2] : (!quake.veq<5>) -> !quake.ref
quake.x %3 : (!quake.ref) -> ()
%4 = quake.extract_ref %0[3] : (!quake.veq<5>) -> !quake.ref
%5 = quake.extract_ref %0[0] : (!quake.veq<5>) -> !quake.ref
quake.x [%4] %5 : (!quake.ref, !quake.ref) -> ()
%6 = quake.extract_ref %0[1] : (!quake.veq<5>) -> !quake.ref
%measOut = quake.mz %6 : (!quake.ref) -> !cc.measure_handle
%7 = quake.discriminate %measOut : (!cc.measure_handle) -> i1
%8 = cc.cast unsigned %7 : (i1) -> i32
%9 = quake.extract_ref %0[3] : (!quake.veq<5>) -> !quake.ref
%10 = quake.extract_ref %0[0] : (!quake.veq<5>) -> !quake.ref
quake.y [%9] %10 : (!quake.ref, !quake.ref) -> ()
%11 = quake.extract_ref %0[4] : (!quake.veq<5>) -> !quake.ref
%12 = quake.extract_ref %0[2] : (!quake.veq<5>) -> !quake.ref
quake.z [%11] %12 : (!quake.ref, !quake.ref) -> ()
%13 = quake.extract_ref %0[3] : (!quake.veq<5>) -> !quake.ref
quake.rx (%cst_0) %13 : (f64, !quake.ref) -> ()
%14 = quake.extract_ref %0[0] : (!quake.veq<5>) -> !quake.ref
%measOut_1 = quake.mz %14 : (!quake.ref) -> !cc.measure_handle
%15 = quake.discriminate %measOut_1 : (!cc.measure_handle) -> i1
%16 = cc.cast unsigned %15 : (i1) -> i32
%17 = quake.extract_ref %0[2] : (!quake.veq<5>) -> !quake.ref
quake.rx (%cst) %17 : (f64, !quake.ref) -> ()
%18 = quake.extract_ref %0[2] : (!quake.veq<5>) -> !quake.ref
%measOut_2 = quake.mz %18 : (!quake.ref) -> !cc.measure_handle
%19 = quake.discriminate %measOut_2 : (!cc.measure_handle) -> i1
%20 = cc.cast unsigned %19 : (i1) -> i32
call @_Z7dagwoodiii(%8, %16, %20) : (i32, i32, i32) -> ()
return
}In this form the 3 mz operations, the 3 discriminate operations, and the 3 cast operations can all be physically moved (modifying the linked list at the IR level) in the IR to just above the call to But let's go ahead and convert this to value semantics. func.func @__nvqpp__mlirgen__function_f._Z1fv() attributes {"cudaq-entrypoint", "cudaq-kernel", no_this} {
%cst = arith.constant 3.000000e+00 : f64
%0 = quake.null_wire
%1 = quake.null_wire
%2 = quake.null_wire
%3 = quake.null_wire
%4 = quake.null_wire
%5 = quake.x %0 : (!quake.wire) -> !quake.wire
%6 = quake.x %1 : (!quake.wire) -> !quake.wire
%7 = quake.x %2 : (!quake.wire) -> !quake.wire
%8:2 = quake.x [%3] %5 : (!quake.wire, !quake.wire) -> (!quake.wire, !quake.wire)
%measOut, %wires = quake.mz %6 : (!quake.wire) -> (!cc.measure_handle, !quake.wire)
%9 = quake.discriminate %measOut : (!cc.measure_handle) -> i1
%10 = cc.cast unsigned %9 : (i1) -> i32
%11:2 = quake.y [%8#0] %8#1 : (!quake.wire, !quake.wire) -> (!quake.wire, !quake.wire)
%12:2 = quake.z [%4] %7 : (!quake.wire, !quake.wire) -> (!quake.wire, !quake.wire)
%measOut_0, %wires_1 = quake.mz %11#1 : (!quake.wire) -> (!cc.measure_handle, !quake.wire)
%13 = quake.discriminate %measOut_0 : (!cc.measure_handle) -> i1
%14 = cc.cast unsigned %13 : (i1) -> i32
%15 = quake.rx (%cst) %12#1 : (f64, !quake.wire) -> !quake.wire
%measOut_2, %wires_3 = quake.mz %15 : (!quake.wire) -> (!cc.measure_handle, !quake.wire)
%16 = quake.discriminate %measOut_2 : (!cc.measure_handle) -> i1
%17 = cc.cast unsigned %16 : (i1) -> i32
call @_Z7dagwoodiii(%10, %14, %17) : (i32, i32, i32) -> ()
return
}Here, the linked list in the IR structure has no semantics. There are literally no side-effects here. (Other than the call to It is perfectly legitimate to run this code without any changes to the IR itself through a process and transcribe it as the sea-of-ops graph that it actually is. flowchart TD
cst["%cst = arith.constant 3.0 : f64"]
n0["%0 = quake.null_wire"]
n1["%1 = quake.null_wire"]
n2["%2 = quake.null_wire"]
n3["%3 = quake.null_wire"]
n4["%4 = quake.null_wire"]
x0["%5 = quake.x %0"]
x1["%6 = quake.x %1"]
x2["%7 = quake.x %2"]
x3["%8:2 = quake.x [%3] %5"]
y0["%11:2 = quake.y [%8#0] %8#1"]
z0["%12:2 = quake.z [%4] %7"]
mz0["%measOut, %wires = quake.mz %6"]
disc0["%9 = quake.discriminate %measOut"]
cast0["%10 = cc.cast unsigned %9 -> i32"]
mz1["%measOut_0, %wires_1 = quake.mz %11#1"]
disc1["%13 = quake.discriminate %measOut_0"]
cast1["%14 = cc.cast unsigned %13 -> i32"]
rx0["%15 = quake.rx(%cst) %12#1"]
mz2["%measOut_2, %wires_3 = quake.mz %15"]
disc2["%16 = quake.discriminate %measOut_2"]
cast2["%17 = cc.cast unsigned %16 -> i32"]
callDag["call @_Z7dagwoodiii(%10, %14, %17)"]
%% Wires into first layer of X gates
n0 --> x0
n1 --> x1
n2 --> x2
n3 --> x3
n4 --> z0
%% Dataflow between quantum gates
x0 --> x3
x1 --> mz0
x2 --> z0
x3 --> y0
z0 --> rx0
%% Measurement and classical path 0
mz0 --> disc0
disc0 --> cast0
%% Measurement and classical path 1
y0 --> mz1
mz1 --> disc1
disc1 --> cast1
%% Rotation / measurement / classical path 2
cst --> rx0
rx0 --> mz2
mz2 --> disc2
disc2 --> cast2
%% Final call
cast0 --> callDag
cast1 --> callDag
cast2 --> callDag
This shows that physically moving the IR around and relinking it in its containers is quite simply unnecessary for the compiler. What's nice about this graph visualization (and not relying on the linear transcription above it), is that it is immediately obvious that there are 3 completely unrelated subgraphs of quake IR that can all be run in parallel. |
sacpis
left a comment
There was a problem hiding this comment.
@cabreraam Overall LGTM. Have doubt about one of the quantum op followed by the classical one.
| for (mlir::Value target : msmt.getTargets()) { | ||
| for (auto operand : cudaq::quake::getQuantumOperands(candidate)) | ||
| if (mayAliasQuantumValues(target, operand)) |
There was a problem hiding this comment.
Seems like a measurement can be incorrectly sunk past a quake.exp_pauli or quake.apply that operates on the measured qubit. Here the function relies on getQuantumOperands and getQuantumResults which call getQuantumTypesFromRange. That function assumes the operand layout as (classical...)(quantum..). It skips the leading classical operands (// Skip over classical types at the beginning). Then if any remaining operands is classical, it returns an empty range.
Can we please investigate this issue?
There was a problem hiding this comment.
Two quake ops violate the assumption by placing a classical operand after quantum ones are
- quake.exp_pauli: args are [parameters(float), controls(q), targets(q), pauli(cc.ptr/charspan)].
The trailing pauli operand forces the empty return. - quake.apply: [indirect_callee(cc.callable), controls(q), actuals(AnyType)]
Any classical actual after the quantum controls forces the empty return.
There was a problem hiding this comment.
Thanks for the feedback @sacpis!
Yes, I agree. As currently written, we would:
- incorrectly sink a measurement past an
exp_paulibecause of an incorrect generalization about where classical and quantum arguments appear in quake ops. - Same for
apply
Using the value-semantics approach will allow us to account for these issues.
I am working on a rewrite that will address this!
Adds a new
sink-measurementsthat operates onfuncOps and moves Quake measurement operations later within a block when doing so is semantically safe.The pass sinks measurements past operations that do not consume the measurement result and do not touch the measured quantum target.
It uses conservative quantum alias checks, only proving non-aliasing for cases such as distinct local
quake.allocavalues or different constant-indexquake.extract_refvalues from the same localveq.The barriers (ops we won't move measurement ops past) include:
Adds
cudaq/test/Transforms/sink_measurements.qkewith coverage for!quake.refs don't point to the same qubitveqsveqoperands