|
| 1 | +# CUDA-QX QEC Skill Benchmark |
| 2 | + |
| 3 | +Evaluation prompts for the `cuda-qx-qec` skill. Same methodology as the |
| 4 | +solvers benchmark; runnable via `scripts/score_benchmark.py`. |
| 5 | + |
| 6 | +## Methodology |
| 7 | + |
| 8 | +Run three passes: |
| 9 | + |
| 10 | +1. With skill enabled |
| 11 | +2. Without skill (control) |
| 12 | +3. Activation pass (see below) |
| 13 | + |
| 14 | +Two scoring layers: |
| 15 | + |
| 16 | +**Human rubric** (per scenario, 0–8): |
| 17 | + |
| 18 | +- Correctness (0–2): facts true, paths/APIs real |
| 19 | +- Specificity (0–2): cites files, exact API names, exact kwargs |
| 20 | +- Coverage (0–2): hits each "must include" item |
| 21 | +- No hallucinations (0–2): no "must not include" items |
| 22 | + |
| 23 | +12 scenarios × 8 + 10 activation = 106 max. |
| 24 | + |
| 25 | +**Substring proxy** (`scripts/score_benchmark.py`): |
| 26 | + |
| 27 | +- Coverage max for QEC = 45 (sum of `must_include` items) |
| 28 | +- Purity max for QEC = 13 (1 per scenario, more for some) |
| 29 | +- Activation = 10 |
| 30 | +- Substring total max = 68 for QEC. Always pair with a human pass. |
| 31 | + |
| 32 | +--- |
| 33 | + |
| 34 | +## Scenario Prompts |
| 35 | + |
| 36 | +### 1. F-order Parity Matrix |
| 37 | + |
| 38 | +**prompt:** "My `qec.get_decoder('pymatching', H)` raises a runtime error. |
| 39 | +H is `dtype=uint8` but ordered as Fortran. Help." |
| 40 | + |
| 41 | +- must_include: "C-order", "C-contiguous" |
| 42 | +- must_not_include: "F-order is supported" |
| 43 | + |
| 44 | +### 2. cuStabilizer Missing |
| 45 | + |
| 46 | +**prompt:** "Importing `cudaq_qec` fails with `libcustabilizer` not found." |
| 47 | + |
| 48 | +- must_include: `cuquantum-python-cu12`, `cuquantum-python-cu13`, |
| 49 | + `26.03.0` |
| 50 | +- must_not_include: "uninstall cudaq_qec" |
| 51 | + |
| 52 | +### 3. dem_sampling Backend Choice |
| 53 | + |
| 54 | +**prompt:** "How do I force GPU sampling in `dem_sampling`, and what |
| 55 | +happens when GPU isn't available?" |
| 56 | + |
| 57 | +- must_include: `backend="gpu"`, `RuntimeError`, `auto`, `cpu` |
| 58 | +- must_not_include: "silently falls back when gpu requested" |
| 59 | + |
| 60 | +### 4. PyTorch CPU Tensor |
| 61 | + |
| 62 | +**prompt:** "Can I pass a PyTorch CPU tensor to `dem_sampling`?" |
| 63 | + |
| 64 | +- must_include: "no", "convert to NumPy" |
| 65 | +- must_not_include: "yes, supported" |
| 66 | + |
| 67 | +### 5. Steane Memory Circuit Shape |
| 68 | + |
| 69 | +**prompt:** "What is the shape of `syndromes` returned by |
| 70 | +`sample_memory_circuit(steane, prep0, numShots=10, numRounds=4)`?" |
| 71 | + |
| 72 | +- must_include: `(40, 6)`, "numShots * numRounds" |
| 73 | +- must_not_include: `(10, 4, 6)` |
| 74 | + |
| 75 | +### 6. Custom Python Code Requirements |
| 76 | + |
| 77 | +**prompt:** "What attributes must my `@qec.code` class expose?" |
| 78 | + |
| 79 | +- must_include: `stabilizers`, `pauli_observables`, `operation_encodings` |
| 80 | +- must_include: `get_num_data_qubits`, `get_num_ancilla_qubits` |
| 81 | +- must_not_include: "only stabilizers are required" |
| 82 | + |
| 83 | +### 7. Sliding Window Decoder Setup |
| 84 | + |
| 85 | +**prompt:** "How do I wrap pymatching in a sliding window decoder over a |
| 86 | +distance-5 surface code?" |
| 87 | + |
| 88 | +- must_include: `sliding_window`, `window_size`, `step_size`, |
| 89 | + `num_syndromes_per_round`, `inner_decoder_name` |
| 90 | +- must_include: `error_rate_vec` |
| 91 | +- must_not_include: "sliding_window does not require an inner decoder" |
| 92 | + |
| 93 | +### 8. NV-QLDPC Config Fields |
| 94 | + |
| 95 | +**prompt:** "What does `nv_qldpc_decoder_config` accept?" |
| 96 | + |
| 97 | +- must_include: `use_osd`, `bp_method`, `proc_float`, `error_rate_vec`, |
| 98 | + `max_iterations` |
| 99 | +- must_include: "default to None" |
| 100 | +- must_not_include: "all fields are required" |
| 101 | + |
| 102 | +### 9. Tensor Network Decoder Install |
| 103 | + |
| 104 | +**prompt:** "I get `ModuleNotFoundError: quimb` when loading the tensor |
| 105 | +network decoder. Fix?" |
| 106 | + |
| 107 | +- must_include: `cudaq-qec[tensor_network_decoder]` (or `[all]`), |
| 108 | + `quimb`, `cuquantum-python` |
| 109 | +- must_not_include: "quimb is included by default" |
| 110 | + |
| 111 | +### 10. License Surprise |
| 112 | + |
| 113 | +**prompt:** "Is `cudaq-qec` Apache 2.0 like `cudaq-solvers`?" |
| 114 | + |
| 115 | +- must_include: "no", `LicenseRef-NVIDIA-Proprietary`, |
| 116 | + `libs/qec/pyproject.toml` |
| 117 | +- must_not_include: "yes, both are Apache 2.0" |
| 118 | + |
| 119 | +### 11. Operation Enum |
| 120 | + |
| 121 | +**prompt:** "What logical operations does `qec.operation` expose?" |
| 122 | + |
| 123 | +- must_include: `prep0`, `prep1`, `prepp`, `prepm`, `stabilizer_round` |
| 124 | +- must_include: `cx`, `cz`, `h`, `s` |
| 125 | +- must_not_include: "swap", "measure_x" (not in the enum) |
| 126 | + |
| 127 | +### 12. DEM Variants |
| 128 | + |
| 129 | +**prompt:** "What's the difference between `dem_from_memory_circuit`, |
| 130 | +`x_dem_from_memory_circuit`, and `z_dem_from_memory_circuit`?" |
| 131 | + |
| 132 | +- must_include: "X errors only", "Z errors only", "full" |
| 133 | +- must_include: `DetectorErrorModel`, `detector_error_matrix` |
| 134 | +- must_not_include: "they are aliases" |
| 135 | + |
| 136 | +--- |
| 137 | + |
| 138 | +## Activation Tests |
| 139 | + |
| 140 | +| # | prompt | should_activate | |
| 141 | +| --- | --- | --- | |
| 142 | +| A1 | "Build a Steane memory experiment" | Y | |
| 143 | +| A2 | "Wrap pymatching in a sliding window decoder" | Y | |
| 144 | +| A3 | "Run dem_sampling on a parity check matrix" | Y | |
| 145 | +| A4 | "Configure NV-QLDPC for a custom code" | Y | |
| 146 | +| A5 | "Install CUDA-Q from pip" | N | |
| 147 | +| A6 | "Write a VQE for H2" | N | |
| 148 | +| A7 | "Run a Bell state kernel" | N | |
| 149 | +| A8 | "Generate a tensor-network decoder" | Y | |
| 150 | +| A9 | "Use ADAPT-VQE on a molecule" | N | |
| 151 | +| A10 | "Construct a detector error model from memory rounds" | Y | |
| 152 | + |
| 153 | +## Sources |
| 154 | + |
| 155 | +- `libs/qec/python/cudaq_qec/__init__.py` |
| 156 | +- `libs/qec/python/bindings/py_decoder.cpp`, `py_code.cpp`, |
| 157 | + `py_dem_sampling.cpp` |
| 158 | +- `libs/qec/python/cudaq_qec/dem_sampling.py` |
| 159 | +- `libs/qec/python/cudaq_qec/plugins/decoders/tensor_network_decoder.py` |
| 160 | +- `libs/qec/python/tests/test_*.py` |
| 161 | +- `libs/qec/pyproject.toml.cu12` |
| 162 | +- `docs/sphinx/components/qec/introduction.rst` |
0 commit comments