This example demonstrates using Mellea's Instruct-Validate-Repair (IVR) pattern to generate Qiskit quantum computing code that automatically passes flake8-qiskit-migration validation rules (QKT rules).
Takes a prompt containing deprecated Qiskit code and:
- Generates corrected code using the LLM
- Validates the output against QKT rules
- Automatically repairs the code if validation fails (up to 10 attempts)
# Run the example (uses default deprecated code prompt)
uv run docs/examples/instruct_validate_repair/qiskit_code_validation/qiskit_code_validation.pyDependencies (mellea, flake8-qiskit-migration) are automatically installed.
- Ollama backend running locally (
ollama serve) - Compatible model:
hf.co/Qiskit/mistral-small-3.2-24b-qiskit-GGUF:latest(recommended — domain-specialized; see Changing the Model) - flake8-qiskit-migration: Automatically installed when using
uv run
- Instruction: LLM generates code following structured requirements
- Post-condition validation: Validates generated code against QKT rules (see Qiskit Migration Guide)
- Repair loop: Automatically repairs code that fails validation (up to 10 attempts)
The example supports two repair strategies (see Sampling Strategies):
- RepairTemplateStrategy (default): Adds validation failure reasons directly to the instruction and retries generation
- MultiTurnStrategy: Builds conversation history by adding validation failures as new user messages
To switch strategies, edit the use_multiturn_strategy variable in test_qiskit_code_validation()
Note: MultiTurnStrategy requires ChatContext() while RepairTemplateStrategy works with SimpleContext(). The example automatically selects the appropriate context based on your strategy choice.
Benchmarks on mistral-small-3.2-24b-qiskit model:
| Dataset | Strategy | First Pass (QKT) | Post-Repair (QKT) |
|---|---|---|---|
| QHE | RepairTemplate | 97.4% | 100% |
| MultiTurn | 95.4% | 100% | |
| QKT | RepairTemplate | 88.9% | 100% |
| MultiTurn | 97.8% | 100% |
Datasets:
- QHE (QiskitHumanEval): 151 general Qiskit code generation tasks
- QKT: 45 Qiskit version migration tasks requiring fixes to deprecated APIs
Note: Pass rates measure whether generated code passes QKT validation rules, not whether the code correctly solves the prompt. On QHE, the model achieves ~27.8% correctness when running the QHE check() test suite against the generated code. Full benchmark data and analysis are available in @ajbozarth's toolbox repo.
qiskit_code_validation/
├── qiskit_code_validation.py # Main example
├── validation_helpers.py # Validation utilities
└── README.md # This file
validation_helpers.py provides:
extract_code_from_markdown(): Extracts code from markdown blocksvalidate_qiskit_migration(): Validates against QKT rulesvalidate_input_code(): Pre-validates input prompts
To try different prompts, edit the prompt variable in test_qiskit_code_validation() function. Here are some examples you can copy/paste:
Bell State Circuit:
prompt = "create a bell state circuit"List Backends:
prompt = "use qiskit to list fake backends"Random Circuit:
prompt = "give me a random qiskit circuit"Toffoli Gate:
prompt = """Complete this code:
```python
from qiskit import QuantumCircuit
qc = QuantumCircuit(3)
qc.toffoli(0, 1, 2)
# draw the circuit
```
"""Entanglement Circuit:
prompt = """from qiskit import QuantumCircuit
# create an entanglement state circuit
"""The default prompt demonstrates fixing deprecated Qiskit APIs:
prompt = """from qiskit import BasicAer, QuantumCircuit, execute
backend = BasicAer.get_backend('qasm_simulator')
qc = QuantumCircuit(5, 5)
qc.h(0)
qc.cnot(0, range(1, 5))
qc.measure_all()
# run circuit on the simulator"""This code uses deprecated APIs (BasicAer, execute) that the LLM will automatically fix to use modern Qiskit APIs.
Runtime Service with Estimator:
prompt = """from qiskit.circuit.random import random_circuit
from qiskit.quantum_info import SparsePauliOp
from qiskit_ibm_runtime import Estimator, Options, QiskitRuntimeService, Session
# create a Qiskit random circuit named "circuit" with 2 qubits, depth 2, seed 1.
# After that, generate an observable type SparsePauliOp("IY"). Run it in the backend "ibm_sherbrooke" using QiskitRuntimeService inside a session
# Instantiate the runtime Estimator primitive using the session and the options optimization level 3 and resilience level 2. Run the estimator
# Conclude the code printing the observable, expectation value and the metadata of the job."""Bell Circuit with Runtime Service:
prompt = """from qiskit import QuantumCircuit
from qiskit_ibm_runtime import QiskitRuntimeService
# define a Bell circuit and run it in ibm_salamanca using QiskitRuntimeService"""When you run the example with the default deprecated code prompt, you'll see:
====== Prompt ======
from qiskit import BasicAer, QuantumCircuit, execute
backend = BasicAer.get_backend('qasm_simulator')
qc = QuantumCircuit(5, 5)
qc.h(0)
qc.cnot(0, range(1, 5))
qc.measure_all()
# run circuit on the simulator
======================
Validation failed with 1 error(s):
QKT101: QuantumCircuit.cnot() has been removed in Qiskit 1.0; use `.cx()` instead
====== Result (23.1s, 2 attempt(s)) ======
```python
from qiskit_aer import AerSimulator, QuantumCircuit
backend = AerSimulator()
qc = QuantumCircuit(5, 5)
qc.h(0)
qc.cx(0, range(1, 5))
qc.measure_all()
```
======================
✓ Code passes Qiskit migration validation
Note: The exact output may vary depending on the model and its interpretation of the prompt.
To try a different model, edit the model_id variable in the test_qiskit_code_validation() function:
model_id = "hf.co/Qiskit/mistral-small-3.2-24b-qiskit-GGUF:latest"The default model is a Qiskit-specialized fine-tune of Mistral Small. It requires a large initial download (~15GB) but produces reliable results without a system prompt.
General-purpose models (e.g. granite4:micro-h) can be used as a lighter alternative but have significantly lower correctness on Qiskit tasks. When using a non-specialized model, set system_prompt = QISKIT_SYSTEM_PROMPT to improve results.
The grounding_context parameter accepts a dict[str, str] of additional context passed to the LLM alongside the prompt. Keys act as section labels and values are the content. This is useful for injecting relevant documentation snippets, RAG results, or API references at inference time.
Example — injecting migration guide excerpts:
grounding_context = {
"primitives_migration": (
"SamplerV2 replaces the legacy execute() function. "
"Use: sampler = SamplerV2(backend); job = sampler.run([circuit]); result = job.result()"
),
"transpilation": (
"Use generate_preset_pass_manager() instead of transpile(). "
"Example: pm = generate_preset_pass_manager(optimization_level=1, backend=backend); isa_circuit = pm.run(circuit)"
),
}
code, success, attempts = generate_validated_qiskit_code(
m, prompt, strategy, grounding_context=grounding_context
)Error: Connection refused
Solution: Start Ollama with ollama serve
Error: model 'hf.co/Qiskit/mistral-small-3.2-24b-qiskit-GGUF:latest' not found
Solution: Pull the model first:
ollama pull hf.co/Qiskit/mistral-small-3.2-24b-qiskit-GGUF:latestIf using a general-purpose model, it may not have enough Qiskit knowledge to pass validation consistently. Try:
- Switching to the Qiskit-specialized model (
hf.co/Qiskit/mistral-small-3.2-24b-qiskit-GGUF:latest) - Setting
system_prompt = QISKIT_SYSTEM_PROMPTto guide the model toward modern Qiskit APIs - Using simpler prompts
ModuleNotFoundError: No module named 'flake8_qiskit_migration'
Solution: Use uv run which auto-installs dependencies