Skip to content

Commit fd51c2b

Browse files
authored
docs: update qiskit_code_validation example defaults (#743)
* docs: update qiskit_code_validation example defaults Switch default model to hf.co/Qiskit/mistral-small-3.2-24b-qiskit-GGUF and inline QISKIT_SYSTEM_PROMPT as a documented optional tuning aid for non-specialized models. Update README to match. Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com> * docs: update qiskit_code_validation README benchmark results Replace outdated benchmark table with completed run data and add check() correctness finding (~32.5% on QHE). Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com> * docs: add grounding_context usage example to qiskit_code_validation README Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com> --------- Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
1 parent 39bf127 commit fd51c2b

2 files changed

Lines changed: 132 additions & 74 deletions

File tree

docs/examples/instruct_validate_repair/qiskit_code_validation/README.md

Lines changed: 44 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Takes a prompt containing deprecated Qiskit code and:
88
1. Detects QKT violations in the input code
99
2. Passes those violations to the LLM as context
1010
3. Generates corrected code that passes QKT validation
11-
4. Automatically repairs the code if validation fails (up to 5 attempts)
11+
4. Automatically repairs the code if validation fails (up to 10 attempts)
1212

1313
## Quick Start
1414

@@ -22,7 +22,7 @@ Dependencies (`mellea`, `flake8-qiskit-migration`) are automatically installed.
2222
## Requirements
2323

2424
- **Ollama backend** running locally (`ollama serve`)
25-
- **Compatible model**: e.g., `hf.co/Qiskit/mistral-small-3.2-24b-qiskit-GGUF:latest` or `granite4:small-h`
25+
- **Compatible model**: `hf.co/Qiskit/mistral-small-3.2-24b-qiskit-GGUF:latest` (recommended — domain-specialized; see [Changing the Model](#changing-the-model))
2626
- **flake8-qiskit-migration**: Automatically installed when using `uv run`
2727

2828
## How It Works
@@ -32,7 +32,7 @@ Dependencies (`mellea`, `flake8-qiskit-migration`) are automatically installed.
3232
1. **Pre-condition validation**: Validates the input prompt and any code it contains
3333
2. **Instruction**: LLM generates code following structured requirements
3434
3. **Post-condition validation**: Validates generated code against QKT rules (see [Qiskit Migration Guide](https://docs.quantum.ibm.com/api/migration-guides))
35-
4. **Repair loop**: Automatically repairs code that fails validation (up to 5 attempts)
35+
4. **Repair loop**: Automatically repairs code that fails validation (up to 10 attempts)
3636

3737
### Sampling Strategies
3838

@@ -47,20 +47,20 @@ To switch strategies, edit the `use_multiturn_strategy` variable in `test_qiskit
4747

4848
#### Strategy Performance Comparison
4949

50-
Benchmarks on `mistral-small-3.2-24b-qiskit` model (pass rates measure QKT validation only, not correctness):
50+
Benchmarks on `mistral-small-3.2-24b-qiskit` model, no system prompt:
5151

52-
| Dataset | Strategy | First Pass | Post-Repair |
52+
| Dataset | Strategy | First Pass (QKT) | Post-Repair (QKT) |
5353
|---------|----------|------------|-------------|
54-
| **QHE** | RepairTemplate | 78.2% | **99.3%** |
55-
| | MultiTurn | 77.5% | 96.7% |
56-
| **QKT** | RepairTemplate | 54.1% | **83.8%** |
57-
| | MultiTurn | 37.8% | 70.3% |
54+
| **QHE** | RepairTemplate | 98.0% | **100%** |
55+
| | MultiTurn | **100%** | **100%** |
56+
| **QKT** | RepairTemplate | 98.0% | **100%** |
57+
| | MultiTurn | 93.3% | **100%** |
5858

5959
**Datasets:**
60-
- **QHE** (QiskitHumanEval): Code generation tasks testing general Qiskit programming
61-
- **QKT**: Qiskit version migration tasks requiring fixes to deprecated APIs
60+
- **QHE** (QiskitHumanEval): 151 general Qiskit code generation tasks
61+
- **QKT**: 45 Qiskit version migration tasks requiring fixes to deprecated APIs
6262

63-
**Note:** These benchmarks measure whether generated code passes QKT validation rules, not whether the code correctly solves the prompt. Both aspects are important for production use.
63+
**Note:** Pass rates measure whether generated code passes QKT validation rules, not whether the code correctly solves the prompt. On QHE, the model achieves ~32.5% correctness when running the QHE check() test suite against the generated code. Full benchmark data and analysis are available in @ajbozarth's [toolbox repo](https://github.com/ajbozarth/toolbox/tree/main/mellea/qiskit_code_validation/benchmarking).
6464

6565
### Code Structure
6666

@@ -183,22 +183,17 @@ qc.measure_all()
183183
Validation failed with 1 error(s):
184184
QKT101: QuantumCircuit.cnot() has been removed in Qiskit 1.0; use `.cx()` instead
185185
186-
====== Result (83.5s) ======
186+
====== Result (23.1s, 2 attempt(s)) ======
187187
```python
188-
from qiskit_aer import AerSimulator
189-
from qiskit import QuantumCircuit
188+
from qiskit_aer import AerSimulator, QuantumCircuit
190189
191190
backend = AerSimulator()
192191
193192
qc = QuantumCircuit(5, 5)
194193
qc.h(0)
195-
qc.cx(0, range(1, 5)) # Fixed: use .cx() instead of .cnot()
194+
qc.cx(0, range(1, 5))
196195
qc.measure_all()
197-
198-
job = backend.run(qc)
199-
result = job.result()
200196
```
201-
I fixed the code by replacing `QuantumCircuit.cnot()` with `QuantumCircuit.cx()` as required by Qiskit 1.0. I also replaced the deprecated `BasicAer.get_backend('qasm_simulator')` with `AerSimulator()`. This code should now pass Qiskit migration validation (QKT rules).
202197
======================
203198
204199
✓ Code passes Qiskit migration validation
@@ -211,13 +206,35 @@ I fixed the code by replacing `QuantumCircuit.cnot()` with `QuantumCircuit.cx()`
211206
To try a different model, edit the `model_id` variable in the `test_qiskit_code_validation()` function:
212207

213208
```python
214-
# Uncomment one to try different models
215-
# model_id = "granite4:micro-h"
216-
# model_id = "granite4:small-h"
217209
model_id = "hf.co/Qiskit/mistral-small-3.2-24b-qiskit-GGUF:latest"
218210
```
219211

220-
**Note**: Smaller models (like `granite4:micro-h`) may not have enough Qiskit knowledge to pass validation consistently. The Qiskit-specific model or `granite4:small-h` work best.
212+
The default model is a Qiskit-specialized fine-tune of Mistral Small. It requires a large initial download (~15GB) but produces reliable results without a system prompt.
213+
214+
General-purpose models (e.g. `granite4:micro-h`) can be used as a lighter alternative but have significantly lower correctness on Qiskit tasks. When using a non-specialized model, set `system_prompt = QISKIT_SYSTEM_PROMPT` to improve results.
215+
216+
## Using Grounding Context
217+
218+
The `grounding_context` parameter accepts a `dict[str, str]` of additional context passed to the LLM alongside the prompt. Keys act as section labels and values are the content. This is useful for injecting relevant documentation snippets, RAG results, or API references at inference time.
219+
220+
**Example — injecting migration guide excerpts:**
221+
222+
```python
223+
grounding_context = {
224+
"primitives_migration": (
225+
"SamplerV2 replaces the legacy execute() function. "
226+
"Use: sampler = SamplerV2(backend); job = sampler.run([circuit]); result = job.result()"
227+
),
228+
"transpilation": (
229+
"Use generate_preset_pass_manager() instead of transpile(). "
230+
"Example: pm = generate_preset_pass_manager(optimization_level=1, backend=backend); isa_circuit = pm.run(circuit)"
231+
),
232+
}
233+
234+
code, success, attempts = generate_validated_qiskit_code(
235+
m, prompt, strategy, grounding_context=grounding_context
236+
)
237+
```
221238

222239
## Troubleshooting
223240

@@ -237,9 +254,9 @@ ollama pull hf.co/Qiskit/mistral-small-3.2-24b-qiskit-GGUF:latest
237254
```
238255

239256
### Validation Always Fails
240-
If using smaller models (e.g., `granite4:micro-h`), they may not have enough Qiskit knowledge. Try:
241-
- Using a larger model (`granite4:small-h` or the Qiskit-specific model)
242-
- Reducing prompt complexity
257+
If using a general-purpose model, it may not have enough Qiskit knowledge to pass validation consistently. Try:
258+
- Switching to the Qiskit-specialized model (`hf.co/Qiskit/mistral-small-3.2-24b-qiskit-GGUF:latest`)
259+
- Setting `system_prompt = QISKIT_SYSTEM_PROMPT` to guide the model toward modern Qiskit APIs
243260
- Using simpler prompts
244261

245262
### Import Error: flake8-qiskit-migration
@@ -248,8 +265,3 @@ ModuleNotFoundError: No module named 'flake8_qiskit_migration'
248265
```
249266
**Solution**: Use `uv run` which auto-installs dependencies
250267

251-
## Future Work
252-
253-
The following enhancements are planned for future iterations:
254-
255-
1. **Enable Smaller Models** - Add system prompt or grounding context with Qiskit API documentation to help smaller models perform accurate migrations. This would allow removing the `pytest.mark.skip` marker and make the example run in standard test suites.

docs/examples/instruct_validate_repair/qiskit_code_validation/qiskit_code_validation.py

Lines changed: 88 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
1. **Pre-condition validation**: Validate prompt content and any input code
1616
2. **Instruction**: LLM generates code following structured requirements
1717
3. **Post-condition validation**: Validate generated code against QKT rules
18-
4. **Repair loop**: Automatically repair code that fails validation (up to 5 attempts)
18+
4. **Repair loop**: Automatically repair code that fails validation (up to 10 attempts)
1919
2020
Requirements:
2121
- flake8-qiskit-migration: Installed automatically when run via `uv run`
@@ -30,18 +30,51 @@
3030

3131
from validation_helpers import validate_input_code, validate_qiskit_migration
3232

33-
import mellea
33+
from mellea import MelleaSession, start_session
3434
from mellea.backends import ModelOption
3535
from mellea.stdlib.context import ChatContext, SimpleContext
3636
from mellea.stdlib.requirements import req, simple_validate
3737
from mellea.stdlib.sampling import MultiTurnStrategy, RepairTemplateStrategy
3838

39+
# Optional system prompt for models not specialized for Qiskit.
40+
# Set system_prompt = QISKIT_SYSTEM_PROMPT in test_qiskit_code_validation() to enable.
41+
QISKIT_SYSTEM_PROMPT = """\
42+
You are the Qiskit code assistant, a Qiskit coding expert developed by IBM Quantum. \
43+
Your mission is to help users write good Qiskit code and advise them on best practices \
44+
for quantum computing using Qiskit and IBM Quantum and its hardware and services. \
45+
You stick to the user request, without adding non-requested information or yapping.
46+
47+
When doing code generation, you always generate Python and Qiskit code. If the input \
48+
you received only contains code, your task is to complete the code without adding extra \
49+
explanations or text.
50+
51+
The current version of `qiskit` is `2.1`. Ensure your code is valid Python and Qiskit. \
52+
The official documentation is available at https://quantum.cloud.ibm.com/docs/en. \
53+
Avoid `https://qiskit.org` links as they are not active.
54+
55+
Code standards — never use deprecated methods:
56+
- Transpilation: use `generate_preset_pass_manager()` instead of `transpile()`
57+
- Execution: use `SamplerV2` or `EstimatorV2` primitives instead of `execute()`
58+
- Provider: `qiskit-ibmq-provider` / `IBMQ` was deprecated in 2023; use `qiskit-ibm-runtime` instead
59+
- Simulator: import as `from qiskit_aer import AerSimulator`, not `from qiskit.providers.aer import AerSimulator`
60+
- Random circuits: import as `from qiskit.circuit.random import random_circuit`
61+
62+
When no backend is specified, default to `ibm_fez`, `ibm_marrakesh`, `ibm_pittsburg`, or `ibm_kingston`. \
63+
Avoid simulators unless explicitly requested.
64+
65+
The four steps of a Qiskit pattern: (1) Map problem to quantum circuits and operators. \
66+
(2) Optimize for target hardware. (3) Execute on target hardware. (4) Post-process results.
67+
"""
68+
3969

4070
def generate_validated_qiskit_code(
41-
m: mellea.MelleaSession,
71+
m: MelleaSession,
4272
prompt: str,
4373
strategy: MultiTurnStrategy | RepairTemplateStrategy,
44-
) -> str:
74+
*,
75+
system_prompt: str | None = None,
76+
grounding_context: dict[str, str] | None = None,
77+
) -> tuple[str, bool, int]:
4578
"""Generate Qiskit code that passes Qiskit migration validation.
4679
4780
This function implements the Instruct-Validate-Repair pattern:
@@ -54,34 +87,34 @@ def generate_validated_qiskit_code(
5487
m: Mellea session
5588
prompt: User prompt for code generation
5689
strategy: Sampling strategy for handling validation failures
90+
system_prompt: Optional system prompt passed via ModelOption.SYSTEM_PROMPT
91+
grounding_context: Optional grounding context dict passed to m.instruct()
5792
5893
Returns:
59-
Generated code that passes validation
60-
61-
Raises:
62-
ValueError: If prompt validation fails
94+
Tuple of (generated_code, success, attempts_used)
6395
"""
6496
# Pre-validate input code if present — include violations as context rather than failing
6597
is_valid, error_msg = validate_input_code(prompt)
66-
input_code_errors = None
6798
if not is_valid:
6899
print(
69100
f"Input code has QKT violations, including as context for LLM: {error_msg}"
70101
)
71-
input_code_errors = error_msg
72-
73-
# Build the instruction prompt, optionally augmented with input code violations
74-
instruct_prompt = prompt
75-
if input_code_errors is not None:
76-
instruct_prompt = (
102+
prompt = (
77103
f"{prompt}\n\n"
78104
f"Note: the code above has the following Qiskit migration issues that must be fixed:\n"
79-
f"{input_code_errors}"
105+
f"{error_msg}"
80106
)
81107

108+
# Only pass optional kwargs if they have values — avoids passing None to m.instruct()
109+
extra: dict = {}
110+
if grounding_context:
111+
extra["grounding_context"] = grounding_context
112+
if system_prompt:
113+
extra["model_options"] = {ModelOption.SYSTEM_PROMPT: system_prompt}
114+
82115
# Generate code with output validation only
83116
code_candidate = m.instruct(
84-
instruct_prompt,
117+
prompt,
85118
requirements=[
86119
req(
87120
"Code must pass Qiskit migration validation (QKT rules)",
@@ -90,10 +123,17 @@ def generate_validated_qiskit_code(
90123
],
91124
strategy=strategy,
92125
return_sampling_results=True,
126+
**extra,
127+
)
128+
129+
attempts = (
130+
len(code_candidate.sample_generations)
131+
if code_candidate.sample_generations
132+
else 1
93133
)
94134

95135
if code_candidate.success:
96-
return str(code_candidate.result)
136+
return str(code_candidate.result), True, attempts
97137
else:
98138
print("Code generation did not fully succeed, returning best attempt")
99139
# Log detailed validation failure reasons
@@ -105,9 +145,13 @@ def generate_validated_qiskit_code(
105145
)
106146
# Return best attempt even if validation failed
107147
if code_candidate.sample_generations:
108-
return str(code_candidate.sample_generations[0].value or "")
148+
return (
149+
str(code_candidate.sample_generations[-1].value or ""),
150+
False,
151+
attempts,
152+
)
109153
print("No code generations available")
110-
return ""
154+
return "", False, attempts
111155

112156

113157
def test_qiskit_code_validation() -> None:
@@ -117,16 +161,14 @@ def test_qiskit_code_validation() -> None:
117161
that uses old APIs (BasicAer, execute) and having the LLM fix it to use
118162
modern Qiskit APIs that pass QKT validation rules.
119163
"""
120-
# Strategy selection - True for MultiTurnStrategy, False for RepairTemplateStrategy
121-
# MultiTurnStrategy: Adds validation failure reasons as a new user message in the conversation
122-
# RepairTemplateStrategy: Adds validation failure reasons to the instruction and retries
123-
use_multiturn_strategy = False
124-
125-
# Model selection - uncomment one to try different models
126-
# model_id = "granite4:micro-h"
127-
# model_id = "granite4:small-h"
164+
# Model — requires Ollama with the model pulled locally
165+
# See README.md for model options and tradeoffs
128166
model_id = "hf.co/Qiskit/mistral-small-3.2-24b-qiskit-GGUF:latest"
129167

168+
# System prompt — None uses the model's built-in Qiskit knowledge (default)
169+
# Set to QISKIT_SYSTEM_PROMPT when using a model not specialized for Qiskit
170+
system_prompt = None
171+
130172
# Prompt - replace with your own or see README.md for examples
131173
prompt = """from qiskit import BasicAer, QuantumCircuit, execute
132174
@@ -144,37 +186,41 @@ def test_qiskit_code_validation() -> None:
144186
print(prompt)
145187
print("======================\n")
146188

189+
# Strategy selection - True for MultiTurnStrategy, False for RepairTemplateStrategy
190+
# MultiTurnStrategy: Adds validation failure reasons as a new user message in the conversation
191+
# RepairTemplateStrategy: Adds validation failure reasons to the instruction and retries
192+
use_multiturn_strategy = False
193+
147194
# Initialize the required context
148195
ctx = ChatContext() if use_multiturn_strategy else SimpleContext()
196+
if use_multiturn_strategy:
197+
strategy: MultiTurnStrategy | RepairTemplateStrategy = MultiTurnStrategy(
198+
loop_budget=10
199+
)
200+
else:
201+
strategy = RepairTemplateStrategy(loop_budget=10)
149202

150-
with mellea.start_session(
203+
with start_session(
151204
model_id=model_id,
152205
backend_name="ollama",
153206
ctx=ctx,
154207
model_options={ModelOption.TEMPERATURE: 0.8, ModelOption.MAX_NEW_TOKENS: 2048},
155208
) as m:
156209
start_time = time.time()
157210

158-
if use_multiturn_strategy:
159-
strategy: MultiTurnStrategy | RepairTemplateStrategy = MultiTurnStrategy(
160-
loop_budget=5
161-
)
162-
else:
163-
strategy = RepairTemplateStrategy(loop_budget=5)
164-
165-
code = generate_validated_qiskit_code(m, prompt, strategy)
211+
code, success, attempts = generate_validated_qiskit_code(
212+
m, prompt, strategy, system_prompt=system_prompt
213+
)
166214
elapsed = time.time() - start_time
167215

168-
print(f"\n====== Result ({elapsed:.1f}s) ======")
216+
print(f"\n====== Result ({elapsed:.1f}s, {attempts} attempt(s)) ======")
169217
print(code)
170218
print("======================\n")
171219

172-
# Validate the generated code
173-
is_valid, error_msg = validate_qiskit_migration(code)
174-
175-
if is_valid:
220+
if success:
176221
print("✓ Code passes Qiskit migration validation")
177222
else:
223+
_, error_msg = validate_qiskit_migration(code)
178224
print("✗ Validation errors:")
179225
print(error_msg)
180226

0 commit comments

Comments
 (0)