You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: update qiskit_code_validation example defaults (#743)
* docs: update qiskit_code_validation example defaults
Switch default model to hf.co/Qiskit/mistral-small-3.2-24b-qiskit-GGUF
and inline QISKIT_SYSTEM_PROMPT as a documented optional tuning aid for
non-specialized models. Update README to match.
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
* docs: update qiskit_code_validation README benchmark results
Replace outdated benchmark table with completed run data and add
check() correctness finding (~32.5% on QHE).
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
* docs: add grounding_context usage example to qiskit_code_validation README
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
---------
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
-**Compatible model**: `hf.co/Qiskit/mistral-small-3.2-24b-qiskit-GGUF:latest`(recommended — domain-specialized; see [Changing the Model](#changing-the-model))
26
26
-**flake8-qiskit-migration**: Automatically installed when using `uv run`
27
27
28
28
## How It Works
@@ -32,7 +32,7 @@ Dependencies (`mellea`, `flake8-qiskit-migration`) are automatically installed.
32
32
1.**Pre-condition validation**: Validates the input prompt and any code it contains
33
33
2.**Instruction**: LLM generates code following structured requirements
34
34
3.**Post-condition validation**: Validates generated code against QKT rules (see [Qiskit Migration Guide](https://docs.quantum.ibm.com/api/migration-guides))
35
-
4.**Repair loop**: Automatically repairs code that fails validation (up to 5 attempts)
35
+
4.**Repair loop**: Automatically repairs code that fails validation (up to 10 attempts)
36
36
37
37
### Sampling Strategies
38
38
@@ -47,20 +47,20 @@ To switch strategies, edit the `use_multiturn_strategy` variable in `test_qiskit
47
47
48
48
#### Strategy Performance Comparison
49
49
50
-
Benchmarks on `mistral-small-3.2-24b-qiskit` model (pass rates measure QKT validation only, not correctness):
50
+
Benchmarks on `mistral-small-3.2-24b-qiskit` model, no system prompt:
-**QHE** (QiskitHumanEval): Code generation tasks testing general Qiskit programming
61
-
-**QKT**: Qiskit version migration tasks requiring fixes to deprecated APIs
60
+
-**QHE** (QiskitHumanEval): 151 general Qiskit code generation tasks
61
+
-**QKT**: 45 Qiskit version migration tasks requiring fixes to deprecated APIs
62
62
63
-
**Note:**These benchmarks measure whether generated code passes QKT validation rules, not whether the code correctly solves the prompt. Both aspects are important for production use.
63
+
**Note:**Pass rates measure whether generated code passes QKT validation rules, not whether the code correctly solves the prompt. On QHE, the model achieves ~32.5% correctness when running the QHE check() test suite against the generated code. Full benchmark data and analysis are available in @ajbozarth's [toolbox repo](https://github.com/ajbozarth/toolbox/tree/main/mellea/qiskit_code_validation/benchmarking).
64
64
65
65
### Code Structure
66
66
@@ -183,22 +183,17 @@ qc.measure_all()
183
183
Validation failed with 1 error(s):
184
184
QKT101: QuantumCircuit.cnot() has been removed in Qiskit 1.0; use `.cx()` instead
185
185
186
-
====== Result (83.5s) ======
186
+
====== Result (23.1s, 2 attempt(s)) ======
187
187
```python
188
-
from qiskit_aer import AerSimulator
189
-
from qiskit import QuantumCircuit
188
+
from qiskit_aer import AerSimulator, QuantumCircuit
190
189
191
190
backend = AerSimulator()
192
191
193
192
qc = QuantumCircuit(5, 5)
194
193
qc.h(0)
195
-
qc.cx(0, range(1, 5)) # Fixed: use .cx() instead of .cnot()
194
+
qc.cx(0, range(1, 5))
196
195
qc.measure_all()
197
-
198
-
job = backend.run(qc)
199
-
result = job.result()
200
196
```
201
-
I fixed the code by replacing `QuantumCircuit.cnot()` with `QuantumCircuit.cx()` as required by Qiskit 1.0. I also replaced the deprecated `BasicAer.get_backend('qasm_simulator')` with `AerSimulator()`. This code should now pass Qiskit migration validation (QKT rules).
202
197
======================
203
198
204
199
✓ Code passes Qiskit migration validation
@@ -211,13 +206,35 @@ I fixed the code by replacing `QuantumCircuit.cnot()` with `QuantumCircuit.cx()`
211
206
To try a different model, edit the `model_id` variable in the `test_qiskit_code_validation()` function:
**Note**: Smaller models (like `granite4:micro-h`) may not have enough Qiskit knowledge to pass validation consistently. The Qiskit-specific model or `granite4:small-h` work best.
212
+
The default model is a Qiskit-specialized fine-tune of Mistral Small. It requires a large initial download (~15GB) but produces reliable results without a system prompt.
213
+
214
+
General-purpose models (e.g. `granite4:micro-h`) can be used as a lighter alternative but have significantly lower correctness on Qiskit tasks. When using a non-specialized model, set `system_prompt = QISKIT_SYSTEM_PROMPT` to improve results.
215
+
216
+
## Using Grounding Context
217
+
218
+
The `grounding_context` parameter accepts a `dict[str, str]` of additional context passed to the LLM alongside the prompt. Keys act as section labels and values are the content. This is useful for injecting relevant documentation snippets, RAG results, or API references at inference time.
219
+
220
+
**Example — injecting migration guide excerpts:**
221
+
222
+
```python
223
+
grounding_context = {
224
+
"primitives_migration": (
225
+
"SamplerV2 replaces the legacy execute() function. "
226
+
"Use: sampler = SamplerV2(backend); job = sampler.run([circuit]); result = job.result()"
227
+
),
228
+
"transpilation": (
229
+
"Use generate_preset_pass_manager() instead of transpile(). "
If using smaller models (e.g., `granite4:micro-h`), they may not have enough Qiskit knowledge. Try:
241
-
-Using a larger model (`granite4:small-h` or the Qiskit-specific model)
242
-
-Reducing prompt complexity
257
+
If using a general-purpose model, it may not have enough Qiskit knowledge to pass validation consistently. Try:
258
+
-Switching to the Qiskit-specialized model (`hf.co/Qiskit/mistral-small-3.2-24b-qiskit-GGUF:latest`)
259
+
-Setting `system_prompt = QISKIT_SYSTEM_PROMPT` to guide the model toward modern Qiskit APIs
243
260
- Using simpler prompts
244
261
245
262
### Import Error: flake8-qiskit-migration
@@ -248,8 +265,3 @@ ModuleNotFoundError: No module named 'flake8_qiskit_migration'
248
265
```
249
266
**Solution**: Use `uv run` which auto-installs dependencies
250
267
251
-
## Future Work
252
-
253
-
The following enhancements are planned for future iterations:
254
-
255
-
1.**Enable Smaller Models** - Add system prompt or grounding context with Qiskit API documentation to help smaller models perform accurate migrations. This would allow removing the `pytest.mark.skip` marker and make the example run in standard test suites.
0 commit comments