Skip to content

Commit d9892f9

Browse files
authored
docs: multi-turn strategy to qiskit ivr example (#717)
Signed-off-by: va <va@us.ibm.com>
1 parent 2f2c20e commit d9892f9

2 files changed

Lines changed: 56 additions & 11 deletions

File tree

docs/examples/instruct_validate_repair/qiskit_code_validation/README.md

Lines changed: 30 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Qiskit Code Validation with Instruct-Validate-Repair
22

3-
This example demonstrates using Mellea's Instruct-Validate-Repair (IVR) pattern to generate Qiskit quantum computing code that automatically passes flake8-qiskit-migration validation rules (QKT rules).
3+
This example demonstrates using Mellea's Instruct-Validate-Repair (IVR) pattern to generate Qiskit quantum computing code that automatically passes `flake8-qiskit-migration` validation rules (QKT rules).
44

55
## What This Example Does
66

@@ -34,6 +34,34 @@ Dependencies (`mellea`, `flake8-qiskit-migration`) are automatically installed.
3434
3. **Post-condition validation**: Validates generated code against QKT rules (see [Qiskit Migration Guide](https://docs.quantum.ibm.com/api/migration-guides))
3535
4. **Repair loop**: Automatically repairs code that fails validation (up to 5 attempts)
3636

37+
### Sampling Strategies
38+
39+
The example supports two repair strategies (see [Sampling Strategies](../README.md#sampling-strategies)):
40+
41+
- **RepairTemplateStrategy** (default): Adds validation failure reasons directly to the instruction and retries generation
42+
- **MultiTurnStrategy**: Builds conversation history by adding validation failures as new user messages
43+
44+
To switch strategies, edit the `use_multiturn_strategy` variable in `test_qiskit_code_validation()`
45+
46+
**Note**: `MultiTurnStrategy` requires `ChatContext()` while `RepairTemplateStrategy` works with `SimpleContext()`. The example automatically selects the appropriate context based on your strategy choice.
47+
48+
#### Strategy Performance Comparison
49+
50+
Benchmarks on `mistral-small-3.2-24b-qiskit` model (pass rates measure QKT validation only, not correctness):
51+
52+
| Dataset | Strategy | First Pass | Post-Repair |
53+
|---------|----------|------------|-------------|
54+
| **QHE** | RepairTemplate | 78.2% | **99.3%** |
55+
| | MultiTurn | 77.5% | 96.7% |
56+
| **QKT** | RepairTemplate | 54.1% | **83.8%** |
57+
| | MultiTurn | 37.8% | 70.3% |
58+
59+
**Datasets:**
60+
- **QHE** (QiskitHumanEval): Code generation tasks testing general Qiskit programming
61+
- **QKT**: Qiskit version migration tasks requiring fixes to deprecated APIs
62+
63+
**Note:** These benchmarks measure whether generated code passes QKT validation rules, not whether the code correctly solves the prompt. Both aspects are important for production use.
64+
3765
### Code Structure
3866

3967
```
@@ -224,6 +252,4 @@ ModuleNotFoundError: No module named 'flake8_qiskit_migration'
224252

225253
The following enhancements are planned for future iterations:
226254

227-
1. **MultiTurnStrategy Integration** - Try using `MultiTurnStrategy` (see [Sampling Strategies](../README.md#sampling-strategies)) which builds conversation history by adding validation failures as new user messages, to see if this approach improves results over the current `RepairTemplateStrategy` which adds failures directly to the instruction.
228-
229-
2. **Enable Smaller Models** - Add system prompt or grounding context with Qiskit API documentation to help smaller models perform accurate migrations. This would allow removing the `pytest.mark.skip` marker and make the example run in standard test suites.
255+
1. **Enable Smaller Models** - Add system prompt or grounding context with Qiskit API documentation to help smaller models perform accurate migrations. This would allow removing the `pytest.mark.skip` marker and make the example run in standard test suites.

docs/examples/instruct_validate_repair/qiskit_code_validation/qiskit_code_validation.py

Lines changed: 26 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -27,31 +27,33 @@
2727
"""
2828

2929
import time
30-
from typing import Literal
3130

3231
from validation_helpers import validate_input_code, validate_qiskit_migration
3332

3433
import mellea
3534
from mellea.backends import ModelOption
35+
from mellea.stdlib.context import ChatContext, SimpleContext
3636
from mellea.stdlib.requirements import req, simple_validate
37-
from mellea.stdlib.sampling import RepairTemplateStrategy
37+
from mellea.stdlib.sampling import MultiTurnStrategy, RepairTemplateStrategy
3838

3939

4040
def generate_validated_qiskit_code(
41-
m: mellea.MelleaSession, prompt: str, max_repair_attempts: int = 5
41+
m: mellea.MelleaSession,
42+
prompt: str,
43+
strategy: MultiTurnStrategy | RepairTemplateStrategy,
4244
) -> str:
4345
"""Generate Qiskit code that passes Qiskit migration validation.
4446
4547
This function implements the Instruct-Validate-Repair pattern:
4648
1. Pre-validates input code
4749
2. Instructs the LLM with structured requirements
4850
3. Validates output against QKT rules
49-
4. Repairs code if validation fails (up to max_repair_attempts times)
51+
4. Repairs code if validation fails (up to the strategy's loop_budget times)
5052
5153
Args:
5254
m: Mellea session
5355
prompt: User prompt for code generation
54-
max_repair_attempts: Maximum number of repair attempts for validation failures
56+
strategy: Sampling strategy for handling validation failures
5557
5658
Returns:
5759
Generated code that passes validation
@@ -86,7 +88,7 @@ def generate_validated_qiskit_code(
8688
validation_fn=simple_validate(validate_qiskit_migration),
8789
)
8890
],
89-
strategy=RepairTemplateStrategy(loop_budget=max_repair_attempts),
91+
strategy=strategy,
9092
return_sampling_results=True,
9193
)
9294

@@ -115,6 +117,11 @@ def test_qiskit_code_validation() -> None:
115117
that uses old APIs (BasicAer, execute) and having the LLM fix it to use
116118
modern Qiskit APIs that pass QKT validation rules.
117119
"""
120+
# Strategy selection - True for MultiTurnStrategy, False for RepairTemplateStrategy
121+
# MultiTurnStrategy: Adds validation failure reasons as a new user message in the conversation
122+
# RepairTemplateStrategy: Adds validation failure reasons to the instruction and retries
123+
use_multiturn_strategy = False
124+
118125
# Model selection - uncomment one to try different models
119126
# model_id = "granite4:micro-h"
120127
# model_id = "granite4:small-h"
@@ -137,13 +144,25 @@ def test_qiskit_code_validation() -> None:
137144
print(prompt)
138145
print("======================\n")
139146

147+
# Initialize the required context
148+
ctx = ChatContext() if use_multiturn_strategy else SimpleContext()
149+
140150
with mellea.start_session(
141151
model_id=model_id,
142152
backend_name="ollama",
153+
ctx=ctx,
143154
model_options={ModelOption.TEMPERATURE: 0.8, ModelOption.MAX_NEW_TOKENS: 2048},
144155
) as m:
145156
start_time = time.time()
146-
code = generate_validated_qiskit_code(m, prompt, max_repair_attempts=5)
157+
158+
if use_multiturn_strategy:
159+
strategy: MultiTurnStrategy | RepairTemplateStrategy = MultiTurnStrategy(
160+
loop_budget=5
161+
)
162+
else:
163+
strategy = RepairTemplateStrategy(loop_budget=5)
164+
165+
code = generate_validated_qiskit_code(m, prompt, strategy)
147166
elapsed = time.time() - start_time
148167

149168
print(f"\n====== Result ({elapsed:.1f}s) ======")

0 commit comments

Comments
 (0)