Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Qiskit Code Validation with Instruct-Validate-Repair

This example demonstrates using Mellea's Instruct-Validate-Repair (IVR) pattern to generate Qiskit quantum computing code that automatically passes flake8-qiskit-migration validation rules (QKT rules).
This example demonstrates using Mellea's Instruct-Validate-Repair (IVR) pattern to generate Qiskit quantum computing code that automatically passes `flake8-qiskit-migration` validation rules (QKT rules).

## What This Example Does

Expand Down Expand Up @@ -34,6 +34,34 @@ Dependencies (`mellea`, `flake8-qiskit-migration`) are automatically installed.
3. **Post-condition validation**: Validates generated code against QKT rules (see [Qiskit Migration Guide](https://docs.quantum.ibm.com/api/migration-guides))
4. **Repair loop**: Automatically repairs code that fails validation (up to 5 attempts)

### Sampling Strategies

The example supports two repair strategies (see [Sampling Strategies](../README.md#sampling-strategies)):

- **RepairTemplateStrategy** (default): Adds validation failure reasons directly to the instruction and retries generation
- **MultiTurnStrategy**: Builds conversation history by adding validation failures as new user messages

To switch strategies, edit the `use_multiturn_strategy` variable in `test_qiskit_code_validation()`

**Note**: `MultiTurnStrategy` requires `ChatContext()` while `RepairTemplateStrategy` works with `SimpleContext()`. The example automatically selects the appropriate context based on your strategy choice.

#### Strategy Performance Comparison

Benchmarks on `mistral-small-3.2-24b-qiskit` model (pass rates measure QKT validation only, not correctness):

| Dataset | Strategy | First Pass | Post-Repair |
|---------|----------|------------|-------------|
| **QHE** | RepairTemplate | 78.2% | **99.3%** |
| | MultiTurn | 77.5% | 96.7% |
| **QKT** | RepairTemplate | 54.1% | **83.8%** |
| | MultiTurn | 37.8% | 70.3% |

**Datasets:**
- **QHE** (QiskitHumanEval): Code generation tasks testing general Qiskit programming
- **QKT**: Qiskit version migration tasks requiring fixes to deprecated APIs

**Note:** These benchmarks measure whether generated code passes QKT validation rules, not whether the code correctly solves the prompt. Both aspects are important for production use.

### Code Structure

```
Expand Down Expand Up @@ -224,6 +252,4 @@ ModuleNotFoundError: No module named 'flake8_qiskit_migration'

The following enhancements are planned for future iterations:

1. **MultiTurnStrategy Integration** - Try using `MultiTurnStrategy` (see [Sampling Strategies](../README.md#sampling-strategies)) which builds conversation history by adding validation failures as new user messages, to see if this approach improves results over the current `RepairTemplateStrategy` which adds failures directly to the instruction.

2. **Enable Smaller Models** - Add system prompt or grounding context with Qiskit API documentation to help smaller models perform accurate migrations. This would allow removing the `pytest.mark.skip` marker and make the example run in standard test suites.
1. **Enable Smaller Models** - Add system prompt or grounding context with Qiskit API documentation to help smaller models perform accurate migrations. This would allow removing the `pytest.mark.skip` marker and make the example run in standard test suites.
Original file line number Diff line number Diff line change
Expand Up @@ -27,31 +27,33 @@
"""

import time
from typing import Literal

from validation_helpers import validate_input_code, validate_qiskit_migration

import mellea
from mellea.backends import ModelOption
from mellea.stdlib.context import ChatContext, SimpleContext
from mellea.stdlib.requirements import req, simple_validate
from mellea.stdlib.sampling import RepairTemplateStrategy
from mellea.stdlib.sampling import MultiTurnStrategy, RepairTemplateStrategy


def generate_validated_qiskit_code(
m: mellea.MelleaSession, prompt: str, max_repair_attempts: int = 5
m: mellea.MelleaSession,
prompt: str,
strategy: MultiTurnStrategy | RepairTemplateStrategy,
) -> str:
"""Generate Qiskit code that passes Qiskit migration validation.

This function implements the Instruct-Validate-Repair pattern:
1. Pre-validates input code
2. Instructs the LLM with structured requirements
3. Validates output against QKT rules
4. Repairs code if validation fails (up to max_repair_attempts times)
4. Repairs code if validation fails (up to the strategy's loop_budget times)

Args:
m: Mellea session
prompt: User prompt for code generation
max_repair_attempts: Maximum number of repair attempts for validation failures
strategy: Sampling strategy for handling validation failures

Returns:
Generated code that passes validation
Expand Down Expand Up @@ -86,7 +88,7 @@ def generate_validated_qiskit_code(
validation_fn=simple_validate(validate_qiskit_migration),
)
],
strategy=RepairTemplateStrategy(loop_budget=max_repair_attempts),
strategy=strategy,
return_sampling_results=True,
)

Expand Down Expand Up @@ -115,6 +117,11 @@ def test_qiskit_code_validation() -> None:
that uses old APIs (BasicAer, execute) and having the LLM fix it to use
modern Qiskit APIs that pass QKT validation rules.
"""
# Strategy selection - True for MultiTurnStrategy, False for RepairTemplateStrategy
# MultiTurnStrategy: Adds validation failure reasons as a new user message in the conversation
# RepairTemplateStrategy: Adds validation failure reasons to the instruction and retries
use_multiturn_strategy = False

# Model selection - uncomment one to try different models
# model_id = "granite4:micro-h"
# model_id = "granite4:small-h"
Expand All @@ -137,13 +144,25 @@ def test_qiskit_code_validation() -> None:
print(prompt)
print("======================\n")

# Initialize the required context
ctx = ChatContext() if use_multiturn_strategy else SimpleContext()

with mellea.start_session(
model_id=model_id,
backend_name="ollama",
ctx=ctx,
model_options={ModelOption.TEMPERATURE: 0.8, ModelOption.MAX_NEW_TOKENS: 2048},
) as m:
start_time = time.time()
code = generate_validated_qiskit_code(m, prompt, max_repair_attempts=5)

if use_multiturn_strategy:
strategy: MultiTurnStrategy | RepairTemplateStrategy = MultiTurnStrategy(
loop_budget=5
)
else:
strategy = RepairTemplateStrategy(loop_budget=5)

code = generate_validated_qiskit_code(m, prompt, strategy)
elapsed = time.time() - start_time

print(f"\n====== Result ({elapsed:.1f}s) ======")
Expand Down
Loading