Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,9 @@ This example demonstrates using Mellea's Instruct-Validate-Repair (IVR) pattern
## What This Example Does

Takes a prompt containing deprecated Qiskit code and:
1. Detects QKT violations in the input code
2. Passes those violations to the LLM as context
3. Generates corrected code that passes QKT validation
4. Automatically repairs the code if validation fails (up to 10 attempts)
1. Generates corrected code using the LLM
2. Validates the output against QKT rules
3. Automatically repairs the code if validation fails (up to 10 attempts)

## Quick Start

Expand All @@ -29,10 +28,9 @@ Dependencies (`mellea`, `flake8-qiskit-migration`) are automatically installed.

### The IVR Pipeline

1. **Pre-condition validation**: Validates the input prompt and any code it contains
2. **Instruction**: LLM generates code following structured requirements
3. **Post-condition validation**: Validates generated code against QKT rules (see [Qiskit Migration Guide](https://docs.quantum.ibm.com/api/migration-guides))
4. **Repair loop**: Automatically repairs code that fails validation (up to 10 attempts)
1. **Instruction**: LLM generates code following structured requirements
2. **Post-condition validation**: Validates generated code against QKT rules (see [Qiskit Migration Guide](https://docs.quantum.ibm.com/api/migration-guides))
3. **Repair loop**: Automatically repairs code that fails validation (up to 10 attempts)

### Sampling Strategies

Expand All @@ -47,20 +45,20 @@ To switch strategies, edit the `use_multiturn_strategy` variable in `test_qiskit

#### Strategy Performance Comparison

Benchmarks on `mistral-small-3.2-24b-qiskit` model, no system prompt:
Benchmarks on `mistral-small-3.2-24b-qiskit` model:

| Dataset | Strategy | First Pass (QKT) | Post-Repair (QKT) |
|---------|----------|------------|-------------|
| **QHE** | RepairTemplate | 98.0% | **100%** |
| | MultiTurn | **100%** | **100%** |
| **QKT** | RepairTemplate | 98.0% | **100%** |
| | MultiTurn | 93.3% | **100%** |
| **QHE** | RepairTemplate | 97.4% | **100%** |
| | MultiTurn | 95.4% | **100%** |
| **QKT** | RepairTemplate | 88.9% | **100%** |
| | MultiTurn | **97.8%** | **100%** |

**Datasets:**
- **QHE** (QiskitHumanEval): 151 general Qiskit code generation tasks
- **QKT**: 45 Qiskit version migration tasks requiring fixes to deprecated APIs

**Note:** Pass rates measure whether generated code passes QKT validation rules, not whether the code correctly solves the prompt. On QHE, the model achieves ~32.5% correctness when running the QHE check() test suite against the generated code. Full benchmark data and analysis are available in @ajbozarth's [toolbox repo](https://github.com/ajbozarth/toolbox/tree/main/mellea/qiskit_code_validation/benchmarking).
**Note:** Pass rates measure whether generated code passes QKT validation rules, not whether the code correctly solves the prompt. On QHE, the model achieves ~27.8% correctness when running the QHE check() test suite against the generated code. Full benchmark data and analysis are available in @ajbozarth's [toolbox repo](https://github.com/ajbozarth/toolbox/tree/main/mellea/qiskit_code_validation/benchmarking).

### Code Structure

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -93,18 +93,6 @@ def generate_validated_qiskit_code(
Returns:
Tuple of (generated_code, success, attempts_used)
"""
# Pre-validate input code if present — include violations as context rather than failing
is_valid, error_msg = validate_input_code(prompt)
if not is_valid:
print(
f"Input code has QKT violations, including as context for LLM: {error_msg}"
)
prompt = (
f"{prompt}\n\n"
f"Note: the code above has the following Qiskit migration issues that must be fixed:\n"
f"{error_msg}"
)

# Only pass optional kwargs if they have values — avoids passing None to m.instruct()
extra: dict = {}
if grounding_context:
Expand Down
Loading