Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -65,4 +65,5 @@ target/
pyosmo/config.py.bak
.pytest_cache

.hypothesis
.hypothesis
models
474 changes: 474 additions & 0 deletions doc/prd_tla_plus_verification.md

Large diffs are not rendered by default.

262 changes: 262 additions & 0 deletions doc/prompt_tla_verification_agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,262 @@
# Agent Prompt: Implement TLA+ Formal Verification for PyOsmo

## Context

You are working on the PyOsmo project (`/Users/oopee/git/pyosmo`), a Model-Based Testing framework for Python. Your task is to implement formal verification of the PyOsmo engine using TLA+ specifications, following the PRD at `doc/prd_tla_plus_verification.md` and the technical investigation at `doc/tla_plus_investigation.md`.

## Your Mission

Implement the TLA+ formal verification work in phases. Complete each phase fully before moving to the next. After each phase, run verification to confirm correctness before proceeding.

---

## Phase 0: Fix Discovered Bugs

Before writing specifications, fix the bugs discovered during investigation. These are documented in Section 7 of the PRD.

1. **Fix `StepCoverage` range validation** (`pyosmo/end_conditions/step_coverage.py:12`):
- Current: `if coverage_percent in range(1, 100)` raises exception (inverted logic)
- Fix: Should reject values NOT in valid range. Valid range is 1-100 inclusive.

2. **Fix `StepCoverage.end_suite()`** (`pyosmo/end_conditions/step_coverage.py:25-32`):
- Current: Checks `current_test_case` coverage (per-test)
- Fix: Should check cumulative coverage across ALL test cases in the suite

3. **Fix `OsmoTestCaseRecord.is_running()`** (`pyosmo/history/test_case.py:19-20`):
- Current: Returns True when stopped (inverted)
- Fix: Rename to `is_stopped()` or invert the logic. Check all callers.

4. **Fix empty `available_steps` crash** (`pyosmo/osmo.py`):
- Current: If all guards return False, `algorithm.choose()` gets empty list → `IndexError`
- Fix: Check for empty available steps before calling algorithm. Raise a clear `OsmoError` explaining no steps are available.

5. **Fix `TestStepLog.name`** (`pyosmo/history/test_step_log.py:22-23`):
- Current: Uses `_step.function_name` instead of `_step.name`
- Fix: Use `_step.name` for semantic step name in history/statistics

After each fix:
- Run `pytest pyosmo/tests/` to verify no regressions
- Run `ruff check pyosmo/ --fix && ruff format pyosmo/`
- Add a regression test if one doesn't exist for the fixed behavior

---

## Phase 1: Hypothesis Property Tests

Before writing TLA+ specs, implement property-based tests using Hypothesis that verify the same properties. These serve as immediate value and validation baseline.

Create file: `pyosmo/tests/test_properties.py`

Implement these property tests:

### 1.1 Hook Ordering Property
```python
# Test that lifecycle hooks always execute in correct order regardless of:
# - Number of steps per test
# - Number of tests per suite
# - Which steps are available
# - Whether errors occur
# Capture hook calls via a tracking model, verify ordering invariant holds.
```

### 1.2 Error Strategy Combinations (all 16 pairs)
```python
# For each combination of test_strategy × suite_strategy:
# ("raise", "ignore", "ignore_asserts", "allow_count") × same
# Run with models that produce errors at random positions.
# Verify:
# - AlwaysRaise at test level: error always propagates to suite level
# - AlwaysIgnore: suite always completes normally
# - AllowCount(n): exactly n errors tolerated, (n+1)th propagates
```

### 1.3 Balancing Algorithm Fairness
```python
# Given N steps all always available, after k*N iterations:
# - max_count - min_count <= 1
# Given steps with changing availability:
# - No available step is starved (count stays within bounds)
```

### 1.4 End Condition Composition
```python
# And(Length(a), Length(b)) terminates after max(a, b) steps
# Or(Length(a), Length(b)) terminates after min(a, b) steps
# StepCoverage works correctly at both test and suite level
```

### 1.5 Empty Available Steps
```python
# When all guards return False, engine raises clear error
# When guards alternate (some steps available, then none), behavior is correct
```

After implementing:
- Run `pytest pyosmo/tests/test_properties.py -v`
- All tests must pass
- Run `ruff check pyosmo/ --fix && ruff format pyosmo/`

---

## Phase 2: TLA+ Engine State Machine Specification

Create directory: `pyosmo/specs/tla/`

### 2.1 Write `PyOsmoEngine.tla`

Model the core execution engine state machine. Reference the specification sketch in Section 4.1 of `doc/tla_plus_investigation.md` but expand it to include:

- All phase transitions (idle, suite, test, step)
- Lifecycle hook execution tracking
- Two-level error handling with configurable strategies
- End condition evaluation
- Dynamic step availability (guards changing between iterations)
- History state (test count, step count, error count)

The specification should be approximately 150-250 lines.

### 2.2 Write `PyOsmoEngine.cfg`

TLC configuration with small constants:
```
CONSTANTS
Steps = {s1, s2, s3, s4}
MaxSteps = 5
MaxTests = 3
ErrorStrategies = {"raise", "ignore", "allow_count"}

INVARIANT TypeInvariant
INVARIANT ValidPhaseTransitions
INVARIANT HookOrderingInvariant
INVARIANT MonotonicCounters
INVARIANT SingleActiveTestCase

PROPERTY SuiteTerminates
```

### 2.3 Write `ErrorStrategyCascade.tla`

Focused specification for error strategy interactions:
- Model all 4 strategies at test level and suite level
- Track error occurrence, propagation, and swallowing
- Verify no silent error loss under AlwaysRaise
- Verify AllowCount threshold semantics

### 2.4 Write `ErrorStrategyCascade.cfg`

Configuration testing all 16 strategy combinations.

### 2.5 Verify with TLC

Run TLC on each specification:
```bash
java -jar tla2tools.jar -config PyOsmoEngine.cfg PyOsmoEngine.tla
java -jar tla2tools.jar -config ErrorStrategyCascade.cfg ErrorStrategyCascade.tla
```

If TLC is not available locally, document the expected verification results based on the specification analysis. Note any properties that cannot be verified without TLC and explain why.

---

## Phase 3: Algorithm Fairness Specifications

### 3.1 Write `BalancingAlgorithm.tla`

Model the deterministic balancing algorithm:
- State: step_counts function, available set, chosen step
- Action: Choose minimum-count step, increment its count
- Invariant: After |Steps| iterations, max - min <= 1
- Liveness: All available steps eventually executed

### 3.2 Write `WeightedBalancing.tla`

Model the weighted balancing algorithm:
- State: step_counts, weights (fixed), available set
- Action: Compute normalized weights, subtract history norm, rescue negatives, choose
- Invariant: All computed weights > 0 after normalization
- Safety: No division by zero in normalization

Use integer arithmetic scaled by 100 to model the floating-point behavior.

### 3.3 Write configurations and verify

---

## Phase 4: End Condition Specifications

### 4.1 Write `EndConditions.tla`

Model composable end conditions:
- Length, StepCoverage, Time (abstracted), Endless
- And/Or combinators
- Test-level vs suite-level semantics
- Verify termination properties

### 4.2 Verify no infinite loops

For all non-Endless configurations with finite step sets, prove suite eventually terminates.

---

## Phase 5: Documentation and Integration

### 5.1 Add README for specs
Create `pyosmo/specs/tla/README.md` explaining:
- What each spec verifies
- How to run TLC
- How to interpret counterexamples
- Correspondence between TLA+ and Python code

### 5.2 Update CLAUDE.md
Add a section about the TLA+ specifications to the project's CLAUDE.md.

### 5.3 Final verification
- Run full test suite: `pytest pyosmo/tests/`
- Run linter: `ruff check pyosmo/ --fix && ruff format pyosmo/`
- Verify all property tests pass
- Verify TLA+ specs are syntactically valid (if tooling available)

---

## Constraints and Guidelines

1. **Read before writing**: Always read the relevant source files before modifying them. Understand the existing code patterns.

2. **Test after each change**: Run `pytest pyosmo/tests/` after every code modification. Do not proceed if tests fail.

3. **Lint after each change**: Run `ruff check pyosmo/ --fix && ruff format pyosmo/` after code changes.

4. **Follow existing patterns**: Match the project's code style (double quotes, 120 char lines, type hints encouraged).

5. **Minimal changes for bug fixes**: Fix only what's broken. Don't refactor surrounding code.

6. **TLA+ specs should be self-documenting**: Include comments mapping each action to the Python source location (file:line).

7. **If TLC is not installed**: Write the specifications correctly anyway. Document that verification requires `tla2tools.jar` (available from https://github.com/tlaplus/tlaplus/releases). The specs themselves have value as executable documentation.

8. **If stuck on TLA+ syntax**: Use standard TLA+ patterns from Leslie Lamport's examples. The key operators are:
- `/\` (and), `\/` (or), `=>` (implies)
- `'` (next state), `UNCHANGED` (stuttering)
- `[]` (always), `<>` (eventually), `~>` (leads-to)
- `WF_vars(Action)` (weak fairness)

9. **Commit messages**: If committing, use descriptive messages like "Add TLA+ engine state machine specification" or "Fix StepCoverage range validation bug".

10. **Do not skip phases**: Complete Phase 0 (bug fixes) before Phase 1 (Hypothesis tests) before Phase 2 (TLA+ specs). Each phase validates the previous.

---

## Definition of Done

- [ ] All 5 bugs from Section 7 of PRD are fixed with regression tests
- [ ] Property-based tests in `test_properties.py` pass (5+ properties)
- [ ] `PyOsmoEngine.tla` specification written with safety and liveness properties
- [ ] `ErrorStrategyCascade.tla` covers all 16 strategy combinations
- [ ] `BalancingAlgorithm.tla` proves fairness property
- [ ] `WeightedBalancing.tla` proves weight positivity
- [ ] `EndConditions.tla` proves termination for non-Endless configs
- [ ] All TLC configs written
- [ ] README for specs directory exists
- [ ] `pytest pyosmo/tests/` passes with zero failures
- [ ] `ruff check pyosmo/` passes with zero errors
- [ ] No regressions in existing test suite
Loading