Skip to content

Commit a5a62f7

Browse files
committed
Add AI-driven PyOsmo model generator and refinement scripts.
- Introduce `generate_model.py` for automating PyOsmo model creation using Claude and Playwright. - Add `refine_model.py` to optimize generated models based on test execution history. - Include prompt templates for Claude agent in `prompt_template.py`. - Provide example outputs and usage documentation in `README.md` and `example_output/`. - Extend `history.py` with JSON export utilities to facilitate model refinement.
1 parent a70f9dd commit a5a62f7

6 files changed

Lines changed: 551 additions & 0 deletions

File tree

examples/ai_web_agent/README.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# AI Web Agent - PyOsmo Model Generator
2+
3+
Generate PyOsmo model-based tests for any web application using Claude + Playwright.
4+
5+
## How it works
6+
7+
1. **`generate_model.py`** — A Claude agent explores a web page via Playwright MCP, discovers interactive elements, and generates a PyOsmo model with `step_*`/`guard_*` methods.
8+
9+
2. **`refine_model.py`** — Runs the generated model, collects test history as JSON, and sends it to Claude for analysis. The agent fixes errors, improves coverage, and adds missing steps.
10+
11+
3. **`prompt_template.py`** — System prompts that teach the agent PyOsmo patterns.
12+
13+
## Setup
14+
15+
```bash
16+
# Install dependencies
17+
pip install claude-agent-sdk playwright pyosmo
18+
19+
# Install Playwright browsers
20+
playwright install chromium
21+
22+
# Set your API key
23+
export ANTHROPIC_API_KEY=your-key-here
24+
```
25+
26+
## Usage
27+
28+
### Generate a model
29+
30+
```bash
31+
python generate_model.py https://todomvc.com/examples/react/dist/ -o todo_model.py
32+
```
33+
34+
### Refine a model
35+
36+
```bash
37+
# Single refinement pass
38+
python refine_model.py todo_model.py --url https://todomvc.com/examples/react/dist/
39+
40+
# Multiple iterations
41+
python refine_model.py todo_model.py --url https://todomvc.com/examples/react/dist/ --iterations 3
42+
```
43+
44+
### Run the generated model directly
45+
46+
```bash
47+
python todo_model.py
48+
```
49+
50+
## Example output
51+
52+
See `example_output/todo_app_model.py` for a sample of what the agent generates for a TodoMVC application.
53+
54+
## How the refinement loop works
55+
56+
```
57+
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
58+
│ Generate │────>│ Run model │────>│ Collect │
59+
│ model │ │ with PyOsmo │ │ history JSON│
60+
└─────────────┘ └──────────────┘ └──────┬───────┘
61+
62+
┌──────────────┐ │
63+
│ Write back │<───────────┘
64+
│ refined │ Claude analyzes
65+
│ model │ errors & coverage
66+
└──────────────┘
67+
```
68+
69+
The history JSON includes statistics, step frequencies, transition pairs, and per-test error details — giving the agent everything it needs to diagnose and fix issues.
Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
"""Example PyOsmo model for a Todo application.
2+
3+
This is a sample of what the AI agent generates. It models a typical
4+
TodoMVC-style application with add, complete, delete, and filter actions.
5+
"""
6+
7+
from playwright.sync_api import Page
8+
9+
from pyosmo import Osmo
10+
from pyosmo.end_conditions import Length
11+
12+
13+
class TodoAppModel:
14+
"""Model-based test for a Todo web application."""
15+
16+
def __init__(self, page: Page, url: str):
17+
self.page = page
18+
self.url = url
19+
self.todo_count = 0
20+
self.completed_count = 0
21+
22+
def before_test(self):
23+
"""Navigate to app and reset state."""
24+
self.page.goto(self.url)
25+
self.page.wait_for_selector(".new-todo")
26+
self.todo_count = 0
27+
self.completed_count = 0
28+
29+
# --- Add a todo item ---
30+
31+
def step_add_todo(self):
32+
input_field = self.page.locator(".new-todo")
33+
input_field.fill(f"Task {self.todo_count + 1}")
34+
input_field.press("Enter")
35+
self.todo_count += 1
36+
37+
def guard_add_todo(self):
38+
return self.todo_count < 10
39+
40+
def weight_add_todo(self):
41+
return 5 # Adding items is the most common action
42+
43+
# --- Toggle a todo as complete ---
44+
45+
def step_toggle_todo(self):
46+
items = self.page.locator(".todo-list li:not(.completed) .toggle")
47+
items.first.click()
48+
self.completed_count += 1
49+
50+
def guard_toggle_todo(self):
51+
active_count = self.todo_count - self.completed_count
52+
return active_count > 0
53+
54+
# --- Delete a todo item ---
55+
56+
def step_delete_todo(self):
57+
item = self.page.locator(".todo-list li").first
58+
item.hover()
59+
item.locator(".destroy").click()
60+
self.todo_count -= 1
61+
62+
def guard_delete_todo(self):
63+
return self.todo_count > 0
64+
65+
# --- Filter: show all ---
66+
67+
def step_filter_all(self):
68+
self.page.click('a[href="#/"]')
69+
70+
def guard_filter_all(self):
71+
return self.todo_count > 0
72+
73+
# --- Filter: show active ---
74+
75+
def step_filter_active(self):
76+
self.page.click('a[href="#/active"]')
77+
78+
def guard_filter_active(self):
79+
return self.todo_count > 0
80+
81+
# --- Filter: show completed ---
82+
83+
def step_filter_completed(self):
84+
self.page.click('a[href="#/completed"]')
85+
86+
def guard_filter_completed(self):
87+
return self.completed_count > 0
88+
89+
# --- Assertions (run after every step) ---
90+
91+
def after(self):
92+
"""Verify todo count display matches model state."""
93+
active_count = self.todo_count - self.completed_count
94+
if active_count > 0:
95+
count_text = self.page.locator(".todo-count").text_content()
96+
assert str(active_count) in count_text
97+
98+
99+
if __name__ == "__main__":
100+
from playwright.sync_api import sync_playwright
101+
102+
with sync_playwright() as p:
103+
browser = p.chromium.launch(headless=False)
104+
page = browser.new_page()
105+
106+
model = TodoAppModel(page=page, url="https://todomvc.com/examples/react/dist/")
107+
108+
osmo = Osmo(model)
109+
osmo.test_end_condition = Length(30)
110+
osmo.generate()
111+
112+
print(osmo.history.to_json())
113+
browser.close()
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
#!/usr/bin/env python3
2+
"""Generate a PyOsmo model for a web application using Claude + Playwright.
3+
4+
Usage:
5+
python generate_model.py <URL> [--output model.py]
6+
7+
Requires:
8+
pip install claude-agent-sdk
9+
npx @anthropic-ai/claude-code mcp add playwright -- npx @playwright/mcp@latest
10+
"""
11+
12+
import argparse
13+
import asyncio
14+
import sys
15+
from pathlib import Path
16+
17+
from prompt_template import PYOSMO_MODEL_REFERENCE
18+
19+
20+
async def generate_model(url: str, output_path: str) -> None:
21+
try:
22+
from claude_agent_sdk import Agent, AgentConfig, MCPServer
23+
except ImportError:
24+
print("Error: claude-agent-sdk is required. Install with: pip install claude-agent-sdk")
25+
sys.exit(1)
26+
27+
playwright_mcp = MCPServer(
28+
name="playwright",
29+
command="npx",
30+
args=["@playwright/mcp@latest"],
31+
)
32+
33+
agent = Agent(
34+
model="claude-sonnet-4-5-20250929",
35+
config=AgentConfig(
36+
system_prompt=PYOSMO_MODEL_REFERENCE,
37+
mcp_servers=[playwright_mcp],
38+
),
39+
)
40+
41+
user_prompt = f"""\
42+
Explore the web application at {url} and generate a PyOsmo model for it.
43+
44+
Steps:
45+
1. Navigate to {url} and observe the page structure
46+
2. Click around to discover interactive elements, forms, navigation, and states
47+
3. Generate a PyOsmo model class with step_*/guard_* methods using Playwright selectors
48+
4. Include state tracking and assertions where appropriate
49+
5. Output ONLY the complete Python file content, no extra explanation
50+
51+
Save the model to: {output_path}
52+
"""
53+
54+
print(f"Exploring {url} and generating model...")
55+
result = await agent.run(user_prompt)
56+
print(f"Agent completed. Model saved to {output_path}")
57+
print(f"Result: {result}")
58+
59+
60+
def main() -> None:
61+
parser = argparse.ArgumentParser(description="Generate a PyOsmo model from a web application")
62+
parser.add_argument("url", help="URL of the web application to model")
63+
parser.add_argument("--output", "-o", default="generated_model.py", help="Output file path (default: generated_model.py)")
64+
args = parser.parse_args()
65+
66+
asyncio.run(generate_model(args.url, args.output))
67+
68+
69+
if __name__ == "__main__":
70+
main()
Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
"""System prompt template that teaches the Claude agent how to build PyOsmo models."""
2+
3+
PYOSMO_MODEL_REFERENCE = """\
4+
You are an expert test engineer. Your job is to explore a web application and
5+
generate a PyOsmo model-based testing model that exercises the application.
6+
7+
## PyOsmo Model Structure
8+
9+
A PyOsmo model is a Python class whose methods describe **steps** (actions),
10+
**guards** (preconditions), and optional **weights** (probabilities).
11+
12+
### Naming convention (preferred)
13+
14+
```python
15+
class WebAppModel:
16+
def before_test(self):
17+
\"\"\"Reset state before each test.\"\"\"
18+
self.page.goto(self.url)
19+
20+
# --- steps ---
21+
def step_click_login(self):
22+
self.page.click("#login-btn")
23+
24+
# --- guards: return True when the step is allowed ---
25+
def guard_click_login(self):
26+
return self.page.is_visible("#login-btn")
27+
28+
# --- weights (optional, default=1) ---
29+
def weight_click_login(self):
30+
return 5 # 5× more likely than default
31+
```
32+
33+
### Lifecycle hooks
34+
- `before_suite()` – once before all tests
35+
- `before_test()` – before each test case (reset state here)
36+
- `before()` / `after()` – before/after every step
37+
- `after_test()` – after each test case
38+
- `after_suite()` – once after all tests
39+
40+
### Guards
41+
Every `step_X` can have a matching `guard_X` that returns True/False.
42+
If the guard returns False the step is **not available** for selection.
43+
At least one step must always be available.
44+
45+
### State tracking
46+
Use `self.*` attributes to track application state (logged_in, item_count, current_page, etc).
47+
Update state in steps and check it in guards. This is how you model valid sequences.
48+
49+
## Guidelines for exploring and modeling a web page
50+
51+
1. **Navigate** to the target URL. Observe the page structure: links, buttons, forms, navigation.
52+
2. **Identify actions** a user can take – clicking links, filling forms, toggling elements, navigating.
53+
3. **Map each action to a `step_*` method** using Playwright selectors.
54+
4. **Add guards** for actions that are only valid in certain states (e.g., can only logout when logged in).
55+
5. **Track state** with `self.*` variables – update in steps, check in guards.
56+
6. **Add `before_test`** to navigate to the starting URL and reset state.
57+
7. **Keep the model focused** – 5-15 steps is a good range for a first model.
58+
8. **Use robust selectors** – prefer `data-testid`, `role`, or visible text over brittle CSS paths.
59+
9. **Add assertions** where possible – e.g., after clicking "Add to cart", assert the cart count increased.
60+
61+
## Output format
62+
63+
Generate a single Python file that:
64+
- Imports `from playwright.sync_api import Page`
65+
- Defines a model class with a constructor accepting `page: Page` and `url: str`
66+
- Has `before_test` reset to the starting URL
67+
- Has `step_*` / `guard_*` methods for each discovered action
68+
- Includes a `if __name__ == '__main__'` block that runs the model with PyOsmo
69+
70+
Example `__main__` block:
71+
72+
```python
73+
if __name__ == "__main__":
74+
from playwright.sync_api import sync_playwright
75+
from pyosmo import Osmo
76+
from pyosmo.end_conditions import Length
77+
78+
with sync_playwright() as p:
79+
browser = p.chromium.launch(headless=False)
80+
page = browser.new_page()
81+
model = WebAppModel(page=page, url="https://example.com")
82+
osmo = Osmo(model)
83+
osmo.test_end_condition = Length(20)
84+
osmo.generate()
85+
# Print results as JSON for analysis
86+
print(osmo.history.to_json())
87+
browser.close()
88+
```
89+
"""
90+
91+
REFINEMENT_PROMPT = """\
92+
You are an expert test engineer refining a PyOsmo model based on test execution results.
93+
94+
You will receive:
95+
1. The current model source code
96+
2. A JSON history from the last test run (produced by `history.to_json()`)
97+
98+
## How to read the history JSON
99+
100+
- `statistics.error_count` – total errors across all tests
101+
- `statistics.step_frequency` – how often each step ran
102+
- `step_pairs` – which step transitions occurred (and how often)
103+
- `test_cases[].errors[]` – specific errors with step name and message
104+
105+
## Refinement strategy
106+
107+
1. **Fix errors first**: look at `test_cases[].errors[]`. Common causes:
108+
- Selector changed or element not found → update the selector
109+
- Guard too permissive → tighten the guard condition
110+
- Missing wait → add `page.wait_for_selector()` before interacting
111+
- State tracking wrong → fix state updates
112+
113+
2. **Improve coverage**: look at `statistics.step_frequency`.
114+
- Steps with 0 executions → guard may be too restrictive
115+
- Steps dominating → lower their weight or add more variety
116+
117+
3. **Check transitions**: look at `step_pairs`.
118+
- Missing expected transitions → guards may block valid paths
119+
- Unexpected transitions → may need new guards
120+
121+
4. **Add new steps**: if the test explored pages with actions not yet modeled,
122+
add new `step_*`/`guard_*` methods.
123+
124+
Output the **complete updated model file**.
125+
"""

0 commit comments

Comments
 (0)