Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
398 changes: 398 additions & 0 deletions packages/testing-framework/POC-GHERKIN.md

Large diffs are not rendered by default.

146 changes: 105 additions & 41 deletions packages/testing-framework/example/README.md
Original file line number Diff line number Diff line change
@@ -1,55 +1,119 @@
# Midscene v2 Testing Framework — Example
# Midscene v2 Testing Framework — Examples

A self-contained demo of [`@midscene/testing-framework`](..)
(the AI-native v2 UI testing framework, Phase 0). Copy this folder out, install,
set your model env vars, and run.
Two related examples live here:

## What it shows
1. **Three authoring styles, one test suite** (`style-1-gherkin/`,
`style-2-js/`, `style-3-overlay/`) — the flow-IR POC. **Start here.**
2. A copy-out **YAML runner** demo (`e2e/` + `midscene.config.ts`) — the
Phase 0 node engine. See [below](#the-phase-0-yaml-runner-example).

- A **config-style** `uiAgent` (web) in `midscene.config.ts` — environment lives
in config, never in the case YAML.
- The full node model in `e2e/*.yaml`:
- `ui` — natural-language UI actions (run by Midscene's UI Agent)
- `verify` — gating judgment with a forced pass/fail verdict
- `soft` — non-gating soft assertion (failure → warning only)
- `agent` — advisory free exploration (never gates)
- custom **runtime** nodes (`prepareCartFixture`, `notify`) via `defineRuntime`
- A `$name` **skill** reference (`$catalog`) backed by `skills/catalog/SKILL.md`.
- The **output contract**: steps record natural-language conclusions that later
`verify` / `agent` nodes reference by name.
## Which style do I need?

## Run it
**You probably only need style 1.** Plain `.feature` files run end-to-end
with nothing else — no JS, no step definitions, no overlay. If your tests
are plain English with no computed data, stop there.

```bash
# 1. install
pnpm install # or npm install / yarn
- **Style 1 — pure Gherkin**: the default, complete on its own.
- **Style 2 — pure JS/TS**: for engineering-owned suites that want loops,
types, and computed args as first-class code.
- **Style 3 — overlay**: **optional — an escape hatch, never a
requirement.** It only earns its keep when Gherkin is the contract AND a
handful of scenarios need programmatic exceptions:
- bind-time computed values (dates, env-derived data Gherkin cannot
express),
- environment-specific tweaks without forking the feature file (skip a
scenario in CI, downgrade a verify to soft, insert a step),
- a safe seam between non-engineer-owned prose and engineer-owned JS —
the overlay is sparse and drift-validated at bind time, so it cannot
silently rot.

## Three interchangeable styles of the SAME suite

The style folders author the **same multi-file test suite** for the static
shop in `demo-app/`. They are *alternative surfaces*, not different suites:
all three compile to one shared intermediate representation (flow-IR) and
run through the same executor, so you pick a style per team — or mix them —
without changing semantics. No step-definition code exists anywhere; every
step is natural language executed by AI agents.

# 2. configure the model (UI Agent + Pi share one endpoint)
cp .env.example .env # then edit, or export the vars in your shell
| Folder | Style | Read this first | Choose it when |
| --- | --- | --- | --- |
| `style-1-gherkin/` | Pure Gherkin `.feature` files | `flows/login.feature` | Non-engineers own the suite; specs are the shared language. Fully sufficient on its own. |
| `style-2-js/` | Pure JS/TS fluent API | `flows/index.ts` | The suite is generated or heavily dynamic (loops, computed prompts). |
| `style-3-overlay/` | OPTIONAL: Gherkin source of truth + sparse JS overlay | `checkout.overlay.ts` | Escape hatch only: Gherkin stays canonical, but a few scenarios need computed values or env tweaks. Binds **style 1's** feature files — nothing is duplicated. |

# 3. run all cases
pnpm test
Inside each style the layout shows real-world modular reuse:

```text
style-1-gherkin/
flows/ # SHARED flow definitions (@flow scenarios)
login.feature # "Login" — params/returns declared as tags
add-to-cart.feature # "Add product to cart"
features/ # independent test modules; they CALL the shared
cart.feature # flows but do not define them
checkout.feature
smoke.feature

# run a single case
pnpm test:one
style-2-js/
flows/index.ts # the same two flows, declared with defineFlow()
features/ # one module per .feature twin
cart.flows.ts
checkout.flows.ts
smoke.flows.ts

style-3-overlay/
checkout.overlay.ts # sparse patch over style-1's checkout.feature
```

By default the demo runs against the bundled static page in `site/index.html`
(offline). Set `DEMO_URL` to point at your own app.
Cross-file resolution is the suite's job, not the file's: `compileSuite()`
compiles every `.feature` under a directory and merges all `@flow`
definitions into **one registry** (duplicate flow names across files fail
loudly), then each module's scenarios run against it. The JS side gets the
same effect by importing the shared registry from `flows/index.ts`.

Results are written to `midscene_run/output/summary.json`, and Midscene HTML
reports for the UI steps land in `midscene_run/report/`.
Key concepts, explained in context in the "read this first" files:

## Layout
- **Flow** — a named, parameterized, reusable prompt sequence. Fresh
variable scope inside (only declared params visible), only declared
`returns` flow back to the caller.
- **Capture / `remember`** — the UI agent extracts a value from the screen
into a machine-owned variable table; later prompts use `{name}`
placeholders that are substituted mechanically *before* any model sees
the text.
- **Keyword mapping** — Given/When → UI actions; Then → fail-closed
`verify` (a general agent must report a pass/fail verdict); `@soft` /
`Soft()` → warn-only checks.

```text
.
midscene.config.ts # uiAgent + discovery + runtime nodes
e2e/
product-detail.yaml # ui + verify + soft + agent
add-to-cart.yaml # custom node + $catalog skill + verify + agent + notify
skills/
catalog/SKILL.md # a $name skill (Pi discovers/loads it)
site/
index.html # tiny static demo app
### Run it

```bash
pnpm --filter @midscene/testing-framework demo # offline, no keys
pnpm --filter @midscene/testing-framework demo -- --live # real browser+model
```

The demo runs the suite module-by-module in all three styles, narrates each
prompt/variable/verdict, and proves the styles are equivalent by comparing
execution traces. See `../POC-GHERKIN.md` for the full design.

## The Phase 0 YAML runner example

A self-contained demo of the YAML node engine: copy this folder out,
install, set model env vars, and run.

- A **config-style** `uiAgent` (web) in `midscene.config.ts`.
- The full node model in `e2e/*.yaml`: `ui`, `verify`, `soft`, `agent`,
plus custom **runtime** nodes (`prepareCartFixture`, `notify`) via
`defineRuntime`.
- A `$name` **skill** reference (`$catalog`) backed by
`skills/catalog/SKILL.md`.

```bash
pnpm install
cp .env.example .env # or export MIDSCENE_MODEL_* in your shell
pnpm test # midscene-tf run
pnpm test:one # single case
```

By default it runs against the bundled static page in `site/index.html`;
set `DEMO_URL` to point at your own app. Results land in
`midscene_run/output/summary.json`, HTML reports in `midscene_run/report/`.
162 changes: 162 additions & 0 deletions packages/testing-framework/example/demo-app/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Midscene POC Shop</title>
<style>
* { box-sizing: border-box; font-family: system-ui, sans-serif; }
body { margin: 0; color: #1f2933; }
header { display: flex; justify-content: space-between; align-items: center; padding: 16px 24px; background: #0b5fff; color: #fff; }
header button { padding: 8px 14px; border: none; border-radius: 6px; cursor: pointer; }
header .right { display: flex; align-items: center; gap: 12px; }
main { padding: 24px; max-width: 640px; margin: 0 auto; }
section { display: none; }
section.active { display: block; }
.card { border: 1px solid #e4e7eb; border-radius: 10px; padding: 16px; margin-bottom: 16px; }
.price { color: #0b5fff; font-weight: 600; }
button.primary { padding: 12px 20px; background: #0b5fff; color: #fff; border: none; border-radius: 6px; font-size: 16px; cursor: pointer; }
select, input { padding: 10px; border: 1px solid #cbd2d9; border-radius: 6px; margin-right: 8px; }
.row { display: flex; align-items: center; gap: 8px; margin: 12px 0; }
.qty { display: inline-flex; align-items: center; gap: 8px; }
.qty button { width: 32px; height: 32px; border: 1px solid #cbd2d9; background: #fff; border-radius: 6px; cursor: pointer; font-size: 16px; }
#cartTotal { font-size: 20px; font-weight: 700; }
.muted { color: #52606d; font-size: 14px; }
</style>
</head>
<body>
<header>
<strong>Midscene POC Shop</strong>
<div class="right">
<span id="greeting" hidden></span>
<span id="cartBadge">Cart: 0 items</span>
<button id="navLogin">Login</button>
</div>
</header>
<main>
<section id="home" class="active">
<h2>Welcome to the POC shop</h2>
<div class="card">
<h3>Trail Backpack</h3>
<div class="price">$129.00</div>
<p class="muted">Lightweight 28L pack for day hikes.</p>
<button class="primary" data-add="Trail Backpack">Add to cart</button>
</div>
<div class="card">
<h3>Camp Mug</h3>
<div class="price">$24.50</div>
<p class="muted">Enamel mug that survives the campfire.</p>
<button class="primary" data-add="Camp Mug">Add to cart</button>
</div>
<button id="openCart">Open cart</button>
</section>

<section id="login">
<h2>Sign in</h2>
<div class="row">
<select id="role">
<option value="admin">admin</option>
<option value="guest">guest</option>
</select>
<button class="primary" id="signIn">Sign in with saved test credentials</button>
</div>
</section>

<section id="dashboard">
<h2 id="dashboardTitle"></h2>
<p class="muted">You are signed in. Use the header to keep shopping.</p>
<button id="backHome">Back to shop</button>
</section>

<section id="cart">
<h2>Your cart</h2>
<div id="cartLines"></div>
<div class="row">
<input id="coupon" placeholder="Coupon code" />
<button id="applyCoupon">Apply coupon</button>
</div>
<p>Total: <span id="cartTotal">$0.00</span></p>
<p class="muted" id="couponNote" hidden></p>
<button id="backHome2">Back to shop</button>
</section>
</main>

<script>
const PRICES = { 'Trail Backpack': 129.0, 'Camp Mug': 24.5 };
const state = { role: null, items: {}, discount: 0 };
const show = (id) => {
document
.querySelectorAll('section')
.forEach((s) => s.classList.toggle('active', s.id === id));
};
const money = (n) => `$${n.toFixed(2)}`;
const itemCount = () =>
Object.values(state.items).reduce((sum, qty) => sum + qty, 0);
const render = () => {
const lines = document.getElementById('cartLines');
lines.innerHTML = '';
let total = 0;
for (const [name, qty] of Object.entries(state.items)) {
if (qty <= 0) continue;
total += PRICES[name] * qty;
const line = document.createElement('div');
line.className = 'card row';
line.innerHTML =
`<span>${name}</span>` +
`<span class="qty"><button data-dec="${name}">−</button>` +
`<span>Qty: <strong>${qty}</strong></span>` +
`<button data-inc="${name}">+</button></span>` +
`<span class="price">${money(PRICES[name] * qty)}</span>`;
lines.appendChild(line);
}
total *= 1 - state.discount;
document.getElementById('cartTotal').textContent = money(total);
const n = itemCount();
document.getElementById('cartBadge').textContent =
`Cart: ${n} item${n === 1 ? '' : 's'}`;
};
document.getElementById('navLogin').addEventListener('click', () => show('login'));
document.getElementById('signIn').addEventListener('click', () => {
state.role = document.getElementById('role').value;
const pretty = state.role[0].toUpperCase() + state.role.slice(1);
const greeting = document.getElementById('greeting');
greeting.textContent = `Hello, ${pretty}!`;
greeting.hidden = false;
document.getElementById('dashboardTitle').textContent = `Dashboard (${state.role})`;
show('dashboard');
});
document.body.addEventListener('click', (event) => {
const t = event.target;
if (!(t instanceof HTMLElement)) return;
if (t.dataset.add) {
state.items[t.dataset.add] = (state.items[t.dataset.add] ?? 0) + 1;
render();
}
if (t.dataset.inc) {
state.items[t.dataset.inc] += 1;
render();
}
if (t.dataset.dec) {
state.items[t.dataset.dec] = Math.max(0, state.items[t.dataset.dec] - 1);
render();
}
});
document.getElementById('openCart').addEventListener('click', () => {
render();
show('cart');
});
document.getElementById('applyCoupon').addEventListener('click', () => {
const code = document.getElementById('coupon').value.trim();
if (!code) return;
state.discount = 0.1;
const note = document.getElementById('couponNote');
note.textContent = `Coupon "${code}" applied: 10% off.`;
note.hidden = false;
render();
});
for (const id of ['backHome', 'backHome2']) {
document.getElementById(id).addEventListener('click', () => show('home'));
}
</script>
</body>
</html>
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# An independent test module. It calls the shared "Login" and "Add product
# to cart" flows (defined under ../flows) and only authors what is specific
# to cart management. Cross-file resolution is the suite's job: compile all
# .feature files with `compileSuite`, run each module's scenarios against
# the merged flow registry.
#
# Keyword → runtime mapping (no step definitions anywhere):
# Given/When → ui action performed by the Midscene UI Agent
# Then → verify: a general agent must report a pass/fail verdict;
# fail (or no verdict) FAILS the scenario (fail-closed)
# And/But → inherit the previous primary keyword
Feature: Cart management

Background:
Given the demo shop is open on the home page

Scenario: Cart shows the added product with quantity and price
When I run the "Login" flow with role "guest"
# The flow's declared return {price} lands in this scenario's variable
# table; the Then steps below use it after mechanical substitution.
And I run the "Add product to cart" flow with product "Camp Mug"
Then the cart lists "Camp Mug" with quantity 1 at {price}
And the cart badge in the header shows 1 item

Scenario: Increasing the quantity updates the total
When I run the "Login" flow with role "guest"
And I run the "Add product to cart" flow with product "Camp Mug"
When I increase the "Camp Mug" quantity in the cart to 2
Then the cart total equals twice {price}
And the cart badge in the header shows 2 items
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# The checkout module reuses the same shared flows as cart.feature — flows
# are written once under ../flows and called everywhere. This file is also
# the binding target of style 3: ../../style-3-overlay/checkout.overlay.ts
# patches the admin journey with a computed coupon code WITHOUT this file
# changing — it stays the single human-readable source of truth.
Feature: Checkout

Background:
Given the demo shop is open on the home page

Scenario: Checkout as admin
When I run the "Login" flow with role "admin"
And I run the "Add product to cart" flow with product "Trail Backpack"
Then the cart total equals {price}
But the cart does not show any error banner

# The @soft tag downgrades this scenario's Then steps from verify to soft:
# a failed soft check records a warning but never fails the scenario. Use
# it for advisory checks that should not gate a run.
@soft
Scenario: Promo banner is advisory
Then a promo banner is visible at the top of the page
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Smoke module: a login matrix over the shared "Login" flow. Scenario
# Outline examples are expanded at compile time by the Gherkin pickles
# compiler — "<role>" is replaced per example row. Note the two kinds of
# placeholders: "<role>" is compile-time (Gherkin examples), "{greeting}"
# is runtime (filled by the Login flow's declared return when it executes).
Feature: Login smoke

Background:
Given the demo shop is open on the home page

Scenario Outline: Login greets every role
When I run the "Login" flow with role "<role>"
Then the header greets the user with {greeting}

Examples:
| role |
| admin |
| guest |
Loading
Loading