AppiumTestDistribution · saikrishna321 · Apr 14, 2026 · Apr 13, 2026 · Apr 14, 2026 · Apr 14, 2026
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -19,11 +19,9 @@ jobs:
       - uses: actions/setup-node@v4
         with:
           node-version: '20'
-          cache: npm
-          cache-dependency-path: package-lock.json
 
       - name: Install dependencies
-        run: npm ci
+        run: npm install --no-package-lock
 
       - name: Format check (Prettier)
         run: npm run format:check

diff --git a/docs/qa-observed-assertions.md b/docs/qa-observed-assertions.md
@@ -0,0 +1,99 @@
+# QA: Observed Behavior Assertions
+
+## Problem
+
+AppClaw completes a task and declares success, but gives no structured record of _what it observed_ — prices, confirmation messages, order numbers, screen states. On the next run there is no way to know if the outcome was the same.
+
+## Concept
+
+After a successful run, an LLM call reads the agent's step history and extracts observable facts as assertions:
+
+```
+Run: "complete checkout for 1 large oat milk latte"
+Observed assertions:
+  ✓ Order confirmation screen appeared
+  ✓ Item: "Oat Milk Latte, Large" shown
+  ✓ Price shown: $6.95
+  ✓ Payment method: Apple Pay
+  ✓ Estimated ready time shown
+  ✓ Completed in 4 steps
+```
+
+On subsequent runs these become **soft assertions** — the agent flags any that no longer hold.
+
+## Assertion Types
+
+| Type            | Example                              | How detected                       |
+| --------------- | ------------------------------------ | ---------------------------------- |
+| Screen appeared | "Order confirmation screen appeared" | Screen fingerprint match           |
+| Text present    | "Price shown: $6.95"                 | LLM extraction from DOM/screenshot |
+| Step count      | "Completed in 4 steps"               | `stepsInRun` from trajectory       |
+| Element state   | "Apple Pay button was selected"      | LLM extraction                     |
+
+## Proposed Design
+
+### Extraction (async, post-run)
+
+```typescript
+// After successful finalize()
+const assertions = await extractAssertions(stepHistory, goal, llmClient);
+saveAssertions(appId, goalHash, assertions);
+```
+
+Prompt to LLM:
+
+```
+Given this agent run transcript, extract 3-6 observable facts about the outcome
+as short assertion strings. Focus on: screens that appeared, values shown,
+actions completed. Be specific. Format: one assertion per line.
+```
+
+### Storage
+
+`~/.appclaw/assertions/<appId>/<goalHash>.json`
+
+```json
+{
+  "goal": "complete checkout",
+  "appId": "com.starbucks",
+  "extractedAt": 1712345678,
+  "assertions": ["Order confirmation screen appeared", "Price shown: $6.95", "Completed in 4 steps"]
+}
+```
+
+### Soft assertion check on next run
+
+At run end, retrieve stored assertions and ask the LLM:
+
+```
+Previous run observed: ["Order confirmation screen appeared", "Price shown: $6.95"]
+Based on the current run transcript, which of these still hold? Which do not?
+```
+
+Emit result in terminal and HTML report.
+
+### Hard assertions in YAML flows
+
+QA engineers can also write explicit assertions in flow files:
+
+```yaml
+steps:
+  - tap checkout
+  - ...
+assertions:
+  - order confirmation screen is visible
+  - price displayed is under $10
+  - no error messages present
+```
+
+These run after all steps complete and fail the flow if any assertion fails.
+
+## Files to Touch
+
+- New: `src/assertions/extractor.ts` — LLM-based assertion extraction
+- New: `src/assertions/checker.ts` — compare assertions against current run
+- New: `src/assertions/store.ts` — persist/load assertion sets
+- `src/flow/parse-yaml-flow.ts` — parse `assertions:` block from YAML
+- `src/flow/run-yaml-flow.ts` — run assertion checker after steps complete
+- `src/agent/loop.ts` — trigger async extraction on success
+- `src/report/writer.ts` — include assertion results in HTML report
diff --git a/docs/qa-personas.md b/docs/qa-personas.md
@@ -0,0 +1,68 @@
+# QA: Test Persona Profiles
+
+## Problem
+
+Every QA test run requires a specific user context — free vs premium, new vs returning, admin vs regular. Currently this must be spelled out in the goal on every run, making flows verbose and hard to reuse.
+
+## Proposed Design
+
+### Persona files at `.appclaw/env/personas/<name>.yaml`
+
+```yaml
+# .appclaw/env/personas/premium-user.yaml
+name: premium-user
+credentials:
+  email: qa+premium@company.com
+  password: $SECRET_PREMIUM_PASS # interpolated from .appclaw/env/secrets
+state:
+  subscription: active
+  cart: empty
+  onboarding: completed
+  notifications: denied
+```
+
+### CLI usage
+
+```bash
+appclaw --flow checkout.yaml --persona premium-user
+appclaw --flow onboarding.yaml --persona new-user
+```
+
+### YAML flow usage
+
+```yaml
+persona: premium-user
+steps:
+  - tap the checkout button
+  - ...
+```
+
+## How It Works
+
+1. Persona file is loaded at run start
+2. Persona fields are injected into the LLM system prompt as context:
+   ```
+   CURRENT USER PERSONA: premium-user
+   - Subscription: active
+   - Cart: empty
+   - Onboarding: completed
+   ```
+3. Credentials are available for interpolation in steps:
+   ```yaml
+   - type $persona.credentials.email into the email field
+   ```
+4. Secrets (values starting with `$`) are resolved from `.appclaw/env/secrets` before injection
+
+## Personas to Ship With (Examples)
+
+- `new-user` — no account, fresh install state
+- `free-user` — logged in, free tier limits apply
+- `premium-user` — logged in, all features unlocked
+- `admin` — elevated permissions
+
+## Files to Touch
+
+- `src/flow/run-yaml-flow.ts` — load and inject persona at run start
+- `src/config.ts` — add `--persona` CLI flag
+- `src/llm/prompts.ts` — inject persona context into system prompt
+- New: `src/persona/loader.ts` — load, validate, interpolate persona files
diff --git a/docs/qa-regression-baseline.md b/docs/qa-regression-baseline.md
@@ -0,0 +1,65 @@
+# QA: Trajectory as Regression Baseline
+
+## Problem
+
+AppClaw's trajectory store already records the exact path a successful run took — the sequence of actions, selectors, and step counts. This is a regression baseline sitting unused. There is no way today to compare a current run against a previous one and flag changes.
+
+## Insight
+
+A regression is detectable when:
+
+- The same goal on the same app **took more steps** than before
+- A screen that used to appear **no longer appears**
+- An action that always worked **now fails**
+- The completion path **diverged** from the recorded trajectory
+
+## Proposed Design
+
+### Regression report per run
+
+After each run, compare against the stored trajectory for the same (goal, app, platform) and emit a diff:
+
+```
+Regression Check: "complete checkout"  app: com.starbucks
+─────────────────────────────────────────────────────────
+✓  Step 1: find_and_click "Add to Cart"         (same)
+✓  Step 2: find_and_click "Proceed to Checkout" (same)
+⚠  Step 3: NEW — dismiss_popup "Enable notifications"  (not in baseline)
+✓  Step 4: find_and_click "Apple Pay"           (same)
+✗  Step 5: MISSING — order confirmation screen  (appeared in baseline, not now)
+
+Steps: 4 (baseline: 4) ✓  |  New steps: 1  |  Missing steps: 1
+```
+
+### CLI flag
+
+```bash
+appclaw --flow checkout.yaml --check-regression
+appclaw --flow checkout.yaml --update-baseline   # overwrite stored baseline
+```
+
+### Baseline storage
+
+Extend `TrajectoryEntry` in `src/memory/types.ts` with an ordered step sequence (not just the winning final action) so full path comparison is possible.
+
+Or store baselines separately at `~/.appclaw/baselines/<appId>/<goalHash>.json`.
+
+## Step Count Heuristic (Quick Win)
+
+Without full path comparison, step count delta alone is a useful signal:
+
+```
+⚠ Regression risk: "complete checkout" took 7 steps (baseline: 4). App may have added screens.
+```
+
+This requires no schema changes — `stepsInRun` is already stored in `TrajectoryEntry`.
+
+Surface this warning in the run summary today.
+
+## Files to Touch
+
+- `src/memory/types.ts` — extend `TrajectoryEntry` with step sequence (optional, for full diff)
+- `src/memory/retriever.ts` — add baseline comparison function
+- `src/agent/loop.ts` — emit regression warning at run end
+- `src/report/writer.ts` — include regression diff in HTML report
+- `src/config.ts` — add `--check-regression` and `--update-baseline` flags
diff --git a/docs/qa-step-libraries.md b/docs/qa-step-libraries.md
@@ -0,0 +1,79 @@
+# QA: Named Setup Steps / Step Libraries
+
+## Problem
+
+Common setup sequences (login, clear cart, reset permissions, onboard a new user) are written out in full in every flow file that needs them. When the login flow changes, every flow that embeds it must be updated. There is no reuse.
+
+## Concept
+
+Named step sequences stored as shared YAML fragments, referenced from any flow:
+
+```yaml
+# .appclaw/steps/login-as-admin.yaml
+name: login-as-admin
+description: Log in using admin credentials, handle 2FA if prompted
+steps:
+  - tap the Sign In button
+  - type $persona.credentials.email into the email field
+  - type $persona.credentials.password into the password field
+  - tap Login
+  - if OTP screen appears, wait for human input
+```
+
+Referenced in any flow:
+
+```yaml
+setup:
+  - use: login-as-admin
+  - use: clear-cart
+
+steps:
+  - tap checkout
+  - ...
+```
+
+## Step Library Locations
+
+Resolution order (first match wins):
+
+1. `.appclaw/steps/` — project-level, checked into repo
+2. `~/.appclaw/steps/` — user-level, shared across projects
+3. Built-in steps shipped with AppClaw (login helpers, permission handlers)
+
+## Built-in Steps to Ship
+
+| Name                    | Description                                       |
+| ----------------------- | ------------------------------------------------- |
+| `dismiss-notifications` | Deny notification permission prompt if it appears |
+| `dismiss-tracking`      | Deny app tracking permission if it appears        |
+| `clear-cart`            | Navigate to cart and remove all items             |
+| `logout`                | Navigate to account settings and log out          |
+| `wait-for-network`      | Wait until a loading spinner disappears           |
+
+## Composability
+
+Steps can reference other steps:
+
+```yaml
+# .appclaw/steps/fresh-checkout-session.yaml
+steps:
+  - use: logout
+  - use: login-as-free-user
+  - use: clear-cart
+```
+
+## Discoverable via CLI
+
+```bash
+appclaw --list-steps                    # list all available named steps
+appclaw --list-steps --filter login     # filter by name
+appclaw --run-step login-as-admin       # run a single step in isolation
+```
+
+## Files to Touch
+
+- `src/flow/parse-yaml-flow.ts` — resolve `use:` references, load step files
+- `src/flow/run-yaml-flow.ts` — execute referenced steps inline
+- New: `src/flow/step-library.ts` — resolve step files from project + user + built-in paths
+- `src/config.ts` — add `--list-steps`, `--run-step` flags
+- New: `src/flow/builtin-steps/` — built-in step YAML files