Skip to content

Commit 679a181

Browse files
committed
Adopt Cypress-inspired QA reliability
1 parent 9eae297 commit 679a181

20 files changed

Lines changed: 655 additions & 142 deletions

File tree

ARCHITECTURE.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,8 @@ agent/src/
7474
│ ├── actions.ts # Maps command strings to browser method execution
7575
│ ├── browser-agent.ts # Unified browser context, state cache, actions
7676
│ ├── console-listener.ts
77+
│ ├── cypress-runtime.ts # Retryable assertions, failure screenshots, command log
78+
│ ├── fixtures.ts # Non-sensitive fixture lookup for task steps
7779
│ ├── login-runner.ts # Secure credential autofill and validation
7880
│ ├── network-listener.ts# Collects network errors & intercepting API payloads
7981
│ ├── page-analyzer.ts # Computes accessible DOM representation
@@ -156,14 +158,22 @@ To prevent the agent from performing destructive actions in production/staging e
156158
- **Safe Tool Whitelist**: Tools that only observe or perform standard form interaction (e.g., `open_url`, `click_by_index`, `scroll`, `hover`) bypass filters immediately, preventing false positives.
157159
- **Intent Pattern Matching**: Unknown or custom tools are analyzed against safety rules (regex check) for action flags before execution. This prevents data fields (like entering `email: "delete-me@gmail.com"`) from triggering message-send blockages.
158160

159-
### 4. Zero-Dependency OOXML Excel Builder (`excel.ts`)
161+
### 4. Cypress-Inspired Reliability Layer (`cypress-runtime.ts`)
162+
QaAgent stays Playwright-native but adopts Cypress-style reliability patterns for explicit task steps:
163+
- Query/assertion steps retry until a timeout and re-check the current DOM each attempt.
164+
- Mutating actions are recorded as single-shot commands while Playwright handles actionability waits.
165+
- Failed commands can capture a failure screenshot.
166+
- Every explicit task command is written to a structured Command Log with status, attempts, duration, error, and screenshot path.
167+
- Fixture references load reusable non-sensitive values from `agent/fixtures`.
168+
169+
### 5. Zero-Dependency OOXML Excel Builder (`excel.ts`)
160170
To remain lightweight and portable, the Excel report generator uses **no external libraries** like `exceljs` or `xlsx`. It compiles raw OpenXML files directly:
161171
- Writes structure files: `[Content_Types].xml`, `xl/styles.xml`, `xl/workbook.xml`, `xl/worksheets/sheet1.xml`, etc.
162172
- Serializes screenshots into PNG files under `xl/media/` and writes `drawing.xml` elements to position screenshots inside cells.
163173
- Standardizes styling: formats headers (purple background, bold white text), severity tiers (Red/Critical, Amber/High, Yellow/Medium, Blue/Low), and column widths.
164174
- Bundles them using a lightweight, pure Node.js CRC32-based ZIP compiler.
165175

166-
### 5. Autonomous Explorer (`autonomous-explorer.ts`)
176+
### 6. Autonomous Explorer (`autonomous-explorer.ts`)
167177
In Codex/no-API mode, the agent isn't passive. It crawls and checks sites dynamically:
168178
- Locates navbar, sidebar, and tab navigation links.
169179
- Explores linked pages (restricted to the same origin URL).

README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ QaAgent runs a local, highly-instrumented Playwright browser, captures trace evi
2727
* **Autonomous Crawling & Testing**: In Codex/no-API mode, the agent automatically discovers links, sidebar items, tabs, and modals within the same origin, tests form validation, and takes full-page screenshots at every step.
2828
* **Dual Execution Modes**: Choose **Codex/no-API mode** (ideal for local-first execution with local credentials) or **Groq API mode** (autonomous agent CLI loop utilizing model-driven tool calls).
2929
* **Multi-Strategy Selector Healing**: Automatically attempts to recover from failing CSS selectors using selectors history memory, text hints, ARIA roles, or indexed state coordinates before raising a failure.
30+
* **Cypress-Inspired Reliability**: Retryable assertions, fixture-backed task values, failure screenshots, and a Command Log sheet make dynamic UI runs easier to debug without adding Cypress as a runtime dependency.
3031
* **Two-Tier Safety Guard**: A proactive firewall blocking destructive actions (deletes, settings alterations, payments, bulk updates, and message broadcast sends) by default. Safe tools bypass checks to eliminate false positives.
3132
* **Fleshed-out QA Detectors**: Automated DOM audits checking for accessibility faults, invalid forms, pagination/horizontal scrolling failures in tables, and console/network bottlenecks.
3233
* **Misleading UI Detection**: An API response interceptor capturing HTTP payloads to confirm if a user-facing success toast matches the actual server API response.
@@ -54,6 +55,24 @@ Run with a task file:
5455
npm run agent:codex -- --task-file agent/tasks/example-task.json --headed
5556
```
5657

58+
Cypress-style task assertions:
59+
```json
60+
{
61+
"cypress": {
62+
"defaultCommandTimeoutMs": 5000,
63+
"pollIntervalMs": 100,
64+
"screenshotOnFailure": true
65+
},
66+
"steps": [
67+
{ "action": "assert_visible", "selector": "h1" },
68+
{ "action": "assert_text", "expected": "Example Domain" },
69+
{ "action": "assert_url_includes", "expected": "example.com" }
70+
]
71+
}
72+
```
73+
74+
More details: [agent/integrations/cypress/README.md](agent/integrations/cypress/README.md).
75+
5776
---
5877

5978
## 🏛️ Architecture & System Design

agent/fixtures/example-user.json

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"name": "Qa Test User",
3+
"email": "qa.user@example.com",
4+
"phone": "+91 90000 00000",
5+
"company": "QaAgent Demo Co",
6+
"city": "Bengaluru",
7+
"role": "QA Tester"
8+
}
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Cypress-Inspired Reliability Layer
2+
3+
QaAgent remains TypeScript + Playwright. It does not add Cypress as a runtime dependency. This integration adopts the Cypress ideas that fit a local-first QA agent:
4+
5+
- Retry query/assertion steps until a timeout, instead of relying on fixed sleeps.
6+
- Keep mutating actions single-shot while still letting Playwright wait for actionability.
7+
- Capture a failure screenshot when an assertion or command fails.
8+
- Record every explicit task command in a structured Command Log with status, attempts, duration, error, and failure screenshot path.
9+
- Support fixture references for reusable non-sensitive test values.
10+
- Keep task steps independent so a later run does not depend on browser state from a previous run.
11+
12+
## Task Config
13+
14+
```json
15+
{
16+
"cypress": {
17+
"defaultCommandTimeoutMs": 5000,
18+
"pollIntervalMs": 100,
19+
"screenshotOnFailure": true,
20+
"fixtureDir": "agent/fixtures"
21+
}
22+
}
23+
```
24+
25+
## Assertion Steps
26+
27+
```json
28+
[
29+
{ "action": "assert_visible", "selector": "h1" },
30+
{ "action": "assert_text", "selector": "main", "expected": "Dashboard" },
31+
{ "action": "assert_url_includes", "expected": "/dashboard" },
32+
{ "action": "assert_count", "selector": "table tbody tr", "count": 10 }
33+
]
34+
```
35+
36+
Assertions re-query the page until they pass or hit `defaultCommandTimeoutMs`. Use `timeoutMs` on a step to override the task default.
37+
38+
## Fixtures
39+
40+
Fixtures live in `agent/fixtures` by default and must contain fake or non-sensitive values only.
41+
42+
```json
43+
{ "action": "fill_by_label", "text": "Email", "fixture": "example-user.email" }
44+
```
45+
46+
The fixture reference above loads `agent/fixtures/example-user.json` and reads the `email` key. Passwords, tokens, cookies, customer records, and payment data must not be stored in fixtures.
47+
48+
## Reports
49+
50+
Markdown and Excel reports include a `Cypress-Style Command Log`. This makes flaky UI failures easier to debug because each command shows:
51+
52+
- pass/fail status
53+
- command kind
54+
- command target
55+
- retry attempts
56+
- duration
57+
- error text
58+
- failure screenshot path

agent/scripts/quality-gate.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ function run(name: string, command: string, args: string[]): void {
3333
function secretScan(): void {
3434
const roots = [
3535
"agent/src",
36+
"agent/fixtures",
3637
"agent/scripts",
3738
"agent/tests",
3839
"agent/tasks",

agent/src/api-agent/groq-tool-loop.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,7 @@ export async function runGroqToolLoop(task: QaTask, headed: boolean, maxSteps: n
145145
tracePath,
146146
browserState: state,
147147
coverage,
148+
commandLog: browser.recorder.commandLog(),
148149
qaChecklist: detected.checklist,
149150
memoryNotes: [
150151
`QA profile: ${task.qaProfile}`,
@@ -190,6 +191,7 @@ export async function runGroqToolLoop(task: QaTask, headed: boolean, maxSteps: n
190191
screenshots,
191192
tracePath,
192193
coverage,
194+
commandLog: browser.recorder.commandLog(),
193195
qaChecklist: {},
194196
memoryNotes: [...coverage.notes, ...(tracePath ? [`Trace: ${tracePath}`] : [])],
195197
loginResult,

agent/src/browser/actions.ts

Lines changed: 126 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,50 +1,148 @@
11
import type { BrowserAgent } from "./browser-agent.js";
2-
import type { TaskStep } from "../shared/types.js";
2+
import { resolveStepValue } from "./fixtures.js";
3+
import { oneAttempt, retryAssertion, runCypressCommand } from "./cypress-runtime.js";
4+
import type { QaTask, TaskStep } from "../shared/types.js";
35

4-
export async function runTaskStep(browser: BrowserAgent, step: TaskStep): Promise<string | undefined> {
6+
export async function runTaskStep(browser: BrowserAgent, step: TaskStep, task?: QaTask): Promise<string | undefined> {
57
switch (step.action) {
68
case "open":
7-
await browser.openUrl(step.url || "");
8-
return undefined;
9+
return runCypressCommand(browser, task, step, { kind: "action", name: "open", target: required(step.url, "url") }, async () => {
10+
await browser.openUrl(required(step.url, "url"));
11+
return oneAttempt(undefined);
12+
});
913
case "click":
10-
await browser.click(required(step.selector, "selector"));
11-
return undefined;
14+
return runCypressCommand(browser, task, step, { kind: "action", name: "click" }, async () => {
15+
await browser.click(required(step.selector, "selector"));
16+
return oneAttempt(undefined);
17+
});
1218
case "click_by_index":
13-
await browser.clickByIndex(requiredNumber(step.index, "index"));
14-
return undefined;
19+
return runCypressCommand(browser, task, step, { kind: "action", name: "click_by_index", target: String(requiredNumber(step.index, "index")) }, async () => {
20+
await browser.clickByIndex(requiredNumber(step.index, "index"));
21+
return oneAttempt(undefined);
22+
});
1523
case "click_by_text":
16-
await browser.clickByText(required(step.text || step.value, "text"));
17-
return undefined;
24+
return runCypressCommand(browser, task, step, { kind: "action", name: "click_by_text", target: textValue(step, task, "text") }, async () => {
25+
await browser.clickByText(textValue(step, task, "text"));
26+
return oneAttempt(undefined);
27+
});
1828
case "click_by_role":
19-
await browser.clickByRole(required(step.role, "role"), step.text || step.value);
20-
return undefined;
29+
return runCypressCommand(browser, task, step, { kind: "action", name: "click_by_role", target: roleTarget(step, task) }, async () => {
30+
await browser.clickByRole(required(step.role, "role"), optionalTextValue(step, task));
31+
return oneAttempt(undefined);
32+
});
2133
case "fill":
22-
await browser.fill(required(step.selector, "selector"), step.value || "");
23-
return undefined;
34+
return runCypressCommand(browser, task, step, { kind: "action", name: "fill" }, async () => {
35+
await browser.fill(required(step.selector, "selector"), valueFromStep(step, task));
36+
return oneAttempt(undefined);
37+
});
2438
case "fill_by_label":
25-
await browser.fillByLabel(required(step.text || step.selector, "label"), step.value || "");
26-
return undefined;
39+
return runCypressCommand(browser, task, step, { kind: "action", name: "fill_by_label", target: required(step.text || step.selector, "label") }, async () => {
40+
await browser.fillByLabel(required(step.text || step.selector, "label"), valueFromStep(step, task));
41+
return oneAttempt(undefined);
42+
});
2743
case "fill_by_placeholder":
28-
await browser.fillByPlaceholder(required(step.text || step.selector, "placeholder"), step.value || "");
29-
return undefined;
44+
return runCypressCommand(browser, task, step, { kind: "action", name: "fill_by_placeholder", target: required(step.text || step.selector, "placeholder") }, async () => {
45+
await browser.fillByPlaceholder(required(step.text || step.selector, "placeholder"), valueFromStep(step, task));
46+
return oneAttempt(undefined);
47+
});
3048
case "fill_by_name":
31-
await browser.fillByName(required(step.text || step.selector, "name"), step.value || "");
32-
return undefined;
49+
return runCypressCommand(browser, task, step, { kind: "action", name: "fill_by_name", target: required(step.text || step.selector, "name") }, async () => {
50+
await browser.fillByName(required(step.text || step.selector, "name"), valueFromStep(step, task));
51+
return oneAttempt(undefined);
52+
});
3353
case "press":
34-
await browser.press(required(step.selector, "selector"), step.key || "Enter");
35-
return undefined;
54+
return runCypressCommand(browser, task, step, { kind: "action", name: "press", target: required(step.selector, "selector") }, async () => {
55+
await browser.press(required(step.selector, "selector"), step.key || "Enter");
56+
return oneAttempt(undefined);
57+
});
3658
case "wait":
37-
if (step.selector) await browser.waitForSelector(step.selector);
38-
else await browser.wait(Number(step.value || 1000));
39-
return undefined;
59+
return runCypressCommand(browser, task, step, { kind: step.selector ? "query" : "system", name: "wait" }, async () => {
60+
if (step.selector) await browser.waitForSelector(step.selector);
61+
else await browser.wait(Number(step.value || 1000));
62+
return oneAttempt(undefined);
63+
});
4064
case "screenshot":
41-
return browser.screenshot(step.label || "task");
65+
return runCypressCommand(browser, task, step, { kind: "system", name: "screenshot", target: step.label || "task" }, async () =>
66+
oneAttempt(await browser.screenshot(step.label || "task"))
67+
);
4268
case "analyze":
43-
await browser.saveBrowserState();
44-
return undefined;
69+
return runCypressCommand(browser, task, step, { kind: "system", name: "analyze", target: step.label || "browser-state" }, async () => {
70+
await browser.saveBrowserState();
71+
return oneAttempt(undefined);
72+
});
73+
case "assert_visible":
74+
return runCypressCommand(browser, task, step, { kind: "assertion", name: "assert_visible" }, () =>
75+
retryAssertion(browser, task, step, async () => {
76+
const visible = await isStepTargetVisible(browser, step, task);
77+
if (!visible) throw new Error(`Expected target to be visible: ${step.selector || step.text || step.role || step.value || step.label || "unknown"}`);
78+
return undefined;
79+
})
80+
);
81+
case "assert_text":
82+
return runCypressCommand(browser, task, step, { kind: "assertion", name: "assert_text", target: expectedValue(step, task) }, () =>
83+
retryAssertion(browser, task, step, async () => {
84+
const expected = expectedValue(step, task);
85+
const actual = step.selector
86+
? await browser.activePage.locator(step.selector).first().innerText({ timeout: 750 }).catch(() => "")
87+
: await browser.activePage.locator("body").innerText({ timeout: 750 }).catch(() => "");
88+
if (!actual.includes(expected)) throw new Error(`Expected page text to include "${expected}".`);
89+
return undefined;
90+
})
91+
);
92+
case "assert_url_includes":
93+
return runCypressCommand(browser, task, step, { kind: "assertion", name: "assert_url_includes", target: expectedValue(step, task) }, () =>
94+
retryAssertion(browser, task, step, async () => {
95+
const expected = expectedValue(step, task);
96+
const currentUrl = browser.getUrl();
97+
if (!currentUrl.includes(expected)) throw new Error(`Expected URL "${currentUrl}" to include "${expected}".`);
98+
return undefined;
99+
})
100+
);
101+
case "assert_count":
102+
return runCypressCommand(browser, task, step, { kind: "assertion", name: "assert_count", target: required(step.selector, "selector") }, () =>
103+
retryAssertion(browser, task, step, async () => {
104+
const expectedCount = requiredNumber(step.count, "count");
105+
const actualCount = await browser.activePage.locator(required(step.selector, "selector")).count();
106+
if (actualCount !== expectedCount) throw new Error(`Expected ${expectedCount} element(s), found ${actualCount}.`);
107+
return undefined;
108+
})
109+
);
45110
}
46111
}
47112

113+
async function isStepTargetVisible(browser: BrowserAgent, step: TaskStep, task?: QaTask): Promise<boolean> {
114+
if (step.selector) return browser.activePage.locator(step.selector).first().isVisible().catch(() => false);
115+
if (step.role) return browser.activePage.getByRole(step.role as never, optionalTextValue(step, task) ? { name: optionalTextValue(step, task) } : undefined).first().isVisible().catch(() => false);
116+
return browser.activePage.getByText(textValue(step, task, "text"), { exact: false }).first().isVisible().catch(() => false);
117+
}
118+
119+
function valueFromStep(step: TaskStep, task?: QaTask): string {
120+
return resolveStepValue(step.value || fixtureRef(step.fixture), task);
121+
}
122+
123+
function expectedValue(step: TaskStep, task?: QaTask): string {
124+
return required(resolveStepValue(step.expected || step.text || step.value || step.url || fixtureRef(step.fixture), task), "expected");
125+
}
126+
127+
function textValue(step: TaskStep, task: QaTask | undefined, name: string): string {
128+
return required(resolveStepValue(step.text || step.value || fixtureRef(step.fixture), task), name);
129+
}
130+
131+
function optionalTextValue(step: TaskStep, task?: QaTask): string | undefined {
132+
const value = resolveStepValue(step.text || step.value || fixtureRef(step.fixture), task);
133+
return value || undefined;
134+
}
135+
136+
function roleTarget(step: TaskStep, task?: QaTask): string {
137+
const name = optionalTextValue(step, task);
138+
return `${required(step.role, "role")}${name ? `:${name}` : ""}`;
139+
}
140+
141+
function fixtureRef(fixture: string | undefined): string | undefined {
142+
if (!fixture) return undefined;
143+
return fixture.startsWith("fixture:") ? fixture : `fixture:${fixture}`;
144+
}
145+
48146
function required(value: string | undefined, name: string): string {
49147
if (!value) throw new Error(`Missing ${name} for task step.`);
50148
return value;

0 commit comments

Comments
 (0)