Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions apps/report/src/components/sidebar/index.less
Original file line number Diff line number Diff line change
Expand Up @@ -435,7 +435,8 @@
}

// Light mode Tag styles
.cache-tag, .xpath-tag {
.cache-tag,
.xpath-tag {
color: #1890ff;
background-color: #e0f5ff;
}
Expand Down Expand Up @@ -650,7 +651,8 @@
}

// Tag styles for dark mode
.cache-tag, .xpath-tag {
.cache-tag,
.xpath-tag {
color: #1890ff !important;
background-color: rgba(24, 144, 255, 0.15) !important;
}
Expand Down
40 changes: 36 additions & 4 deletions apps/site/docs/en/automate-with-scripts-in-yaml.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Midscene offers a way to perform automation using `.yaml` files, which helps you
Here is an example. By reading its content, you should be able to understand how it works.

```yaml
web:
page:
url: https://www.bing.com

tasks:
Expand Down Expand Up @@ -41,10 +41,10 @@ To execute YAML workflows from the command line, install the Midscene CLI. See [

Script files use YAML format to describe automation tasks. It defines the target to be manipulated (like a webpage or an Android app) and the series of steps to perform.

A standard `.yaml` script file includes a `web`, `android`, `ios`, `harmony`, or `computer` section to configure the environment, an optional `agent` section to configure AI agent behavior, and a `tasks` section to define the automation tasks.
A standard `.yaml` script file includes a `page`, `browser`, `web`, `android`, `ios`, `harmony`, or `computer` section to configure the environment, an optional `agent` section to configure AI agent behavior, and a `tasks` section to define the automation tasks.

```yaml
web:
page:
url: https://www.bing.com

# The tasks section defines the series of steps to be executed
Expand All @@ -56,6 +56,7 @@ tasks:
- aiAssert: The results show weather information
```

Use `page:` for a page-level Agent. Use `browser:` when one Agent should manage a browser and its active page. `web:` remains supported as a compatibility entry; `web.mode: browser` maps to BrowserAgent and plain `web:` maps to PageAgent. Do not combine `page`, `browser`, `web`, or the deprecated `target` in the same script.

### The `agent` part

Expand Down Expand Up @@ -141,10 +142,36 @@ tasks:
```


### The `web` part
### The web target part

Recommended page-level target:

```yaml
page:
url: https://example.com
```

Recommended browser-level target:

```yaml
browser:
url: https://example.com
autoFollowNewPage: true
```

Compatibility form:

```yaml
web:
mode: browser
url: https://example.com
autoFollowNewPage: true
```

Shared options:

```yaml
page:
# The URL to visit, required. If `serve` is provided, provide the relative path.
url: <url>

Expand Down Expand Up @@ -195,8 +222,13 @@ web:
unstableLogContent: <boolean | path-to-unstable-log-file>

# Whether to restrict page navigation to the current tab, optional, defaults to true.
# Page mode only. Do not use it with `browser:` or `web.mode: browser`.
forceSameTabNavigation: <boolean>

# Whether BrowserAgent should automatically continue in newly opened pages, optional, defaults to false.
# Browser mode only. Use `browser:` or `web.mode: browser`.
autoFollowNewPage: <boolean>

# CDP endpoint, optional. Connects to an existing browser instance via CDP instead of launching a new one.
# Mutually exclusive with bridgeMode.
cdpEndpoint: ws://localhost:9222/devtools/browser
Expand Down
12 changes: 8 additions & 4 deletions apps/site/docs/en/integrate-with-playwright.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -265,16 +265,20 @@ After the command executes successfully, it will output: `Midscene - report file

### About opening in a new tab

Each Agent instance is bound to a single page. To make debugging easier, Midscene intercepts new tabs by default (for example, links with `target="_blank"`) and opens them in the current page.
`PlaywrightAgent` is a page-level Agent: each instance is bound to a single page. To make debugging easier, Midscene intercepts new tabs by default (for example, links with `target="_blank"`) and opens them in the current page.

If you want to restore opening in a new tab, set `forceSameTabNavigation` to `false`—but you’ll need to create a new Agent instance for each new tab.
If you want to restore opening in a new tab while keeping the Agent on the original page, set `forceSameTabNavigation` to `false` and create a new Agent instance for each new tab yourself.

If one Agent should manage page switching for the whole browser context, use `PlaywrightBrowserAgent`. Enable `autoFollowNewPage` when subsequent actions should automatically continue in the newly opened tab.

```typescript
const mid = new PlaywrightAgent(page, {
forceSameTabNavigation: false,
const mid = new PlaywrightBrowserAgent(context, page, {
autoFollowNewPage: true,
});
```

Use `new PlaywrightBrowserAgent(context, page, options)` when you explicitly choose the initial active page. Use `PlaywrightBrowserAgent.create(context, options)` when you want Midscene to choose or create the initial active page; the factory uses `initialPage` when provided, otherwise it reuses the first existing context page or creates a new page.

### Browser support

Some Midscene web automation features rely on Chrome DevTools Protocol (CDP), which is provided by Chromium-based browsers. These include browser-level events, touch gestures, and CDP fallback paths used by specific interactions.
Expand Down
12 changes: 8 additions & 4 deletions apps/site/docs/en/integrate-with-puppeteer.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -104,16 +104,20 @@ After the above command executes successfully, the console will output: `Midscen

### About opening in a new tab

Each Agent instance is bound to a single page. For easier debugging, Midscene intercepts new tabs by default (for example, links with `target="_blank"`) and opens them in the current page.
`PuppeteerAgent` is a page-level Agent: each instance is bound to a single page. For easier debugging, Midscene intercepts new tabs by default (for example, links with `target="_blank"`) and opens them in the current page.

If you want to allow new tabs again, set `forceSameTabNavigation` to `false`—but you must create a new Agent instance for each new tab.
If you want to allow new tabs again while keeping the Agent on the original page, set `forceSameTabNavigation` to `false` and create a new Agent instance for each new tab yourself.

If one Agent should manage page switching for the whole browser, use `PuppeteerBrowserAgent`. Enable `autoFollowNewPage` when subsequent actions should automatically continue in the newly opened tab.

```typescript
const mid = new PuppeteerAgent(page, {
forceSameTabNavigation: false,
const mid = new PuppeteerBrowserAgent(browser, page, {
autoFollowNewPage: true,
});
```

Use `new PuppeteerBrowserAgent(browser, page, options)` when you explicitly choose the initial active page. Use `PuppeteerBrowserAgent.create(browser, options)` when you want Midscene to choose or create the initial active page; the factory uses `initialPage` when provided, otherwise it reuses the first existing browser page or creates a new page.

### Browser support

Some Midscene web automation features rely on Chrome DevTools Protocol (CDP), which is provided by Chromium-based browsers. These include browser-level events, touch gestures, and CDP fallback paths used by specific interactions.
Expand Down
64 changes: 56 additions & 8 deletions apps/site/docs/en/web-api-reference.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -22,20 +22,22 @@ PuppeteerAgent, PlaywrightAgent, and Chrome Bridge share one action space; the M
- `Reload` &mdash; Reload the page.
- `GoBack` &mdash; Navigate back in history.

## PuppeteerAgent {#puppeteer-agent}
## PuppeteerPageAgent / PuppeteerAgent {#puppeteer-agent}

Use Midscene against a Puppeteer-controlled browser when you need AI actions in your own Puppeteer workflows.

`PuppeteerPageAgent` is bound to one Puppeteer `Page`. `PuppeteerAgent` remains an alias for backward compatibility.

### Import

```ts
import { PuppeteerAgent } from '@midscene/web/puppeteer';
import { PuppeteerPageAgent } from '@midscene/web/puppeteer';
```

### Constructor

```ts
const agent = new PuppeteerAgent(page, {
const agent = new PuppeteerPageAgent(page, {
// browser-specific options...
});
```
Expand All @@ -56,11 +58,33 @@ In addition to the base agent options, Puppeteer exposes:

:::info

- One agent per page: by default (`forceSameTabNavigation: true`), Midscene opens new links in the current tab for easier debugging. Set it to `false` if you want new tabs, and create a new agent per tab.
- One agent per page: by default (`forceSameTabNavigation: true`), Midscene opens new links in the current tab for easier debugging. Set it to `false` if you want normal new-tab behavior and create a new `PuppeteerAgent` for each page yourself. Use `PuppeteerBrowserAgent` when the same Agent should manage browser-level page switching.
- `PuppeteerAgent` / `PuppeteerPageAgent` remains page-scoped for compatibility. It does not expose browser-level page switching unless you explicitly choose `PuppeteerBrowserAgent`.
- For the full list of interaction methods, see [API reference (Common)](./api#interaction-methods).

:::

### Browser agent

Use `PuppeteerBrowserAgent` when one Midscene Agent should manage page switching inside a Puppeteer browser. It is bound to a browser instance, keeps one active page, and can optionally follow newly opened pages.

```ts
const agent = new PuppeteerBrowserAgent(browser, page, {
autoFollowNewPage: true,
});
```

- Constructor: `new PuppeteerBrowserAgent(browser, page, options?)` &mdash; Use this when you explicitly choose the initial active page.
- Factory: `PuppeteerBrowserAgent.create(browser, options?)` &mdash; Use this when you want Midscene to choose or create the initial active page. It uses `initialPage` if provided, otherwise the first existing browser page, or creates a new page.
- `initialPage: Page` &mdash; Initial Puppeteer page for the factory.
- `autoFollowNewPage: boolean` &mdash; Automatically switch the active page when the browser opens a new page. Default `false`.
- `newPageTimeout: number` &mdash; Timeout for `waitForNewPage`. Default `5000`.
- `activePage: Page` &mdash; Current page controlled by the Browser Agent.
- `pages()` &mdash; List pages from the bound browser.
- `newPage()` &mdash; Create a new page and make it active.
- `setActivePage(page: Page)` &mdash; Explicitly set which Puppeteer page the Browser Agent controls next.
- `waitForNewPage(action?, options?)` &mdash; Wait for a newly opened page without implicitly switching the active page.

### Examples

#### Quick start
Expand Down Expand Up @@ -111,20 +135,22 @@ await browser.disconnect();

- [Integrate with Puppeteer](./integrate-with-puppeteer) for installation, fixtures, and remote-CDP guidance.

## PlaywrightAgent {#playwright-agent}
## PlaywrightPageAgent / PlaywrightAgent {#playwright-agent}

Use Midscene inside a Playwright browser for AI-driven testing or automation alongside your Playwright flows.

`PlaywrightPageAgent` is bound to one Playwright `Page`. `PlaywrightAgent` remains an alias for backward compatibility.

### Import

```ts
import { PlaywrightAgent } from '@midscene/web/playwright';
import { PlaywrightPageAgent } from '@midscene/web/playwright';
```

### Constructor

```ts
const agent = new PlaywrightAgent(page, {
const agent = new PlaywrightPageAgent(page, {
// browser-specific options...
});
```
Expand All @@ -143,11 +169,33 @@ const agent = new PlaywrightAgent(page, {

:::info

- One agent per page: with `forceSameTabNavigation` (default `true`), Midscene intercepts new tabs for stability. Set it to `false` to allow new tabs and create a separate agent for each.
- One agent per page: with `forceSameTabNavigation` (default `true`), Midscene intercepts new tabs for stability. Set it to `false` to allow normal new tabs and create a new `PlaywrightAgent` for each page yourself. Use `PlaywrightBrowserAgent` when the same Agent should manage browser-context-level page switching.
- `PlaywrightAgent` / `PlaywrightPageAgent` remains page-scoped for compatibility. It does not expose browser-level page switching unless you explicitly choose `PlaywrightBrowserAgent`.
- For the full list of interaction methods, see [API reference (Common)](./api#interaction-methods).

:::

### Browser agent

Use `PlaywrightBrowserAgent` when one Midscene Agent should manage page switching inside a Playwright browser context. It is bound to a browser context, keeps one active page, and can optionally follow newly opened pages.

```ts
const agent = new PlaywrightBrowserAgent(context, page, {
autoFollowNewPage: true,
});
```

- Constructor: `new PlaywrightBrowserAgent(context, page, options?)` &mdash; Use this when you explicitly choose the initial active page.
- Factory: `PlaywrightBrowserAgent.create(context, options?)` &mdash; Use this when you want Midscene to choose or create the initial active page. It uses `initialPage` if provided, otherwise the first existing context page, or creates a new page.
- `initialPage: Page` &mdash; Initial Playwright page for the factory.
- `autoFollowNewPage: boolean` &mdash; Automatically switch the active page when the context opens a new page. Default `false`.
- `newPageTimeout: number` &mdash; Timeout for `waitForNewPage`. Default `5000`.
- `activePage: Page` &mdash; Current page controlled by the Browser Agent.
- `pages()` &mdash; List pages from the bound browser context.
- `newPage()` &mdash; Create a new page and make it active.
- `setActivePage(page: Page)` &mdash; Explicitly set which Playwright page the Browser Agent controls next.
- `waitForNewPage(action?, options?)` &mdash; Wait for a newly opened page without implicitly switching the active page.

### Examples

#### Quick start
Expand Down
14 changes: 7 additions & 7 deletions apps/site/docs/en/yaml-script-runner.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Midscene defines a YAML-based scripting format so you can quickly author automat
For example, you can write a YAML script like this:

```yaml
web:
page:
url: https://www.bing.com

tasks:
Expand Down Expand Up @@ -70,7 +70,7 @@ npm i @midscene/cli --save-dev
Create `bing-search.yaml` to drive a web browser:

```yaml
web:
page:
url: https://www.bing.com

tasks:
Expand Down Expand Up @@ -160,7 +160,7 @@ After execution, the output directory contains:

### Run in headed mode

> `web` scenarios only
> Web page scenarios only

Headed mode opens the browser window. By default, scripts run headless.

Expand All @@ -178,10 +178,10 @@ midscene /path/to/yaml --keep-window

CDP mode lets YAML scripts connect to an existing browser instance via Chrome DevTools Protocol, without launching a new browser. This is useful for reusing an existing browser session, connecting to remote browsers, or cloud browser services.

Set `cdpEndpoint` in the `web` section:
Set `cdpEndpoint` in the `page` section:

```diff
web:
page:
url: https://www.bing.com
+ cdpEndpoint: ws://localhost:9222/devtools/browser
```
Expand All @@ -194,12 +194,12 @@ CDP mode and bridge mode are mutually exclusive. In CDP mode, Midscene will only

### Use bridge mode

> `web` scenarios only
> Web page scenarios only

Bridge mode lets YAML scripts drive your existing desktop browser so you can reuse cookies, extensions, or state. Install the Chrome extension, then add:

```diff
web:
page:
url: https://www.bing.com
+ bridgeMode: newTabWithUrl
```
Expand Down
Loading