Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 12 additions & 5 deletions apps/site/docs/en/web-api-reference.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ PuppeteerAgent, PlaywrightAgent, and Chrome Bridge share one action space; the M
- `Scroll` — Scroll from an element or screen center; supports scroll-to-top/bottom/left/right helpers.
- `DragAndDrop` — Drag from one element to another.
- `LongPress` — Long-press a target element with optional duration.
- `Swipe` — Touch-style swipe gesture (available when `enableTouchEventsInActionSpace` is `true`).
- `Pinch` — Two-finger pinch gesture for zoom in/out (available when `enableTouchEventsInActionSpace` is `true`; Chromium-based browsers only for Playwright).
- `Swipe` — Touch-style swipe gesture (available when `interactionMode` is `touch`, or the legacy `enableTouchEventsInActionSpace` is `true`).
- `Pinch` — Two-finger pinch gesture for zoom in/out (available when `interactionMode` is `touch`, or the legacy `enableTouchEventsInActionSpace` is `true`; Chromium-based browsers only for Playwright).
- `ClearInput` — Clear the contents of an input field.
- `Navigate` — Open a URL in the current tab.
- `Reload` — Reload the page.
Expand Down Expand Up @@ -47,7 +47,8 @@ In addition to the base agent options, Puppeteer exposes:
- `forceSameTabNavigation: boolean` — Restrict navigation to the current tab. Default `true`.
- `waitForNavigationTimeout: number` — Maximum wait when a step causes navigation. Default `5000` (set `0` to skip waiting).
- `waitForNetworkIdleTimeout: number` — Wait for network idle between actions to reduce flakiness. Default `2000` (set `0` to skip waiting).
- `enableTouchEventsInActionSpace: boolean` — Add touch gestures (like swipe) to the action space so the agent can handle touch-only interactions. Default `false`.
- `interactionMode: 'mouse' | 'touch'` — Choose the overall interaction mode. Default `mouse`. `touch` exposes touch gestures in the action space and uses gesture-based scrolling by default.
- `enableTouchEventsInActionSpace: boolean` — Legacy compatibility option. When `true`, it behaves like `interactionMode: 'touch'` for touch actions and default scrolling.
- `forceChromeSelectRendering: boolean` — Force `select` elements to render with Chrome's base-select styling so they're visible in screenshots/element extraction; requires Puppeteer > `24.6.0`.
- `customActions: DeviceAction[]` — Register bespoke actions defined via `defineAction` so planning can call domain-specific steps.

Expand Down Expand Up @@ -133,7 +134,8 @@ const agent = new PlaywrightAgent(page, {
- `forceSameTabNavigation: boolean` — Keep automation inside the active tab. Default `true`.
- `waitForNavigationTimeout: number` — Wait time for navigation completion. Default `5000` (set `0` to disable).
- `waitForNetworkIdleTimeout: number` — Wait between actions for network idle. Default `2000` (set `0` to disable).
- `enableTouchEventsInActionSpace: boolean` — Add touch gestures (like swipe) to the action space so the agent can handle touch-only interactions. Default `false`.
- `interactionMode: 'mouse' | 'touch'` — Choose the overall interaction mode. Default `mouse`. `touch` exposes touch gestures in the action space and uses gesture-based scrolling by default. `touch` requires a Chromium-based browser for Playwright.
- `enableTouchEventsInActionSpace: boolean` — Legacy compatibility option. When `true`, it behaves like `interactionMode: 'touch'` for touch actions and default scrolling.
- `forceChromeSelectRendering: boolean` — Force `select` elements to render with Chrome's base-select styling so they're visible in screenshots/element extraction; requires Playwright ≥ `1.52.0`.
- `customActions: DeviceAction[]` — Extend planning with project-specific actions.

Expand Down Expand Up @@ -241,18 +243,23 @@ Call `connectCurrentTab` or `connectNewTabWithUrl` before issuing other actions.
```ts
function connectCurrentTab(options?: {
forceSameTabNavigation?: boolean;
interactionMode?: 'mouse' | 'touch';
}): Promise<void>;
```

- `options.forceSameTabNavigation` (default `true`) intercepts new tabs and opens them in the current tab to simplify debugging; set to `false` if you want normal new-tab behavior (create a separate agent per tab).
- `options.interactionMode` (default `'mouse'`) controls the connected tab as mouse or touch. `touch` uses gesture-based scrolling by default.
- Resolves on a successful handshake with the active tab; rejects if the extension is not allowed to connect.

#### `connectNewTabWithUrl()`

```ts
function connectNewTabWithUrl(
url: string,
options?: { forceSameTabNavigation?: boolean },
options?: {
forceSameTabNavigation?: boolean;
interactionMode?: 'mouse' | 'touch';
},
): Promise<void>;
```

Expand Down
17 changes: 12 additions & 5 deletions apps/site/docs/zh/web-api-reference.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ PuppeteerAgent、PlaywrightAgent 和 Chrome Bridge 共用一套 Action Space,M
- `Scroll` —— 以元素为起点或从屏幕中央滚动,支持滚动到顶/底/左/右。
- `DragAndDrop` —— 从一个元素拖拽到另一个元素。
- `LongPress` —— 长按目标元素,可选自定义时长。
- `Swipe` —— 触摸式滑动(开启 `enableTouchEventsInActionSpace` 时可用)。
- `Pinch` —— 双指缩放手势,用于放大/缩小(开启 `enableTouchEventsInActionSpace` 时可用;Playwright 仅支持 Chromium 内核浏览器)。
- `Swipe` —— 触摸式滑动(`interactionMode` 为 `touch` 时可用,或兼容旧参数 `enableTouchEventsInActionSpace: true`)。
- `Pinch` —— 双指缩放手势,用于放大/缩小(`interactionMode` 为 `touch` 时可用,或兼容旧参数 `enableTouchEventsInActionSpace: true`;Playwright 仅支持 Chromium 内核浏览器)。
- `ClearInput` —— 清空输入框内容。
- `Navigate` —— 在当前标签页打开指定 URL。
- `Reload` —— 刷新当前页面。
Expand Down Expand Up @@ -47,7 +47,8 @@ const agent = new PuppeteerAgent(page, {
- `forceSameTabNavigation: boolean` —— 限制始终在当前标签页内导航,默认 `true`。
- `waitForNavigationTimeout: number` —— 当操作触发页面跳转时的最长等待时间,默认 `5000`(设为 `0` 表示不等待)。
- `waitForNetworkIdleTimeout: number` —— 每次操作后等待网络空闲的时间,默认 `2000`(设为 `0` 关闭)。
- `enableTouchEventsInActionSpace: boolean` —— 在动作空间里增加触摸手势(如滑动),用于需要触摸事件的页面,默认 `false`。
- `interactionMode: 'mouse' | 'touch'` —— 控制整体交互模式,默认 `mouse`。设为 `touch` 时,会在动作空间中暴露触摸手势,并默认使用手势滚动。
- `enableTouchEventsInActionSpace: boolean` —— 兼容旧参数。设为 `true` 时,会按 `interactionMode: 'touch'` 处理触摸动作和默认滚动方式。
- `forceChromeSelectRendering: boolean` —— 强制 `select` 元素使用 Chrome 的 base-select 样式,避免系统原生样式导致截图/元素提取不可见;需要 Puppeteer > `24.6.0`。
- `customActions: DeviceAction[]` —— 借助 `defineAction` 注册自定义动作,让规划器可以调用领域特定步骤。

Expand Down Expand Up @@ -133,7 +134,8 @@ const agent = new PlaywrightAgent(page, {
- `forceSameTabNavigation: boolean` —— 强制在当前标签页内执行,默认 `true`。
- `waitForNavigationTimeout: number` —— 等待导航完成的时间,默认 `5000`(设为 `0` 关闭)。
- `waitForNetworkIdleTimeout: number` —— 每次操作后等待网络空闲的时间,默认 `2000`(设为 `0` 关闭)。
- `enableTouchEventsInActionSpace: boolean` —— 在动作空间里增加触摸手势(如滑动),用于需要触摸事件的页面,默认 `false`。
- `interactionMode: 'mouse' | 'touch'` —— 控制整体交互模式,默认 `mouse`。设为 `touch` 时,会在动作空间中暴露触摸手势,并默认使用手势滚动。Playwright 下 `touch` 仅支持 Chromium 内核浏览器。
- `enableTouchEventsInActionSpace: boolean` —— 兼容旧参数。设为 `true` 时,会按 `interactionMode: 'touch'` 处理触摸动作和默认滚动方式。
- `forceChromeSelectRendering: boolean` —— 强制 `select` 元素使用 Chrome 的 base-select 样式,避免系统原生样式导致截图/元素提取不可见;需要 Playwright ≥ `1.52.0`。
- `customActions: DeviceAction[]` —— 追加项目特有的动作,供规划器调用。

Expand Down Expand Up @@ -241,18 +243,23 @@ const agent = new AgentOverChromeBridge({
```ts
function connectCurrentTab(options?: {
forceSameTabNavigation?: boolean;
interactionMode?: 'mouse' | 'touch';
}): Promise<void>;
```

- `options.forceSameTabNavigation`(默认 `true`)会拦截新标签并在当前页打开,方便调试;若想保留新标签行为可设为 `false`,但需要为每个新标签创建新的 Agent。
- `options.interactionMode`(默认 `'mouse'`)用于控制连接后标签页采用鼠标还是触摸交互。设为 `touch` 时会默认使用手势滚动。
- 连接当前激活标签页,成功后返回 `Promise<void>`,如果扩展未允许连接会报错。

#### `connectNewTabWithUrl()`

```ts
function connectNewTabWithUrl(
url: string,
options?: { forceSameTabNavigation?: boolean },
options?: {
forceSameTabNavigation?: boolean;
interactionMode?: 'mouse' | 'touch';
},
): Promise<void>;
```

Expand Down
14 changes: 13 additions & 1 deletion packages/web-integration/src/bridge-mode/agent-cli-side.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
import { Agent, type AgentOpt } from '@midscene/core/agent';
import { assert } from '@midscene/shared/utils';
import {
InteractionMode,
resolveWebPageInteractionOptions,
} from '../web-element';
import { commonWebActionsForWebPage } from '../web-page';
import type { KeyboardAction, MouseAction } from '../web-page';
import {
Expand Down Expand Up @@ -51,6 +55,9 @@ export const getBridgePageInCliSide = (options?: {
await server.call(BridgeEvent.UpdateAgentStatus, [message]);
},
};
const state = {
interactionMode: InteractionMode.Mouse,
};

const proxyPage = new Proxy(page, {
get(target, prop, receiver) {
Expand All @@ -69,7 +76,8 @@ export const getBridgePageInCliSide = (options?: {
}

if (prop === 'actionSpace') {
return () => commonWebActionsForWebPage(proxyPage);
return () =>
commonWebActionsForWebPage(proxyPage, state.interactionMode);
}

if (Object.keys(page).includes(prop)) {
Expand Down Expand Up @@ -109,6 +117,8 @@ export const getBridgePageInCliSide = (options?: {
// Special handling for methods that support timeout in options
if (prop === 'connectNewTabWithUrl') {
return async (url: string, options?: BridgeConnectTabOptions) => {
state.interactionMode =
resolveWebPageInteractionOptions(options).interactionMode;
const timeout = options?.timeout;
const caller = bridgeCaller(prop, timeout);
return await caller(url, options);
Expand All @@ -117,6 +127,8 @@ export const getBridgePageInCliSide = (options?: {

if (prop === 'connectCurrentTab') {
return async (options?: BridgeConnectTabOptions) => {
state.interactionMode =
resolveWebPageInteractionOptions(options).interactionMode;
const timeout = options?.timeout;
const caller = bridgeCaller(prop, timeout);
return await caller(options);
Expand Down
8 changes: 8 additions & 0 deletions packages/web-integration/src/bridge-mode/common.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import type { InteractionMode } from '../web-element';

export const DefaultBridgeServerHost = '127.0.0.1';
export const DefaultBridgeServerPort = 3766;
export const DefaultLocalEndpoint = `http://${DefaultBridgeServerHost}:${DefaultBridgeServerPort}`;
Expand Down Expand Up @@ -42,6 +44,12 @@ export interface BridgeConnectTabOptions {
* @default true
*/
forceSameTabNavigation?: boolean;
/**
* Choose how the connected tab should be controlled.
* `touch` enables gesture-based scrolling by default.
* @default 'mouse'
*/
interactionMode?: InteractionMode;
/**
* Custom timeout for connecting to the tab in milliseconds.
* @default 30000 (30 seconds)
Expand Down
11 changes: 10 additions & 1 deletion packages/web-integration/src/bridge-mode/page-browser-side.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
import { assert } from '@midscene/shared/utils';
import ChromeExtensionProxyPage from '../chrome-extension/page';
import {
type InteractionMode,
resolveWebPageInteractionOptions,
} from '../web-element';
import type {
ChromePageDestroyOptions,
KeyboardAction,
Expand Down Expand Up @@ -34,9 +38,10 @@ export class ExtensionBridgePageBrowserSide extends ChromeExtensionProxyPage {
type: 'log' | 'status',
) => void = () => {},
forceSameTabNavigation = true,
interactionMode?: InteractionMode,
public onConnectionRequest?: () => Promise<boolean>,
) {
super(forceSameTabNavigation);
super(forceSameTabNavigation, interactionMode);
}

private async setupBridgeClient() {
Expand Down Expand Up @@ -181,6 +186,8 @@ export class ExtensionBridgePageBrowserSide extends ChromeExtensionProxyPage {
if (options?.forceSameTabNavigation) {
this.forceSameTabNavigation = true;
}
const interactionOptions = resolveWebPageInteractionOptions(options);
this.interactionMode = interactionOptions.interactionMode;

await this.setActiveTabId(tabId);
}
Expand All @@ -199,6 +206,8 @@ export class ExtensionBridgePageBrowserSide extends ChromeExtensionProxyPage {
if (options?.forceSameTabNavigation) {
this.forceSameTabNavigation = true;
}
const interactionOptions = resolveWebPageInteractionOptions(options);
this.interactionMode = interactionOptions.interactionMode;

await this.setActiveTabId(tabId);
}
Expand Down
49 changes: 39 additions & 10 deletions packages/web-integration/src/chrome-extension/page.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,11 @@
The page must be active when interacting with it.
*/

import { limitOpenNewTabScript } from '@/web-element';
import {
type InteractionMode,
limitOpenNewTabScript,
resolveWebPageInteractionOptions,
} from '@/web-element';
import type {
ElementCacheFeature,
ElementTreeNode,
Expand Down Expand Up @@ -50,6 +54,8 @@ export default class ChromeExtensionProxyPage implements AbstractInterface {

public forceSameTabNavigation: boolean;

public interactionMode: InteractionMode;

private viewportSize?: Size;

private activeTabId: number | null = null;
Expand All @@ -60,12 +66,19 @@ export default class ChromeExtensionProxyPage implements AbstractInterface {

public _continueWhenFailedToAttachDebugger = false;

constructor(forceSameTabNavigation: boolean) {
constructor(
forceSameTabNavigation: boolean,
interactionMode?: InteractionMode,
) {
this.forceSameTabNavigation = forceSameTabNavigation;
const interactionOptions = resolveWebPageInteractionOptions({
interactionMode,
});
this.interactionMode = interactionOptions.interactionMode;
}

actionSpace(): DeviceAction[] {
return commonWebActionsForWebPage(this);
return commonWebActionsForWebPage(this, this.interactionMode);
}

public async setActiveTabId(tabId: number) {
Expand Down Expand Up @@ -682,13 +695,29 @@ export default class ChromeExtensionProxyPage implements AbstractInterface {
const finalX = startX || this.latestMouseX;
const finalY = startY || this.latestMouseY;
await this.showMousePointer(finalX, finalY);
await this.sendCommandToDebugger('Input.dispatchMouseEvent', {
type: 'mouseWheel',
x: finalX,
y: finalY,
deltaX,
deltaY,
});
if (this.interactionMode === 'touch') {
await this.sendCommandToDebugger('Input.synthesizeScrollGesture', {
x: finalX,
y: finalY,
// synthesizeScrollGesture uses gesture distances, whose directions are
// opposite to wheel deltas for the same visual scroll result.
xDistance: -deltaX,
yDistance: -deltaY,
// speed is measured in pixels per second, so it must stay very high;
// otherwise our "scroll to edge" calls would take a long time to finish.
speed: 9999999,
repeatCount: 0,
preventFling: true,
});
} else {
await this.sendCommandToDebugger('Input.dispatchMouseEvent', {
type: 'mouseWheel',
x: finalX,
y: finalY,
deltaX,
deltaY,
});
}
this.latestMouseX = finalX;
this.latestMouseY = finalY;
},
Expand Down
1 change: 1 addition & 0 deletions packages/web-integration/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ export type { PlayWrightAiFixtureType } from './playwright';
export { Agent as PageAgent, type AgentOpt } from '@midscene/core/agent';
export { PuppeteerAgent } from './puppeteer';
export { PlaywrightAgent } from './playwright';
export { InteractionMode } from './web-element';
export { StaticPageAgent, StaticPage } from './static';
export { WebMidsceneTools } from './mcp-tools';
export { webPlaygroundPlatform } from './platform';
Expand Down
5 changes: 4 additions & 1 deletion packages/web-integration/src/playwright/ai-fixture.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ import { rmSync, writeFileSync } from 'node:fs';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { PlaywrightAgent, type PlaywrightWebPage } from '@/playwright/index';
import type { WebPageAgentOpt } from '@/web-element';
import type { InteractionMode, WebPageAgentOpt } from '@/web-element';
import type { Cache } from '@midscene/core';
import type { AgentOpt, Agent as PageAgent } from '@midscene/core/agent';
import { processCacheConfig } from '@midscene/core/utils';
Expand Down Expand Up @@ -60,12 +60,14 @@ export const PlaywrightAiFixture = (options?: {
forceSameTabNavigation?: boolean;
waitForNetworkIdleTimeout?: number;
waitForNavigationTimeout?: number;
interactionMode?: InteractionMode;
cache?: PlaywrightCache;
}) => {
const {
forceSameTabNavigation = true,
waitForNetworkIdleTimeout = DEFAULT_WAIT_FOR_NETWORK_IDLE_TIMEOUT,
waitForNavigationTimeout = DEFAULT_WAIT_FOR_NAVIGATION_TIMEOUT,
interactionMode,
cache,
} = options ?? {};

Expand Down Expand Up @@ -95,6 +97,7 @@ export const PlaywrightAiFixture = (options?: {
pageAgentMap[idForPage] = new PlaywrightAgent(page, {
testId: `playwright-${testId}-${idForPage}`,
forceSameTabNavigation,
interactionMode,
cache: cacheConfig,
groupName: title,
groupDescription: file,
Expand Down
18 changes: 17 additions & 1 deletion packages/web-integration/src/playwright/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,11 @@ export { PlaywrightAiFixture } from './ai-fixture';
export { overrideAIConfig } from '@midscene/shared/env';
export { WebPage as PlaywrightWebPage } from './page';
export type { WebPageAgentOpt } from '@/web-element';
import type { WebPageAgentOpt } from '@/web-element';
export { InteractionMode } from '@/web-element';
import {
type WebPageAgentOpt,
resolveWebPageInteractionOptions,
} from '@/web-element';
import { getDebug } from '@midscene/shared/logger';
import semver from 'semver';
import {
Expand Down Expand Up @@ -46,6 +50,18 @@ export class PlaywrightAgent extends PageAgent<PlaywrightWebPage> {
'[midscene] PlaywrightAgent requires a valid Playwright page instance. Please make sure to pass a valid page object.',
);
}

const { interactionMode } = resolveWebPageInteractionOptions(opts);

if (interactionMode === 'touch') {
const browserName = page.context().browser()?.browserType().name();
if (browserName && browserName !== 'chromium') {
throw new Error(
`[midscene] touch interaction requires a Chromium-based Playwright browser, but current browser is "${browserName}". Gesture scrolling is not supported in Firefox/WebKit.`,
);
}
}

const webPage = new PlaywrightWebPage(page, opts);
super(webPage, opts);

Expand Down
Loading
Loading