Use this doc when you need to customize Midscene's Android automation or review Android-only constructor options. For shared parameters (reporting, hooks, caching, etc.), see the platform-agnostic API reference (Common).
AndroidDevice uses the following action space; the Midscene Agent can use these actions while planning tasks:
Tap— Tap an element.DoubleClick— Double-tap an element.Input— Enter text withreplace/typeOnly/clearmodes (appendis a deprecated alias fortypeOnly). Supports optionalautoDismissKeyboardparameter.Scroll— Scroll from an element or screen center in any direction, with helpers to reach the top, bottom, left, or right.DragAndDrop— Drag from one element to another.KeyboardPress— Press a specified key.LongPress— Long-press a target element with optional duration.PullGesture— Pull up or down (e.g., to refresh) with optional distance and duration.Pinch— Two-finger pinch gesture. Usescale > 1to zoom in,scale < 1to zoom out.ClearInput— Clear the contents of an input field.Launch— Open a web URL orpackage/.Activitystring.Terminate— Force-stop an app by package name.RunAdbShell— Execute rawadb shellcommands.AndroidBackButton— Trigger the system back action.AndroidHomeButton— Return to the home screen.AndroidRecentAppsButton— Open the multitasking/recent apps view.
Create a connection to an adb-managed device that an AndroidAgent can drive.
import { AndroidDevice, getConnectedDevices } from '@midscene/android';const device = new AndroidDevice(deviceId, {
// device options...
});deviceId: string— Value returned byadb devicesorgetConnectedDevices().autoDismissKeyboard?: boolean— Automatically hide the keyboard after input. Defaulttrue.keyboardDismissStrategy?: 'esc-first' | 'back-first'— Order for dismissing keyboards. Default'esc-first'.androidAdbPath?: string— Custom path to the adb executable.remoteAdbHost?: string/remoteAdbPort?: number— Point to a remote adb server.imeStrategy?: 'always-yadb' | 'yadb-for-non-ascii'— Choose when to invoke yadb for text input. Default'yadb-for-non-ascii'.'yadb-for-non-ascii'(default) — Uses yadb for Unicode characters (including Latin Unicode like ö, é, ñ), Chinese, Japanese, and format specifiers (like %s, %d). Pure ASCII text uses the faster nativeadb input text.'always-yadb'— Always uses yadb for all text input, providing maximum compatibility but slightly slower for pure ASCII text.
displayId?: number— Target a specific virtual display if the device mirrors multiple displays.screenshotResizeScale?: number— Deprecated. This option has been removed and no longer has any effect. UsescreenshotShrinkFactorinAgentOptinstead to control screenshot size sent to the AI model.minScreenshotBufferSize?: number— Screenshot buffer size validation threshold in bytes. Buffers below this value are treated as failed or corrupted captures. Default1024(1KB). Set to0to skip only this size check; Midscene still rejects empty buffers and invalid image formats.alwaysRefreshScreenInfo?: boolean— Re-query rotation and screen size every step. Defaultfalse.scrcpyConfig?: object— Scrcpy high-performance screenshot configuration, disabled by default. See Scrcpy Screenshot Mode below.
By default, Midscene captures screenshots via adb shell screencap, which takes ~500–2000ms per call. Enabling Scrcpy mode streams H.264 video from the device and captures frames in real time, reducing screenshot latency to approximately 100–200ms.
How to enable:
const device = new AndroidDevice(deviceId, {
scrcpyConfig: {
enabled: true,
},
});Optional parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
enabled |
boolean |
false |
Enable Scrcpy screenshots |
maxSize |
number |
0 |
Max video dimension (width or height). 0 = no scaling |
videoBitRate |
number |
2000000 |
H.264 encoding bitrate (bps) |
idleTimeoutMs |
number |
30000 |
Auto-disconnect after idle (ms). Set to 0 to disable |
:::tip Scrcpy mode automatically falls back to ADB screenshots if the connection fails. No extra error handling is needed. :::
- Discover devices with
getConnectedDevices(); theudidmatchesadb devices. - Supports remote adb via
remoteAdbHost/remoteAdbPort; setandroidAdbPathif adb is not on PATH. - Use
screenshotShrinkFactorinAgentOptto cut latency on high-DPI devices.
import { AndroidAgent, AndroidDevice, getConnectedDevices } from '@midscene/android';
const [first] = await getConnectedDevices();
const device = new AndroidDevice(first.udid);
await device.connect();
const agent = new AndroidAgent(device, {
aiActionContext: 'If a permissions dialog appears, accept it.',
});
await agent.launch('https://www.ebay.com');
await agent.aiAct('search "Headphones" and wait for results');
const items = await agent.aiQuery(
'{itemTitle: string, price: number}[], find item in list and corresponding price',
);
console.log(items);await agent.launch('com.android.settings/.Settings');
await agent.back();
await agent.home();Wire Midscene's AI planner to an AndroidDevice for UI automation.
import { AndroidAgent } from '@midscene/android';const agent = new AndroidAgent(device, {
// common agent options...
});customActions?: DeviceAction[]— Extend planning with actions defined viadefineAction.appNameMapping?: Record<string, string>— Map friendly app names to package names. When you pass an app name tolaunch(target), the agent will look up the package name in this mapping. If no mapping is found, it will attempt to launchtargetas-is. User-provided mappings take precedence over default mappings.- All other fields match API constructors:
generateReport,reportFileName,aiActionContext,modelConfig,cacheId,createOpenAIClient,onTaskStartTip, and more.
:::info
- Use one agent per device connection.
- Android-only helpers such as
launch,terminate, andrunAdbShellare also exposed in YAML scripts. See Android platform-specific actions. - For shared interaction methods, see API reference (Common).
:::
Launch a web URL or native Android activity/package.
function launch(target: string): Promise<void>;target: string— Can be a web URL, a string inpackage/.Activityformat (e.g.,com.android.settings/.Settings), an app package name, or an app name. If you pass an app name and it exists inappNameMapping, it will be automatically resolved to the mapped package name; otherwise,targetwill be launched as-is.
Run a raw adb shell command through the connected device. Pass only the shell command itself, without the adb shell prefix.
function runAdbShell(command: string, opt?: { timeout?: number }): Promise<string>;command: string— Command passed verbatim toadb shell. For example, useinput tap 100 200, notadb shell input tap 100 200.opt.timeout?: number— Optional command execution timeout in milliseconds.
const result = await agent.runAdbShell('dumpsys battery', { timeout: 60 * 1000 });
console.log(result);
await agent.runAdbShell('input tap 100 200');Terminate (force-stop) a running Android app.
function terminate(uri: string): Promise<void>;uri: string— Package name, app name inappNameMapping, orpackage/.Activity(only the package part is used).
await agent.terminate('com.android.settings');agent.back(): Promise<void>— Trigger the Android system Back action.agent.home(): Promise<void>— Return to the launcher.agent.recentApps(): Promise<void>— Open the Recents/Overview screen.
Create an AndroidAgent from any connected adb device.
function agentFromAdbDevice(
deviceId?: string,
opts?: PageAgentOpt & AndroidDeviceOpt,
): Promise<AndroidAgent>;deviceId?: string— Connect to a specific device; omitted means “first available”.opts?: PageAgentOpt & AndroidDeviceOpt— Combine agent options with AndroidDevice settings.
Enumerate adb devices Midscene can drive.
function getConnectedDevices(): Promise<Array<{
udid: string;
state: string;
port?: number;
}>>;- Android getting started for setup and scripting steps.