Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions apps/report/src/components/detail-panel/index.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,10 @@ const DetailPanel = (): JSX.Element => {
highlightElements = [activeTask.output.element];
}

if (Array.isArray(activeTask.output?.elements)) {
highlightElements = [...highlightElements, ...activeTask.output.elements];
}

// Extract elements from param
if (activeTask.param) {
// For Planning tasks, extract from output.actions[0].param
Expand Down
3 changes: 2 additions & 1 deletion apps/report/src/components/store/index.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,8 @@ export const useExecutionDump = create<DumpStoreType>((set, get) => {
console.log('will set task', task);
if (
task.type === 'Insight' ||
(task.type === 'Planning' && task.subType === 'Locate')
(task.type === 'Planning' &&
(task.subType === 'Locate' || task.subType === 'LocateAll'))
) {
const dump = getTaskServiceDump(task);
set({
Expand Down
45 changes: 45 additions & 0 deletions apps/site/docs/en/api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -981,6 +981,51 @@ const locateInfo = await agent.aiLocate(
console.log(locateInfo);
```

### `agent.aiLocateAll()`

Locate all elements that match a natural language description in one model call.

- Type

```typescript
function aiLocateAll(
locate: string | Object,
options?: Object,
): Promise<
Array<{
rect: {
left: number;
top: number;
width: number;
height: number;
};
center: [number, number];
dpr?: number; // device pixel ratio
}>
>;
```

- Parameters:

- `locate: string | Object` - A natural language description shared by all target elements, or [prompting with images](#prompting-with-images).
- `options?: Object` - Optional, a configuration object. `uiContext` and image prompt options are supported. Single-element optimizations such as `xpath`, cache lookup, and `deepLocate` are not applied to `aiLocateAll()`; passing these options throws an error.

- Return Value:

- Returns all visible elements that match the description.
- Results are ordered from top-to-bottom, then left-to-right.
- Returns an empty array when no matching element is found.
- Like `aiLocate()`, `rect` may be an approximate box when the underlying model only supports point grounding. Prefer `center` when you only need a stable click target.

- Examples:

```typescript
const buttons = await agent.aiLocateAll('all Add to cart buttons');
for (const button of buttons) {
console.log(button.center);
}
```

### `agent.aiWaitFor()`

Wait until a specified condition, described in natural language, becomes true. Considering the cost of AI calls, the check interval will not exceed the specified `checkIntervalMs`.
Expand Down
45 changes: 45 additions & 0 deletions apps/site/docs/zh/api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -974,6 +974,51 @@ const locateInfo = await agent.aiLocate('页面顶部的登录按钮');
console.log(locateInfo);
```

### `agent.aiLocateAll()`

通过自然语言描述,一次模型调用定位所有匹配的元素。

- 类型

```typescript
function aiLocateAll(
locate: string | Object,
options?: Object,
): Promise<
Array<{
rect: {
left: number;
top: number;
width: number;
height: number;
};
center: [number, number];
dpr?: number; // device pixel ratio
}>
>;
```

- 参数:

- `locate: string | Object` - 所有目标元素共用的自然语言描述,或[使用图片作为提示词](#使用图片作为提示词)。
- `options?: Object` - 可选配置对象。支持 `uiContext` 和图片提示词相关配置。`xpath`、定位缓存、`deepLocate` 等单元素定位优化不会应用到 `aiLocateAll()`;传入这些选项会抛错。

- 返回值:

- 返回所有可见且匹配描述的元素。
- 结果按从上到下、从左到右排序。
- 如果没有找到匹配元素,返回空数组。
- 与 `aiLocate()` 一样,当底层模型只支持按点定位时,`rect` 可能是近似框。如果只需要稳定点击位置,优先使用 `center`。

- 示例:

```typescript
const buttons = await agent.aiLocateAll('所有加入购物车按钮');
for (const button of buttons) {
console.log(button.center);
}
```

### `agent.aiWaitFor()`

等待某个条件达成。考虑到 AI 服务的成本,检查间隔不会超过 `checkIntervalMs` 毫秒。
Expand Down
62 changes: 62 additions & 0 deletions packages/core/src/agent/agent.ts
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ import {
type ExecutionRecorderItem,
type ExecutionTask,
type ExecutionTaskLog,
type LocateAllOption,
type LocateOption,
type LocateResultElement,
type OnTaskStartTip,
Expand Down Expand Up @@ -75,6 +76,7 @@ import {
TaskExecutionError,
TaskExecutor,
locatePlanForLocate,
locatePlanForLocateAll,
withFileChooser,
} from './tasks';
import {
Expand All @@ -100,6 +102,10 @@ const defaultServiceExtractOption: ServiceExtractOption = {
screenshotIncluded: true,
};

type LocateAllResultItem = Pick<LocateResultElement, 'rect' | 'center'> & {
dpr?: number;
};

export type AiActOptions = {
cacheable?: boolean;
fileChooserAccept?: string | string[];
Expand All @@ -116,6 +122,30 @@ type AiActInternalOptions = AiActOptions & {
};
};

const unsupportedLocateAllOptionKeys = [
'deepLocate',
'deepThink',
'xpath',
'cacheable',
'fileChooserAccept',
] as const;

function assertLocateAllOptionsSupported(opt?: LocateAllOption) {
if (!opt || typeof opt !== 'object') {
return;
}

const providedUnsupportedKeys = unsupportedLocateAllOptionKeys.filter((key) =>
Object.prototype.hasOwnProperty.call(opt, key),
);
assert(
providedUnsupportedKeys.length === 0,
`aiLocateAll does not support these single-element locate options: ${providedUnsupportedKeys.join(
', ',
)}. Supported options are uiContext and image prompt options.`,
);
}

export class Agent<
InterfaceType extends AbstractInterface = AbstractInterface,
> {
Expand Down Expand Up @@ -1141,6 +1171,38 @@ export class Agent<
} as Pick<LocateResultElement, 'rect' | 'center'>;
}

async aiLocateAll(
prompt: TUserPrompt,
opt?: LocateAllOption,
): Promise<LocateAllResultItem[]> {
assertLocateAllOptionsSupported(opt);
const locateParam = buildDetailedLocateParam(prompt, opt);
assert(locateParam, 'cannot get locate param for aiLocateAll');
const locateAllParam = { prompt: locateParam.prompt };
const locatePlan = locatePlanForLocateAll(locateAllParam);
const plans = [locatePlan];
const defaultModel = this.resolveModelRuntime('default');
const planningModel = this.resolveModelRuntime('planning');

const { output } = await this.taskExecutor.runPlans(
taskTitleStr('LocateAll', locateParamStr(locateAllParam)),
plans,
planningModel,
defaultModel,
opt?.uiContext ? { uiContext: opt.uiContext } : undefined,
);

const { elements } = output;

return (elements || []).map(
(element: LocateResultElement & { dpr?: number }) => ({
rect: element.rect,
center: element.center,
dpr: element.dpr,
}),
);
}

async aiAssert(
assertion: TUserPrompt,
msg?: string,
Expand Down
120 changes: 120 additions & 0 deletions packages/core/src/agent/task-builder.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@ import type {
ExecutionTaskActionApply,
ExecutionTaskApply,
ExecutionTaskHitBy,
ExecutionTaskPlanningLocateAllApply,
ExecutionTaskPlanningLocateApply,
LocateAllResultWithDump,
LocateResultElement,
LocateResultWithDump,
PlanningAction,
Expand Down Expand Up @@ -98,6 +100,16 @@ export function locatePlanForLocate(param: string | DetailedLocateParam) {
return locatePlan;
}

export function locatePlanForLocateAll(param: string | DetailedLocateParam) {
const locate = normalizeLocateParam(param);
const locatePlan: PlanningAction<PlanningLocateParam> = {
type: 'LocateAll',
param: locate,
thought: '',
};
return locatePlan;
}

interface TaskBuilderDeps {
interfaceInstance: AbstractInterface;
service: Service;
Expand Down Expand Up @@ -175,6 +187,14 @@ export class TaskBuilder {
context,
),
],
[
'LocateAll',
(plan) =>
this.handleLocateAllPlan(
plan as PlanningAction<PlanningLocateParam>,
context,
),
],
['Finished', (plan) => this.handleFinishedPlan(plan, context)],
]);

Expand Down Expand Up @@ -213,6 +233,14 @@ export class TaskBuilder {
context.tasks.push(taskLocate);
}

private async handleLocateAllPlan(
plan: PlanningAction<PlanningLocateParam>,
context: PlanBuildContext,
): Promise<void> {
const taskLocate = this.createLocateAllTask(plan, plan.param, context);
context.tasks.push(taskLocate);
}

private async handleActionPlan(
plan: PlanningAction,
context: PlanBuildContext,
Expand Down Expand Up @@ -708,4 +736,96 @@ export class TaskBuilder {

return taskLocator;
}

private createLocateAllTask(
plan: PlanningAction<PlanningLocateParam>,
detailedLocateParam: DetailedLocateParam | string,
context: PlanBuildContext,
): ExecutionTaskPlanningLocateAllApply {
const { defaultModel, abortSignal } = context;
const locateParam = normalizeLocateParam(detailedLocateParam);

const taskLocator: ExecutionTaskPlanningLocateAllApply = {
type: 'Planning',
subType: 'LocateAll',
param: locateParam,
thought: plan.thought,
executor: async (param, taskContext) => {
const { task } = taskContext;
let { uiContext } = taskContext;

assert(
param?.prompt,
`No prompt to locate all, param=${JSON.stringify(param)}`,
);

if (!uiContext) {
uiContext = await this.service.contextRetrieverFn();
}

assert(uiContext, 'uiContext is required for Service task');

let locateDump: ServiceDump | undefined;
let locateResult: LocateAllResultWithDump | undefined;

const applyDump = (dump?: ServiceDump) => {
if (!dump) {
return;
}
locateDump = dump;
task.log = {
dump,
rawResponse: dump.taskInfo?.rawResponse,
rawChoiceMessage: dump.taskInfo?.rawChoiceMessage,
};
task.usage = withUsageIntent(dump.taskInfo?.usage, 'default');
if (dump.taskInfo?.reasoning_content) {
task.reasoning_content = dump.taskInfo.reasoning_content;
}
};

const timing = taskContext.task.timing;
try {
setTimingFieldOnce(timing, 'callAiStart');
locateResult = await this.service.locateAll(
param,
{
context: uiContext,
},
defaultModel,
abortSignal,
);
applyDump(locateResult.dump);
} catch (error) {
if (error instanceof ServiceError) {
applyDump(error.dump);
}
throw error;
} finally {
setTimingFieldOnce(timing, 'callAiEnd');
}

const invalidElementReason = locateResult.elements
.map((element) => invalidLocateElementReason(element))
.find((reason): reason is string => !!reason);
if (invalidElementReason) {
if (locateDump) {
throw new ServiceError(invalidElementReason, locateDump);
}
throw new Error(invalidElementReason);
}

return {
output: {
elements: locateResult.elements.map((element) => ({
...element,
dpr: uiContext.deprecatedDpr,
})),
},
};
},
};

return taskLocator;
}
}
2 changes: 1 addition & 1 deletion packages/core/src/agent/tasks.ts
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ import { assert } from '@midscene/shared/utils';
import { ExecutionSession } from './execution-session';
import { TaskBuilder } from './task-builder';
import type { TaskCache } from './task-cache';
export { locatePlanForLocate } from './task-builder';
export { locatePlanForLocate, locatePlanForLocateAll } from './task-builder';
import { setTimingFieldOnce } from '@/task-timing';
import { descriptionOfTree } from '@midscene/shared/extractor';
import { type TaskTitleType, taskTitleStr } from './ui-utils';
Expand Down
Loading
Loading