Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions .agents/skills/use-appclaw-agent-cli/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
name: use-appclaw-agent-cli
description: >
Use the appclaw-agent CLI to directly open, inspect, and interact with a
mobile app via terminal commands — without writing a YAML flow. Trigger this
skill when the user asks to open an app, tap or fill a UI element, check
visibility, or perform any one-off device interaction that does not require a
reusable flow file.
---

# AppClaw Agent CLI

When the user asks you to operate or inspect a mobile device interactively
(open an app, tap a button, check visibility, etc.) using terminal commands
rather than a YAML flow:

1. Verify that `appclaw-agent` is installed and run `appclaw-agent help workflow`.
2. Use a descriptive named session for the task.
3. Inspect with `snapshot -i --json` before choosing a target.
4. Prefer returned `@eN` references or durable selectors for interaction.
5. Request a new snapshot after each state-changing action.
6. Use `--vision` only when explicitly requested or when visual targeting is required and configured.

## Scrolling — direction reference

**Always use `scroll`, never `swipe`.** `scroll` and `swipe` are aliases in the parser, but `scroll` reads unambiguously — `scroll down` means scroll down, `scroll up` means scroll up.

| Goal | Command |
| ----------------------------------- | ------------------------------------------------------- |
| See content **below** (scroll down) | `appclaw-agent --session <name> scroll down --json` |
| See content **above** (scroll up) | `appclaw-agent --session <name> scroll up --json` |
| Scroll down within an element | `appclaw-agent --session <name> scroll @eN down --json` |
| Scroll up within an element | `appclaw-agent --session <name> scroll @eN up --json` |

**Never use `swipe`** — `swipe up` is ambiguous (training data says it scrolls down; AppClaw treats it as scroll up). Using `scroll` eliminates the confusion entirely.

**Never use `swipe @eN direction`** — element-scoped swipe crashes (`swipeElement is not a function`). Use `scroll @eN direction` instead. 7. Close the named session when the task is complete.

## Assertions must always be visual

**Never use DOM presence (`is visible`, snapshot element checks) as the sole assertion.** The DOM may contain elements that are off-screen, scrolled out of view, or clipped — DOM presence does not mean the user can see it.

For every assertion or verification step:

1. Take a screenshot: `appclaw-agent --session <name> screenshot /tmp/<name>.png`
2. Read the screenshot image with the Read tool and visually analyze what is actually rendered on screen.
3. Base your pass/fail verdict **only on what you can see in the screenshot**, not on DOM presence.
4. If the target content is not clearly visible in the screenshot, the assertion **fails** — even if a DOM element exists for it.

This applies to any check phrased as "verify X is present", "confirm X appears", "assert X is visible", or similar.

The installed CLI help is the source of truth for supported commands.
8 changes: 4 additions & 4 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ jobs:

- uses: actions/setup-node@v4
with:
node-version: '20'
node-version: '22'

- name: Install dependencies
run: npm install --no-package-lock
run: npm ci

- name: Format check (Prettier)
run: npm run format:check
Expand All @@ -43,13 +43,13 @@ jobs:

- uses: actions/setup-node@v4
with:
node-version: '20'
node-version: '22'
cache: npm
cache-dependency-path: vscode-extension/package-lock.json

- name: Install dependencies
working-directory: vscode-extension
run: npm ci
run: npm install --no-package-lock

- name: Build
working-directory: vscode-extension
Expand Down
43 changes: 43 additions & 0 deletions .github/workflows/publish-agent.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
name: Release appclaw-agent

on:
push:
branches: [main]
workflow_dispatch:

permissions:
contents: write
issues: write
pull-requests: write
id-token: write

jobs:
release:
name: Semantic Release
runs-on: ubuntu-latest
defaults:
run:
working-directory: packages/appclaw-agent
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
persist-credentials: false

- uses: actions/setup-node@v4
with:
node-version: '22'
registry-url: 'https://registry.npmjs.org'

- name: Install dependencies
run: npm install --no-package-lock

- name: Build
run: npm run build

- name: Semantic Release
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
NPM_TOKEN: ${{ secrets.NPM_TOKEN }}
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
run: npx semantic-release
28 changes: 26 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ Screenshot-first mode using Stark (df-vision + Gemini) for element location. Req
```env
LLM_PROVIDER=gemini
LLM_API_KEY=your-gemini-api-key
LLM_MODEL=gemini-3.1-flash-lite-preview
LLM_MODEL=gemini-3.1-flash-lite
AGENT_MODE=vision
```

Expand Down Expand Up @@ -285,7 +285,7 @@ All configuration is via `.env`:
| **LLM** | | |
| `LLM_PROVIDER` | `gemini` | LLM provider (`anthropic`, `openai`, `gemini`, `groq`, `ollama`) |
| `LLM_API_KEY` | — | API key for your provider (not used for local Ollama; see `OLLAMA_*` for cloud URL / auth) |
| `LLM_MODEL` | (auto) | Model override (e.g. `gemini-3.1-flash-lite-preview`, `claude-sonnet-4-20250514`) |
| `LLM_MODEL` | (auto) | Model override (e.g. `gemini-3.1-flash-lite`, `claude-sonnet-4-20250514`) |
| `OLLAMA_BASE_URL` | (default) | Ollama API base URL (e.g. remote or Docker). Empty = `http://127.0.0.1:11434` (`LLM_PROVIDER=ollama`) |
| `OLLAMA_API_KEY` | — | Optional Bearer token for Ollama Cloud or authenticated endpoints (`LLM_PROVIDER=ollama`) |
| `AGENT_MODE` | `vision` | `dom` (XML locators) or `vision` (screenshot-first) |
Expand Down Expand Up @@ -377,6 +377,30 @@ This installs two skills:

Skills are auto-discovered if you're working inside a clone of this repo.

## Agent-Driven Device CLI

For Claude Code, Gemini CLI, Codex CLI, and other agents that can run terminal
commands, install the separate agent-native CLI:

```sh
npm install -g appclaw-agent
appclaw-agent help workflow
```

`appclaw-agent` maintains named device sessions across commands and returns
compact UI references for deterministic interaction:

```sh
appclaw-agent --session login open com.example.app --platform android
appclaw-agent --session login snapshot -i --json
appclaw-agent --session login press @e1 --json
appclaw-agent --session login close
```

Install the `use-appclaw-agent-cli` skill to teach a supported agent this
workflow. Vision operations are available explicitly through `--vision` when
AppClaw vision is configured.

## License

Licensed under the Apache License, Version 2.0. See `LICENSE` for the full text.
10 changes: 8 additions & 2 deletions landing/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -1852,7 +1852,12 @@ <h1 class="reveal reveal-delay-1">
<div class="providers-label">Where you run it</div>
<p class="run-surface-text">
<a href="https://www.npmjs.com/package/appclaw" target="_blank" rel="noopener">CLI</a>
(<code>npx appclaw</code>) for terminals and CI, or the
(<code>npx appclaw</code>) for terminals and CI,
<a href="https://www.npmjs.com/package/appclaw-agent" target="_blank" rel="noopener"
>Agent CLI</a
>
(<code>npm i -g appclaw-agent</code>) for AI coding agents like Claude Code and Gemini
CLI, or the
<a
href="https://marketplace.visualstudio.com/items?itemName=AppClaw.appclaw"
target="_blank"
Expand Down Expand Up @@ -2328,7 +2333,8 @@ <h2 class="reveal reveal-delay-1">Ready to automate your<br />mobile apps?</h2>
</a>
</div>
<div class="cta-meta reveal">
CLI<span>·</span>Cursor &amp; VS Code<span>·</span>Apache 2.0<span>·</span>BYO LLM Key
CLI<span>·</span>Agent CLI<span>·</span>Cursor &amp; VS Code<span>·</span>Apache
2.0<span>·</span>BYO LLM Key
</div>
</div>
</section>
Expand Down
Loading