|
| 1 | +You are interacting with an Android emulator running Kolibri, a WebView-based app. Your goal is to inspect the current screen state and, if needed, interact with the UI to accomplish the user's task. |
| 2 | + |
| 3 | +## Step 1: Capture the screen |
| 4 | + |
| 5 | +```bash |
| 6 | +mkdir -p /tmp/claude |
| 7 | +adb exec-out screencap -p > /tmp/claude/screenshot.png |
| 8 | +``` |
| 9 | + |
| 10 | +Then read the screenshot image at `/tmp/claude/screenshot.png` to see what's on screen. |
| 11 | + |
| 12 | +## Step 2: Inspect the UI |
| 13 | + |
| 14 | +Kolibri is a WebView app. There are two separate tools for inspecting the UI, and you need to use the right one: |
| 15 | + |
| 16 | +### For WebView content (Kolibri UI — buttons, text, forms, navigation): |
| 17 | +```bash |
| 18 | +python3 scripts/cdp_helper.py dump |
| 19 | +``` |
| 20 | +This uses Chrome DevTools Protocol to list all visible DOM elements with their `text`, `id`, `classes`, and `role`. This is what Maestro sees when using `androidWebViewHierarchy: devtools`. |
| 21 | + |
| 22 | +You can also click WebView elements directly: |
| 23 | +```bash |
| 24 | +python3 scripts/cdp_helper.py click "CONTINUE" |
| 25 | +python3 scripts/cdp_helper.py click "EXPLORE" |
| 26 | +``` |
| 27 | + |
| 28 | +Or run arbitrary JavaScript: |
| 29 | +```bash |
| 30 | +python3 scripts/cdp_helper.py js "document.title" |
| 31 | +``` |
| 32 | + |
| 33 | +### For native Android UI (system dialogs, permission prompts, toasts): |
| 34 | +```bash |
| 35 | +adb shell uiautomator dump /sdcard/window_dump.xml && adb shell cat /sdcard/window_dump.xml |
| 36 | +``` |
| 37 | +Use this when you see a native Android dialog (e.g. "Allow notifications?", permission requests). These are **not** visible to CDP. Parse the XML to find elements by `text` and `bounds`, then tap using `adb shell input tap <x> <y>` with coordinates derived from bounds center. |
| 38 | + |
| 39 | +**How to tell which tool to use:** If the screenshot shows a system dialog with rounded corners overlaying the app, use uiautomator. For everything else (Kolibri's own UI), use CDP. |
| 40 | + |
| 41 | +## Step 3: Check recent logs (if needed) |
| 42 | + |
| 43 | +```bash |
| 44 | +adb logcat -d -t 50 |
| 45 | +``` |
| 46 | + |
| 47 | +For Kolibri-specific logs: |
| 48 | +```bash |
| 49 | +adb logcat -d -t 50 -s python.stdout:V python.stderr:V KolibriWebView:V KolibriServer:V |
| 50 | +``` |
| 51 | + |
| 52 | +## Step 4: Interact with the UI |
| 53 | + |
| 54 | +### WebView elements (preferred) |
| 55 | +Use the CDP helper to click by text — this avoids coordinate math entirely: |
| 56 | +```bash |
| 57 | +python3 scripts/cdp_helper.py click "Button Text" |
| 58 | +``` |
| 59 | + |
| 60 | +### Native elements (system dialogs only) |
| 61 | +Derive tap coordinates from uiautomator `bounds="[left,top][right,bottom]"`: |
| 62 | +- x = (left + right) / 2 |
| 63 | +- y = (top + bottom) / 2 |
| 64 | + |
| 65 | +```bash |
| 66 | +adb shell input tap <x> <y> |
| 67 | +``` |
| 68 | + |
| 69 | +### Other interactions |
| 70 | +```bash |
| 71 | +adb shell input text "<text>" # Type (encode spaces as %s) |
| 72 | +adb shell input swipe 540 1500 540 500 300 # Scroll down |
| 73 | +adb shell input keyevent 4 # BACK |
| 74 | +adb shell input keyevent 66 # ENTER |
| 75 | +``` |
| 76 | + |
| 77 | +## Step 5: Verify |
| 78 | + |
| 79 | +After every interaction, take another screenshot and read it to confirm the action had the intended effect. Repeat the inspect-act loop until the task is complete. |
| 80 | + |
| 81 | +## Workflow summary |
| 82 | + |
| 83 | +1. Screenshot -> Read image -> Inspect (CDP for WebView, uiautomator for native) -> Understand state |
| 84 | +2. Decide action -> Click via CDP or tap via adb -> Screenshot again -> Verify |
| 85 | +3. Repeat until done |
| 86 | + |
| 87 | +Always read the screenshot image visually — the CDP dump shows text content but not layout, and uiautomator cannot see inside the WebView. |
0 commit comments