|
37 | 37 | - [Event Triggers](#event-triggers) |
38 | 38 | - [Run History](#run-history) |
39 | 39 | - [Report Generation](#report-generation) |
| 40 | + - [Observability (Prometheus / OpenTelemetry)](#observability-prometheus--opentelemetry) |
40 | 41 | - [Remote Automation (Socket / REST)](#remote-automation-socket--rest) |
41 | 42 | - [Plugin Loader](#plugin-loader) |
42 | 43 | - [Shell Command Execution](#shell-command-execution) |
|
60 | 61 | - **Image Recognition** — locate UI elements on screen using OpenCV template matching with configurable threshold |
61 | 62 | - **Accessibility Element Finder** — query the OS accessibility tree (Windows UIA / macOS AX) to locate buttons, menus, and controls by name/role |
62 | 63 | - **AI Element Locator (VLM)** — describe a UI element in plain language and let a vision-language model (Anthropic / OpenAI) find its screen coordinates |
63 | | -- **OCR** — extract text from screen regions using Tesseract; wait for, click, or locate rendered text; regex search and full-region dump |
| 64 | +- **OCR** — extract text from screen regions through three pluggable backends (Tesseract for ASCII, EasyOCR for CJK without an external binary, PaddleOCR for highest-quality Chinese / Japanese / Korean). Single unified API + canonical language codes; backend chosen by `backend=` kwarg, `AUTOCONTROL_OCR_BACKEND` env var, or auto-detection. Wait for, click, or locate rendered text; regex search and full-region dump |
64 | 65 | - **LLM Action Planner** — translate a plain-language description into a validated `AC_*` action list using Claude |
65 | 66 | - **Runtime Variables & Control Flow** — `${var}` substitution at execution time, plus `AC_set_var` / `AC_inc_var` / `AC_if_var` / `AC_for_each` / `AC_loop` / `AC_retry` for data-driven scripts |
66 | 67 | - **Remote Desktop** — stream this machine's screen and accept remote input over a token-authenticated TCP protocol, *or* connect to another machine and view + control it (host + viewer GUIs included). Optional TLS (HTTPS-grade encryption), WebSocket transport (ws:// + wss:// for browser / firewall-friendly clients), persistent 9-digit Host ID, host→viewer audio streaming, bidirectional clipboard sync (text + image), and chunked file transfer (drag-drop + progress bar; arbitrary destination path; no size cap). Plus folder sync (additive mirror — local deletions never propagate) and a self-hosted coturn TURN config bundle generator (turnserver.conf + systemd unit + docker-compose + README). **AnyDesk-style popout**: when the viewer authenticates, the live remote desktop opens in its own resizable top-level window so the control panel stays uncluttered. The Remote Desktop tabs are wrapped in `QScrollArea` so the panel stays usable on small windows and stretches edge-to-edge on 4K displays. Driveable headlessly via `je_auto_control` and over MCP through the new `ac_remote_*` tools |
|
94 | 95 | - **OpenAPI 3.1 + Swagger UI** — `GET /openapi.json` (auth-gated, generated from the live route table) + `GET /docs` (browser Swagger UI with bearer token bar). Drift test in CI catches new routes added without metadata. |
95 | 96 | - **Configuration Bundle** — single-file JSON export/import of user config (admin hosts, address book, trusted viewers, known hosts, host service, IDs). Atomic write with `<name>.bak.<timestamp>` backups; CLI `python -m je_auto_control.utils.config_bundle export|import`; `POST /config/{export,import}`; GUI buttons on the REST API tab. |
96 | 97 | - **USB Passthrough (experimental, opt-in)** — wire-level protocol over a WebRTC `usb` DataChannel (10 opcodes, CREDIT-based flow control, 16 KiB payload cap). Host-side `UsbPassthroughSession` end-to-end on the Linux libusb backend; Windows `WinUSB` backend with full ctypes wiring (hardware-unverified); macOS `IOKit` skeleton. Viewer-side blocking client (`UsbPassthroughClient` → `ClientHandle.control_transfer / bulk_transfer / interrupt_transfer`). Persistent ACL (`~/.je_auto_control/usb_acl.json`, default deny, mode 0600) with host-side prompt QDialog and tamper-evident audit-log integration. Default off — opt-in via `enable_usb_passthrough(True)` or `JE_AUTOCONTROL_USB_PASSTHROUGH=1`. Phase 2e external security review checklist included; default-on requires sign-off. |
| 98 | +- **Observability (Prometheus + OpenTelemetry)** — stdlib-only `Counter` / `Gauge` / `Histogram` registry with a tiny built-in HTTP exporter on `/metrics`, plus an OpenTelemetry-compatible tracer that upgrades to real OTel spans when the SDK is installed. The executor and agent loop emit `autocontrol_action_calls_total{action,outcome}`, `autocontrol_action_duration_seconds`, and `autocontrol_agent_steps_total{tool,outcome}` automatically — drop the URL into a Prometheus scrape config and you have a Grafana dashboard with zero per-script wiring. |
97 | 99 |
|
98 | 100 | --- |
99 | 101 |
|
@@ -334,6 +336,14 @@ third-party components and their licenses. |
334 | 336 |
|
335 | 337 | ## Quick Start |
336 | 338 |
|
| 339 | +Looking for copy-pasteable end-to-end scripts instead of API snippets? |
| 340 | +The [`examples/`](examples/) directory has 17 self-contained programs |
| 341 | +covering screenshot + click, OCR, the headless scheduler, remote |
| 342 | +desktop, the agent loop, observability, recording / replay, runtime |
| 343 | +variables, window management, hotkeys, image triggers, HTML reports, |
| 344 | +the MCP stdio bridge, the REST API, the secrets vault, and plugin |
| 345 | +loading. |
| 346 | + |
337 | 347 | ### Mouse Control |
338 | 348 |
|
339 | 349 | ```python |
@@ -463,12 +473,26 @@ ac.click_text("Submit") |
463 | 473 | ac.wait_for_text("Loading complete", timeout=15.0) |
464 | 474 | ``` |
465 | 475 |
|
| 476 | +Backend selection — set ``AUTOCONTROL_OCR_BACKEND=tesseract|easyocr|paddleocr`` |
| 477 | +or pass ``backend=`` per call; otherwise auto-detection picks the first |
| 478 | +one that imports: |
| 479 | + |
| 480 | +```python |
| 481 | +ac.find_text_matches("登入", lang="chi_tra", backend="easyocr") |
| 482 | +ac.click_text("Sign in", backend="tesseract") |
| 483 | +``` |
| 484 | + |
466 | 485 | If Tesseract is not on `PATH`, point at it explicitly: |
467 | 486 |
|
468 | 487 | ```python |
469 | 488 | ac.set_tesseract_cmd(r"C:\Program Files\Tesseract-OCR\tesseract.exe") |
470 | 489 | ``` |
471 | 490 |
|
| 491 | +Backend install paths and the canonical lang-code table are in |
| 492 | +[docs/source/Eng/doc/ocr_backends/ocr_backends_doc.rst](docs/source/Eng/doc/ocr_backends/ocr_backends_doc.rst) |
| 493 | +(or the [繁體中文](docs/source/Zh/doc/ocr_backends/ocr_backends_doc.rst) |
| 494 | +version). |
| 495 | + |
472 | 496 | Dump every recognised text record in a region (or full screen), or |
473 | 497 | search by regex when the text varies: |
474 | 498 |
|
@@ -1086,6 +1110,36 @@ xml_string = je_auto_control.generate_xml() |
1086 | 1110 |
|
1087 | 1111 | Reports include: function name, parameters, timestamp, and exception info (if any) for each recorded action. HTML reports display successful actions in cyan and failed actions in red. |
1088 | 1112 |
|
| 1113 | +### Observability (Prometheus / OpenTelemetry) |
| 1114 | + |
| 1115 | +Stdlib-only metric primitives plus an OpenTelemetry-compatible tracer |
| 1116 | +fallback. The executor and agent loop emit call counts and latency |
| 1117 | +histograms automatically — no per-script wiring required. |
| 1118 | + |
| 1119 | +```python |
| 1120 | +import je_auto_control as ac |
| 1121 | + |
| 1122 | +# Expose /metrics on http://127.0.0.1:9090 for Prometheus to scrape. |
| 1123 | +exporter = ac.default_metrics_exporter() |
| 1124 | +exporter.start() |
| 1125 | + |
| 1126 | +# Add your own metric — same shapes as prometheus_client. |
| 1127 | +counter = ac.default_metric_registry().register(ac.MetricCounter( |
| 1128 | + "myapp_widgets_built_total", "widgets built", |
| 1129 | + label_names=("kind",), |
| 1130 | +)) |
| 1131 | +counter.inc(labels={"kind": "blue"}) |
| 1132 | + |
| 1133 | +# Wrap a callable in a span — no-op until opentelemetry-api is installed. |
| 1134 | +@ac.traced("my_pipeline.process_one") |
| 1135 | +def process_one(item): ... |
| 1136 | +``` |
| 1137 | + |
| 1138 | +Built-in metrics are listed in |
| 1139 | +[docs/source/Eng/doc/observability/observability_doc.rst](docs/source/Eng/doc/observability/observability_doc.rst) |
| 1140 | +(or the [繁體中文](docs/source/Zh/doc/observability/observability_doc.rst) |
| 1141 | +version). |
| 1142 | + |
1089 | 1143 | ### Remote Automation (Socket / REST) |
1090 | 1144 |
|
1091 | 1145 | Two servers are available — a raw TCP socket and a stdlib HTTP/REST |
@@ -1348,6 +1402,13 @@ cd AutoControl |
1348 | 1402 | pip install -r dev_requirements.txt |
1349 | 1403 | ``` |
1350 | 1404 |
|
| 1405 | +Reproducible installs use the committed `uv.lock`: |
| 1406 | + |
| 1407 | +```bash |
| 1408 | +uv sync # install pinned versions across the whole dep tree |
| 1409 | +uv lock --upgrade # refresh after editing pyproject.toml |
| 1410 | +``` |
| 1411 | + |
1351 | 1412 | ### Running Tests |
1352 | 1413 |
|
1353 | 1414 | ```bash |
|
0 commit comments