Skip to content

Commit 79d5099

Browse files
committed
README: surface examples, OCR backends, observability, uv.lock
Three README languages updated to reflect what landed this session: - examples/: each README now has a short pointer to the 17-script directory at the top of Quick Start, before the API-snippet sections. - OCR backends: the OCR feature bullet and the OCR Quick Start section mention the three pluggable backends (Tesseract / EasyOCR / PaddleOCR), the AUTOCONTROL_OCR_BACKEND env var, and link to the per-language backend docs. - Observability: new Quick Start section + a feature bullet covering the Prometheus /metrics exporter and the OpenTelemetry-compatible tracer that AutoControl ships with stdlib-only. - uv.lock: Development > Setting Up shows how to use the committed lockfile (uv sync / uv lock --upgrade). No changes to docs/source/ — the new OCR-backends and observability pages were already wired into both language indexes earlier in the session.
1 parent 9720a5c commit 79d5099

3 files changed

Lines changed: 174 additions & 3 deletions

File tree

README.md

Lines changed: 62 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737
- [Event Triggers](#event-triggers)
3838
- [Run History](#run-history)
3939
- [Report Generation](#report-generation)
40+
- [Observability (Prometheus / OpenTelemetry)](#observability-prometheus--opentelemetry)
4041
- [Remote Automation (Socket / REST)](#remote-automation-socket--rest)
4142
- [Plugin Loader](#plugin-loader)
4243
- [Shell Command Execution](#shell-command-execution)
@@ -60,7 +61,7 @@
6061
- **Image Recognition** — locate UI elements on screen using OpenCV template matching with configurable threshold
6162
- **Accessibility Element Finder** — query the OS accessibility tree (Windows UIA / macOS AX) to locate buttons, menus, and controls by name/role
6263
- **AI Element Locator (VLM)** — describe a UI element in plain language and let a vision-language model (Anthropic / OpenAI) find its screen coordinates
63-
- **OCR** — extract text from screen regions using Tesseract; wait for, click, or locate rendered text; regex search and full-region dump
64+
- **OCR** — extract text from screen regions through three pluggable backends (Tesseract for ASCII, EasyOCR for CJK without an external binary, PaddleOCR for highest-quality Chinese / Japanese / Korean). Single unified API + canonical language codes; backend chosen by `backend=` kwarg, `AUTOCONTROL_OCR_BACKEND` env var, or auto-detection. Wait for, click, or locate rendered text; regex search and full-region dump
6465
- **LLM Action Planner** — translate a plain-language description into a validated `AC_*` action list using Claude
6566
- **Runtime Variables & Control Flow**`${var}` substitution at execution time, plus `AC_set_var` / `AC_inc_var` / `AC_if_var` / `AC_for_each` / `AC_loop` / `AC_retry` for data-driven scripts
6667
- **Remote Desktop** — stream this machine's screen and accept remote input over a token-authenticated TCP protocol, *or* connect to another machine and view + control it (host + viewer GUIs included). Optional TLS (HTTPS-grade encryption), WebSocket transport (ws:// + wss:// for browser / firewall-friendly clients), persistent 9-digit Host ID, host→viewer audio streaming, bidirectional clipboard sync (text + image), and chunked file transfer (drag-drop + progress bar; arbitrary destination path; no size cap). Plus folder sync (additive mirror — local deletions never propagate) and a self-hosted coturn TURN config bundle generator (turnserver.conf + systemd unit + docker-compose + README). **AnyDesk-style popout**: when the viewer authenticates, the live remote desktop opens in its own resizable top-level window so the control panel stays uncluttered. The Remote Desktop tabs are wrapped in `QScrollArea` so the panel stays usable on small windows and stretches edge-to-edge on 4K displays. Driveable headlessly via `je_auto_control` and over MCP through the new `ac_remote_*` tools
@@ -94,6 +95,7 @@
9495
- **OpenAPI 3.1 + Swagger UI**`GET /openapi.json` (auth-gated, generated from the live route table) + `GET /docs` (browser Swagger UI with bearer token bar). Drift test in CI catches new routes added without metadata.
9596
- **Configuration Bundle** — single-file JSON export/import of user config (admin hosts, address book, trusted viewers, known hosts, host service, IDs). Atomic write with `<name>.bak.<timestamp>` backups; CLI `python -m je_auto_control.utils.config_bundle export|import`; `POST /config/{export,import}`; GUI buttons on the REST API tab.
9697
- **USB Passthrough (experimental, opt-in)** — wire-level protocol over a WebRTC `usb` DataChannel (10 opcodes, CREDIT-based flow control, 16 KiB payload cap). Host-side `UsbPassthroughSession` end-to-end on the Linux libusb backend; Windows `WinUSB` backend with full ctypes wiring (hardware-unverified); macOS `IOKit` skeleton. Viewer-side blocking client (`UsbPassthroughClient``ClientHandle.control_transfer / bulk_transfer / interrupt_transfer`). Persistent ACL (`~/.je_auto_control/usb_acl.json`, default deny, mode 0600) with host-side prompt QDialog and tamper-evident audit-log integration. Default off — opt-in via `enable_usb_passthrough(True)` or `JE_AUTOCONTROL_USB_PASSTHROUGH=1`. Phase 2e external security review checklist included; default-on requires sign-off.
98+
- **Observability (Prometheus + OpenTelemetry)** — stdlib-only `Counter` / `Gauge` / `Histogram` registry with a tiny built-in HTTP exporter on `/metrics`, plus an OpenTelemetry-compatible tracer that upgrades to real OTel spans when the SDK is installed. The executor and agent loop emit `autocontrol_action_calls_total{action,outcome}`, `autocontrol_action_duration_seconds`, and `autocontrol_agent_steps_total{tool,outcome}` automatically — drop the URL into a Prometheus scrape config and you have a Grafana dashboard with zero per-script wiring.
9799

98100
---
99101

@@ -334,6 +336,14 @@ third-party components and their licenses.
334336

335337
## Quick Start
336338

339+
Looking for copy-pasteable end-to-end scripts instead of API snippets?
340+
The [`examples/`](examples/) directory has 17 self-contained programs
341+
covering screenshot + click, OCR, the headless scheduler, remote
342+
desktop, the agent loop, observability, recording / replay, runtime
343+
variables, window management, hotkeys, image triggers, HTML reports,
344+
the MCP stdio bridge, the REST API, the secrets vault, and plugin
345+
loading.
346+
337347
### Mouse Control
338348

339349
```python
@@ -463,12 +473,26 @@ ac.click_text("Submit")
463473
ac.wait_for_text("Loading complete", timeout=15.0)
464474
```
465475

476+
Backend selection — set ``AUTOCONTROL_OCR_BACKEND=tesseract|easyocr|paddleocr``
477+
or pass ``backend=`` per call; otherwise auto-detection picks the first
478+
one that imports:
479+
480+
```python
481+
ac.find_text_matches("登入", lang="chi_tra", backend="easyocr")
482+
ac.click_text("Sign in", backend="tesseract")
483+
```
484+
466485
If Tesseract is not on `PATH`, point at it explicitly:
467486

468487
```python
469488
ac.set_tesseract_cmd(r"C:\Program Files\Tesseract-OCR\tesseract.exe")
470489
```
471490

491+
Backend install paths and the canonical lang-code table are in
492+
[docs/source/Eng/doc/ocr_backends/ocr_backends_doc.rst](docs/source/Eng/doc/ocr_backends/ocr_backends_doc.rst)
493+
(or the [繁體中文](docs/source/Zh/doc/ocr_backends/ocr_backends_doc.rst)
494+
version).
495+
472496
Dump every recognised text record in a region (or full screen), or
473497
search by regex when the text varies:
474498

@@ -1086,6 +1110,36 @@ xml_string = je_auto_control.generate_xml()
10861110

10871111
Reports include: function name, parameters, timestamp, and exception info (if any) for each recorded action. HTML reports display successful actions in cyan and failed actions in red.
10881112

1113+
### Observability (Prometheus / OpenTelemetry)
1114+
1115+
Stdlib-only metric primitives plus an OpenTelemetry-compatible tracer
1116+
fallback. The executor and agent loop emit call counts and latency
1117+
histograms automatically — no per-script wiring required.
1118+
1119+
```python
1120+
import je_auto_control as ac
1121+
1122+
# Expose /metrics on http://127.0.0.1:9090 for Prometheus to scrape.
1123+
exporter = ac.default_metrics_exporter()
1124+
exporter.start()
1125+
1126+
# Add your own metric — same shapes as prometheus_client.
1127+
counter = ac.default_metric_registry().register(ac.MetricCounter(
1128+
"myapp_widgets_built_total", "widgets built",
1129+
label_names=("kind",),
1130+
))
1131+
counter.inc(labels={"kind": "blue"})
1132+
1133+
# Wrap a callable in a span — no-op until opentelemetry-api is installed.
1134+
@ac.traced("my_pipeline.process_one")
1135+
def process_one(item): ...
1136+
```
1137+
1138+
Built-in metrics are listed in
1139+
[docs/source/Eng/doc/observability/observability_doc.rst](docs/source/Eng/doc/observability/observability_doc.rst)
1140+
(or the [繁體中文](docs/source/Zh/doc/observability/observability_doc.rst)
1141+
version).
1142+
10891143
### Remote Automation (Socket / REST)
10901144

10911145
Two servers are available — a raw TCP socket and a stdlib HTTP/REST
@@ -1348,6 +1402,13 @@ cd AutoControl
13481402
pip install -r dev_requirements.txt
13491403
```
13501404

1405+
Reproducible installs use the committed `uv.lock`:
1406+
1407+
```bash
1408+
uv sync # install pinned versions across the whole dep tree
1409+
uv lock --upgrade # refresh after editing pyproject.toml
1410+
```
1411+
13511412
### Running Tests
13521413

13531414
```bash

README/README_zh-CN.md

Lines changed: 56 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@
3636
- [事件触发器](#事件触发器)
3737
- [执行历史](#执行历史)
3838
- [报告生成](#报告生成)
39+
- [可观测性(Prometheus / OpenTelemetry)](#可观测性prometheus--opentelemetry)
3940
- [远程自动化(Socket / REST)](#远程自动化socket--rest)
4041
- [插件加载器](#插件加载器)
4142
- [Shell 命令执行](#shell-命令执行)
@@ -59,7 +60,7 @@
5960
- **图像识别** — 使用 OpenCV 模板匹配在屏幕上定位 UI 元素,支持可配置的检测阈值
6061
- **Accessibility 元件搜索** — 通过操作系统无障碍树(Windows UIA / macOS AX)按名称/角色定位按钮、菜单、控件
6162
- **AI 元件定位(VLM)** — 用自然语言描述 UI 元素,由视觉语言模型(Anthropic / OpenAI)返回屏幕坐标
62-
- **OCR**使用 Tesseract 从屏幕提取文字,可搜索、点击或等待文字出现;支持 regex 搜索与整块区域 dump
63+
- **OCR**三个可插拔后端(Tesseract 用于 ASCII、EasyOCR 无外部可执行文件且支持 CJK、PaddleOCR 中/日/韩质量最高),统一 API 与标准语言代码;后端由 `backend=` 参数、`AUTOCONTROL_OCR_BACKEND` 环境变量或自动探测决定。可搜索、点击或等待文字出现;支持 regex 搜索与整块区域 dump
6364
- **LLM 动作规划器** — 用 Claude 把自然语言描述翻译成验证过的 `AC_*` 动作清单
6465
- **运行期变量与流程控制** — 执行时 `${var}` 替换,加上 `AC_set_var` / `AC_inc_var` / `AC_if_var` / `AC_for_each` / `AC_loop` / `AC_retry` 让脚本数据驱动
6566
- **远程桌面** — 用 token 认证的 TCP 协议串流本机画面并接收输入,**** 连接到他机观看与控制(host + viewer GUI 内置)。可选 TLS(HTTPS 级加密)、WebSocket 传输(``ws://`` + ``wss://``,穿墙/浏览器友好)、持久化 9 位数 Host ID、host→viewer 音频串流、双向剪贴板同步(文字 + 图片)、分块文件传输(拖放 + 进度条;任意目的路径;无大小上限)。另含文件夹同步(增量镜像 — 本地删除不会传出去)与自建 coturn TURN 配置包生成器(turnserver.conf + systemd unit + docker-compose + README)。**AnyDesk 风格弹出窗口**:viewer 认证成功后远程桌面会开在独立的可调整大小顶层窗口,控制面板保持简洁;Remote Desktop 子分页外层包了 `QScrollArea`,小窗口下可滚动、4K 屏幕下会铺满。同时支持 headless API 与 MCP 工具 (`ac_remote_*`) 直接驱动
@@ -331,6 +332,12 @@ sudo apt-get install cmake libssl-dev
331332

332333
## 快速开始
333334

335+
想要可以直接复制粘贴的完整脚本而不只是 API 片段?
336+
[`examples/`](../examples/) 目录收录 17 个独立示例:截屏+点击、OCR、
337+
调度器、远程桌面、agent loop、可观测性、录制/回放、运行期变量、
338+
窗口管理、热键、图像触发器、HTML 报告、MCP stdio bridge、REST API、
339+
secret vault,以及插件加载。
340+
334341
### 鼠标控制
335342

336343
```python
@@ -457,12 +464,24 @@ ac.click_text("Submit")
457464
ac.wait_for_text("加载完成", timeout=15.0)
458465
```
459466

467+
选择后端 — 设置 ``AUTOCONTROL_OCR_BACKEND=tesseract|easyocr|paddleocr``
468+
或在调用时传入 ``backend=``;都不设置时会自动挑第一个 import 成功的:
469+
470+
```python
471+
ac.find_text_matches("登录", lang="chi_sim", backend="easyocr")
472+
ac.click_text("Sign in", backend="tesseract")
473+
```
474+
460475
若 Tesseract 不在 `PATH` 中,可手动指定路径:
461476

462477
```python
463478
ac.set_tesseract_cmd(r"C:\Program Files\Tesseract-OCR\tesseract.exe")
464479
```
465480

481+
各后端安装路径与标准语言代码表见
482+
[docs/source/Eng/doc/ocr_backends/ocr_backends_doc.rst](../docs/source/Eng/doc/ocr_backends/ocr_backends_doc.rst)
483+
[繁体中文版本](../docs/source/Zh/doc/ocr_backends/ocr_backends_doc.rst)
484+
466485
把区域(或整屏)内所有识别到的文字 dump 出来,或用 regex 搜索变动内容:
467486

468487
```python
@@ -999,6 +1018,35 @@ xml_string = je_auto_control.generate_xml()
9991018

10001019
报告内容包含:每个记录动作的函数名称、参数、时间戳及异常信息(如有)。HTML 报告中成功的动作以青色显示,失败的动作以红色显示。
10011020

1021+
### 可观测性(Prometheus / OpenTelemetry)
1022+
1023+
纯标准库的 metric 原语加上 OpenTelemetry 兼容 tracer,
1024+
executor 与 agent loop 会自动发送调用次数与延迟分布 metric,
1025+
不用手动 instrument。
1026+
1027+
```python
1028+
import je_auto_control as ac
1029+
1030+
# 在 http://127.0.0.1:9090 开放 /metrics,给 Prometheus scrape。
1031+
exporter = ac.default_metrics_exporter()
1032+
exporter.start()
1033+
1034+
# 自定义 metric — 形状与 prometheus_client 相同。
1035+
counter = ac.default_metric_registry().register(ac.MetricCounter(
1036+
"myapp_widgets_built_total", "widgets built",
1037+
label_names=("kind",),
1038+
))
1039+
counter.inc(labels={"kind": "blue"})
1040+
1041+
# 把 callable 包进 span — 未安装 opentelemetry-api 时为 no-op。
1042+
@ac.traced("my_pipeline.process_one")
1043+
def process_one(item): ...
1044+
```
1045+
1046+
内建 metric 清单见
1047+
[docs/source/Eng/doc/observability/observability_doc.rst](../docs/source/Eng/doc/observability/observability_doc.rst)
1048+
[繁体中文版](../docs/source/Zh/doc/observability/observability_doc.rst)
1049+
10021050
### 远程自动化(Socket / REST)
10031051

10041052
提供两种服务器:原始 TCP socket 与纯 stdlib HTTP/REST。默认均绑定
@@ -1237,6 +1285,13 @@ cd AutoControl
12371285
pip install -r dev_requirements.txt
12381286
```
12391287

1288+
可复现的安装走已 commit 的 `uv.lock`
1289+
1290+
```bash
1291+
uv sync # 依锁文件同步整条依赖链
1292+
uv lock --upgrade # 编辑 pyproject.toml 后重新锁
1293+
```
1294+
12401295
### 运行测试
12411296

12421297
```bash

README/README_zh-TW.md

Lines changed: 56 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@
3636
- [事件觸發器](#事件觸發器)
3737
- [執行歷史](#執行歷史)
3838
- [報告產生](#報告產生)
39+
- [可觀測性(Prometheus / OpenTelemetry)](#可觀測性prometheus--opentelemetry)
3940
- [遠端自動化(Socket / REST)](#遠端自動化socket--rest)
4041
- [外掛載入器](#外掛載入器)
4142
- [Shell 命令執行](#shell-命令執行)
@@ -59,7 +60,7 @@
5960
- **圖像辨識** — 使用 OpenCV 模板匹配在螢幕上定位 UI 元素,支援可設定的偵測閾值
6061
- **Accessibility 元件搜尋** — 透過作業系統無障礙樹(Windows UIA / macOS AX)依名稱/角色定位按鈕、選單、控制項
6162
- **AI 元件定位(VLM)** — 用自然語言描述 UI 元素,交由視覺語言模型(Anthropic / OpenAI)取得螢幕座標
62-
- **OCR**使用 Tesseract 從螢幕擷取文字,可搜尋、點擊或等待文字出現;支援 regex 搜尋與整塊區域 dump
63+
- **OCR**三個可插拔後端(Tesseract 用於 ASCII、EasyOCR 不需外部執行檔且支援 CJK、PaddleOCR 中/日/韓品質最佳),統一 API 與標準語言代碼;後端由 `backend=` 參數、`AUTOCONTROL_OCR_BACKEND` 環境變數或自動偵測決定。可搜尋、點擊或等待文字出現;支援 regex 搜尋與整塊區域 dump
6364
- **LLM 動作規劃器** — 用 Claude 把自然語言描述翻譯成驗證過的 `AC_*` 動作清單
6465
- **執行期變數與流程控制** — 執行時 `${var}` 取代,加上 `AC_set_var` / `AC_inc_var` / `AC_if_var` / `AC_for_each` / `AC_loop` / `AC_retry` 讓腳本資料驅動
6566
- **遠端桌面** — 用 token 認證的 TCP 協定串流本機畫面並接收輸入,**** 連線到他機觀看與控制(host + viewer GUI 皆內建)。可選 TLS(HTTPS 級加密)、WebSocket 傳輸(``ws://`` + ``wss://``,穿牆/瀏覽器友善)、持久化 9 位數 Host ID、host→viewer 音訊串流、雙向剪貼簿同步(文字 + 圖片)、分塊檔案傳輸(拖放 + 進度條;任意目的路徑;無大小上限)。另含資料夾同步(增量鏡像 — 本地刪除不會傳出去)與自架 coturn TURN 設定包產生器(turnserver.conf + systemd unit + docker-compose + README)。**AnyDesk 風格彈出視窗**:viewer 認證成功後遠端桌面會開在獨立的可調整大小頂層視窗,控制面板維持簡潔;Remote Desktop 子分頁外層包了 `QScrollArea`,小視窗下可捲動、4K 螢幕下會延展到整寬。同時可由 headless API 與 MCP 工具(`ac_remote_*`)直接驅動
@@ -331,6 +332,12 @@ sudo apt-get install cmake libssl-dev
331332

332333
## 快速開始
333334

335+
想要可以直接複製貼上的完整腳本而不只是 API 片段?
336+
[`examples/`](../examples/) 資料夾收錄 17 支獨立範例:截圖+點擊、OCR、
337+
排程器、遠端桌面、agent loop、可觀測性、錄製/回放、執行期變數、
338+
視窗管理、熱鍵、影像觸發、HTML 報告、MCP stdio bridge、REST API、
339+
secret vault,以及外掛載入。
340+
334341
### 滑鼠控制
335342

336343
```python
@@ -457,12 +464,24 @@ ac.click_text("Submit")
457464
ac.wait_for_text("載入完成", timeout=15.0)
458465
```
459466

467+
選擇後端 — 設定 ``AUTOCONTROL_OCR_BACKEND=tesseract|easyocr|paddleocr``
468+
或在呼叫時傳入 ``backend=``;都不設定時會自動挑第一個 import 成功的:
469+
470+
```python
471+
ac.find_text_matches("登入", lang="chi_tra", backend="easyocr")
472+
ac.click_text("Sign in", backend="tesseract")
473+
```
474+
460475
若 Tesseract 不在 `PATH` 中,可手動指定路徑:
461476

462477
```python
463478
ac.set_tesseract_cmd(r"C:\Program Files\Tesseract-OCR\tesseract.exe")
464479
```
465480

481+
各後端安裝路徑與標準語言代碼表見
482+
[docs/source/Eng/doc/ocr_backends/ocr_backends_doc.rst](../docs/source/Eng/doc/ocr_backends/ocr_backends_doc.rst)
483+
[繁體中文版本](../docs/source/Zh/doc/ocr_backends/ocr_backends_doc.rst)
484+
466485
把區域(或整螢幕)內所有辨識到的文字 dump 出來,或用 regex 搜尋變動內容:
467486

468487
```python
@@ -999,6 +1018,35 @@ xml_string = je_auto_control.generate_xml()
9991018

10001019
報告內容包含:每個紀錄動作的函式名稱、參數、時間戳記及例外資訊(如有)。HTML 報告中成功的動作以青色顯示,失敗的動作以紅色顯示。
10011020

1021+
### 可觀測性(Prometheus / OpenTelemetry)
1022+
1023+
純標準函式庫的 metric 元件加上 OpenTelemetry 相容 tracer,
1024+
executor 與 agent loop 都會自動發送呼叫次數與延遲分布 metric,
1025+
不用手動 instrument。
1026+
1027+
```python
1028+
import je_auto_control as ac
1029+
1030+
# 在 http://127.0.0.1:9090 開放 /metrics,給 Prometheus scrape。
1031+
exporter = ac.default_metrics_exporter()
1032+
exporter.start()
1033+
1034+
# 自訂 metric — 形狀與 prometheus_client 相同。
1035+
counter = ac.default_metric_registry().register(ac.MetricCounter(
1036+
"myapp_widgets_built_total", "widgets built",
1037+
label_names=("kind",),
1038+
))
1039+
counter.inc(labels={"kind": "blue"})
1040+
1041+
# 把 callable 包進 span — 未安裝 opentelemetry-api 時為 no-op。
1042+
@ac.traced("my_pipeline.process_one")
1043+
def process_one(item): ...
1044+
```
1045+
1046+
內建 metric 清單見
1047+
[docs/source/Eng/doc/observability/observability_doc.rst](../docs/source/Eng/doc/observability/observability_doc.rst)
1048+
[繁體中文版本](../docs/source/Zh/doc/observability/observability_doc.rst)
1049+
10021050
### 遠端自動化(Socket / REST)
10031051

10041052
提供兩種伺服器:原始 TCP socket 與純 stdlib HTTP/REST。預設均綁定
@@ -1237,6 +1285,13 @@ cd AutoControl
12371285
pip install -r dev_requirements.txt
12381286
```
12391287

1288+
可重現的安裝走已 commit 的 `uv.lock`
1289+
1290+
```bash
1291+
uv sync # 依鎖檔同步整條相依鏈
1292+
uv lock --upgrade # 編輯 pyproject.toml 後重新鎖
1293+
```
1294+
12401295
### 執行測試
12411296

12421297
```bash

0 commit comments

Comments
 (0)