|
37 | 37 | - [Event Triggers](#event-triggers) |
38 | 38 | - [Run History](#run-history) |
39 | 39 | - [Report Generation](#report-generation) |
| 40 | + - [Observability (Prometheus / OpenTelemetry)](#observability-prometheus--opentelemetry) |
40 | 41 | - [Remote Automation (Socket / REST)](#remote-automation-socket--rest) |
41 | 42 | - [Plugin Loader](#plugin-loader) |
42 | 43 | - [Shell Command Execution](#shell-command-execution) |
|
60 | 61 | - **Image Recognition** — locate UI elements on screen using OpenCV template matching with configurable threshold |
61 | 62 | - **Accessibility Element Finder** — query the OS accessibility tree (Windows UIA / macOS AX) to locate buttons, menus, and controls by name/role |
62 | 63 | - **AI Element Locator (VLM)** — describe a UI element in plain language and let a vision-language model (Anthropic / OpenAI) find its screen coordinates |
63 | | -- **OCR** — extract text from screen regions using Tesseract; wait for, click, or locate rendered text; regex search and full-region dump |
| 64 | +- **OCR** — extract text from screen regions through three pluggable backends (Tesseract for ASCII, EasyOCR for CJK without an external binary, PaddleOCR for highest-quality Chinese / Japanese / Korean). Single unified API + canonical language codes; backend chosen by `backend=` kwarg, `AUTOCONTROL_OCR_BACKEND` env var, or auto-detection. Wait for, click, or locate rendered text; regex search and full-region dump |
64 | 65 | - **LLM Action Planner** — translate a plain-language description into a validated `AC_*` action list using Claude |
65 | 66 | - **Runtime Variables & Control Flow** — `${var}` substitution at execution time, plus `AC_set_var` / `AC_inc_var` / `AC_if_var` / `AC_for_each` / `AC_loop` / `AC_retry` for data-driven scripts |
66 | 67 | - **Remote Desktop** — stream this machine's screen and accept remote input over a token-authenticated TCP protocol, *or* connect to another machine and view + control it (host + viewer GUIs included). Optional TLS (HTTPS-grade encryption), WebSocket transport (ws:// + wss:// for browser / firewall-friendly clients), persistent 9-digit Host ID, host→viewer audio streaming, bidirectional clipboard sync (text + image), and chunked file transfer (drag-drop + progress bar; arbitrary destination path; no size cap). Plus folder sync (additive mirror — local deletions never propagate) and a self-hosted coturn TURN config bundle generator (turnserver.conf + systemd unit + docker-compose + README). **AnyDesk-style popout**: when the viewer authenticates, the live remote desktop opens in its own resizable top-level window so the control panel stays uncluttered. The Remote Desktop tabs are wrapped in `QScrollArea` so the panel stays usable on small windows and stretches edge-to-edge on 4K displays. Driveable headlessly via `je_auto_control` and over MCP through the new `ac_remote_*` tools |
|
94 | 95 | - **OpenAPI 3.1 + Swagger UI** — `GET /openapi.json` (auth-gated, generated from the live route table) + `GET /docs` (browser Swagger UI with bearer token bar). Drift test in CI catches new routes added without metadata. |
95 | 96 | - **Configuration Bundle** — single-file JSON export/import of user config (admin hosts, address book, trusted viewers, known hosts, host service, IDs). Atomic write with `<name>.bak.<timestamp>` backups; CLI `python -m je_auto_control.utils.config_bundle export|import`; `POST /config/{export,import}`; GUI buttons on the REST API tab. |
96 | 97 | - **USB Passthrough (experimental, opt-in)** — wire-level protocol over a WebRTC `usb` DataChannel (10 opcodes, CREDIT-based flow control, 16 KiB payload cap). Host-side `UsbPassthroughSession` end-to-end on the Linux libusb backend; Windows `WinUSB` backend with full ctypes wiring (hardware-unverified); macOS `IOKit` skeleton. Viewer-side blocking client (`UsbPassthroughClient` → `ClientHandle.control_transfer / bulk_transfer / interrupt_transfer`). Persistent ACL (`~/.je_auto_control/usb_acl.json`, default deny, mode 0600) with host-side prompt QDialog and tamper-evident audit-log integration. Default off — opt-in via `enable_usb_passthrough(True)` or `JE_AUTOCONTROL_USB_PASSTHROUGH=1`. Phase 2e external security review checklist included; default-on requires sign-off. |
| 98 | +- **Observability (Prometheus + OpenTelemetry)** — stdlib-only `Counter` / `Gauge` / `Histogram` registry with a tiny built-in HTTP exporter on `/metrics`, plus an OpenTelemetry-compatible tracer that upgrades to real OTel spans when the SDK is installed. The executor and agent loop emit `autocontrol_action_calls_total{action,outcome}`, `autocontrol_action_duration_seconds`, and `autocontrol_agent_steps_total{tool,outcome}` automatically — drop the URL into a Prometheus scrape config and you have a Grafana dashboard with zero per-script wiring. |
97 | 99 |
|
98 | 100 | --- |
99 | 101 |
|
@@ -334,6 +336,14 @@ third-party components and their licenses. |
334 | 336 |
|
335 | 337 | ## Quick Start |
336 | 338 |
|
| 339 | +Looking for copy-pasteable end-to-end scripts instead of API snippets? |
| 340 | +The [`examples/`](examples/) directory has 17 self-contained programs |
| 341 | +covering screenshot + click, OCR, the headless scheduler, remote |
| 342 | +desktop, the agent loop, observability, recording / replay, runtime |
| 343 | +variables, window management, hotkeys, image triggers, HTML reports, |
| 344 | +the MCP stdio bridge, the REST API, the secrets vault, and plugin |
| 345 | +loading. |
| 346 | + |
337 | 347 | ### Mouse Control |
338 | 348 |
|
339 | 349 | ```python |
@@ -463,12 +473,26 @@ ac.click_text("Submit") |
463 | 473 | ac.wait_for_text("Loading complete", timeout=15.0) |
464 | 474 | ``` |
465 | 475 |
|
| 476 | +Backend selection — set ``AUTOCONTROL_OCR_BACKEND=tesseract|easyocr|paddleocr`` |
| 477 | +or pass ``backend=`` per call; otherwise auto-detection picks the first |
| 478 | +one that imports: |
| 479 | + |
| 480 | +```python |
| 481 | +ac.find_text_matches("登入", lang="chi_tra", backend="easyocr") |
| 482 | +ac.click_text("Sign in", backend="tesseract") |
| 483 | +``` |
| 484 | + |
466 | 485 | If Tesseract is not on `PATH`, point at it explicitly: |
467 | 486 |
|
468 | 487 | ```python |
469 | 488 | ac.set_tesseract_cmd(r"C:\Program Files\Tesseract-OCR\tesseract.exe") |
470 | 489 | ``` |
471 | 490 |
|
| 491 | +Backend install paths and the canonical lang-code table are in |
| 492 | +[docs/source/Eng/doc/ocr_backends/ocr_backends_doc.rst](docs/source/Eng/doc/ocr_backends/ocr_backends_doc.rst) |
| 493 | +(or the [繁體中文](docs/source/Zh/doc/ocr_backends/ocr_backends_doc.rst) |
| 494 | +version). |
| 495 | + |
472 | 496 | Dump every recognised text record in a region (or full screen), or |
473 | 497 | search by regex when the text varies: |
474 | 498 |
|
@@ -577,24 +601,175 @@ viewer.send_input({"action": "type", "text": "hello"}) |
577 | 601 | viewer.disconnect() |
578 | 602 | ``` |
579 | 603 |
|
580 | | -GUI: **Remote Desktop** tab with two sub-tabs. |
581 | | - |
582 | | -- **Host** — token field with a *Generate* button, security warning |
583 | | - about the bind address, start / stop controls, refreshing port + |
584 | | - viewer-count status, and a 4 fps preview pane below the controls so |
585 | | - the user being remoted sees what viewers see. |
586 | | -- **Viewer** — address / port / token form, *Connect* / *Disconnect*, |
587 | | - and a custom frame-display widget that paints incoming JPEG frames |
588 | | - scaled with `KeepAspectRatio`. Mouse / wheel / key events on the |
589 | | - display are remapped from widget coordinates back to the remote |
590 | | - screen's pixel space using the latest frame's dimensions, then |
591 | | - forwarded as `INPUT` messages. |
| 604 | +GUI: **Remote Desktop** tab opens to the **Quick Connect** screen |
| 605 | +(AnyDesk-style) by default — huge Host ID on one side, a single input |
| 606 | +that accepts `host:port`, `ws://`, `wss://`, or a 9-digit Host ID on |
| 607 | +the other, with *Connect* and *Start hosting* as the two primary |
| 608 | +buttons. Recent connections are remembered across sessions. Advanced |
| 609 | +per-transport sub-tabs (legacy TCP / WS host + viewer, WebRTC host + |
| 610 | +viewer with manual SDP / custom codecs / TLS pinning) stay one click |
| 611 | +away. WebRTC sub-tabs lazy-load so a stock install without the |
| 612 | +`[webrtc]` extra still opens the tab. |
592 | 613 |
|
593 | 614 | > ⚠️ Anyone with the host:port and token gets full mouse / keyboard |
594 | 615 | > control of the host machine. Default bind is `127.0.0.1`; expose |
595 | 616 | > externally only via SSH tunnel or TLS front-end. The token is the |
596 | 617 | > only line of defence — treat it like a password. |
597 | 618 |
|
| 619 | +**Quick Connect headless API.** The transport coordinator that backs |
| 620 | +the GUI input box is also exported, so scripts can dispatch the same |
| 621 | +way: |
| 622 | + |
| 623 | +```python |
| 624 | +from je_auto_control import parse_remote_desktop_target |
| 625 | +parse_remote_desktop_target("192.168.1.10:5555") |
| 626 | +# ConnectTarget(kind='tcp', host='192.168.1.10', port=5555, ...) |
| 627 | +parse_remote_desktop_target("ws://hub:8765/desk") |
| 628 | +# ConnectTarget(kind='ws', host='hub', port=8765, path='/desk') |
| 629 | +parse_remote_desktop_target("123-456-789") |
| 630 | +# ConnectTarget(kind='webrtc_id', host_id='123456789') |
| 631 | +``` |
| 632 | + |
| 633 | +**Connection approval + view-only mode.** Optional callback gates |
| 634 | +every incoming session AnyDesk-style. Returning `"view_only"` admits |
| 635 | +the viewer but drops their `INPUT` messages; returning a falsy value |
| 636 | +(or raising) sends `AUTH_FAIL` "rejected by host": |
| 637 | + |
| 638 | +```python |
| 639 | +from je_auto_control import RemoteDesktopHost, PendingViewer |
| 640 | + |
| 641 | +def gate(p: PendingViewer) -> str: |
| 642 | + if p.address[0].startswith("10."): |
| 643 | + return "view_only" |
| 644 | + return "full" # or True |
| 645 | + |
| 646 | +host = RemoteDesktopHost(token="tok", on_pending_viewer=gate) |
| 647 | +``` |
| 648 | + |
| 649 | +**IP allowlist (CIDR + exact IPs).** Reject peers outside the |
| 650 | +configured ranges *before* TLS / auth runs, so attackers can't probe |
| 651 | +further: |
| 652 | + |
| 653 | +```python |
| 654 | +host = RemoteDesktopHost( |
| 655 | + token="tok", ip_allowlist=["10.0.0.0/8", "192.168.1.100"], |
| 656 | +) |
| 657 | +``` |
| 658 | + |
| 659 | +**One-time share codes** — extra tokens that self-destruct on first |
| 660 | +successful auth, ideal for client-support workflows: |
| 661 | + |
| 662 | +```python |
| 663 | +host = RemoteDesktopHost(token="tok", single_use_tokens=["abc123"]) |
| 664 | +host.add_single_use_token("9k4ndx") # rotate at runtime |
| 665 | +host.revoke_single_use_token("abc123") # cancel before it's used |
| 666 | +``` |
| 667 | + |
| 668 | +**TOTP 2FA (RFC 6238, stdlib only).** Layer a 6-digit OTP on top of |
| 669 | +the token; host accepts ±1 step of clock drift: |
| 670 | + |
| 671 | +```python |
| 672 | +from je_auto_control.utils.remote_desktop.totp import ( |
| 673 | + generate_secret, generate_code, provisioning_uri, |
| 674 | +) |
| 675 | +secret = generate_secret() |
| 676 | +print(provisioning_uri(secret, account="alice")) # otpauth:// URI for QR |
| 677 | + |
| 678 | +host = RemoteDesktopHost(token="tok", totp_secret=secret) |
| 679 | +viewer = RemoteDesktopViewer( |
| 680 | + host=..., token="tok", totp_code=generate_code(secret), |
| 681 | +) |
| 682 | +``` |
| 683 | + |
| 684 | +**Multi-monitor selection.** Capture one specific monitor instead of |
| 685 | +the combined virtual desktop: |
| 686 | + |
| 687 | +```python |
| 688 | +from je_auto_control import list_host_monitors, RemoteDesktopHost |
| 689 | +print(list_host_monitors()) |
| 690 | +# [{'index': 0, 'is_combined': True, ...}, |
| 691 | +# {'index': 1, 'left': 0, 'top': 0, ...}, |
| 692 | +# {'index': 2, 'left': 1920, ...}] |
| 693 | +host = RemoteDesktopHost(token="tok", monitor_index=1) |
| 694 | +``` |
| 695 | + |
| 696 | +**Remote cursor overlay.** Host broadcasts cursor position at 30 Hz |
| 697 | +(deduped on still desktops); the viewer's popup window draws an arrow |
| 698 | +on top of the JPEG stream so you can see exactly where the host's |
| 699 | +pointer is. Disable via `enable_cursor_broadcast=False`. |
| 700 | + |
| 701 | +**Multi-viewer collaborative cursors + chat.** Two new message types |
| 702 | +(`CHAT` and `CURSOR` with `viewer_id`). Use a `MultiViewerHost` to |
| 703 | +relay one viewer's pointer to the others; pair with the chat channel |
| 704 | +for ad-hoc text between operators: |
| 705 | + |
| 706 | +```python |
| 707 | +host = RemoteDesktopHost( |
| 708 | + token="tok", on_chat=lambda sender, text: print(sender, ":", text), |
| 709 | +) |
| 710 | +host.broadcast_chat("session starts in 30s") |
| 711 | +host.broadcast_viewer_cursor("alice", 200, 300) |
| 712 | + |
| 713 | +viewer = RemoteDesktopViewer( |
| 714 | + host=..., on_chat=lambda s, t: ..., |
| 715 | + on_viewer_cursor=lambda vid, x, y: ..., |
| 716 | +) |
| 717 | +viewer.send_chat("ack") |
| 718 | +``` |
| 719 | + |
| 720 | +**Relative mouse mode (FPS / CAD).** New input action that sends |
| 721 | +deltas instead of absolute coordinates: |
| 722 | + |
| 723 | +```python |
| 724 | +viewer.send_input({"action": "mouse_move_relative", "dx": 5, "dy": -3}) |
| 725 | +``` |
| 726 | + |
| 727 | +**Motion-aware capture.** The capture loop now hashes each encoded |
| 728 | +JPEG; identical frames are skipped, so a static desktop produces |
| 729 | +~zero bandwidth. New viewers are seeded with the latest frame on auth |
| 730 | +so they never see a black popup. |
| 731 | + |
| 732 | +**Live stats** (FPS / kbps / totals over a 3-second window): |
| 733 | + |
| 734 | +```python |
| 735 | +viewer.stats() |
| 736 | +# {'fps': 24.3, 'kbps': 4801.2, 'frames': 720.0, 'bytes': 1.8e7, 'uptime': 30.2} |
| 737 | +``` |
| 738 | + |
| 739 | +**JPEG sequence recorder (no PyAV needed).** TCP-path session |
| 740 | +capture: each frame written to disk plus `manifest.json` so it can |
| 741 | +be replayed at original cadence: |
| 742 | + |
| 743 | +```python |
| 744 | +from je_auto_control.utils.remote_desktop.jpeg_recorder import ( |
| 745 | + JpegSequenceRecorder, |
| 746 | +) |
| 747 | +rec = JpegSequenceRecorder("~/recordings/2026-05-23") |
| 748 | +rec.start() |
| 749 | +viewer = RemoteDesktopViewer(host=..., on_frame=rec.record_frame) |
| 750 | +# ... session ... |
| 751 | +rec.stop() # writes manifest.json next to the .jpg files |
| 752 | +``` |
| 753 | + |
| 754 | +**TCP relay (WebRTC fallback).** When P2P fails (strict NAT, mobile |
| 755 | +CGNAT, hotel Wi-Fi), both peers connect outbound to a relay and |
| 756 | +exchange a shared 32-byte session ID; the relay pipes bytes between |
| 757 | +them. Same module ships an `encode_handshake(role, session_id)` |
| 758 | +helper for clients: |
| 759 | + |
| 760 | +```python |
| 761 | +from je_auto_control.utils.remote_desktop.relay import RelayServer |
| 762 | +relay = RelayServer(bind="0.0.0.0", port=9000) # NOSONAR # public relay |
| 763 | +relay.start() |
| 764 | +``` |
| 765 | + |
| 766 | +**Service installer (unattended host).** `python -m |
| 767 | +je_auto_control.utils.remote_desktop.host_service ...` |
| 768 | +exposes `configure` / `init` / `run` plus per-platform installers: |
| 769 | +`install-windows-service` / `uninstall-windows-service` (pywin32), |
| 770 | +`generate-launchd` / `uninstall-launchd`, `generate-systemd` / |
| 771 | +`uninstall-systemd`. |
| 772 | + |
598 | 773 | **Encrypted transports + alternate protocols.** Pass an `ssl_context` |
599 | 774 | to either `RemoteDesktopHost` or `RemoteDesktopViewer` to wrap every |
600 | 775 | connection in TLS. For firewall-friendly access, use the in-tree |
@@ -935,6 +1110,36 @@ xml_string = je_auto_control.generate_xml() |
935 | 1110 |
|
936 | 1111 | Reports include: function name, parameters, timestamp, and exception info (if any) for each recorded action. HTML reports display successful actions in cyan and failed actions in red. |
937 | 1112 |
|
| 1113 | +### Observability (Prometheus / OpenTelemetry) |
| 1114 | + |
| 1115 | +Stdlib-only metric primitives plus an OpenTelemetry-compatible tracer |
| 1116 | +fallback. The executor and agent loop emit call counts and latency |
| 1117 | +histograms automatically — no per-script wiring required. |
| 1118 | + |
| 1119 | +```python |
| 1120 | +import je_auto_control as ac |
| 1121 | + |
| 1122 | +# Expose /metrics on http://127.0.0.1:9090 for Prometheus to scrape. |
| 1123 | +exporter = ac.default_metrics_exporter() |
| 1124 | +exporter.start() |
| 1125 | + |
| 1126 | +# Add your own metric — same shapes as prometheus_client. |
| 1127 | +counter = ac.default_metric_registry().register(ac.MetricCounter( |
| 1128 | + "myapp_widgets_built_total", "widgets built", |
| 1129 | + label_names=("kind",), |
| 1130 | +)) |
| 1131 | +counter.inc(labels={"kind": "blue"}) |
| 1132 | + |
| 1133 | +# Wrap a callable in a span — no-op until opentelemetry-api is installed. |
| 1134 | +@ac.traced("my_pipeline.process_one") |
| 1135 | +def process_one(item): ... |
| 1136 | +``` |
| 1137 | + |
| 1138 | +Built-in metrics are listed in |
| 1139 | +[docs/source/Eng/doc/observability/observability_doc.rst](docs/source/Eng/doc/observability/observability_doc.rst) |
| 1140 | +(or the [繁體中文](docs/source/Zh/doc/observability/observability_doc.rst) |
| 1141 | +version). |
| 1142 | + |
938 | 1143 | ### Remote Automation (Socket / REST) |
939 | 1144 |
|
940 | 1145 | Two servers are available — a raw TCP socket and a stdlib HTTP/REST |
@@ -1197,6 +1402,13 @@ cd AutoControl |
1197 | 1402 | pip install -r dev_requirements.txt |
1198 | 1403 | ``` |
1199 | 1404 |
|
| 1405 | +Reproducible installs use the committed `uv.lock`: |
| 1406 | + |
| 1407 | +```bash |
| 1408 | +uv sync # install pinned versions across the whole dep tree |
| 1409 | +uv lock --upgrade # refresh after editing pyproject.toml |
| 1410 | +``` |
| 1411 | + |
1200 | 1412 | ### Running Tests |
1201 | 1413 |
|
1202 | 1414 | ```bash |
|
0 commit comments