Skip to content

Commit 4f0ec2e

Browse files
authored
Merge pull request #194 from Integration-Automation/dev
dev → main: Phase 6–10, examples, OCR backends, observability
2 parents 28ee721 + 6d72a07 commit 4f0ec2e

166 files changed

Lines changed: 21005 additions & 224 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Reusable GitHub Actions workflow — drop this into a repo that hosts
2+
# AutoControl action JSON files (``*.action.json`` by default) and get
3+
# PR-level validation for free. The workflow:
4+
# 1. Installs je_auto_control from PyPI (or a configurable ref).
5+
# 2. Globs every action JSON file matching ``files``.
6+
# 3. Runs ``python -m je_auto_control.utils.action_lint`` over each.
7+
# Any ``error``-severity finding fails the workflow.
8+
9+
name: action-json-lint
10+
11+
on:
12+
workflow_call:
13+
inputs:
14+
files:
15+
description: "Glob for action JSON files to lint."
16+
required: false
17+
type: string
18+
default: "**/*.action.json"
19+
autocontrol_ref:
20+
description: "Pip spec for je_auto_control (e.g. == 0.1.0 or git+https://...)."
21+
required: false
22+
type: string
23+
default: "je_auto_control"
24+
25+
jobs:
26+
lint:
27+
runs-on: ubuntu-latest
28+
steps:
29+
- uses: actions/checkout@v4
30+
31+
- uses: actions/setup-python@v5
32+
with:
33+
python-version: "3.12"
34+
35+
- name: Install je_auto_control
36+
env:
37+
AUTOCONTROL_REF: ${{ inputs.autocontrol_ref }}
38+
run: |
39+
python -m pip install --upgrade pip
40+
python -m pip install "$AUTOCONTROL_REF"
41+
42+
- name: Lint action JSON files
43+
shell: bash
44+
env:
45+
FILES_GLOB: ${{ inputs.files }}
46+
run: |
47+
shopt -s globstar nullglob
48+
files=( $FILES_GLOB )
49+
if [ ${#files[@]} -eq 0 ]; then
50+
echo "No files matched $FILES_GLOB — nothing to lint."
51+
exit 0
52+
fi
53+
echo "Linting ${#files[@]} files..."
54+
python -m je_auto_control.utils.action_lint "${files[@]}"

.github/workflows/quality.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ jobs:
8181
# for any sub-package the snapshot doesn't include
8282
# (admin, usb, remote_desktop, vision, …).
8383
pip install -e .
84-
pip install ruff==0.15.13 bandit==1.9.4 pytest==9.0.3 pytest-timeout==2.4.0 pytest-rerunfailures==15.1 PySide6==6.11.1
84+
pip install ruff==0.15.14 bandit==1.9.4 pytest==9.0.3 pytest-timeout==2.4.0 pytest-rerunfailures==15.1 PySide6==6.11.1
8585
8686
- name: Run headless pytest suite
8787
run: pytest test/unit_test/headless/ -v --tb=short --timeout=120

README.md

Lines changed: 225 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737
- [Event Triggers](#event-triggers)
3838
- [Run History](#run-history)
3939
- [Report Generation](#report-generation)
40+
- [Observability (Prometheus / OpenTelemetry)](#observability-prometheus--opentelemetry)
4041
- [Remote Automation (Socket / REST)](#remote-automation-socket--rest)
4142
- [Plugin Loader](#plugin-loader)
4243
- [Shell Command Execution](#shell-command-execution)
@@ -60,7 +61,7 @@
6061
- **Image Recognition** — locate UI elements on screen using OpenCV template matching with configurable threshold
6162
- **Accessibility Element Finder** — query the OS accessibility tree (Windows UIA / macOS AX) to locate buttons, menus, and controls by name/role
6263
- **AI Element Locator (VLM)** — describe a UI element in plain language and let a vision-language model (Anthropic / OpenAI) find its screen coordinates
63-
- **OCR** — extract text from screen regions using Tesseract; wait for, click, or locate rendered text; regex search and full-region dump
64+
- **OCR** — extract text from screen regions through three pluggable backends (Tesseract for ASCII, EasyOCR for CJK without an external binary, PaddleOCR for highest-quality Chinese / Japanese / Korean). Single unified API + canonical language codes; backend chosen by `backend=` kwarg, `AUTOCONTROL_OCR_BACKEND` env var, or auto-detection. Wait for, click, or locate rendered text; regex search and full-region dump
6465
- **LLM Action Planner** — translate a plain-language description into a validated `AC_*` action list using Claude
6566
- **Runtime Variables & Control Flow**`${var}` substitution at execution time, plus `AC_set_var` / `AC_inc_var` / `AC_if_var` / `AC_for_each` / `AC_loop` / `AC_retry` for data-driven scripts
6667
- **Remote Desktop** — stream this machine's screen and accept remote input over a token-authenticated TCP protocol, *or* connect to another machine and view + control it (host + viewer GUIs included). Optional TLS (HTTPS-grade encryption), WebSocket transport (ws:// + wss:// for browser / firewall-friendly clients), persistent 9-digit Host ID, host→viewer audio streaming, bidirectional clipboard sync (text + image), and chunked file transfer (drag-drop + progress bar; arbitrary destination path; no size cap). Plus folder sync (additive mirror — local deletions never propagate) and a self-hosted coturn TURN config bundle generator (turnserver.conf + systemd unit + docker-compose + README). **AnyDesk-style popout**: when the viewer authenticates, the live remote desktop opens in its own resizable top-level window so the control panel stays uncluttered. The Remote Desktop tabs are wrapped in `QScrollArea` so the panel stays usable on small windows and stretches edge-to-edge on 4K displays. Driveable headlessly via `je_auto_control` and over MCP through the new `ac_remote_*` tools
@@ -94,6 +95,7 @@
9495
- **OpenAPI 3.1 + Swagger UI**`GET /openapi.json` (auth-gated, generated from the live route table) + `GET /docs` (browser Swagger UI with bearer token bar). Drift test in CI catches new routes added without metadata.
9596
- **Configuration Bundle** — single-file JSON export/import of user config (admin hosts, address book, trusted viewers, known hosts, host service, IDs). Atomic write with `<name>.bak.<timestamp>` backups; CLI `python -m je_auto_control.utils.config_bundle export|import`; `POST /config/{export,import}`; GUI buttons on the REST API tab.
9697
- **USB Passthrough (experimental, opt-in)** — wire-level protocol over a WebRTC `usb` DataChannel (10 opcodes, CREDIT-based flow control, 16 KiB payload cap). Host-side `UsbPassthroughSession` end-to-end on the Linux libusb backend; Windows `WinUSB` backend with full ctypes wiring (hardware-unverified); macOS `IOKit` skeleton. Viewer-side blocking client (`UsbPassthroughClient``ClientHandle.control_transfer / bulk_transfer / interrupt_transfer`). Persistent ACL (`~/.je_auto_control/usb_acl.json`, default deny, mode 0600) with host-side prompt QDialog and tamper-evident audit-log integration. Default off — opt-in via `enable_usb_passthrough(True)` or `JE_AUTOCONTROL_USB_PASSTHROUGH=1`. Phase 2e external security review checklist included; default-on requires sign-off.
98+
- **Observability (Prometheus + OpenTelemetry)** — stdlib-only `Counter` / `Gauge` / `Histogram` registry with a tiny built-in HTTP exporter on `/metrics`, plus an OpenTelemetry-compatible tracer that upgrades to real OTel spans when the SDK is installed. The executor and agent loop emit `autocontrol_action_calls_total{action,outcome}`, `autocontrol_action_duration_seconds`, and `autocontrol_agent_steps_total{tool,outcome}` automatically — drop the URL into a Prometheus scrape config and you have a Grafana dashboard with zero per-script wiring.
9799

98100
---
99101

@@ -334,6 +336,14 @@ third-party components and their licenses.
334336

335337
## Quick Start
336338

339+
Looking for copy-pasteable end-to-end scripts instead of API snippets?
340+
The [`examples/`](examples/) directory has 17 self-contained programs
341+
covering screenshot + click, OCR, the headless scheduler, remote
342+
desktop, the agent loop, observability, recording / replay, runtime
343+
variables, window management, hotkeys, image triggers, HTML reports,
344+
the MCP stdio bridge, the REST API, the secrets vault, and plugin
345+
loading.
346+
337347
### Mouse Control
338348

339349
```python
@@ -463,12 +473,26 @@ ac.click_text("Submit")
463473
ac.wait_for_text("Loading complete", timeout=15.0)
464474
```
465475

476+
Backend selection — set ``AUTOCONTROL_OCR_BACKEND=tesseract|easyocr|paddleocr``
477+
or pass ``backend=`` per call; otherwise auto-detection picks the first
478+
one that imports:
479+
480+
```python
481+
ac.find_text_matches("登入", lang="chi_tra", backend="easyocr")
482+
ac.click_text("Sign in", backend="tesseract")
483+
```
484+
466485
If Tesseract is not on `PATH`, point at it explicitly:
467486

468487
```python
469488
ac.set_tesseract_cmd(r"C:\Program Files\Tesseract-OCR\tesseract.exe")
470489
```
471490

491+
Backend install paths and the canonical lang-code table are in
492+
[docs/source/Eng/doc/ocr_backends/ocr_backends_doc.rst](docs/source/Eng/doc/ocr_backends/ocr_backends_doc.rst)
493+
(or the [繁體中文](docs/source/Zh/doc/ocr_backends/ocr_backends_doc.rst)
494+
version).
495+
472496
Dump every recognised text record in a region (or full screen), or
473497
search by regex when the text varies:
474498

@@ -577,24 +601,175 @@ viewer.send_input({"action": "type", "text": "hello"})
577601
viewer.disconnect()
578602
```
579603

580-
GUI: **Remote Desktop** tab with two sub-tabs.
581-
582-
- **Host** — token field with a *Generate* button, security warning
583-
about the bind address, start / stop controls, refreshing port +
584-
viewer-count status, and a 4 fps preview pane below the controls so
585-
the user being remoted sees what viewers see.
586-
- **Viewer** — address / port / token form, *Connect* / *Disconnect*,
587-
and a custom frame-display widget that paints incoming JPEG frames
588-
scaled with `KeepAspectRatio`. Mouse / wheel / key events on the
589-
display are remapped from widget coordinates back to the remote
590-
screen's pixel space using the latest frame's dimensions, then
591-
forwarded as `INPUT` messages.
604+
GUI: **Remote Desktop** tab opens to the **Quick Connect** screen
605+
(AnyDesk-style) by default — huge Host ID on one side, a single input
606+
that accepts `host:port`, `ws://`, `wss://`, or a 9-digit Host ID on
607+
the other, with *Connect* and *Start hosting* as the two primary
608+
buttons. Recent connections are remembered across sessions. Advanced
609+
per-transport sub-tabs (legacy TCP / WS host + viewer, WebRTC host +
610+
viewer with manual SDP / custom codecs / TLS pinning) stay one click
611+
away. WebRTC sub-tabs lazy-load so a stock install without the
612+
`[webrtc]` extra still opens the tab.
592613

593614
> ⚠️ Anyone with the host:port and token gets full mouse / keyboard
594615
> control of the host machine. Default bind is `127.0.0.1`; expose
595616
> externally only via SSH tunnel or TLS front-end. The token is the
596617
> only line of defence — treat it like a password.
597618
619+
**Quick Connect headless API.** The transport coordinator that backs
620+
the GUI input box is also exported, so scripts can dispatch the same
621+
way:
622+
623+
```python
624+
from je_auto_control import parse_remote_desktop_target
625+
parse_remote_desktop_target("192.168.1.10:5555")
626+
# ConnectTarget(kind='tcp', host='192.168.1.10', port=5555, ...)
627+
parse_remote_desktop_target("ws://hub:8765/desk")
628+
# ConnectTarget(kind='ws', host='hub', port=8765, path='/desk')
629+
parse_remote_desktop_target("123-456-789")
630+
# ConnectTarget(kind='webrtc_id', host_id='123456789')
631+
```
632+
633+
**Connection approval + view-only mode.** Optional callback gates
634+
every incoming session AnyDesk-style. Returning `"view_only"` admits
635+
the viewer but drops their `INPUT` messages; returning a falsy value
636+
(or raising) sends `AUTH_FAIL` "rejected by host":
637+
638+
```python
639+
from je_auto_control import RemoteDesktopHost, PendingViewer
640+
641+
def gate(p: PendingViewer) -> str:
642+
if p.address[0].startswith("10."):
643+
return "view_only"
644+
return "full" # or True
645+
646+
host = RemoteDesktopHost(token="tok", on_pending_viewer=gate)
647+
```
648+
649+
**IP allowlist (CIDR + exact IPs).** Reject peers outside the
650+
configured ranges *before* TLS / auth runs, so attackers can't probe
651+
further:
652+
653+
```python
654+
host = RemoteDesktopHost(
655+
token="tok", ip_allowlist=["10.0.0.0/8", "192.168.1.100"],
656+
)
657+
```
658+
659+
**One-time share codes** — extra tokens that self-destruct on first
660+
successful auth, ideal for client-support workflows:
661+
662+
```python
663+
host = RemoteDesktopHost(token="tok", single_use_tokens=["abc123"])
664+
host.add_single_use_token("9k4ndx") # rotate at runtime
665+
host.revoke_single_use_token("abc123") # cancel before it's used
666+
```
667+
668+
**TOTP 2FA (RFC 6238, stdlib only).** Layer a 6-digit OTP on top of
669+
the token; host accepts ±1 step of clock drift:
670+
671+
```python
672+
from je_auto_control.utils.remote_desktop.totp import (
673+
generate_secret, generate_code, provisioning_uri,
674+
)
675+
secret = generate_secret()
676+
print(provisioning_uri(secret, account="alice")) # otpauth:// URI for QR
677+
678+
host = RemoteDesktopHost(token="tok", totp_secret=secret)
679+
viewer = RemoteDesktopViewer(
680+
host=..., token="tok", totp_code=generate_code(secret),
681+
)
682+
```
683+
684+
**Multi-monitor selection.** Capture one specific monitor instead of
685+
the combined virtual desktop:
686+
687+
```python
688+
from je_auto_control import list_host_monitors, RemoteDesktopHost
689+
print(list_host_monitors())
690+
# [{'index': 0, 'is_combined': True, ...},
691+
# {'index': 1, 'left': 0, 'top': 0, ...},
692+
# {'index': 2, 'left': 1920, ...}]
693+
host = RemoteDesktopHost(token="tok", monitor_index=1)
694+
```
695+
696+
**Remote cursor overlay.** Host broadcasts cursor position at 30 Hz
697+
(deduped on still desktops); the viewer's popup window draws an arrow
698+
on top of the JPEG stream so you can see exactly where the host's
699+
pointer is. Disable via `enable_cursor_broadcast=False`.
700+
701+
**Multi-viewer collaborative cursors + chat.** Two new message types
702+
(`CHAT` and `CURSOR` with `viewer_id`). Use a `MultiViewerHost` to
703+
relay one viewer's pointer to the others; pair with the chat channel
704+
for ad-hoc text between operators:
705+
706+
```python
707+
host = RemoteDesktopHost(
708+
token="tok", on_chat=lambda sender, text: print(sender, ":", text),
709+
)
710+
host.broadcast_chat("session starts in 30s")
711+
host.broadcast_viewer_cursor("alice", 200, 300)
712+
713+
viewer = RemoteDesktopViewer(
714+
host=..., on_chat=lambda s, t: ...,
715+
on_viewer_cursor=lambda vid, x, y: ...,
716+
)
717+
viewer.send_chat("ack")
718+
```
719+
720+
**Relative mouse mode (FPS / CAD).** New input action that sends
721+
deltas instead of absolute coordinates:
722+
723+
```python
724+
viewer.send_input({"action": "mouse_move_relative", "dx": 5, "dy": -3})
725+
```
726+
727+
**Motion-aware capture.** The capture loop now hashes each encoded
728+
JPEG; identical frames are skipped, so a static desktop produces
729+
~zero bandwidth. New viewers are seeded with the latest frame on auth
730+
so they never see a black popup.
731+
732+
**Live stats** (FPS / kbps / totals over a 3-second window):
733+
734+
```python
735+
viewer.stats()
736+
# {'fps': 24.3, 'kbps': 4801.2, 'frames': 720.0, 'bytes': 1.8e7, 'uptime': 30.2}
737+
```
738+
739+
**JPEG sequence recorder (no PyAV needed).** TCP-path session
740+
capture: each frame written to disk plus `manifest.json` so it can
741+
be replayed at original cadence:
742+
743+
```python
744+
from je_auto_control.utils.remote_desktop.jpeg_recorder import (
745+
JpegSequenceRecorder,
746+
)
747+
rec = JpegSequenceRecorder("~/recordings/2026-05-23")
748+
rec.start()
749+
viewer = RemoteDesktopViewer(host=..., on_frame=rec.record_frame)
750+
# ... session ...
751+
rec.stop() # writes manifest.json next to the .jpg files
752+
```
753+
754+
**TCP relay (WebRTC fallback).** When P2P fails (strict NAT, mobile
755+
CGNAT, hotel Wi-Fi), both peers connect outbound to a relay and
756+
exchange a shared 32-byte session ID; the relay pipes bytes between
757+
them. Same module ships an `encode_handshake(role, session_id)`
758+
helper for clients:
759+
760+
```python
761+
from je_auto_control.utils.remote_desktop.relay import RelayServer
762+
relay = RelayServer(bind="0.0.0.0", port=9000) # NOSONAR # public relay
763+
relay.start()
764+
```
765+
766+
**Service installer (unattended host).** `python -m
767+
je_auto_control.utils.remote_desktop.host_service ...`
768+
exposes `configure` / `init` / `run` plus per-platform installers:
769+
`install-windows-service` / `uninstall-windows-service` (pywin32),
770+
`generate-launchd` / `uninstall-launchd`, `generate-systemd` /
771+
`uninstall-systemd`.
772+
598773
**Encrypted transports + alternate protocols.** Pass an `ssl_context`
599774
to either `RemoteDesktopHost` or `RemoteDesktopViewer` to wrap every
600775
connection in TLS. For firewall-friendly access, use the in-tree
@@ -935,6 +1110,36 @@ xml_string = je_auto_control.generate_xml()
9351110

9361111
Reports include: function name, parameters, timestamp, and exception info (if any) for each recorded action. HTML reports display successful actions in cyan and failed actions in red.
9371112

1113+
### Observability (Prometheus / OpenTelemetry)
1114+
1115+
Stdlib-only metric primitives plus an OpenTelemetry-compatible tracer
1116+
fallback. The executor and agent loop emit call counts and latency
1117+
histograms automatically — no per-script wiring required.
1118+
1119+
```python
1120+
import je_auto_control as ac
1121+
1122+
# Expose /metrics on http://127.0.0.1:9090 for Prometheus to scrape.
1123+
exporter = ac.default_metrics_exporter()
1124+
exporter.start()
1125+
1126+
# Add your own metric — same shapes as prometheus_client.
1127+
counter = ac.default_metric_registry().register(ac.MetricCounter(
1128+
"myapp_widgets_built_total", "widgets built",
1129+
label_names=("kind",),
1130+
))
1131+
counter.inc(labels={"kind": "blue"})
1132+
1133+
# Wrap a callable in a span — no-op until opentelemetry-api is installed.
1134+
@ac.traced("my_pipeline.process_one")
1135+
def process_one(item): ...
1136+
```
1137+
1138+
Built-in metrics are listed in
1139+
[docs/source/Eng/doc/observability/observability_doc.rst](docs/source/Eng/doc/observability/observability_doc.rst)
1140+
(or the [繁體中文](docs/source/Zh/doc/observability/observability_doc.rst)
1141+
version).
1142+
9381143
### Remote Automation (Socket / REST)
9391144

9401145
Two servers are available — a raw TCP socket and a stdlib HTTP/REST
@@ -1197,6 +1402,13 @@ cd AutoControl
11971402
pip install -r dev_requirements.txt
11981403
```
11991404

1405+
Reproducible installs use the committed `uv.lock`:
1406+
1407+
```bash
1408+
uv sync # install pinned versions across the whole dep tree
1409+
uv lock --upgrade # refresh after editing pyproject.toml
1410+
```
1411+
12001412
### Running Tests
12011413

12021414
```bash

0 commit comments

Comments
 (0)