You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CLAUDE.md
+64-21Lines changed: 64 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# FileAutomation
2
2
3
-
Automation-first Python library for local file / directory / zip operations, HTTP downloads, and Google Drive integration. Actions are defined as JSON and dispatched through a central registry so they can be executed in-process, from disk, or over a TCP socket.
3
+
Automation-first Python library for local file / directory / zip operations, HTTP downloads, and remote storage (Google Drive, S3, Azure Blob, Dropbox, SFTP). Actions are defined as JSON and dispatched through a central registry so they can be executed in-process, from disk, over a TCP socket, or over HTTP.
4
4
5
5
## Architecture
6
6
@@ -9,32 +9,48 @@ Automation-first Python library for local file / directory / zip operations, HTT
9
9
```
10
10
automation_file/
11
11
├── __init__.py # Public API facade (every name users import)
-`ActionRegistry` — mutable name → callable mapping. `register`, `register_many`, `resolve`, `unregister`, `event_dict` (live view for legacy callers).
56
-
-`ActionExecutor` — holds a registry and runs JSON action lists. `execute_action(list|dict)`, `execute_files(paths)`, `add_command_to_executor(mapping)`.
72
+
-`ActionExecutor` — holds a registry and runs JSON action lists. `execute_action(list|dict, validate_first=False, dry_run=False)`, `execute_action_parallel(list, max_workers=None)`, `validate(list) -> list[str]`, `execute_files(paths)`, `add_command_to_executor(mapping)`.
57
73
-`CallbackExecutor` — runs a registered trigger, then a user callback, sharing the executor's registry.
58
74
-`PackageLoader` — imports a package by name and registers its top-level functions / classes / builtins as `<package>_<member>`.
59
75
-`GoogleDriveClient` — wraps OAuth2 credential loading; exposes `service` lazily. `later_init(token_path, credentials_path)` bootstraps; `require_service()` raises if not initialised.
60
-
-`TCPActionServer` — threaded TCP server that deserialises a JSON action list per connection. Defaults to loopback.
76
+
-`S3Client` / `AzureBlobClient` / `DropboxClient` / `SFTPClient` — lazy-import singleton wrappers around the optional SDKs. Each exposes `later_init(...)` plus `close()` where relevant. Operations are registered via `register_<backend>_ops(registry)`.
77
+
-`TCPActionServer` — threaded TCP server that deserialises a JSON action list per connection. Defaults to loopback; optional `shared_secret` enforces `AUTH <secret>\n` prefix.
-`Quota` — frozen dataclass capping bytes and wall-clock seconds per action or block (`check_size`, `time_budget` context manager, `wraps` decorator). `0` disables each cap.
80
+
-`retry_on_transient(max_attempts, backoff_base, backoff_cap, retriable)` — decorator that retries with capped exponential back-off and raises `RetryExhaustedException` chained to the last error.
81
+
-`safe_join(root, user_path)` / `is_within(root, path)` — path traversal guard; `safe_join` raises `PathTraversalException` when the resolved path escapes `root`.
61
82
62
83
## Branching & CI
63
84
64
85
-`main` branch: stable releases, publishes `automation_file` to PyPI (version in `stable.toml`).
65
86
-`dev` branch: development, publishes `automation_file_dev` to PyPI (version in `dev.toml`).
66
-
- Keep both TOMLs in sync when bumping.
87
+
- Keep both TOMLs in sync when bumping.`[project.optional-dependencies]` (s3/azure/dropbox/sftp/dev) must also stay in sync.
67
88
- CI: GitHub Actions (Windows, Python 3.10 / 3.11 / 3.12) — one matrix workflow per branch: `.github/workflows/ci-dev.yml`, `.github/workflows/ci-stable.yml`.
68
-
- CI steps: install deps → `pytest tests/ -v`.
89
+
- CI steps: `lint` (ruff check + ruff format --check + mypy) → `pytest` with coverage → uploads `coverage.xml` as an artifact.
90
+
- Stable branch additionally runs a `publish` job on push to `main`: builds the sdist + wheel, `twine check`, `twine upload` using `PYPI_API_TOKEN`, then `gh release create v<version> --generate-notes`.
91
+
-`pre-commit` is configured (`.pre-commit-config.yaml`): trailing-whitespace, eof-fixer, check-yaml, check-toml, check-added-large-files, ruff, ruff-format, mypy. Install with `pre-commit install` after cloning.
- Unit tests live under `tests/` (pytest). Fixtures in `tests/conftest.py` (`sample_file`, `sample_dir`).
80
-
- Tests cover every module in `core/`, `local/`, `remote/url_validator`, `project/`, `server/`, `utils/`, plus a facade smoke test.
81
-
- Google Drive / HTTP-download code paths that require real credentials or network access are **not** exercised in CI — only their URL-validation / input-validation guards are.
107
+
- Tests cover every module in `core/`, `local/`, `remote/url_validator`, `project/`, `server/`, `utils/`, plus a facade smoke test, retry/quota/safe_paths, HTTP+TCP auth, and optional-backend registration.
108
+
- Google Drive / HTTP-download / S3 / Azure / Dropbox / SFTP code paths that require real credentials or network access are **not** exercised in CI — only their URL-validation, auth, and guard-clause behaviour are.
82
109
- Run all tests before submitting changes: `python -m pytest tests/ -v`.
83
110
84
111
## Conventions
@@ -121,6 +148,22 @@ All code must follow secure-by-default principles. Review every change against t
121
148
- Do not remove the loopback guard to "make it easier to test remotely". The server dispatches arbitrary registry commands; exposing it to the network is equivalent to exposing a Python REPL.
122
149
- The server accepts a single JSON payload per connection (`recv(8192)`). Do not raise that limit without also adding a length-framed protocol.
123
150
-`quit_server` triggers an orderly shutdown; do not add an administrative bypass that skips the loopback check.
151
+
- Optional `shared_secret=` enforces an `AUTH <secret>\n` prefix; the comparison uses `hmac.compare_digest` (constant time). Never log the secret or the raw payload.
152
+
153
+
### HTTP server
154
+
-`HTTPActionServer` / `start_http_action_server` mirror the TCP server's posture: loopback-only by default, `allow_non_loopback=True` required to bind elsewhere, optional `shared_secret` enforced as `Authorization: Bearer <secret>` using `hmac.compare_digest`.
155
+
- Only `POST /actions` is handled. Request body capped at 1 MB — do not raise without also switching to a streaming parser.
- Any caller resolving a user-supplied path against a trusted root must go through `automation_file.local.safe_paths.safe_join` (raises `PathTraversalException`) or the `is_within` check. Never concatenate + `Path.resolve()` yourself and skip the containment check — symlinks and `..` segments bypass naive string checks.
160
+
161
+
### SFTP host verification
162
+
-`SFTPClient` uses `paramiko.RejectPolicy()` — unknown hosts are rejected, never auto-added. Callers pass `known_hosts=` explicitly or rely on `~/.ssh/known_hosts`. Do not swap in `AutoAddPolicy` for convenience.
163
+
164
+
### Reliability (retry / quota)
165
+
-`retry_on_transient` only retries the exception types passed via `retriable=(…)`. Never widen to bare `Exception` — masks logic bugs as transient failures. Always exhausts to `RetryExhaustedException` chained with `raise ... from err`.
166
+
-`Quota(max_bytes=…, max_seconds=…)` — prefer `Quota.wraps(...)` over inline checks when guarding a whole operation. `0` disables each cap.
124
167
125
168
### Google Drive
126
169
- Credentials are stored at the caller-supplied `token_path` with `encoding="utf-8"`. Never log or print the token contents.
0 commit comments