Skip to content

Commit 359cd6e

Browse files
committed
Add validation, parallel/dry-run, retry, quota, optional cloud backends, HTTP server
- Core: validate_action, execute_action_parallel, dry_run, retry_on_transient, Quota - Local: safe_join/is_within path traversal guard - Servers: optional shared-secret auth (TCP AUTH prefix + HTTP Bearer); add HTTPActionServer - Optional backends behind extras: s3, azure_blob, dropbox_api, sftp (lazy-imported) - CLI: subcommands (zip, unzip, download, create-file, server, http-server, drive-upload) - CI: ruff + mypy lint job, pytest-cov coverage upload, auto twine+release on main - Add pre-commit config; bump dev 0.0.31->0.0.32, stable 0.0.29->0.0.30 - Docs: architecture, usage, API refs rewritten to cover new modules
1 parent c198fc8 commit 359cd6e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+2597
-152
lines changed

.github/workflows/ci-dev.yml

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,27 @@ permissions:
1212
contents: read
1313

1414
jobs:
15+
lint:
16+
runs-on: ubuntu-latest
17+
steps:
18+
- uses: actions/checkout@v4
19+
- uses: actions/setup-python@v5
20+
with:
21+
python-version: "3.12"
22+
cache: pip
23+
- name: Install tooling
24+
run: |
25+
python -m pip install --upgrade pip
26+
pip install ruff mypy
27+
- name: Ruff check
28+
run: ruff check automation_file tests
29+
- name: Ruff format check
30+
run: ruff format --check automation_file tests
31+
- name: Mypy
32+
run: mypy automation_file
33+
1534
pytest:
35+
needs: lint
1636
runs-on: windows-latest
1737
strategy:
1838
fail-fast: false
@@ -29,6 +49,12 @@ jobs:
2949
run: |
3050
python -m pip install --upgrade pip wheel
3151
pip install -r dev_requirements.txt
32-
pip install pytest
33-
- name: Run pytest
34-
run: python -m pytest tests/ -v --tb=short
52+
pip install pytest pytest-cov
53+
- name: Run pytest with coverage
54+
run: python -m pytest tests/ -v --tb=short --cov=automation_file --cov-report=term-missing --cov-report=xml
55+
- name: Upload coverage artifact
56+
if: matrix.python-version == '3.12'
57+
uses: actions/upload-artifact@v4
58+
with:
59+
name: coverage-xml
60+
path: coverage.xml

.github/workflows/ci-stable.yml

Lines changed: 70 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,27 @@ permissions:
1212
contents: read
1313

1414
jobs:
15+
lint:
16+
runs-on: ubuntu-latest
17+
steps:
18+
- uses: actions/checkout@v4
19+
- uses: actions/setup-python@v5
20+
with:
21+
python-version: "3.12"
22+
cache: pip
23+
- name: Install tooling
24+
run: |
25+
python -m pip install --upgrade pip
26+
pip install ruff mypy
27+
- name: Ruff check
28+
run: ruff check automation_file tests
29+
- name: Ruff format check
30+
run: ruff format --check automation_file tests
31+
- name: Mypy
32+
run: mypy automation_file
33+
1534
pytest:
35+
needs: lint
1636
runs-on: windows-latest
1737
strategy:
1838
fail-fast: false
@@ -29,6 +49,53 @@ jobs:
2949
run: |
3050
python -m pip install --upgrade pip wheel
3151
pip install -r requirements.txt
32-
pip install pytest
33-
- name: Run pytest
34-
run: python -m pytest tests/ -v --tb=short
52+
pip install pytest pytest-cov
53+
- name: Run pytest with coverage
54+
run: python -m pytest tests/ -v --tb=short --cov=automation_file --cov-report=term-missing --cov-report=xml
55+
- name: Upload coverage artifact
56+
if: matrix.python-version == '3.12'
57+
uses: actions/upload-artifact@v4
58+
with:
59+
name: coverage-xml
60+
path: coverage.xml
61+
62+
publish:
63+
needs: pytest
64+
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
65+
runs-on: ubuntu-latest
66+
permissions:
67+
contents: write
68+
steps:
69+
- uses: actions/checkout@v4
70+
- name: Set up Python
71+
uses: actions/setup-python@v5
72+
with:
73+
python-version: "3.12"
74+
cache: pip
75+
- name: Install build tools
76+
run: |
77+
python -m pip install --upgrade pip
78+
pip install build twine
79+
- name: Use stable.toml as pyproject.toml
80+
run: cp stable.toml pyproject.toml
81+
- name: Extract version
82+
id: version
83+
run: |
84+
VERSION=$(python -c "import tomllib; print(tomllib.load(open('pyproject.toml','rb'))['project']['version'])")
85+
echo "version=${VERSION}" >> "$GITHUB_OUTPUT"
86+
- name: Build sdist and wheel
87+
run: python -m build
88+
- name: Twine check
89+
run: twine check dist/*
90+
- name: Twine upload to PyPI
91+
env:
92+
TWINE_USERNAME: __token__
93+
TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
94+
run: twine upload --non-interactive dist/*
95+
- name: Create GitHub Release
96+
env:
97+
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
98+
run: |
99+
gh release create "v${{ steps.version.outputs.version }}" dist/* \
100+
--title "v${{ steps.version.outputs.version }}" \
101+
--generate-notes

.pre-commit-config.yaml

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
repos:
2+
- repo: https://github.com/pre-commit/pre-commit-hooks
3+
rev: v5.0.0
4+
hooks:
5+
- id: trailing-whitespace
6+
- id: end-of-file-fixer
7+
- id: check-yaml
8+
- id: check-toml
9+
- id: check-added-large-files
10+
args: ["--maxkb=500"]
11+
12+
- repo: https://github.com/astral-sh/ruff-pre-commit
13+
rev: v0.6.9
14+
hooks:
15+
- id: ruff
16+
args: ["--fix"]
17+
- id: ruff-format
18+
19+
- repo: https://github.com/pre-commit/mirrors-mypy
20+
rev: v1.11.2
21+
hooks:
22+
- id: mypy
23+
additional_dependencies: []
24+
args: ["--config-file=mypy.ini"]
25+
files: ^automation_file/

CLAUDE.md

Lines changed: 64 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# FileAutomation
22

3-
Automation-first Python library for local file / directory / zip operations, HTTP downloads, and Google Drive integration. Actions are defined as JSON and dispatched through a central registry so they can be executed in-process, from disk, or over a TCP socket.
3+
Automation-first Python library for local file / directory / zip operations, HTTP downloads, and remote storage (Google Drive, S3, Azure Blob, Dropbox, SFTP). Actions are defined as JSON and dispatched through a central registry so they can be executed in-process, from disk, over a TCP socket, or over HTTP.
44

55
## Architecture
66

@@ -9,32 +9,48 @@ Automation-first Python library for local file / directory / zip operations, HTT
99
```
1010
automation_file/
1111
├── __init__.py # Public API facade (every name users import)
12-
├── __main__.py # CLI entry (argparse dispatcher)
12+
├── __main__.py # CLI entry (argparse dispatcher, subcommands + legacy flags)
1313
├── exceptions.py # Exception hierarchy (FileAutomationException base)
1414
├── logging_config.py # file_automation_logger (file + stderr handlers)
1515
├── core/
1616
│ ├── action_registry.py # ActionRegistry — name -> callable (Registry + Command)
1717
│ ├── action_executor.py # ActionExecutor — runs JSON action lists (Facade + Template Method)
1818
│ ├── callback_executor.py # CallbackExecutor — trigger then callback composition
1919
│ ├── package_loader.py # PackageLoader — dynamically registers package members
20-
│ └── json_store.py # Thread-safe read/write of JSON action files
20+
│ ├── json_store.py # Thread-safe read/write of JSON action files
21+
│ ├── retry.py # retry_on_transient — capped exponential back-off decorator
22+
│ └── quota.py # Quota — size + time budget guards
2123
├── local/ # Strategy modules — each file is a batch of pure operations
2224
│ ├── file_ops.py
2325
│ ├── dir_ops.py
24-
│ └── zip_ops.py
26+
│ ├── zip_ops.py
27+
│ └── safe_paths.py # safe_join / is_within — path traversal guard
2528
├── remote/
2629
│ ├── url_validator.py # SSRF guard for outbound URLs
27-
│ ├── http_download.py # SSRF-validated HTTP download with size/timeout caps
28-
│ └── google_drive/
29-
│ ├── client.py # GoogleDriveClient (Singleton Facade)
30-
│ ├── delete_ops.py
31-
│ ├── download_ops.py
32-
│ ├── folder_ops.py
33-
│ ├── search_ops.py
34-
│ ├── share_ops.py
35-
│ └── upload_ops.py
30+
│ ├── http_download.py # SSRF-validated HTTP download with size/timeout caps + retry
31+
│ ├── google_drive/
32+
│ │ ├── client.py # GoogleDriveClient (Singleton Facade)
33+
│ │ ├── delete_ops.py
34+
│ │ ├── download_ops.py
35+
│ │ ├── folder_ops.py
36+
│ │ ├── search_ops.py
37+
│ │ ├── share_ops.py
38+
│ │ └── upload_ops.py
39+
│ ├── s3/ # Optional — pip install automation_file[s3]
40+
│ │ ├── client.py # S3Client (lazy boto3 import)
41+
│ │ ├── upload_ops.py
42+
│ │ ├── download_ops.py
43+
│ │ ├── delete_ops.py
44+
│ │ └── list_ops.py
45+
│ ├── azure_blob/ # Optional — pip install automation_file[azure]
46+
│ │ └── {client,upload,download,delete,list}_ops.py
47+
│ ├── dropbox_api/ # Optional — pip install automation_file[dropbox]
48+
│ │ └── {client,upload,download,delete,list}_ops.py
49+
│ └── sftp/ # Optional — pip install automation_file[sftp]
50+
│ └── {client,upload,download,delete,list}_ops.py
3651
├── server/
37-
│ └── tcp_server.py # Loopback-only TCP server executing JSON actions
52+
│ ├── tcp_server.py # Loopback-only TCP server executing JSON actions (optional shared-secret auth)
53+
│ └── http_server.py # Loopback-only HTTP server (POST /actions, optional Bearer auth)
3854
├── project/
3955
│ ├── project_builder.py # ProjectBuilder (Builder pattern)
4056
│ └── templates.py # Scaffolding templates
@@ -53,32 +69,43 @@ automation_file/
5369
## Key types
5470

5571
- `ActionRegistry` — mutable name → callable mapping. `register`, `register_many`, `resolve`, `unregister`, `event_dict` (live view for legacy callers).
56-
- `ActionExecutor` — holds a registry and runs JSON action lists. `execute_action(list|dict)`, `execute_files(paths)`, `add_command_to_executor(mapping)`.
72+
- `ActionExecutor` — holds a registry and runs JSON action lists. `execute_action(list|dict, validate_first=False, dry_run=False)`, `execute_action_parallel(list, max_workers=None)`, `validate(list) -> list[str]`, `execute_files(paths)`, `add_command_to_executor(mapping)`.
5773
- `CallbackExecutor` — runs a registered trigger, then a user callback, sharing the executor's registry.
5874
- `PackageLoader` — imports a package by name and registers its top-level functions / classes / builtins as `<package>_<member>`.
5975
- `GoogleDriveClient` — wraps OAuth2 credential loading; exposes `service` lazily. `later_init(token_path, credentials_path)` bootstraps; `require_service()` raises if not initialised.
60-
- `TCPActionServer` — threaded TCP server that deserialises a JSON action list per connection. Defaults to loopback.
76+
- `S3Client` / `AzureBlobClient` / `DropboxClient` / `SFTPClient` — lazy-import singleton wrappers around the optional SDKs. Each exposes `later_init(...)` plus `close()` where relevant. Operations are registered via `register_<backend>_ops(registry)`.
77+
- `TCPActionServer` — threaded TCP server that deserialises a JSON action list per connection. Defaults to loopback; optional `shared_secret` enforces `AUTH <secret>\n` prefix.
78+
- `HTTPActionServer``ThreadingHTTPServer` exposing `POST /actions`. Defaults to loopback; optional `shared_secret` enforces `Authorization: Bearer <secret>`.
79+
- `Quota` — frozen dataclass capping bytes and wall-clock seconds per action or block (`check_size`, `time_budget` context manager, `wraps` decorator). `0` disables each cap.
80+
- `retry_on_transient(max_attempts, backoff_base, backoff_cap, retriable)` — decorator that retries with capped exponential back-off and raises `RetryExhaustedException` chained to the last error.
81+
- `safe_join(root, user_path)` / `is_within(root, path)` — path traversal guard; `safe_join` raises `PathTraversalException` when the resolved path escapes `root`.
6182

6283
## Branching & CI
6384

6485
- `main` branch: stable releases, publishes `automation_file` to PyPI (version in `stable.toml`).
6586
- `dev` branch: development, publishes `automation_file_dev` to PyPI (version in `dev.toml`).
66-
- Keep both TOMLs in sync when bumping.
87+
- Keep both TOMLs in sync when bumping. `[project.optional-dependencies]` (s3/azure/dropbox/sftp/dev) must also stay in sync.
6788
- CI: GitHub Actions (Windows, Python 3.10 / 3.11 / 3.12) — one matrix workflow per branch: `.github/workflows/ci-dev.yml`, `.github/workflows/ci-stable.yml`.
68-
- CI steps: install deps → `pytest tests/ -v`.
89+
- CI steps: `lint` (ruff check + ruff format --check + mypy) → `pytest` with coverage → uploads `coverage.xml` as an artifact.
90+
- Stable branch additionally runs a `publish` job on push to `main`: builds the sdist + wheel, `twine check`, `twine upload` using `PYPI_API_TOKEN`, then `gh release create v<version> --generate-notes`.
91+
- `pre-commit` is configured (`.pre-commit-config.yaml`): trailing-whitespace, eof-fixer, check-yaml, check-toml, check-added-large-files, ruff, ruff-format, mypy. Install with `pre-commit install` after cloning.
6992

7093
## Development
7194

7295
```bash
73-
python -m pip install -r dev_requirements.txt pytest
96+
python -m pip install -r dev_requirements.txt pytest pytest-cov
97+
python -m pip install -e ".[dev]" # ruff, mypy, pre-commit
7498
python -m pytest tests/ -v --tb=short
99+
ruff check automation_file/ tests/
100+
ruff format --check automation_file/ tests/
101+
mypy automation_file/
75102
python -m automation_file --help
76103
```
77104

78105
**Testing:**
79106
- Unit tests live under `tests/` (pytest). Fixtures in `tests/conftest.py` (`sample_file`, `sample_dir`).
80-
- Tests cover every module in `core/`, `local/`, `remote/url_validator`, `project/`, `server/`, `utils/`, plus a facade smoke test.
81-
- Google Drive / HTTP-download code paths that require real credentials or network access are **not** exercised in CI — only their URL-validation / input-validation guards are.
107+
- Tests cover every module in `core/`, `local/`, `remote/url_validator`, `project/`, `server/`, `utils/`, plus a facade smoke test, retry/quota/safe_paths, HTTP+TCP auth, and optional-backend registration.
108+
- Google Drive / HTTP-download / S3 / Azure / Dropbox / SFTP code paths that require real credentials or network access are **not** exercised in CI — only their URL-validation, auth, and guard-clause behaviour are.
82109
- Run all tests before submitting changes: `python -m pytest tests/ -v`.
83110

84111
## Conventions
@@ -121,6 +148,22 @@ All code must follow secure-by-default principles. Review every change against t
121148
- Do not remove the loopback guard to "make it easier to test remotely". The server dispatches arbitrary registry commands; exposing it to the network is equivalent to exposing a Python REPL.
122149
- The server accepts a single JSON payload per connection (`recv(8192)`). Do not raise that limit without also adding a length-framed protocol.
123150
- `quit_server` triggers an orderly shutdown; do not add an administrative bypass that skips the loopback check.
151+
- Optional `shared_secret=` enforces an `AUTH <secret>\n` prefix; the comparison uses `hmac.compare_digest` (constant time). Never log the secret or the raw payload.
152+
153+
### HTTP server
154+
- `HTTPActionServer` / `start_http_action_server` mirror the TCP server's posture: loopback-only by default, `allow_non_loopback=True` required to bind elsewhere, optional `shared_secret` enforced as `Authorization: Bearer <secret>` using `hmac.compare_digest`.
155+
- Only `POST /actions` is handled. Request body capped at 1 MB — do not raise without also switching to a streaming parser.
156+
- Responses are JSON. Auth failures return `401`; malformed JSON returns `400`; unknown paths return `404`.
157+
158+
### Path traversal
159+
- Any caller resolving a user-supplied path against a trusted root must go through `automation_file.local.safe_paths.safe_join` (raises `PathTraversalException`) or the `is_within` check. Never concatenate + `Path.resolve()` yourself and skip the containment check — symlinks and `..` segments bypass naive string checks.
160+
161+
### SFTP host verification
162+
- `SFTPClient` uses `paramiko.RejectPolicy()` — unknown hosts are rejected, never auto-added. Callers pass `known_hosts=` explicitly or rely on `~/.ssh/known_hosts`. Do not swap in `AutoAddPolicy` for convenience.
163+
164+
### Reliability (retry / quota)
165+
- `retry_on_transient` only retries the exception types passed via `retriable=(…)`. Never widen to bare `Exception` — masks logic bugs as transient failures. Always exhausts to `RetryExhaustedException` chained with `raise ... from err`.
166+
- `Quota(max_bytes=…, max_seconds=…)` — prefer `Quota.wraps(...)` over inline checks when guarding a whole operation. `0` disables each cap.
124167

125168
### Google Drive
126169
- Credentials are stored at the caller-supplied `token_path` with `encoding="utf-8"`. Never log or print the token contents.

0 commit comments

Comments
 (0)