diff --git a/.githooks/pre-commit b/.githooks/pre-commit index b549fd8..8f5e69f 100644 --- a/.githooks/pre-commit +++ b/.githooks/pre-commit @@ -13,7 +13,7 @@ FAIL=0 # locale/ catalogs are mechanically extracted from the impressum/privacy/terms templates above — # they cannot avoid carrying the same address/email strings. Treating them as public is consistent # with the source templates being public. -ALLOW_RE='^(app/templates/(impressum|privacy|terms)\.html|COMMERCIAL-LICENSE\.md|docs/gdpr-account-deletion-design\.md|docs/api-usage-guide\.md|docs/self-hosting\.md|docs-internal/.*|\.githooks/.*|\.github/workflows/scope-guard\.yml|CHANGELOG\.md|locale/.*\.(po|pot|mo))$' +ALLOW_RE='^(app/templates/(impressum|privacy|terms)\.html|COMMERCIAL-LICENSE\.md|docs/gdpr-account-deletion-design\.md|docs/api-usage-guide\.md|docs/self-hosting\.md|docs/dpa-template\.md|docs-internal/.*|\.githooks/.*|\.github/workflows/scope-guard\.yml|CHANGELOG\.md|locale/.*\.(po|pot|mo))$' # Personal/operational identifiers that should never land in public code. PATTERNS='lennart\.seidel@icloud\.com|lennart@filemorph\.io|Reetwerder|21029 Hamburg' diff --git a/.githooks/pre-push b/.githooks/pre-push index 9583fe8..40bd925 100644 --- a/.githooks/pre-push +++ b/.githooks/pre-push @@ -15,7 +15,7 @@ set -e ZERO=0000000000000000000000000000000000000000 # Same patterns as pre-commit — keep in sync. -ALLOW_RE='^(app/templates/(impressum|privacy|terms)\.html|COMMERCIAL-LICENSE\.md|docs/gdpr-account-deletion-design\.md|docs/api-usage-guide\.md|docs/self-hosting\.md|docs-internal/.*|\.githooks/.*|\.github/workflows/scope-guard\.yml|CHANGELOG\.md|locale/.*\.(po|pot|mo))$' +ALLOW_RE='^(app/templates/(impressum|privacy|terms)\.html|COMMERCIAL-LICENSE\.md|docs/gdpr-account-deletion-design\.md|docs/api-usage-guide\.md|docs/self-hosting\.md|docs/dpa-template\.md|docs-internal/.*|\.githooks/.*|\.github/workflows/scope-guard\.yml|CHANGELOG\.md|locale/.*\.(po|pot|mo))$' PATTERNS='lennart\.seidel@icloud\.com|lennart@filemorph\.io|Reetwerder|21029 Hamburg' OPS_PATTERNS='/opt/filemorph(/|$|[[:space:]])|/var/log/filemorph|/home/deploy([[:space:]]|/)|Hetzner CX|HETZNER_HOST|HETZNER_SSH_USER|HETZNER_SSH_KEY|OPS_REPO_DISPATCH_PAT|GHCR_PAT|appleboy/ssh-action' SECRET_ASSIGN='(JWT_SECRET|SMTP_PASSWORD|STRIPE_SECRET_KEY|STRIPE_WEBHOOK_SECRET|DATABASE_URL|API_KEY|POSTGRES_PASSWORD|GHCR_PAT|OPS_REPO_DISPATCH_PAT|HETZNER_SSH_KEY)[[:space:]]*=[[:space:]]*[^[:space:]$]' diff --git a/docs/dpa-template.md b/docs/dpa-template.md new file mode 100644 index 0000000..95f0b9a --- /dev/null +++ b/docs/dpa-template.md @@ -0,0 +1,206 @@ +# Data Processing Agreement (DPA) — Template + +**Status:** Skeleton template, finalised individually in pilot conversations. +**Last reviewed:** 2026-05-08 + +This document is the starting point for a Data Processing Agreement (DPA) +under Article 28 GDPR between a FileMorph Compliance-Edition customer +(*controller*) and the FileMorph operator (*processor*). It is published +in the open-source repository so a procurement reviewer can read the +substance before requesting a binding contract. + +The text below is **not a binding contract** as-is. The final DPA is +drafted in the pilot conversation with each customer, with the bracketed +placeholders filled in from the concrete deployment context (instance +location, scope of processing, named contact, etc.). When you are ready +to finalise, contact `legal@filemorph.io`. + +For the public sub-processor list referenced in §6 below, see +[`docs/sub-processors.md`](sub-processors.md). + +--- + +## 1. Parties + +**Controller** (the customer): +- Legal name: `[CUSTOMER LEGAL NAME]` +- Address: `[CUSTOMER ADDRESS]` +- Authorised signatory: `[NAME, ROLE]` + +**Processor** (the FileMorph operator): +- Legal name: Lennart Seidel +- Address: Reetwerder 25b, 21029 Hamburg, Germany +- Contact: `legal@filemorph.io` + +The processor is the operator of the FileMorph Compliance-Edition +deployment named in §3 (the "Service"). For self-hosted deployments +operated entirely on the controller's own infrastructure, the controller +is also the operator and this template does not apply — there is no +processor relationship. + +## 2. Subject matter and duration + +The processor processes personal data on behalf of the controller for +the sole purpose of operating the Service. Processing begins on +`[EFFECTIVE DATE]` and continues for the term of the underlying service +agreement, ending no later than thirty (30) days after termination +(during which residual processing for deletion or export is permitted). + +## 3. Nature and purpose of processing + +The Service performs file conversion, compression, and integrity-attested +output generation for files uploaded by the controller's authorised +users. Processing operations include: + +- Receiving uploaded files via HTTPS +- Running format conversion / compression in transient memory and + ephemeral filesystem locations +- Returning the converted output and a SHA-256 integrity header +- Writing structured logs (no file content; only metadata: tier, format + pair, byte counts, duration, success flag) +- Recording audit events for actions affecting accounts or entitlements + (registration, login, key creation, deletion, billing changes) + +The Service does **not** perform any analytics, profiling, advertising, +or data sale. + +## 4. Categories of data subjects and personal data + +**Data subjects:** +- The controller's employees, agents, or contractors who use the + Service (the "users") +- Any natural persons whose personal data appears in files the users + upload — categories not known to the processor in advance + +**Personal data:** +- User identifiers: email address (registration), bcrypt password hash, + IP address (request logs only, rotated within 30 days), session JWT + identifiers +- File contents during processing — deleted from memory and disk + immediately after the converted output is returned (typical + retention: seconds; absolute upper bound: 10 minutes via startup + sweep, see `app/main.py`) +- Audit-event records (see §5 below) — retained per the controller's + configured retention policy + +## 5. Audit log and integrity attestation + +Every Compliance-Edition deployment writes a tamper-evident audit log +(SHA-256 hash chain, see `app/core/audit.py` and Migration 005). Each +entry contains: + +- Event type, timestamp, actor identifier, actor IP, payload digest +- Hash of the previous event (chain integrity) + +The audit log is the controller's record of processing activities under +Article 30 GDPR. The retention period defaults to `[RETENTION DAYS]` +and is configurable via the `AUDIT_RETENTION_DAYS` environment variable. + +Each converted output carries an `X-Output-SHA256` response header so +the controller can independently verify integrity. + +## 6. Sub-processors + +The processor uses the sub-processors listed in +[`docs/sub-processors.md`](sub-processors.md). The default list applies +unless the controller and processor agree in writing to a reduced +scope at finalisation. + +The processor will inform the controller of any intended additions or +replacements at least thirty (30) days in advance. The controller may +object on reasonable grounds; in such case the parties will negotiate +in good faith, and absent agreement either party may terminate the +service agreement. + +## 7. Technical and organisational measures (TOM) + +The processor implements the measures documented in: +- [`docs/security-overview.md`](security-overview.md) +- [`docs/threat-model.md`](threat-model.md) +- [`docs/patch-policy.md`](patch-policy.md) +- [`docs/incident-response.md`](incident-response.md) +- [`docs/release-signing.md`](release-signing.md) + +These cover: encryption in transit (TLS 1.2+, HSTS), at-rest scope (no +persistent file storage by design), access control (timing-safe API key +validation, JWT-bound roles, admin role with database recheck per +request), key management, software-supply-chain hardening (cosign-signed +images, signed Git tags, CycloneDX SBOM), and incident-response +timelines. + +A summary of the measures is appended at finalisation as +"Annex II — Technical and Organisational Measures" tailored to the +specific deployment. + +## 8. Controller's instructions and rights + +The processor processes personal data only on documented instructions +from the controller, including with regard to transfers to third +countries. The instructions are this DPA and any subsequent written +instructions from the named contact in §1. + +The controller has the right to: + +- Receive on request, in a commonly used machine-readable format, all + personal data processed on its behalf (Art. 20 GDPR) +- Audit the processor's compliance with this DPA, on reasonable notice + and at the controller's expense, no more than once per twelve months + unless an incident has been reported +- Demand erasure of all personal data after termination, except where + Union or Member-State law requires retention (notably: tax-relevant + records under HGB §257 / AO §147, ten-year period) + +## 9. Personal data breach notification + +If the processor becomes aware of a personal data breach affecting the +controller's data, the processor will notify the controller without +undue delay and in any case **within 72 hours** of becoming aware. The +notification will include: nature of the breach, categories and +approximate number of data subjects and records concerned, likely +consequences, measures taken or proposed. + +The processor's incident-response procedure (see +[`docs/incident-response.md`](incident-response.md)) governs the +internal handling of the breach. + +## 10. Return or deletion at end of provision + +Upon termination of the underlying service agreement, the processor +will, at the controller's choice: + +- Return all personal data in a commonly used machine-readable format + within thirty (30) days, or +- Delete all personal data and certify the deletion in writing + +Records that the processor is legally obliged to retain (tax records, +fraud-prevention records under §257 HGB / §147 AO) are retained for +the statutory period and deleted thereafter without further request. + +## 11. Liability and limitations + +Liability for breach of this DPA is governed by the underlying service +agreement. Each party is liable for its own infringements of Articles +82–84 GDPR. Joint and several liability under Art. 82(4) GDPR is not +excluded. + +## 12. Governing law + +This DPA is governed by the laws of the Federal Republic of Germany. +Place of jurisdiction is Hamburg, Germany. + +--- + +## How to finalise + +1. Review the bracketed placeholders in §1 and §2 and fill them with + the deployment context. +2. Replace `[RETENTION DAYS]` in §5 with the configured value. +3. Append "Annex II — Technical and Organisational Measures" with the + measures specific to the deployment (instance location, network + segmentation, on-call, penetration-test status). +4. Both parties counter-sign a printed PDF or qualified-electronic + signature; the FileMorph operator counter-signature is provided + from `legal@filemorph.io` after the pilot conversation closes. + +Send the completed draft to `legal@filemorph.io`. Turnaround is +typically two business days. diff --git a/docs/sprint-s1-technology-first.md b/docs/sprint-s1-technology-first.md deleted file mode 100644 index 0abacd2..0000000 --- a/docs/sprint-s1-technology-first.md +++ /dev/null @@ -1,66 +0,0 @@ -# Sprint S1 — Technology First (Done-List) - -*Stand: 2026-04-24* - -A wave of backend-hygiene, bandwidth-awareness, observability, and static-asset -hardening shipped under the Technology-First motto. Recorded here because these -items were never priority-ranked backlog tickets — they came out of a live tech -audit. Keep for historical context when reading the affected code. - ---- - -## Shipped - -| Tag | Commit | What | Why | -|---|---|---|---| -| S1-A | `daefe10` | Event-loop-safe encoding · gzip · DB-pool hygiene · PNG squeeze | Every synchronous C-binding call now runs in `asyncio.to_thread`; a single slow convert stops blocking every other user | -| S1-B | `78acb98` | Per-tier output cap (bandwidth amplification guard) | Small input → huge output is the one path that bypasses the upload-size quota; new `max_output_size_mb` closes it | -| S1-B.fix | `f29bc9d` | Right-size Business/Enterprise output cap to 500 MB | Earlier 2 GB cap would have OOM-killed a small-RAM server under concurrent batches | -| S1-C | `b61d29a` | Static-cache headers + smart-format warning UI | `CachingStaticFiles` class serves `/static/*` with short revalidate by default, far-future `immutable` for hashed names; UI warns when a lossy-→-lossless reconvert would balloon the file | -| S3 | `8adfc84` | `FileResponse` + `BackgroundTask` streaming | Output no longer buffered in RAM — critical for 500 MB Business uploads | -| S4-foundation | `edeb4c4` | Structured logs: tier + rejection events | PII-free JSON log lines (tier/format/size only); builds the base for later billing + metrics | -| S1-D | `04f4f01` | Batch UI in the web app (multi-file + ZIP download) | API had `/convert/batch` for months; UI only ever posted one file. Closed the silent capability gap | -| S1-E | `5b9b361` | `/ready` probe + `uvicorn --timeout-keep-alive 65` | Distinguishes "container alive" from "DB pool alive"; keep-alive sized above typical CDN 60 s idle so connections survive a full idle window | -| S1-F | `a03f557` | Self-hosted Tailwind — drop `cdn.tailwindcss.com`, tighten CSP | `script-src 'self'` (no CDN allowance), removed the inline-config SHA-256 hash, standalone Tailwind CLI under `.tools/` | -| scope-scrub | `303c3fa` | Scrub production-server specifics from public comments | Generic wording for RAM sizing + CDN idle timeout — keeps the public repo deployment-agnostic | -| S1.5 | `ccca957` | `API_BASE` split — heavy upload POSTs to optional separate subdomain | Lets the main site sit behind a proxy with a body-size cap while uploads bypass it via a tunnel subdomain; zero-config same-origin default keeps tests + dev unchanged | -| S1-G | `75b0c11` | Cache-busting hash on tailwind bundle | `tailwind..css` picked up by `CachingStaticFiles` regex → `Cache-Control: public, max-age=31536000, immutable`. Browsers cache forever; a rebuild rotates the filename | - -All commits landed on `main` with individual CI green; audited as one batch -(Datenschutz / Security / Frontend / Backend / Tech) before push. - ---- - -## Server-side / dashboard-only (tracked, not in the app) - -Not in this repo because they configure the deployment, not the app: - -- `uvicorn --workers 2` — RAM-budget call; needs server-side observation first. -- CDN proxy flip for the main site (DDoS / WAF / hidden origin IP). Blocked only - on the app side by S1.5 (done); flip is a DNS toggle once uploads route - through the separate subdomain. -- Prometheus / Grafana stack — deferred to observability sprint; `/metrics` - endpoint wiring is app-side but live value needs a scrape target. -- Edge rate-limiting rule on `/api/v1/(convert|compress|morph)`. Redundant with - slowapi but cheap Layer-7 insurance. - ---- - -## Deferred to S2 (Morph + Smart-Tech) - -Explicitly out of scope for S1, queued as the next decision point: - -- `/api/v1/morph` — structural file adjustments (resize / crop / split / trim / - strip-EXIF / compress-to-target). "Morph > Convert" per the motto. -- Compress-to-target-size via binary search (already scaffolded in the - compressor; surface as a first-class option). -- Smart output routing — auto-pick AVIF / WebP based on `Accept:` header. -- Size-preview before upload (client-side estimate). -- Problem-centric preset tiles on the landing page (e.g. "shrink this to fit an - email attachment") instead of format-list cards. - ---- - -## Cross-References - -- Commit range: `98020dc..75b0c11` on `main`