|
| 1 | +# End-to-End Encryption (E2EE) Design — Option A Sign-off |
| 2 | + |
| 3 | +Status: Ready for sign-off |
| 4 | +Owner: Kyaw |
| 5 | + |
| 6 | +## 1. Scope and Goals |
| 7 | +- Web-first PWA with future native iOS/Android. |
| 8 | +- Protect messages, files/assets (3D/logo/media), docs, and analytics events. |
| 9 | +- Privacy-preserving analytics client-side; server-side limited to minimized metadata. |
| 10 | +- Retrofit behind feature flags; backward-compatible bridging for non-E2EE content. |
| 11 | + |
| 12 | +## 2. Threat Model |
| 13 | +- Adversaries: curious/compromised servers, external attackers (MITM), malicious clients, stolen devices. |
| 14 | +- Trust boundaries: Only clients see plaintext/keys. Servers handle ciphertext, minimal metadata, and capability tokens. IdP proves identity only. |
| 15 | +- Security goals: Confidentiality & integrity; forward secrecy and post-compromise security; deniable auth for messages; verifiable membership/state. |
| 16 | +- Out of scope initial: traffic analysis resistance; plaintext content scanning; hardware tamper beyond platform enclaves. |
| 17 | + |
| 18 | +## 3. Cryptographic Primitives and Libraries |
| 19 | + |
| 20 | +### 3.1 Argon2id Parameters (desktop vs mobile) |
| 21 | +- Desktop/Web: m=64 MiB, t=3, p=1 — balances user-perceived latency with robust GPU/ASIC resistance for key wrapping. Typical derivation latency ~200–500 ms on contemporary laptops. |
| 22 | +- Mobile: m=32 MiB, t=3, p=1 — reduces memory pressure and thermal impact while keeping meaningful resistance. Target latency ~300–700 ms depending on device class. |
| 23 | +- Salt length: 16 bytes (random per vault). Output length: 32 bytes (KEK). |
| 24 | +- Rationale: Memory-hardness is the dominant cost lever; t=3 provides reasonable compute amplification without excessive energy draw on battery devices. |
| 25 | + |
| 26 | +- Ed25519 (signing) for identities and devices. |
| 27 | +- X25519 (ECDH) for key agreement; sealed boxes for key wrapping (HPKE-ready abstraction). |
| 28 | +- Messaging: Signal/Double Ratchet via libsignal-client (WASM) for 1:1/small groups. |
| 29 | +- Large groups: Signal sender keys with periodic rotation; MLS on roadmap. |
| 30 | +- Files: AES-256-GCM streaming with HKDF-derived per-chunk nonces; BLAKE3 for chunk and whole-file digests. |
| 31 | +- KDF: HKDF-SHA256; Password KDF: Argon2id (m=64 MiB, t=3, p=1; mobile fallback m=32 MiB). Salt: 16 bytes. Output: 32 bytes. |
| 32 | +- Hashing: BLAKE3 for content addressing and integrity. |
| 33 | + |
| 34 | +## 4. Identity & Device State Machines |
| 35 | +### 4.1 Identity Keys |
| 36 | +States: uninitialized -> generated -> backed_up (optional) -> compromised(revoked) |
| 37 | +Transitions: |
| 38 | +- generate: create Ed25519 identity key pair |
| 39 | +- backup: wrap private key with Argon2id-derived KEK; store vault in IndexedDB |
| 40 | +- revoke: mark identity compromised; re-enroll devices |
| 41 | + |
| 42 | +### 4.2 Device Enrollment |
| 43 | +States: new -> pending_attestation -> verified -> revoked |
| 44 | +Transitions: |
| 45 | +- new: device generates Ed25519 (sign) + X25519 (DH) |
| 46 | +- provision: QR shows {device_pubkeys, nonce}; trusted device scans and verifies SAS |
| 47 | +- attest: trusted device signs attestation binding device to identity |
| 48 | +- verify: server records attestation; device becomes verified |
| 49 | +- revoke: immediate revocation; triggers rotations |
| 50 | + |
| 51 | +## 5. Messaging Sessions |
| 52 | +- 1:1 and small groups: Double Ratchet with prekeys from libsignal. |
| 53 | +- Device revocation: peers refuse messages from revoked devices. |
| 54 | +- Group sender keys: per-room sender key rotated on membership change and every 7 days; per-recipient key wraps. |
| 55 | + |
| 56 | +## 6. File Encryption and Sharing |
| 57 | +See also Mermaid diagrams in docs/diagrams for provisioning, file-share, and rekey flows. |
| 58 | + |
| 59 | +### 5.1 Canonicalization (signatures & tokens) |
| 60 | +- Manifest signing: Canonical JSON per RFC 8785 (JSON Canonicalization Scheme, JCS). Remove the `sig` field before canonicalization; sign the result with device Ed25519. Verification recomputes canonical JSON and verifies the signature. |
| 61 | +- PASETO payload normalization: When computing or verifying detached request signatures or audit hashes, canonicalize the JSON payload using the same JCS rules and sort header fields if applicable. Avoid including transient fields (e.g., `iat` skew-adjusted values) in hash commitments. |
| 62 | + |
| 63 | +### 6.1 Streaming Encryption |
| 64 | +- Per-file random DEK (256-bit). |
| 65 | +- Chunk size 512KB–2MB (adaptive). Max single-object size: 5 GB (resumable uploads). |
| 66 | +- Nonce derivation: nonce_i = HKDF(DEK, info="file-chunk" || chunk_index)[0..12] |
| 67 | +- AES-256-GCM over each chunk; produce per-chunk BLAKE3 and cumulative whole-file BLAKE3. |
| 68 | + |
| 69 | +### 6.2 Manifest Format (signed by device Ed25519) |
| 70 | +``` |
| 71 | +version: 1 |
| 72 | +algo: aes-256-gcm |
| 73 | +chunk_size: <bytes> |
| 74 | +length: <bytes> |
| 75 | +blake3_file: <hex> |
| 76 | +chunks: |
| 77 | + - index: 0 |
| 78 | + offset: 0 |
| 79 | + size: <bytes> |
| 80 | + blake3: <hex> |
| 81 | + - ... |
| 82 | +key_wraps: omitted in manifest; stored adjacent by object_id |
| 83 | +sig: ed25519(signing_device_pubkey, JCS(manifest_without_sig)) |
| 84 | +``` |
| 85 | + |
| 86 | +### 6.3 DEK Sharing |
| 87 | +- For each recipient device X25519 pubkey, create sealed box of DEK. |
| 88 | +- Store wraps: key_wraps(object_id, device_id, wrap_ciphertext) |
| 89 | +- Rekey on membership change; rewrap to active devices. |
| 90 | + |
| 91 | +## 7. Capability Tokens |
| 92 | +- Format: PASETO v4.public (Ed25519-signed by server capability key). |
| 93 | +Claims: |
| 94 | +- sub: user or device id |
| 95 | +- scope: [object:get|put, room:read, room:write, membership:manage] |
| 96 | +- resource: URI or prefix (e.g., s3://bucket/path/object-id) |
| 97 | +- exp: expiry; iat/nbf |
| 98 | +- region: enforced data residency region (controls bucket/prefix routing) |
| 99 | +- tid/nonce: unique token id to prevent replay |
| 100 | +- TTLs: object GET/PUT 15 minutes; room/event scopes 1 hour. |
| 101 | + |
| 102 | +## 8. APIs (Server) |
| 103 | +- POST /devices/attest |
| 104 | +- POST /devices/revoke |
| 105 | +- POST /rooms |
| 106 | +- POST /rooms/:id/members |
| 107 | +- POST /rooms/:id/rotate |
| 108 | +- POST /capabilities |
| 109 | +- PUT /objects/:id (requires capability) |
| 110 | +- GET /objects/:id (requires capability) |
| 111 | +- WS /events |
| 112 | + |
| 113 | +## 9. Storage Schema (Postgres + S3-compatible) |
| 114 | +Residency enforcement: region claim in capabilities is validated and mapped to storage bucket/prefix; mismatches are rejected at capability validation and storage routing. |
| 115 | +- users(id, identity_pubkey_hash, oidc_sub, region) |
| 116 | +- devices(id, user_id, ed25519_pub, x25519_pub, attestation_sig, status) |
| 117 | +- rooms(id, created_by, policy) |
| 118 | +- memberships(room_id, device_id, role, since, status) |
| 119 | +- sender_keys(room_id, epoch, key_id, wrapped_keys jsonb, created_at) |
| 120 | +- objects(id, owner, room_id, bucket, path, blake3_digest, size, manifest_sig, created_at) |
| 121 | +- key_wraps(object_id, device_id, wrap_ciphertext) |
| 122 | +- audit_events(id, actor, type, target, ts, meta) |
| 123 | + |
| 124 | +## 10. Backup & Recovery |
| 125 | +SAS & QR: Use emoji SAS (7-emoji sequence from a 64-emoji table) for human-friendly verification. QR payload schema: base64url(JSON { device_pubkeys, nonce, sig, ts }). |
| 126 | +- Key vault: private keys wrapped by Argon2id-derived KEK; IndexedDB on web; Secure Enclave/Keystore on mobile. |
| 127 | +- Optional Shamir 2-of-3 recovery (user + admin escrow + HSM) with approvals and audit. |
| 128 | + |
| 129 | +## 11. Metadata Minimization |
| 130 | +Audit taxonomy and retention: capture key lifecycle events, membership changes, capability issuance/revocation, and recovery attempts. Retention: 1 year (then purge or anonymize per policy). |
| 131 | +- Store hashed identity references; coarse timestamps; encrypted membership maps when feasible. |
| 132 | +- Avoid plaintext titles/tags. No plaintext in logs. |
| 133 | + |
| 134 | +## 12. Request Signing & Replay Protection |
| 135 | +- Client signs sensitive requests with device Ed25519 over canonical payload + timestamp. |
| 136 | +- Server enforces skew window and tid uniqueness. |
| 137 | + |
| 138 | +## 13. Performance Targets |
| 139 | +- p95 decrypt < 120 ms for 10 MB on desktop. |
| 140 | +- Streaming crypto in Web Workers; backpressure-managed I/O. |
| 141 | + |
| 142 | +## 14. Rollout & Kill Switch |
| 143 | +- Feature flags per tenant/room. |
| 144 | +- Canary cohorts; schema uses sidecar tables for isolation. |
| 145 | +- Instant kill-switch disables capability issuance for E2EE objects/rooms; existing ciphertext remains intact. |
| 146 | + |
| 147 | +## 15. CI/CD and Supply Chain |
| 148 | +- Renovate/Dependabot with grouped patch/minor; majors manual. |
| 149 | +- GitHub Actions: lint/typecheck/tests, CodeQL, SCA, SBOM (Syft), container scanning. |
| 150 | +- Signed commits and releases. |
| 151 | + |
| 152 | +## 16. Test Plan (Acceptance Gates) |
| 153 | +- Unit and property tests for: keygen, provisioning, sealed box wraps, AES-GCM streaming (vectors), manifest sign/verify, PASETO claims/validation, rotation flows. |
| 154 | +- Integration: 1:1 E2EE chat, file upload/download, membership change triggers rewrap/rotation. |
| 155 | +- Data residency pinning tests; GDPR DSR exercises. |
| 156 | + |
| 157 | +## 17. Open Items / Future Work |
| 158 | +- Evaluate MLS migration path for large rooms. |
| 159 | +- HPKE support behind wrapping abstraction. |
| 160 | +- Privacy-preserving analytics with DP budget management per org. |
0 commit comments