Skip to content
143 changes: 143 additions & 0 deletions DESIGN-E2EE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# End-to-End Encryption (E2EE) Design — Option A Sign-off

Status: Draft for review
Owner: Kyaw

## 1. Scope and Goals
- Web-first PWA with future native iOS/Android.
- Protect messages, files/assets (3D/logo/media), docs, and analytics events.
- Privacy-preserving analytics client-side; server-side limited to minimized metadata.
- Retrofit behind feature flags; backward-compatible bridging for non-E2EE content.

## 2. Threat Model
- Adversaries: curious/compromised servers, external attackers (MITM), malicious clients, stolen devices.
- Trust boundaries: Only clients see plaintext/keys. Servers handle ciphertext, minimal metadata, and capability tokens. IdP proves identity only.
- Security goals: Confidentiality & integrity; forward secrecy and post-compromise security; deniable auth for messages; verifiable membership/state.
- Out of scope initial: traffic analysis resistance; plaintext content scanning; hardware tamper beyond platform enclaves.

## 3. Cryptographic Primitives and Libraries
- Ed25519 (signing) for identities and devices.
- X25519 (ECDH) for key agreement; sealed boxes for key wrapping (HPKE-ready abstraction).
- Messaging: Signal/Double Ratchet via libsignal-client (WASM) for 1:1/small groups.
- Large groups: Signal sender keys with periodic rotation; MLS on roadmap.
- Files: AES-256-GCM streaming with HKDF-derived per-chunk nonces; BLAKE3 for chunk and whole-file digests.
- KDF: HKDF-SHA256; Password KDF: Argon2id (high-memory, salted).
- Hashing: BLAKE3 for content addressing and integrity.

## 4. Identity & Device State Machines
### 4.1 Identity Keys
States: uninitialized -> generated -> backed_up (optional) -> compromised(revoked)
Transitions:
- generate: create Ed25519 identity key pair
- backup: wrap private key with Argon2id-derived KEK; store vault in IndexedDB
- revoke: mark identity compromised; re-enroll devices

### 4.2 Device Enrollment
States: new -> pending_attestation -> verified -> revoked
Transitions:
- new: device generates Ed25519 (sign) + X25519 (DH)
- provision: QR shows {device_pubkeys, nonce}; trusted device scans and verifies SAS
- attest: trusted device signs attestation binding device to identity
- verify: server records attestation; device becomes verified
- revoke: immediate revocation; triggers rotations

## 5. Messaging Sessions
- 1:1 and small groups: Double Ratchet with prekeys from libsignal.
- Device revocation: peers refuse messages from revoked devices.
- Group sender keys: per-room sender key rotated on membership change and periodically; per-recipient key wraps.

## 6. File Encryption and Sharing
### 6.1 Streaming Encryption
- Per-file random DEK (256-bit).
- Chunk size 512KB–2MB (adaptive).
- Nonce derivation: nonce_i = HKDF(DEK, info="file-chunk" || chunk_index)[0..12]
- AES-256-GCM over each chunk; produce per-chunk BLAKE3 and cumulative whole-file BLAKE3.

Comment on lines +66 to +68

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Clarify and harden AES-GCM nonce derivation; bind AAD to prevent misuse

The current nonce formula is ambiguous and risks incorrect implementations. For AES-GCM, nonce uniqueness per DEK is critical. Also, include AAD to bind chunk context to the ciphertext.

Apply this diff to specify an unambiguous, misuse-resistant construction and fix the markdown “reversed link” warning:

- - Nonce derivation: nonce_i = HKDF(DEK, info="file-chunk" || chunk_index)[0..12]
- - AES-256-GCM over each chunk; produce per-chunk BLAKE3 and cumulative whole-file BLAKE3.
+ - Nonce derivation (96-bit): Let PRK = HKDF-Extract(salt=blake3_file, IKM=DEK). For each chunk index i:
+   nonce_i = HKDF-Expand(PRK, info = "nonce:file-chunk:v1" || LE64(i), L = 12)  # first 12 bytes only (0..11)
+   Notes:
+   - Use LE64(i) as an 8-byte little-endian value. i must be < 2^32 in practice.
+   - "||" denotes byte concatenation.
+ - AES-256-GCM over each chunk with AAD = concat("e2ee-file:v1", version, chunk_index, object_id, algo, chunk_size)
+   to bind context and detect cross-context replays.
+ - Produce per-chunk BLAKE3 and cumulative whole-file BLAKE3 (for content addressing; not relied upon for AEAD integrity).
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- Nonce derivation: nonce_i = HKDF(DEK, info="file-chunk" || chunk_index)[0..12]
- AES-256-GCM over each chunk; produce per-chunk BLAKE3 and cumulative whole-file BLAKE3.
- Nonce derivation (96-bit): Let PRK = HKDF-Extract(salt=blake3_file, IKM=DEK). For each chunk index i:
nonce_i = HKDF-Expand(PRK, info = "nonce:file-chunk:v1" || LE64(i), L = 12) # first 12 bytes only (0..11)
Notes:
- Use LE64(i) as an 8-byte little-endian value. i must be < 2^32 in practice.
- "||" denotes byte concatenation.
- AES-256-GCM over each chunk with AAD = concat("e2ee-file:v1", version, chunk_index, object_id, algo, chunk_size)
to bind context and detect cross-context replays.
- Produce per-chunk BLAKE3 and cumulative whole-file BLAKE3 (for content addressing; not relied upon for AEAD integrity).
🧰 Tools
🪛 LanguageTool

[grammar] ~53-~53: There might be a mistake here.
Context: ...info="file-chunk" || chunk_index)[0..12] - AES-256-GCM over each chunk; produce per...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)

53-53: Reversed link syntax
(DEK, info="file-chunk" || chunk_index)[0..12]

(MD011, no-reversed-links)

🤖 Prompt for AI Agents
In DESIGN-E2EE.md around lines 53–55, the nonce derivation and AAD are ambiguous
and the markdown has a reversed-link warning; change the spec to derive a 96-bit
(12-byte) GCM nonce by HKDF-Expand using the file DEK as input keying material,
an explicit salt (or zero salt if none), and an info string formed
deterministically (e.g. "file-chunk-v1" || chunk_index || 0x01) and take the
first 12 bytes as the nonce to guarantee uniqueness per DEK+chunk; require
AES-256-GCM to use Additional Authenticated Data that binds chunk_index and a
file identifier (and version/algorithm tag) to the ciphertext (e.g. AAD =
"file-chunk-v1" || file_id || chunk_index) so chunk context is authenticated;
update the text to specify HKDF parameters (hash, salt behavior, info
composition), the exact nonce length (12 bytes), and that implementations must
fail if nonce collisions would occur; and fix the reversed-markdown link by
swapping the link text and URL into the canonical [text](url) form.

### 6.2 Manifest Format (signed by device Ed25519)
```
version: 1
algo: aes-256-gcm
chunk_size: <bytes>
length: <bytes>
blake3_file: <hex>
chunks:
- index: 0
offset: 0
size: <bytes>
blake3: <hex>
- ...
key_wraps: omitted in manifest; stored adjacent by object_id
sig: ed25519(signing_device_pubkey, canonical_json(manifest_without_sig))
```
Comment on lines +56 to +84

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Manifest: add language tag, key id (kid), and a clear canonicalization standard

Specify YAML for linting, include a key identifier to locate the verifying key, and name the canonicalization algorithm to avoid ambiguity.

Apply this diff:

-```
+```yaml
 version: 1
+kid: <signing_device_key_id>  # stable identifier or fingerprint of the Ed25519 signing key
 algo: aes-256-gcm
 chunk_size: <bytes>
 length: <bytes>
 blake3_file: <hex>
 chunks:
   - index: 0
     offset: 0
     size: <bytes>
     blake3: <hex>
   - ...
 key_wraps: omitted in manifest; stored adjacent by object_id
-sig: ed25519(signing_device_pubkey, canonical_json(manifest_without_sig))
+sig: ed25519(signing_device_pubkey, JCS(manifest_without_sig))  # JCS = RFC 8785 canonical JSON

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>

57-57: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

In DESIGN-E2EE.md around lines 56 to 71, update the manifest format to include a
key identifier and an explicit canonicalization algorithm: add a "kid:
<signing_device_key_id>" field under version, and change the sig line to specify
JCS (RFC 8785) canonical JSON (e.g. sig: ed25519(signing_device_pubkey,
JCS(manifest_without_sig))); also mark the example as YAML for linting/clarity.
Ensure the new kid field is documented as a stable identifier or fingerprint of
the Ed25519 signing key and that the canonicalization change replaces the vague
"canonical_json" wording with "JCS(manifest_without_sig)".


</details>

<!-- fingerprinting:phantom:triton:chinchilla -->

<!-- This is an auto-generated comment by CodeRabbit -->


### 6.3 DEK Sharing
- For each recipient device X25519 pubkey, create sealed box of DEK.
- Store wraps: key_wraps(object_id, device_id, wrap_ciphertext)
- Rekey on membership change; rewrap to active devices.

Comment on lines +86 to +90

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Bind DEK wraps to object and recipient; prefer HPKE-ready structure over bare sealed boxes

Sealed boxes don’t carry AAD; pairing a ciphertext with a device_id in the DB without binding risks substitution by a malicious server. Define a wrap payload that includes object_id, alg, and recipient key id, and migrate to HPKE where AAD is first-class.

Apply this diff:

-### 6.3 DEK Sharing
-- For each recipient device X25519 pubkey, create sealed box of DEK.
-- Store wraps: key_wraps(object_id, device_id, wrap_ciphertext)
-- Rekey on membership change; rewrap to active devices.
+### 6.3 DEK Sharing
+- For each recipient device (recipient_kid, X25519 pubkey), create a wrap over:
+  wrap_payload = CBOR({
+    "v": 1,
+    "object_id": <object_id>,
+    "alg": "aes-256-gcm",
+    "dek": <32 bytes>,
+    "recipient_kid": <recipient_kid>
+  })
+  Prefer HPKE(KEM=X25519, KDF=HKDF-SHA256, AEAD=ChaCha20-Poly1305 or AES-GCM) so that AAD = object_id || "dek-wrap:v1"
+  is authenticated. If using libsodium sealed boxes as a stopgap, include the above fields inside the sealed payload.
+- Store wraps: key_wraps(object_id, recipient_kid, wrap_ciphertext, alg="hpke-v1")
+- Rekey on membership change; rewrap to active devices only.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### 6.3 DEK Sharing
- For each recipient device X25519 pubkey, create sealed box of DEK.
- Store wraps: key_wraps(object_id, device_id, wrap_ciphertext)
- Rekey on membership change; rewrap to active devices.
### 6.3 DEK Sharing
- For each recipient device (recipient_kid, X25519 pubkey), create a wrap over:
wrap_payload = CBOR({
"v": 1,
"object_id": <object_id>,
"alg": "aes-256-gcm",
"dek": <32 bytes>,
"recipient_kid": <recipient_kid>
})
Prefer HPKE(KEM=X25519, KDF=HKDF-SHA256, AEAD=ChaCha20-Poly1305 or AES-GCM) so that AAD = object_id || "dek-wrap:v1"
is authenticated. If using libsodium sealed boxes as a stopgap, include the above fields inside the sealed payload.
- Store wraps: key_wraps(object_id, recipient_kid, wrap_ciphertext, alg="hpke-v1")
- Rekey on membership change; rewrap to active devices only.
🧰 Tools
🪛 LanguageTool

[grammar] ~73-~73: There might be a mistake here.
Context: ...t_without_sig)) ``` ### 6.3 DEK Sharing - For each recipient device X25519 pubkey,...

(QB_NEW_EN)


[grammar] ~74-~74: There might be a mistake here.
Context: ...or each recipient device X25519 pubkey, create sealed box of DEK. - Store wraps: key_w...

(QB_NEW_EN)


[grammar] ~74-~74: There might be a mistake here.
Context: ...X25519 pubkey, create sealed box of DEK. - Store wraps: key_wraps(object_id, device...

(QB_NEW_EN)


[grammar] ~75-~75: There might be a mistake here.
Context: ...s(object_id, device_id, wrap_ciphertext) - Rekey on membership change; rewrap to ac...

(QB_NEW_EN)

🤖 Prompt for AI Agents
In DESIGN-E2EE.md around lines 73 to 77, the DEK wrap scheme currently suggests
storing bare sealed-box ciphertexts keyed only by device_id, which allows
server-side substitution and lacks AAD; update the design to bind each wrap to
both object and recipient by defining a wrap payload structure that contains
object_id, algorithm identifier (alg), and recipient_key_id and to replace
sealed boxes with an HPKE-ready envelope that uses AAD; specify that stored rows
(key_wraps) must include object_id, device_id, recipient_key_id, alg, and
wrap_ciphertext (HPKE-encrypted blob) and that unwrapping must verify AAD
matches the object_id/recipient_key_id/alg before accepting the DEK; update
rekey/migration notes to rewrap existing sealed-box entries into the new HPKE
payload format and validate integrity on migration.

## 7. Capability Tokens
- Format: PASETO v4.public (Ed25519-signed by server capability key).
Claims:
- sub: user or device id
- scope: [object:get|put, room:read, room:write, membership:manage]
- resource: URI or prefix (e.g., s3://bucket/path/object-id)
- exp: expiry; iat/nbf
- region: data residency constraint
- tid/nonce: unique token id to prevent replay

## 8. APIs (Server)
- POST /devices/attest
- POST /devices/revoke
- POST /rooms
- POST /rooms/:id/members
- POST /rooms/:id/rotate
- POST /capabilities
- PUT /objects/:id (requires capability)
- GET /objects/:id (requires capability)
- WS /events

## 9. Storage Schema (Postgres + S3-compatible)
- users(id, identity_pubkey_hash, oidc_sub, region)
- devices(id, user_id, ed25519_pub, x25519_pub, attestation_sig, status)
- rooms(id, created_by, policy)
- memberships(room_id, device_id, role, since, status)
- sender_keys(room_id, epoch, key_id, wrapped_keys jsonb, created_at)
- objects(id, owner, room_id, bucket, path, blake3_digest, size, manifest_sig, created_at)
- key_wraps(object_id, device_id, wrap_ciphertext)
- audit_events(id, actor, type, target, ts, meta)

## 10. Backup & Recovery
- Key vault: private keys wrapped by Argon2id-derived KEK; IndexedDB on web; Secure Enclave/Keystore on mobile.
- Optional Shamir 2-of-3 recovery (user + admin escrow + HSM) with approvals and audit.

## 11. Metadata Minimization
- Store hashed identity references; coarse timestamps; encrypted membership maps when feasible.
- Avoid plaintext titles/tags. No plaintext in logs.

## 12. Request Signing & Replay Protection
- Client signs sensitive requests with device Ed25519 over canonical payload + timestamp.
- Server enforces skew window and tid uniqueness.

Comment on lines +134 to +137

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Request signing: specify canonicalization, covered fields, and replay window

Define canonical JSON (e.g., JCS), exact covered headers/query params, and the replay/skew windows to ensure interoperability and security.

Apply this diff:

-## 12. Request Signing & Replay Protection
-- Client signs sensitive requests with device Ed25519 over canonical payload + timestamp.
-- Server enforces skew window and tid uniqueness.
+## 12. Request Signing & Replay Protection
+- Canonicalization: JCS (RFC 8785) over a payload that includes method, path, sorted query, selected headers (host, content-type, content-length), body hash (BLAKE3), timestamp, and tid.
+- Signature: Ed25519 device key over sign_payload; header: X-Signature: base64(sig), X-Signature-KID: <device_kid>.
+- Server enforces clock skew window ≤ 2 minutes and tid uniqueness for 10 minutes (per tenant).
+- GET/HEAD requests sign an empty body hash; all signed fields must be verified before processing.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## 12. Request Signing & Replay Protection
- Client signs sensitive requests with device Ed25519 over canonical payload + timestamp.
- Server enforces skew window and tid uniqueness.
## 12. Request Signing & Replay Protection
- Canonicalization: JCS (RFC 8785) over a payload that includes method, path, sorted query, selected headers (host, content-type, content-length), body hash (BLAKE3), timestamp, and tid.
- Signature: Ed25519 device key over sign_payload; header: X-Signature: base64(sig), X-Signature-KID: <device_kid>.
- Server enforces clock skew window ≤ 2 minutes and tid uniqueness for 10 minutes (per tenant).
- GET/HEAD requests sign an empty body hash; all signed fields must be verified before processing.

## 13. Performance Targets
- p95 decrypt < 120 ms for 10 MB on desktop.
- Streaming crypto in Web Workers; backpressure-managed I/O.

## 14. Rollout & Kill Switch
- Feature flags per tenant/room.
- Canary cohorts; schema uses sidecar tables for isolation.
- Instant kill-switch disables capability issuance for E2EE objects/rooms; existing ciphertext remains intact.

## 15. CI/CD and Supply Chain
- Renovate/Dependabot with grouped patch/minor; majors manual.
- GitHub Actions: lint/typecheck/tests, CodeQL, SCA, SBOM (Syft), container scanning.
- Signed commits and releases.

## 16. Test Plan (Acceptance Gates)
- Unit and property tests for: keygen, provisioning, sealed box wraps, AES-GCM streaming (vectors), manifest sign/verify, PASETO claims/validation, rotation flows.
- Integration: 1:1 E2EE chat, file upload/download, membership change triggers rewrap/rotation.
- Data residency pinning tests; GDPR DSR exercises.

## 17. Open Items / Future Work
- Evaluate MLS migration path for large rooms.
- HPKE support behind wrapping abstraction.
- Privacy-preserving analytics with DP budget management per org.
Loading