From 6d3c3bd564c909ac6141e08422551b1cb4bd3bec Mon Sep 17 00:00:00 2001 From: Anton Shepilov Date: Wed, 28 Jan 2026 16:23:13 +0100 Subject: [PATCH 1/2] docs: ADR for uploading large files --- docs/adr/002-large-file-upload-chunking.md | 294 +++++++++++++++++++++ 1 file changed, 294 insertions(+) create mode 100644 docs/adr/002-large-file-upload-chunking.md diff --git a/docs/adr/002-large-file-upload-chunking.md b/docs/adr/002-large-file-upload-chunking.md new file mode 100644 index 00000000000..4a9dcfca2df --- /dev/null +++ b/docs/adr/002-large-file-upload-chunking.md @@ -0,0 +1,294 @@ +# ADR: Large File Upload Support with Storage-Level Chunking + +## Status + +Proposed + +## Context + +Twake Workplace (Cozy-Stack) allows users to upload files via a single HTTP request that are stored in OpenStack Swift or local filesystem (Afero). +Currently, there are practical limitations: + +1. Swift has a ~5GB single object limit +2. Very large file uploads can stress server resources +3. Long uploads are more prone to network failures +4. We plan to add S3 API support, which has similar chunking needs (multipart uploads) + +### Current Architecture + +The VFS (Virtual File System) abstraction layer supports multiple backends: +- **VFSSwift** (`model/vfs/vfsswift/impl_v3.go`): OpenStack Swift storage +- **VFSAfero** (`model/vfs/vfsafero/impl.go`): Local filesystem storage + +Both implement the same `vfs.VFS` interface, keeping the storage implementation transparent to consumers. + +### Problem Statement + +Users cannot upload files larger than 5GB to Swift storage. We need a solution that: +- Supports files larger than 5GB +- Works with existing single-request upload API +- Is extensible to local VFS and future S3 support +- Minimizes changes to CouchDB schema and existing code + +## Proposal + +### Storage-Level Chunking + +Implement chunking at the storage backend level, keeping it transparent to the HTTP API layer. Each storage backend handles chunking internally using its native large object support: + +**For Swift:** Use Static Large Objects (SLO) +- Swift automatically manages segments and manifests +- Downloads are transparently reassembled by Swift +- No CouchDB schema changes needed + +**For Local VFS (Afero):** Already handles large files natively (no chunking needed) + +**For Future S3:** Use S3 Multipart Upload API +- S3/MinIO provide native multipart upload support similar to Swift SLO (streaming parts directly to disk without buffering entire files) +- Same transparent approach can be used +- Observe S3 protocol limits: a single object maxes out at 5TB, uploads can have at most 10,000 parts, and each part must be between 5MB and 5GB (last part can be smaller) +- Pick part sizes small enough (and configurable) so the 10,000-part limit still covers the largest supported file; anything larger than 5TB must be chunked at the application level because the S3 API itself forbids it +- **Note on checksums:** Like Swift SLO, S3 multipart ETags are not MD5 hashes of the content (they're a hash of part ETags). Application-side MD5 computation will be required for S3 multipart uploads, using the same pattern as Swift SLO + +### Implementation Details + +#### 1. Configuration (Swift-Specific) + +Add Swift-specific configuration under `fs.swift`: + +```yaml +fs: + url: swift://... + swift: + # Size of each segment for SLO uploads (default: 4GB) + segment_size: 4294967296 + # Files larger than this use SLO (default: same as segment_size) + # Set to 0 to always use SLO + slo_threshold: 4294967296 +``` + +**Test override**: Tests will be able to set a tiny segment size (e.g., 1KB) via config overrides to exercise the SLO code path without uploading gigabytes. + +#### 2. Swift VFS Changes (`model/vfs/vfsswift/impl_v3.go`) + +**CreateFile method modifications:** + +The `CreateFile` method will be extended to: +1. Read segment size and SLO threshold from configuration +2. Determine whether to use SLO based on file size (files larger than threshold) or streaming mode (unknown size, indicated by negative `ByteSize`) +3. For SLO uploads: use Swift's `StaticLargeObjectCreateFile` API with configured chunk size, letting Swift generate collision-free segment prefixes automatically +4. Return a new `swiftLargeFileCreationV3` struct that wraps the SLO writer and maintains its own MD5 hasher +5. For regular uploads: continue using the existing `ObjectCreate` path unchanged + +**Quota enforcement for streaming uploads (unknown size):** + +When `ByteSize < 0` (streaming/chunked transfer encoding), the total size is unknown upfront. Quota enforcement will work as follows: +1. Before upload: check that instance has *some* available quota (reject if quota already exceeded) +2. During upload: the `swiftLargeFileCreationV3.Write()` method will track cumulative bytes written +3. On each write: compare cumulative bytes against instance quota; if exceeded, call `Abort()` to clean up segments and return a quota-exceeded error +4. The existing `vfs.CheckAvailableDiskSpace` check runs at file creation time; for streaming uploads, we add a runtime check that aborts mid-stream if the limit is hit +5. This mirrors the existing behavior for regular uploads where Swift/the VFS layer rejects writes that exceed quota + +#### 3. MD5/Checksum Handling + +**Important**: SLO manifests don't return a single MD5 hash like regular objects. The manifest's ETag is a hash of the segment ETags, not the content. + +We will implement application-side MD5 computation: + +1. Create a new `swiftLargeFileCreationV3` struct that holds an MD5 hasher alongside the Swift file writer +2. On each `Write()` call, update the MD5 hash before passing data to Swift +3. On `Close()`, finalize the MD5 hash and store it in `newdoc.MD5Sum` (ignoring Swift's manifest ETag) +4. Proceed with the normal close logic (CouchDB update, versioning) using our computed hash + +This ensures: +- Antivirus scanning works (relies on MD5Sum) +- File versioning works (compares MD5Sum to detect changes) +- File integrity validation works + +#### 4. Failure Handling and Cleanup + +**Failure Scenarios:** + +1. **Client disconnects mid-upload**: The `swiftLargeFileCreationV3.Close()` is never called; segments remain orphaned +2. **Server crash**: Same as above - partially written segments exist without a manifest +3. **Write error mid-stream**: Error returned from `Write()` or `Close()`, segments may exist + +**Cleanup Strategy:** + +**Best-effort cleanup on error:** + +The `swiftLargeFileCreationV3` struct will implement cleanup behavior: +- On `Close()` error: attempt to delete any written segments using `LargeObjectDelete`, then return the original error +- New `Abort()` method: close the underlying writer and delete any segments that were written; called on context cancellation or explicit abort + +**Periodic garbage collection of orphaned segments:** + +Orphaned segments can accumulate from crashes or network failures. We will add a worker job (`worker/gc/slo_segments.go`) that: +1. Lists all segment prefixes (objects matching `*_segments/*` pattern) +2. For each segment prefix, checks if the parent manifest exists +3. If the manifest is missing AND segments are older than a configurable threshold (default: 24h), deletes the orphaned segments +4. Logs all deletions for audit trail + +**Configuration:** +```yaml +fs: + swift: + # ... existing config ... + # Age threshold for orphan cleanup (default: 24h) + orphan_segment_max_age: 24h +``` + +**Triggering cleanup:** +- On upload error: immediate best-effort delete +- On server startup: schedule GC job +- Periodically: run GC worker (configurable interval) +- Manual: `cozy-stack swift gc-segments` CLI command + +#### 5. Operations That Need SLO Awareness + +The following methods currently use `ObjectDelete` or `ObjectCopy` and need updates: + +| Method | Current | Change Needed | +|--------|---------|---------------| +| `destroyFileLocked` | `ObjectDelete` | Use `LargeObjectDelete` with fallback | +| `cleanOldVersion` | `ObjectDelete` | Use `LargeObjectDelete` with fallback | +| `EnsureErased` | `BulkDelete` / `ObjectDelete` | Use `LargeObjectDelete` for each | +| `CopyFile` | `ObjectCopy` | Copy manifest + segments or re-upload | +| `DissociateFile` | `ObjectCopy` + `ObjectDelete` | Handle SLO copy and delete | +| `CopyFileFromOtherFS` | `ObjectCopy` | Handle SLO source objects | + +**Deletion pattern:** + +We will implement a `deleteObject` helper method that: +1. First attempts `LargeObjectDelete` (which handles both SLO manifests with their segments and regular objects) +2. If Swift returns `NotLargeObject` error, falls back to regular `ObjectDelete` +3. This unified approach ensures both SLO and regular objects are deleted correctly without needing to check the object type first + +**Copy pattern** (for SLO objects): + +Swift doesn't support copying SLO manifests directly. Available options: +1. **Copy manifest content and update segment references** - Complex: requires parsing manifest JSON, copying each segment individually, updating references +2. **Download and re-upload** - Simple but slow: streams entire file through the server +3. **Copy segments individually then create new manifest** - Medium complexity: server-side segment copy + new manifest creation + +**Chosen approach: Segment copy with new manifest (Option 3)** + +Rationale: +- Avoids streaming large files through the server (unlike Option 2) +- Keeps data server-side within Swift (efficient for same-region copies) +- Acceptable complexity since copy/dissociate of very large files is rare + +Recommendation: For `CopyFile`/`DissociateFile`, detect if source is SLO and handle appropriately. For initial implementation, fall back to download/re-upload for SLO objects (rare case for very large files). + +### How Swift SLO Works + +``` +Upload 10GB file with 4GB segments: +├── {container}/{objName}_segments/1234567890.123456/00000000 (4GB) +├── {container}/{objName}_segments/1234567890.123456/00000001 (4GB) +├── {container}/{objName}_segments/1234567890.123456/00000002 (2GB) +└── {container}/{objName} (manifest JSON listing segments) + +The segment prefix includes a timestamp to avoid collisions. + +Download: +Client requests {objName} → Swift reads manifest → Streams segments in order +``` + +## Alternatives + +### Alternative A: UI-Initiated Chunked Uploads + +**Description:** Client sends multiple HTTP requests, each containing a chunk of the file. Server assembles chunks after all are received. + +**Pros:** +- Client can resume uploads after failure +- Better progress tracking per chunk +- Works with any storage backend uniformly + +**Cons:** +- Requires new HTTP API endpoints (`POST /files/chunks/start`, `PUT /files/chunks/{id}`, `POST /files/chunks/{id}/complete`) +- Significant UI/client changes required +- Server must track upload sessions and handle cleanup of incomplete uploads +- Adds complexity to CouchDB (need to track chunk metadata) +- More HTTP round trips +- State management for partial uploads + +**Implementation complexity:** High + +### Alternative B: Unified Chunking Layer in VFS + +**Description:** Add a chunking abstraction in the VFS interface that all backends implement, with chunk metadata stored in CouchDB. + +**Pros:** +- Consistent behavior across all storage backends +- Full control over chunk management +- CouchDB knows about file structure + +**Cons:** +- CouchDB schema changes required (new `chunks` field in FileDoc) +- Must implement custom chunk assembly for downloads +- Duplicates functionality that Swift/S3 provide natively +- Increases complexity in all VFS operations +- Migration needed for existing files + +**Implementation complexity:** High + +## Decision + +**Recommended approach: Storage-Level Chunking** + +Rationale for recommendation: +1. **Minimal changes**: Only affects Swift VFS, no API or CouchDB changes +2. **Uses native features**: Swift SLO is battle-tested and efficient +3. **Transparent**: Existing code (downloads, file operations) works unchanged +4. **Extensible**: Same pattern applies to S3 multipart uploads +5. **No UI changes**: Works with existing single-request upload API +6. **Afero compatibility**: Local VFS already handles large files, no changes needed + +## Consequences + +### Positive +- Files larger than 5GB can be uploaded to Swift +- Downloads work transparently (Swift handles segment assembly) +- No CouchDB schema changes +- No HTTP API changes +- Easy to extend to S3 when needed +- Minimal code surface area to maintain +- Works with streaming uploads (unknown Content-Length) + +### Negative +- Swift-specific implementation (though same pattern works for S3) +- MD5 must be computed application-side for SLO uploads +- Copy operations for SLO files are more complex +- Segments use additional storage namespace (though transparent to users) + +### Neutral +- Configuration options are Swift-specific (`fs.swift.segment_size`) +- Deletion is slightly more complex (but library handles it) + +### Security Considerations +- Switching to Swift SLO does not introduce new HTTP endpoints or long-lived upload sessions, so the DoS surface stays effectively the same as today's single-request uploads. +- Segment creation remains gated by the existing per-instance quota checks in `vfs.CheckAvailableDiskSpace`, so a malicious client cannot exceed its quota by partially uploading SLO files. +- **Streaming upload quota enforcement:** For uploads with unknown size (`ByteSize < 0`), the `swiftLargeFileCreationV3` writer tracks cumulative bytes and aborts the upload if quota is exceeded mid-stream. This prevents quota bypass via chunked transfer encoding. +- Large uploads remain inherently expensive (bandwidth/CPU). Existing protections—rate limiting (`pkg/limits`), request timeouts, and monitoring for long-running uploads—should continue to be enforced. No additional attack vectors are introduced by SLO itself. + +## Testing Considerations + +### Unit Tests +- Override `config.Fs.Swift.SegmentSize` to 1KB in tests to exercise SLO code path +- Test files smaller than threshold (regular upload) +- Test files larger than threshold (SLO upload) +- Test streaming upload with unknown size (should use SLO) +- Test MD5 computation matches expected value + +### Integration Tests +- Test file deletion (verify segments are cleaned up) +- Test downloads of SLO files +- Test file versioning with SLO files +- Test copy/dissociate operations with SLO files + +### Test Configuration Example + +Tests will configure tiny segment sizes (e.g., 1KB segments, 512-byte threshold) to exercise the SLO code path with small test files. +For example, a 2KB test file would use 2 segments, allowing verification of upload, download, and delete operations without requiring gigabytes of test data. From 8eab27e0afd17d77be6d3af39dff0afb643d21f1 Mon Sep 17 00:00:00 2001 From: Anton Shepilov Date: Fri, 30 Jan 2026 11:53:18 +0100 Subject: [PATCH 2/2] docs: Added a DLO section to ADR --- docs/adr/002-large-file-upload-chunking.md | 32 ++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/docs/adr/002-large-file-upload-chunking.md b/docs/adr/002-large-file-upload-chunking.md index 4a9dcfca2df..f9c0abb336e 100644 --- a/docs/adr/002-large-file-upload-chunking.md +++ b/docs/adr/002-large-file-upload-chunking.md @@ -234,6 +234,38 @@ Client requests {objName} → Swift reads manifest → Streams segments in order **Implementation complexity:** High +### Alternative C: Swift Dynamic Large Objects (DLO) Instead of SLO + +**Description:** Swift offers two large object mechanisms: Static Large Objects (SLO) and Dynamic Large Objects (DLO). Both are supported by the `ncw/swift` Go library used in this project. DLO uses a simpler manifest that only stores a segment prefix—Swift dynamically discovers segments matching that prefix at download time. + +**Pros:** +- Simpler manifest structure (just a prefix, no segment list) +- Segments can be added or modified after manifest creation +- Slightly simpler upload logic (no need to track segment metadata) + +**Cons:** +- **No integrity validation**: DLO manifests don't store segment ETags, so Swift cannot detect missing or corrupted segments. Downloads succeed with incomplete or wrong data rather than returning an error. +- **Race conditions**: Since segments are discovered dynamically, concurrent modifications can cause inconsistent reads (e.g., a download might see a partial set of segments if upload is in progress). +- **Unpredictable content**: The downloaded content depends on what segments exist at read time, not what was originally uploaded. If a segment is deleted (disk failure, bug, manual deletion), the file silently shrinks. +- **No size validation**: DLO cannot verify that the total size matches expectations. + +**Why SLO is preferred:** + +For a file storage system where data integrity matters, SLO provides critical guarantees that DLO lacks: + +| Aspect | SLO | DLO | +|--------|-----|-----| +| Segment list | Explicit with ETags and sizes | Dynamic prefix matching | +| Integrity check | Swift validates each segment's ETag on download | None—trusts whatever exists | +| Missing segment | Detected—connection dropped, client receives partial results | **Silently succeeds**—"happily ignores" the missing segment | +| Modified segment | Detected—ETag mismatch causes failure | Silently returns different data | +| Consistency | Immutable after creation | Can change between reads | +| Client awareness | Failure is detectable (incomplete transfer) | No way to know data is missing | + +**Note on UI chunked uploads:** DLO's flexibility (adding segments after manifest creation) might seem useful for resumable UI uploads, but it provides no real advantage. Both DLO and SLO support the same upload flow: client uploads segments, then server creates manifest on completion. + +**Implementation complexity:** Low (similar to SLO), but unacceptable data integrity trade-offs. + ## Decision **Recommended approach: Storage-Level Chunking**