Skip to content

Add Idempotency-Key header support for upload creation requests#1366

Open
m7kvqbe1 wants to merge 9 commits into
tus:mainfrom
m7kvqbe1:idempotency-key
Open

Add Idempotency-Key header support for upload creation requests#1366
m7kvqbe1 wants to merge 9 commits into
tus:mainfrom
m7kvqbe1:idempotency-key

Conversation

@m7kvqbe1
Copy link
Copy Markdown

Summary

  • Add support for the Idempotency-Key header on upload creation (POST) requests, allowing clients to safely retry without creating duplicate uploads
  • Introduce an IdempotencyKeyStore interface with two implementations: file-based (for disk storage backends, persists across restarts) and in-memory (for cloud storage backends)
  • Handle interrupted concatenation state: retry if offset is 0, return existing if complete, error if corrupted (0 < offset < size)
  • Wire up the appropriate store automatically in the CLI based on the active storage backend

Motivation

Closes #1364

Upload creation requests (particularly concatenation) are not currently idempotent. When a concat request is interrupted (e.g. by a WAF timeout), the client never receives the response and retries, creating a duplicate final upload.

This implements the approach suggested by @Acconut in #1364: a general-purpose Idempotency-Key header that works for all upload creation requests, not just concatenation. For the filestore backend, key-to-upload-ID mappings are persisted to disk alongside upload data (as .idempotency-key files), so they survive server restarts.

See also #1365 for a previous concat-only approach.

Design

New interface (IdempotencyKeyStore) added to StoreComposer, following the same pattern as Locker:

type IdempotencyKeyStore interface {
    FindUploadID(ctx context.Context, key string) (string, error)
    StoreUploadID(ctx context.Context, key string, uploadID string) error
}

Two implementations shipped:

Backend Store Durability
filestore fileidempotencystore (disk) Survives restarts
S3/GCS/Azure memoryidempotencystore (memory) Within process lifetime

Handler logic (in PostFile):

  1. Read Idempotency-Key header
  2. If store configured and key present: look up existing upload
  3. If found and complete: return existing (201 + Location)
  4. If found and incomplete concat (offset==0): retry concatenation
  5. If found and corrupted concat (0 < offset < size): return 500
  6. If found and non-concat: return existing upload info for PATCH resume
  7. If not found: create normally, store key->ID mapping

Opt-out: --disable-idempotency CLI flag.

Test plan

  • TestIdempotency/ConcatRetryComplete -- retried concat with already-complete upload returns existing
  • TestIdempotency/ConcatRetryIncomplete -- retried concat with offset==0 retries the concatenation
  • TestIdempotency/ConcatRetryCorrupted -- retried concat with partial data returns error
  • TestIdempotency/RegularUploadRetry -- retried regular upload returns existing with offset
  • TestIdempotency/NoStoreConfigured -- header ignored when no store configured
  • TestIdempotency/NoHeaderSent -- normal flow when no header
  • TestIdempotency/DeletedUploadFallsThrough -- stale mapping to deleted upload creates new upload
  • TestFileIdempotencyStore -- file-based store unit tests
  • All existing handler tests continue to pass
  • Full go test ./pkg/... passes

Introduce the IdempotencyKeyStore interface for persisting mappings from
client-provided Idempotency-Key header values to upload IDs, enabling
detection of retried upload creation requests.

Add the interface to StoreComposer following the same pattern as Locker,
with UseIdempotencyKeyStore() and Capabilities() support.

Ref: https://www.ietf.org/archive/id/draft-ietf-httpapi-idempotency-key-header-07.html
Persist idempotency key to upload ID mappings as JSON files on disk,
using {sha256(key)}.idempotency-key filenames. Stores the full original
key inside the file to guard against hash collisions.

Follows the filelocker pattern: New(path), UseIn(composer), and the
same directory can be shared with upload data.
In-memory store for cloud storage backends (S3, GCS, Azure) where no
local disk is available. Mappings are lost on process restart but still
protect against duplicate uploads within a single server lifetime.
When a client sends an Idempotency-Key header with a POST request and
an IdempotencyKeyStore is configured, the handler detects retried upload
creation requests:

- If the key maps to a completed concat upload: return it immediately
- If the key maps to an incomplete concat (offset==0): retry concat
- If the key maps to a corrupted concat (0 < offset < size): return error
- If the key maps to a non-concat upload: return it for PATCH resume
- If no mapping exists: create normally and store the key->ID mapping

Also adds Idempotency-Key to CORS AllowHeaders and updates CORS tests.
Configure the appropriate IdempotencyKeyStore for each storage backend:
- filestore: file-based store (persists to same directory as uploads)
- S3/GCS/Azure: in-memory store (lost on restart, but protects within
  a single server lifetime)

Add --disable-idempotency flag to opt out of Idempotency-Key support.
Cover the key scenarios:
- Concat retry when already complete (returns existing)
- Concat retry when incomplete at offset 0 (retries concat)
- Concat retry when corrupted (returns error)
- Regular upload retry (returns existing for PATCH resume)
- No idempotency store configured (header ignored)
- No header sent (normal flow)
- Deleted upload falls through to new creation
@Murderlon
Copy link
Copy Markdown

Review by Devin: https://app.devin.ai/review/tus/tusd/pull/1366

os.WriteFile uses O_TRUNC, so a crash between truncation and completed
write leaves an empty or partial JSON file. FindUploadID then returns a
raw json.Unmarshal error, which the handler treats as a hard 500 —
permanently breaking that idempotency key.

Fix both the cause and the symptom:
- StoreUploadID now writes to a .tmp file then renames atomically
- FindUploadID treats corrupted JSON as ErrNotFound (cache miss) so the
  handler falls through to create a new upload and overwrites the file
emitFinishEvents increments UploadsFinished metrics and sends to
CompleteUploads on every call. For an idempotent replay of an
already-completed concat upload, hooks and metrics should not fire
again — consistent with the non-concat replay path which correctly
skips them.

Only run PreFinishResponseCallback so callers can still inject
response headers on replay. Update the ConcatRetryComplete test to
no longer expect a spurious CompleteUploads event.
Add DirModePerm and FileModePerm fields to FileIdempotencyStore,
matching the filestore pattern. Fall back to 0775/0664 defaults when
zero. Wire the CLI flags through so .idempotency-key files are created
with the same permissions as upload data files.
@m7kvqbe1
Copy link
Copy Markdown
Author

Review by Devin: https://app.devin.ai/review/tus/tusd/pull/1366

Thanks @Murderlon.

I've added some fixups to this PR based on Devin's review:

@Murderlon
Copy link
Copy Markdown

cc @Acconut

Copy link
Copy Markdown
Member

@Acconut Acconut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for opening this PR. I'm wondering if this is actually capable of solving the timeout issues when concatenating large uploads. Tusd would just retry the concatenating operation when it detects an existing upload resource with the same idempotency key.

But if we assume that the concatenation operation times out because the underlying process is too slow, retrying the operation would likely not help. And tusd aborts the concatenation operation whenever the request times out on the client or proxy side. Unless the process is just sometimes too slow, and there is a chance that it will be faster at the new attempt, retrying offers little improvement. Tusd would actually have to resume the concatenation operation, but I'm not sure this is possible for all storage implementations. It might be doable for filestore and gcsstore, but not for s3store as far as I can tell.

Idempotency keys are a helpful tool to prevent duplicate upload resources from being created, but it might not be that helpful for solving concatenation failures on its own as I initially assumed. If we take a look at the original issue of duplicate uploads because of failed concatenations, the duplicates are just a symptom, but not the actual fault. That is the concatenation failing in the first place due to the request timing out. To prevent that the concatenation could be run in a background process, which has more time to complete than the short-lived request context. So even if the request times out after X seconds, the concatenation can continue. Deduplication via Idempotency-Key would still play an important role here as it ensures that client retrieves the original upload resources when it retries the upload concatenation request.

What do you think about this idea? The hurdle to getting this in tusd is higher since it currently doesn't have a system for running tasks in the background. But we might want to have this for other features as well (e.g. cleaning up old uploads).

@m7kvqbe1
Copy link
Copy Markdown
Author

#1364 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Concatenation operation is not idempotent, causing duplicate uploads on retried requests

3 participants