Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 21 additions & 7 deletions adapter/s3.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,27 @@ const (
s3LeaderHealthPath = "/healthz/leader"
s3HealthMaxRequestBodyBytes = 1024
s3ChunkSize = 1 << 20
s3ChunkBatchOps = 16
s3XMLNamespace = "http://s3.amazonaws.com/doc/2006-03-01/"
s3DefaultRegion = "us-east-1"
s3MaxKeys = 1000
s3ListPageSize = 256
s3ManifestCleanupTimeout = 2 * time.Minute
s3MaxObjectSizeBytes = 5 * 1024 * 1024 * 1024 // 5 GiB, matching AWS S3 single PUT limit.
// s3ChunkBatchOps caps how many s3ChunkSize chunks fit in a single
// coordinator.Dispatch call. The Raft entry produced by Dispatch is
// roughly s3ChunkBatchOps × s3ChunkSize plus protobuf overhead, so
// 4 × 1 MiB = 4 MiB matches the post-PR-#593 default
// `MaxSizePerMsg = 4 MiB`. This alignment matters because
// etcd/raft sends a single entry that is *larger* than
// MaxSizePerMsg as a solo MsgApp (see util.go:limitSize), bypassing
// the documented `MaxInflight × MaxSizePerMsg` per-peer memory
// bound. With 16 × 1 MiB = 16 MiB entries the worst-case leader
// buffer was 1024 × 16 MiB = 16 GiB / peer; at 4 × 1 MiB the bound
// drops to 1024 × 4 MiB = 4 GiB / peer and matches the cap PR #593
// advertises. Per-PUT Raft commit count grows 4× (a 5 GiB PUT goes
// from 320 to 1280 entries) — absorbed by the WAL group commit
// landed in PR #600 and the smaller per-entry fsync.
s3ChunkBatchOps = 4
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

s3ChunkBatchOps を 4 に変更したことで、putObjectuploadPart での Raft エントリサイズが適切に制限されるようになりますが、一方で deleteByPrefix (line 1947) や cleanupPartBlobsAsync (line 1893) など、データ本体を含まないメタデータ操作や削除操作のバッチサイズも同時に 4 に制限されてしまいます。

削除操作はペイロードが非常に小さいため、MaxSizePerMsg (4 MiB) の制限内でもより大きなバッチ(例: 64〜128)で効率的に処理可能です。この変更により、大きなオブジェクトの削除やアップロードの中断時のクリーンアップ処理において Raft プロポーザルの回数が不必要に増加し、スループットやクリーンアップ完了までの時間に影響を与える可能性があります。データ転送を伴う書き込み用のバッチサイズと、削除・スキャン用のバッチサイズを分離することを検討してください。

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reduce chunk batch size to include protobuf overhead

Setting s3ChunkBatchOps to 4 still produces Raft payloads larger than a 4 MiB MaxSizePerMsg once encoding overhead is added (the Request/mutation protobuf plus the 1-byte raft encoding prefix in marshalRaftCommand), so full-size 4×1 MiB chunk batches continue to take etcd/raft’s oversized-first-entry path instead of truly fitting the limit. In practice this means the stated “4 MiB aligned” bound in this change is not actually guaranteed for PutObject; you need headroom (or byte-based batching) rather than 4 * s3ChunkSize exactly.

Useful? React with 👍 / 👎.

s3XMLNamespace = "http://s3.amazonaws.com/doc/2006-03-01/"
s3DefaultRegion = "us-east-1"
s3MaxKeys = 1000
s3ListPageSize = 256
s3ManifestCleanupTimeout = 2 * time.Minute
s3MaxObjectSizeBytes = 5 * 1024 * 1024 * 1024 // 5 GiB, matching AWS S3 single PUT limit.

s3TxnRetryInitialBackoff = 2 * time.Millisecond
s3TxnRetryMaxBackoff = 32 * time.Millisecond
Expand Down
Loading