-
Notifications
You must be signed in to change notification settings - Fork 2
perf(s3): align Raft entry size with MaxSizePerMsg via s3ChunkBatchOps=4 #636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -35,13 +35,27 @@ const ( | |
| s3LeaderHealthPath = "/healthz/leader" | ||
| s3HealthMaxRequestBodyBytes = 1024 | ||
| s3ChunkSize = 1 << 20 | ||
| s3ChunkBatchOps = 16 | ||
| s3XMLNamespace = "http://s3.amazonaws.com/doc/2006-03-01/" | ||
| s3DefaultRegion = "us-east-1" | ||
| s3MaxKeys = 1000 | ||
| s3ListPageSize = 256 | ||
| s3ManifestCleanupTimeout = 2 * time.Minute | ||
| s3MaxObjectSizeBytes = 5 * 1024 * 1024 * 1024 // 5 GiB, matching AWS S3 single PUT limit. | ||
| // s3ChunkBatchOps caps how many s3ChunkSize chunks fit in a single | ||
| // coordinator.Dispatch call. The Raft entry produced by Dispatch is | ||
| // roughly s3ChunkBatchOps × s3ChunkSize plus protobuf overhead, so | ||
| // 4 × 1 MiB = 4 MiB matches the post-PR-#593 default | ||
| // `MaxSizePerMsg = 4 MiB`. This alignment matters because | ||
| // etcd/raft sends a single entry that is *larger* than | ||
| // MaxSizePerMsg as a solo MsgApp (see util.go:limitSize), bypassing | ||
| // the documented `MaxInflight × MaxSizePerMsg` per-peer memory | ||
| // bound. With 16 × 1 MiB = 16 MiB entries the worst-case leader | ||
| // buffer was 1024 × 16 MiB = 16 GiB / peer; at 4 × 1 MiB the bound | ||
| // drops to 1024 × 4 MiB = 4 GiB / peer and matches the cap PR #593 | ||
| // advertises. Per-PUT Raft commit count grows 4× (a 5 GiB PUT goes | ||
| // from 320 to 1280 entries) — absorbed by the WAL group commit | ||
| // landed in PR #600 and the smaller per-entry fsync. | ||
| s3ChunkBatchOps = 4 | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Setting Useful? React with 👍 / 👎. |
||
| s3XMLNamespace = "http://s3.amazonaws.com/doc/2006-03-01/" | ||
| s3DefaultRegion = "us-east-1" | ||
| s3MaxKeys = 1000 | ||
| s3ListPageSize = 256 | ||
| s3ManifestCleanupTimeout = 2 * time.Minute | ||
| s3MaxObjectSizeBytes = 5 * 1024 * 1024 * 1024 // 5 GiB, matching AWS S3 single PUT limit. | ||
|
|
||
| s3TxnRetryInitialBackoff = 2 * time.Millisecond | ||
| s3TxnRetryMaxBackoff = 32 * time.Millisecond | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s3ChunkBatchOpsを 4 に変更したことで、putObjectやuploadPartでの Raft エントリサイズが適切に制限されるようになりますが、一方でdeleteByPrefix(line 1947) やcleanupPartBlobsAsync(line 1893) など、データ本体を含まないメタデータ操作や削除操作のバッチサイズも同時に 4 に制限されてしまいます。削除操作はペイロードが非常に小さいため、
MaxSizePerMsg(4 MiB) の制限内でもより大きなバッチ(例: 64〜128)で効率的に処理可能です。この変更により、大きなオブジェクトの削除やアップロードの中断時のクリーンアップ処理において Raft プロポーザルの回数が不必要に増加し、スループットやクリーンアップ完了までの時間に影響を与える可能性があります。データ転送を伴う書き込み用のバッチサイズと、削除・スキャン用のバッチサイズを分離することを検討してください。