[Storage] [Blobs] Generate Unique Block Blob Ids #59347
Open
amnguye wants to merge 3 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates Azure Storage partitioned upload plumbing so staged blocks can use unique (non-offset-based) block IDs, reducing the chance of collisions across concurrent uploads to the same blob. It does this by extending PartitionedUploader to generate and flow a per-partition blockId through staging and commit, and by introducing a new randomized block ID generator in the Blobs layer.
Changes:
- Add block ID generation/propagation to
PartitionedUploaderand update Blob/DataLake/Files behaviors accordingly. - Move block ID generation out of
Azure.Storage.Shared.StorageExtensionsintoAzure.Storage.Blobs(BlobExtensions.GenerateBlockId), and update blob staging/commit call sites. - Add/adjust tests and update changelog/asset tag.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/storage/Azure.Storage.Files.Shares/src/ShareFileClient.cs | Updates partition upload delegate signatures to accept the new blockId parameter (ignored for Shares). |
| sdk/storage/Azure.Storage.Files.DataLake/src/DataLakeFileClient.cs | Updates partition upload delegates to accept blockId and updates commit tuple deconstruction. |
| sdk/storage/Azure.Storage.Common/tests/PartitionedUploaderTests.cs | Updates mocks/signatures for new delegate parameters and updated partition tuple shape. |
| sdk/storage/Azure.Storage.Common/src/Shared/StorageExtensions.cs | Removes GenerateBlockId(long offset) from shared storage extensions. |
| sdk/storage/Azure.Storage.Common/src/Shared/PartitionedUploader.cs | Adds block ID plumbing: delegate signature updates, tuple includes BlockId, and a new GenerateBlockId behavior hook. |
| sdk/storage/Azure.Storage.Blobs/tests/GenerateBlockIdTests.cs | Adds unit tests validating base64 shape/length/uniqueness for generated IDs. |
| sdk/storage/Azure.Storage.Blobs/tests/BlockBlobClientOpenWriteTests.cs | Switches test block ID generation to BlobExtensions.GenerateBlockId(). |
| sdk/storage/Azure.Storage.Blobs/tests/BlockBlobClientOpenReadTests.cs | Switches test block ID generation to BlobExtensions.GenerateBlockId(). |
| sdk/storage/Azure.Storage.Blobs/tests/BlobClientOpenWriteTests.cs | Switches test block ID generation to BlobExtensions.GenerateBlockId(). |
| sdk/storage/Azure.Storage.Blobs/src/BlockBlobWriteStream.cs | Switches staged block IDs from offset-derived to randomized IDs. |
| sdk/storage/Azure.Storage.Blobs/src/BlockBlobClient.cs | Uses pre-generated partition block IDs for stage/commit and wires GenerateBlockId into PartitionedUploader. |
| sdk/storage/Azure.Storage.Blobs/src/BlobExtensions.cs | Adds randomized Base64 block ID generator. |
| sdk/storage/Azure.Storage.Blobs/CHANGELOG.md | Documents breaking change around block ID generation behavior (wording needs correction). |
| sdk/storage/Azure.Storage.Blobs/assets.json | Updates asset tag. |
Comments suppressed due to low confidence (3)
sdk/storage/Azure.Storage.Blobs/tests/BlockBlobClientOpenWriteTests.cs:68
- This test is a
[RecordedTest]and stages/commits blocks using a randomized block ID. Becauseblockidis part of the request URI and the block list is part of the request body, playback matching will fail unless the recording infrastructure ignores/sanitizesblockidand/or disables body comparison for these requests. Consider generating deterministic block IDs in non-Live modes (e.g., based on position orRecording.Random) or updating the storage test sanitizers/matchers accordingly.
break;
}
string blockId = BlobExtensions.GenerateBlockId();
await client.StageBlockAsync(blockId, new MemoryStream(buffer, 0, lastReadSize));
blockIds.Add(blockId);
}
await client.CommitBlockListAsync(blockIds);
sdk/storage/Azure.Storage.Blobs/tests/BlockBlobClientOpenReadTests.cs:94
- This test is a recorded test and stages/commits blocks using a randomized block ID. Since
blockidappears in the request URI and the committed block list appears in the request body, playback matching will fail unless the recording infrastructure ignores/sanitizes these values or disables body comparison for the relevant requests. Consider using deterministic block IDs in non-Live modes (e.g., derived from position orRecording.Random) or updating the storage test sanitizers/matchers accordingly.
break;
}
string blockId = BlobExtensions.GenerateBlockId();
await client.StageBlockAsync(blockId, new MemoryStream(buffer, 0, lastReadSize));
blockIds.Add(blockId);
}
await client.CommitBlockListAsync(blockIds);
}
sdk/storage/Azure.Storage.Blobs/tests/BlobClientOpenWriteTests.cs:87
- This is a recorded test that stages/commits blocks using a randomized block ID. Because
blockidis part of the request URI and the block list is part of the request body, playback matching will fail unless the recording infrastructure ignores/sanitizes these values or disables body comparison. Consider generating deterministic block IDs in non-Live modes (e.g., derived from position orRecording.Random) or updating the storage test sanitizers/matchers accordingly.
{
break;
}
string blockId = BlobExtensions.GenerateBlockId();
await blockClient.StageBlockAsync(blockId, new MemoryStream(buffer, 0, lastReadSize));
blockIds.Add(blockId);
}
await blockClient.CommitBlockListAsync(blockIds);
Comment on lines
52
to
56
| return string.Join("/", split); | ||
| } | ||
|
|
||
| public static string GenerateBlockId(long offset) | ||
| { | ||
| // TODO #8162 - Add in a random GUID so multiple simultaneous | ||
| // uploads won't stomp on each other and the first to commit wins. | ||
| // This will require some changes to our test framework's | ||
| // RecordedClientRequestIdPolicy. | ||
| byte[] id = new byte[48]; // 48 raw bytes => 64 byte string once Base64 encoded | ||
| BitConverter.GetBytes(offset).CopyTo(id, 0); | ||
| return Convert.ToBase64String(id); | ||
| } | ||
|
|
||
| public static async Task<HttpAuthorization> GetCopyAuthorizationHeaderAsync( | ||
| this TokenCredential tokenCredential, |
Comment on lines
+1807
to
+1814
| public static string GenerateBlockId() | ||
| { | ||
| // Block ID must be a valid Base64 string with length <= 64 characters (before URL-encoding). | ||
| // 48 raw bytes => 64 character Base64 string. | ||
| byte[] id = new byte[48]; | ||
| using (var rng = System.Security.Cryptography.RandomNumberGenerator.Create()) | ||
| { | ||
| rng.GetBytes(id); |
| ### Features Added | ||
|
|
||
| ### Breaking Changes | ||
| - Block IDs generated during partitioned uploads are now unique GUIDs instead of sequential integers. This ensures uniqueness across concurrent uploads to the same blob but means block IDs are no longer predictable or ordered. |
Comment on lines
3464
to
3469
| async, | ||
| cancellationToken).ConfigureAwait(false), | ||
| Scope = operationName => client.ClientConfiguration.ClientDiagnostics.CreateScope(operationName | ||
| ?? $"{nameof(Azure)}.{nameof(Storage)}.{nameof(Blobs)}.{nameof(BlobClient)}.{nameof(Storage.Blobs.BlobClient.Upload)}") | ||
| ?? $"{nameof(Azure)}.{nameof(Storage)}.{nameof(Blobs)}.{nameof(BlobClient)}.{nameof(Storage.Blobs.BlobClient.Upload)}"), | ||
| GenerateBlockId = BlobExtensions.GenerateBlockId | ||
| }; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #8162
Block IDs generated during partitioned uploads are now unique GUIDs instead of sequential integers. This ensures uniqueness across concurrent uploads to the same blob but means block IDs are no longer predictable or ordered.
The changes are much more involved because generating block ids.
However we need to store the block id that is generated when Stage Block occurs, to the list of block ids to commit before we call Commit Block List.
Other changes: