Skip to content

[Storage] [Blobs] Generate Unique Block Blob Ids #59347

Open
amnguye wants to merge 3 commits into
Azure:mainfrom
amnguye:feature/storage/unique-block-id
Open

[Storage] [Blobs] Generate Unique Block Blob Ids #59347
amnguye wants to merge 3 commits into
Azure:mainfrom
amnguye:feature/storage/unique-block-id

Conversation

@amnguye
Copy link
Copy Markdown
Member

@amnguye amnguye commented May 19, 2026

Resolves #8162

Block IDs generated during partitioned uploads are now unique GUIDs instead of sequential integers. This ensures uniqueness across concurrent uploads to the same blob but means block IDs are no longer predictable or ordered.

The changes are much more involved because generating block ids.

  • when the Stage Block was occurring
  • when we call Commit Block List.

However we need to store the block id that is generated when Stage Block occurs, to the list of block ids to commit before we call Commit Block List.

Other changes:

  • Moved GenerateBlockId out from StorageExtensions -> BlobExtensions
    • We were only calling this method for Blob related methods, also it makes for easier unit testing
  • Changes to PartitionedUploaded cascaded to changes in Share Files and DataLake, but we don't use the new block id param for the changed delegate function definitions

Copilot AI review requested due to automatic review settings May 19, 2026 23:06
@github-actions github-actions Bot added the Storage Storage Service (Queues, Blobs, Files) label May 19, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Azure Storage partitioned upload plumbing so staged blocks can use unique (non-offset-based) block IDs, reducing the chance of collisions across concurrent uploads to the same blob. It does this by extending PartitionedUploader to generate and flow a per-partition blockId through staging and commit, and by introducing a new randomized block ID generator in the Blobs layer.

Changes:

  • Add block ID generation/propagation to PartitionedUploader and update Blob/DataLake/Files behaviors accordingly.
  • Move block ID generation out of Azure.Storage.Shared.StorageExtensions into Azure.Storage.Blobs (BlobExtensions.GenerateBlockId), and update blob staging/commit call sites.
  • Add/adjust tests and update changelog/asset tag.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
sdk/storage/Azure.Storage.Files.Shares/src/ShareFileClient.cs Updates partition upload delegate signatures to accept the new blockId parameter (ignored for Shares).
sdk/storage/Azure.Storage.Files.DataLake/src/DataLakeFileClient.cs Updates partition upload delegates to accept blockId and updates commit tuple deconstruction.
sdk/storage/Azure.Storage.Common/tests/PartitionedUploaderTests.cs Updates mocks/signatures for new delegate parameters and updated partition tuple shape.
sdk/storage/Azure.Storage.Common/src/Shared/StorageExtensions.cs Removes GenerateBlockId(long offset) from shared storage extensions.
sdk/storage/Azure.Storage.Common/src/Shared/PartitionedUploader.cs Adds block ID plumbing: delegate signature updates, tuple includes BlockId, and a new GenerateBlockId behavior hook.
sdk/storage/Azure.Storage.Blobs/tests/GenerateBlockIdTests.cs Adds unit tests validating base64 shape/length/uniqueness for generated IDs.
sdk/storage/Azure.Storage.Blobs/tests/BlockBlobClientOpenWriteTests.cs Switches test block ID generation to BlobExtensions.GenerateBlockId().
sdk/storage/Azure.Storage.Blobs/tests/BlockBlobClientOpenReadTests.cs Switches test block ID generation to BlobExtensions.GenerateBlockId().
sdk/storage/Azure.Storage.Blobs/tests/BlobClientOpenWriteTests.cs Switches test block ID generation to BlobExtensions.GenerateBlockId().
sdk/storage/Azure.Storage.Blobs/src/BlockBlobWriteStream.cs Switches staged block IDs from offset-derived to randomized IDs.
sdk/storage/Azure.Storage.Blobs/src/BlockBlobClient.cs Uses pre-generated partition block IDs for stage/commit and wires GenerateBlockId into PartitionedUploader.
sdk/storage/Azure.Storage.Blobs/src/BlobExtensions.cs Adds randomized Base64 block ID generator.
sdk/storage/Azure.Storage.Blobs/CHANGELOG.md Documents breaking change around block ID generation behavior (wording needs correction).
sdk/storage/Azure.Storage.Blobs/assets.json Updates asset tag.
Comments suppressed due to low confidence (3)

sdk/storage/Azure.Storage.Blobs/tests/BlockBlobClientOpenWriteTests.cs:68

  • This test is a [RecordedTest] and stages/commits blocks using a randomized block ID. Because blockid is part of the request URI and the block list is part of the request body, playback matching will fail unless the recording infrastructure ignores/sanitizes blockid and/or disables body comparison for these requests. Consider generating deterministic block IDs in non-Live modes (e.g., based on position or Recording.Random) or updating the storage test sanitizers/matchers accordingly.
                    break;
                }

                string blockId = BlobExtensions.GenerateBlockId();
                await client.StageBlockAsync(blockId, new MemoryStream(buffer, 0, lastReadSize));
                blockIds.Add(blockId);
            }
            await client.CommitBlockListAsync(blockIds);

sdk/storage/Azure.Storage.Blobs/tests/BlockBlobClientOpenReadTests.cs:94

  • This test is a recorded test and stages/commits blocks using a randomized block ID. Since blockid appears in the request URI and the committed block list appears in the request body, playback matching will fail unless the recording infrastructure ignores/sanitizes these values or disables body comparison for the relevant requests. Consider using deterministic block IDs in non-Live modes (e.g., derived from position or Recording.Random) or updating the storage test sanitizers/matchers accordingly.
                    break;
                }

                string blockId = BlobExtensions.GenerateBlockId();
                await client.StageBlockAsync(blockId, new MemoryStream(buffer, 0, lastReadSize));
                blockIds.Add(blockId);
            }
            await client.CommitBlockListAsync(blockIds);
        }

sdk/storage/Azure.Storage.Blobs/tests/BlobClientOpenWriteTests.cs:87

  • This is a recorded test that stages/commits blocks using a randomized block ID. Because blockid is part of the request URI and the block list is part of the request body, playback matching will fail unless the recording infrastructure ignores/sanitizes these values or disables body comparison. Consider generating deterministic block IDs in non-Live modes (e.g., derived from position or Recording.Random) or updating the storage test sanitizers/matchers accordingly.
                {
                    break;
                }

                string blockId = BlobExtensions.GenerateBlockId();
                await blockClient.StageBlockAsync(blockId, new MemoryStream(buffer, 0, lastReadSize));
                blockIds.Add(blockId);
            }
            await blockClient.CommitBlockListAsync(blockIds);

Comment on lines 52 to 56
return string.Join("/", split);
}

public static string GenerateBlockId(long offset)
{
// TODO #8162 - Add in a random GUID so multiple simultaneous
// uploads won't stomp on each other and the first to commit wins.
// This will require some changes to our test framework's
// RecordedClientRequestIdPolicy.
byte[] id = new byte[48]; // 48 raw bytes => 64 byte string once Base64 encoded
BitConverter.GetBytes(offset).CopyTo(id, 0);
return Convert.ToBase64String(id);
}

public static async Task<HttpAuthorization> GetCopyAuthorizationHeaderAsync(
this TokenCredential tokenCredential,
Comment on lines +1807 to +1814
public static string GenerateBlockId()
{
// Block ID must be a valid Base64 string with length <= 64 characters (before URL-encoding).
// 48 raw bytes => 64 character Base64 string.
byte[] id = new byte[48];
using (var rng = System.Security.Cryptography.RandomNumberGenerator.Create())
{
rng.GetBytes(id);
### Features Added

### Breaking Changes
- Block IDs generated during partitioned uploads are now unique GUIDs instead of sequential integers. This ensures uniqueness across concurrent uploads to the same blob but means block IDs are no longer predictable or ordered.
Comment on lines 3464 to 3469
async,
cancellationToken).ConfigureAwait(false),
Scope = operationName => client.ClientConfiguration.ClientDiagnostics.CreateScope(operationName
?? $"{nameof(Azure)}.{nameof(Storage)}.{nameof(Blobs)}.{nameof(BlobClient)}.{nameof(Storage.Blobs.BlobClient.Upload)}")
?? $"{nameof(Azure)}.{nameof(Storage)}.{nameof(Blobs)}.{nameof(BlobClient)}.{nameof(Storage.Blobs.BlobClient.Upload)}"),
GenerateBlockId = BlobExtensions.GenerateBlockId
};
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Storage Storage Service (Queues, Blobs, Files)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[P1] Storage: Use randomized 64 byte block ID when staging multiple upload blocks

2 participants