Skip to content

Add file-based transfer APIs to build cache provider interface and all cache plugins#5746

Open
Copilot wants to merge 45 commits intomainfrom
copilot/stream-cache-entry-for-http-plugin
Open

Add file-based transfer APIs to build cache provider interface and all cache plugins#5746
Copilot wants to merge 45 commits intomainfrom
copilot/stream-cache-entry-for-http-plugin

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 5, 2026

Summary

Adds optional file-based transfer APIs to the build cache provider interface (ICloudBuildCacheProvider) and implements them across all cache plugins (HTTP, Amazon S3, Azure Blob Storage). When enabled via the useDirectFileTransfersForBuildCache experiment flag, cache entries are transferred directly between local files and cloud storage without buffering entire contents in memory, preventing out-of-memory errors for large build outputs.

Details

Core changes:

  • ICloudBuildCacheProvider gains two optional methods: tryDownloadCacheEntryToFileAsync and tryUploadCacheEntryFromFileAsync. Providers that don't implement them gracefully fall back to the existing buffer-based APIs.
  • OperationBuildCache conditionally uses the file-based path when useDirectFileTransfersForBuildCache is enabled and the provider supports it. Includes cleanup of partial files on failed downloads.
  • WebClient is refactored to extract a shared _makeRawRequestAsync core used by both buffer and streaming request paths, with a new fetchStreamAsync method and Content-Encoding decompression support for streaming responses.
  • FileSystem in node-core-library gains createReadStream, createWriteStream, and createWriteStreamAsync methods (wrapped in _wrapException for consistent error handling).
  • FileSystemBuildCacheProvider is simplified — the stream method is removed since cloud providers now handle file I/O directly.

Plugin implementations:

  • HTTP plugin: Downloads via fetchStreamAsyncpipeline() to file. Uploads via createReadStreamfetchStreamAsync. Uses maxAttempts: 1 for uploads (stream consumed after first attempt), with credential fallback skipped for stream bodies.
  • Amazon S3 plugin: Downloads via streaming GET with retry support. Uploads hash the tarball on disk first via _hashFileAsync, then stream with the SHA-256 hash included in the AWS Signature V4 request — restoring full payload signing (no more UNSIGNED_PAYLOAD). No retry on uploads.
  • Azure Blob Storage plugin: Downloads via SDK-native blobClient.downloadToFile(). Uploads via blockBlobClient.uploadFile(). Parent directory creation ensured before download.

Gating:

  • Controlled by useDirectFileTransfersForBuildCache in experiments.json. Defaults to off. Falls back to buffer-based APIs if the cloud provider plugin doesn't implement the file-based methods.

How it was tested

  • Unit tests for HttpBuildCacheProvider (14 tests): buffer and file-based GET/SET, 404 cache miss, credential fallback skip for file uploads, write-not-allowed checks, retry behavior, pipeline assertions
  • Unit tests for AmazonS3Client (38 tests): buffer and file-based GET/SET, signed payload hash verification (not UNSIGNED-PAYLOAD), download retry on transient 5xx, no-retry on upload, credential validation, pipeline assertions
  • All existing tests continue to pass (51 S3 total, 14 HTTP total)
  • Verified backwards compatibility: when the experiment flag is off (default), all code paths are identical to the pre-change behavior

Impacted documentation

  • experiments.schema.json updated with useDirectFileTransfersForBuildCache description
  • common/reviews/api/rush-lib.api.md updated with new API surface
  • common/reviews/api/node-core-library.api.md updated with new FileSystem stream methods

@iclanton
Copy link
Copy Markdown
Member

iclanton commented Apr 5, 2026

@copilot - add APIs to FileSystem in node-core-library for the fs APIs you're making use of in this PR. Ideally, none of our projects (except for FileSystem itself) should use fs. Everything should use FileSystem. The change to node-core-library will also need a changefile. Take a look at the conventions in FileSystem and follow that pattern.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 5, 2026

@copilot - add APIs to FileSystem in node-core-library for the fs APIs you're making use of in this PR. Ideally, none of our projects (except for FileSystem itself) should use fs. Everything...

Added createReadStream() and createWriteStream() static methods to FileSystem in node-core-library, along with FileSystemReadStream and FileSystemWriteStream type aliases (following the same pattern as FileSystemStats and FolderItem). Updated FileSystemBuildCacheProvider and OperationBuildCache to use the new FileSystem APIs instead of importing from node:fs directly. Change file for @rushstack/node-core-library included as a minor bump. Commit: 468caf7

Copilot AI requested a review from iclanton April 5, 2026 04:32
@iclanton
Copy link
Copy Markdown
Member

iclanton commented Apr 5, 2026

@copilot - are there async variants of those fs APIs? Should we expose and use those too?

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 5, 2026

@copilot - are there async variants of those fs APIs? Should we expose and use those too?

No, fs.createReadStream() and fs.createWriteStream() don't have async variants — they're already inherently asynchronous. They return stream objects immediately and all the actual I/O happens asynchronously through the stream events/pipeline. The fs/promises module doesn't include them for this reason.

@iclanton
Copy link
Copy Markdown
Member

iclanton commented Apr 5, 2026

@copilot - take a look at the other FileSystem APIs. Many have an optional options parameter with useful utilities. For example, the write operations have an option to create the parent folder if it doesn't exist. Consider adding useful options to these new APIs

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 5, 2026

@copilot - take a look at the other FileSystem APIs. Many have an optional options parameter with useful utilities. For example, the write operations have an option to create the parent folder if it...

Added IFileSystemCreateWriteStreamOptions with ensureFolderExists to createWriteStream(), following the same pattern as IFileSystemWriteBinaryFileOptions and other write APIs. Updated FileSystemBuildCacheProvider to use the new option instead of manual ensureFolderAsync(). For createReadStream, there aren't useful options from the existing FileSystem conventions (read methods use encoding/convertLineEndings which don't apply to raw streams). Commit: 91161d8

@iclanton
Copy link
Copy Markdown
Member

iclanton commented Apr 5, 2026

@copilot - can we add stream support to the other cache plugins?

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 5, 2026

@copilot - can we add stream support to the other cache plugins?

Added streaming support to both remaining cache plugins:

Amazon S3 plugin (733ef5c):

  • Refactored AmazonS3Client._makeRequestAsync to extract signing into _buildSignedRequest
  • Added _makeStreamRequestAsync using WebClient.fetchStreamAsync() with UNSIGNED-PAYLOAD for AWS Sig V4 signing of streaming uploads
  • Added getObjectStreamAsync() (with retry support) and uploadObjectStreamAsync() (single attempt, since consumed streams can't be replayed)
  • Added tryGetCacheEntryStreamByIdAsync and trySetCacheEntryStreamAsync to AmazonS3BuildCacheProvider

Azure Storage plugin (733ef5c):

  • Added tryGetCacheEntryStreamByIdAsync using blobClient.download() with readableStreamBody
  • Added trySetCacheEntryStreamAsync using blockBlobClient.uploadStream()
  • Added _logBlobError helper for consistent error handling across buffer and stream methods

All three cache plugins (HTTP, S3, Azure) now implement both streaming methods. All existing tests pass (45 S3, 10 Azure).

Copilot AI changed the title Add streaming APIs to build cache provider interface and HTTP plugin Add streaming APIs to build cache provider interface and all cache plugins Apr 5, 2026
@iclanton
Copy link
Copy Markdown
Member

iclanton commented Apr 5, 2026

@copilot - Can you put usage of this functionality behind a Rush experiment?

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 5, 2026

@copilot - Can you put usage of this functionality behind a Rush experiment?

Added a useStreamingBuildCache experiment flag in experiments.json. The streaming code paths in OperationBuildCache now only activate when both the experiment is enabled AND the cloud provider implements the streaming methods. Without the experiment enabled, all cache operations use the existing buffer-based approach. Commit: 2493ed8

Copy link
Copy Markdown
Member

@iclanton iclanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - This change introduces a lot of duplicated code. I've called out some instances of it. Can you review with that in mind and clean that up?

iclanton and others added 15 commits April 6, 2026 15:30
…ype compatibility

Agent-Logs-Url: https://github.com/microsoft/rushstack/sessions/d8f5e137-3cb0-4ed9-8391-50d7a33d804e

Co-authored-by: iclanton <5010588+iclanton@users.noreply.github.com>
…error logging, stream cleanup, and nits

Agent-Logs-Url: https://github.com/microsoft/rushstack/sessions/d0190bf7-a346-4a71-93bf-d5375d98b552

Co-authored-by: iclanton <5010588+iclanton@users.noreply.github.com>
…alue

The streaming path was fixed in 8becd08 but the buffer path still
printed the entire encodings array instead of the individual value.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Deduplicate the S3 prefix logic (repeated 4 times) into a helper,
and extract the write-permission guard (repeated in buffer and stream
set methods) to match the HTTP provider's pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both the buffer and streaming response paths duplicated the same
Content-Encoding header parsing logic. Extract into a shared helper
that returns a parsed string array or undefined.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the outer-scoped cacheEntryBuffer (which was only used as
a boolean flag at the cache-miss check) with an explicit cloudCacheHit
boolean. This scopes the buffer into the else branch and makes the
streaming path set the flag consistently too.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
A new WebClient was being constructed on every call to
_makeHttpCoreRequestAsync. Since the provider never configures any
WebClient instance state, a single class member suffices.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace verbose `terminal: terminal`, `headers: headers`, `body: body`,
and `credential: credential` with ES6 shorthand property syntax.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix credential fallback test: mock CredentialCache so cached credentials
  are available, making the test actually validate the stream-body guard
  (previously, the test passed trivially because _tryGetCredentialsAsync
  would throw before making a second request)
- Fix 504 statusText from 'BadGateway' to 'Gateway Timeout'
- Replace fragile S3 upload snapshot that captured Readable internals
  (breaks on Node.js upgrades) with targeted assertions on URL, verb,
  headers, and body identity
- Replace fail() + try/catch with expect().rejects.toThrow()
- Move Readable import to top of S3 test file instead of per-test
  dynamic import

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@iclanton iclanton force-pushed the copilot/stream-cache-entry-for-http-plugin branch from ea777a0 to f9ca5b7 Compare April 6, 2026 22:31
iclanton and others added 12 commits April 7, 2026 01:14
Replace stream-based ICloudBuildCacheProvider methods with file-path-based
alternatives that give providers full control over I/O:
- tryGetCacheEntryStreamByIdAsync → tryGetCacheEntryToFileAsync
- trySetCacheEntryStreamAsync → trySetCacheEntryFromFileAsync

Key improvements:
- S3 uploads now hash the tarball on disk before streaming, restoring
  AWS Signature V4 payload signing (removes UNSIGNED_PAYLOAD)
- Azure provider uses SDK-native uploadFile/downloadToFile
- HTTP provider uses FileSystem.createReadStream/createWriteStreamAsync
- Providers that don't need pre-upload computation (HTTP, Azure) don't
  pay the cost of hashing
- Experiment renamed to useDirectFileTransfersForBuildCache

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- tryGetCacheEntryToFileAsync → tryDownloadCacheEntryToFileAsync
- trySetCacheEntryFromFileAsync → tryUploadCacheEntryFromFileAsync

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mock FileSystem.createReadStream, createWriteStreamAsync, and
ensureFolderAsync via jest.spyOn instead of module-level fs mocks.
This is more targeted, less brittle, and consistent with the
production code's use of the FileSystem abstraction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
If tryDownloadCacheEntryToFileAsync throws mid-download, a corrupt
partial file could be left at the target path. On the next build,
tryGetCacheEntryPathByIdAsync would find it and try to untar it.
Delete the file in the catch block to prevent this.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The HTTP and S3 providers both ensure the parent directory exists
before writing the cache entry file. The Azure provider was missing
this, which would cause failures on a fresh machine where the build
cache folder hasn't been created yet.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rush change file: replace "streaming APIs" with file-based transfer
  API names and mention the experiment flag
- node-core-library change file: fix broken backtick formatting
- Azure provider: update JSDoc from "stream" to "file-based"
- HTTP test names: replace "stream consumed" / "stream bodies" with
  file-based language

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The delay variable is in milliseconds but the log message said "s".
This produced misleading output like "Will retry request in 4000s...".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… signing

Remove unused onBlobAlreadyExists parameter from _trySetBlobDataAsync

This callback was previously used to drain incoming streams when the
blob already existed. With the switch to file-based APIs, no callers
pass this parameter anymore.
…ore-library and update consumers to use FileSystem instead of fs directly

Wrap FileSystem stream methods in _wrapException for consistent error handling
- HTTP: add 404 cache miss test for tryDownloadCacheEntryToFileAsync
- HTTP: add pipeline assertion in download success test
- S3: add retry test for downloadObjectToFileAsync on transient 5xx
- S3: add pipeline assertions in download success/miss tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
For stream-body requests, the log read "unknown bytes" which is
awkward. Change to "unknown length" for clarity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@iclanton iclanton changed the title [rush] Add streaming APIs to build cache provider interface and all cache plugins Add file-based transfer APIs to build cache provider interface and all cache plugins Apr 7, 2026
type FileSystemCopyFilesAsyncFilter,
type FileSystemCopyFilesFilter,
type FileSystemReadStream,
type FileSystemWriteStream,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this should be it's own PR.

Comment on lines +557 to +559
"Authorization": "AWS4-HMAC-SHA256 Credential=accessKeyId/20200418/us-east-1/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=194608e9e7ba6d8aa4a019b3b6fd237e6b09ef1f45ff7fa60cbb81c1875538be",
"x-amz-content-sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"x-amz-date": "20200418T123242Z",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks rather sketchy to be here, even if I realize it only talks to localhost

});
});

describe('File-based requests', () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An approach I've found helpful for this sort of code is to use a localhost server instead of mocking everything.

case WebClientProxy.Fiddler:
// For debugging, disable cert validation
// eslint-disable-next-line
process.env['NODE_TLS_REJECT_UNAUTHORIZED'] = '0';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should just be able to inject the CA cert? There are a few different ways to have NodeJS pick up the system CAs; environment variable, CLI parameter, or you can manually update the set of accepted CAs when initializing the connection.

body: entryStream,
warningText: 'Could not write cache entry',
// Streaming uploads cannot be retried because the stream is consumed
maxAttempts: 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then why is this even an option to the function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Needs triage

Development

Successfully merging this pull request may close these issues.

4 participants