Skip to content

Add streaming APIs to build cache provider interface and all cache plugins#5746

Draft
Copilot wants to merge 26 commits intomainfrom
copilot/stream-cache-entry-for-http-plugin
Draft

Add streaming APIs to build cache provider interface and all cache plugins#5746
Copilot wants to merge 26 commits intomainfrom
copilot/stream-cache-entry-for-http-plugin

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 5, 2026

  • Address review comments: fix response.resume race, stream retry bug, error logging, stream cleanup, and nits
  • Add unit tests for streaming cache functionality
    • HttpBuildCacheProvider tests: Add streaming GET/SET tests, credential fallback skip for streams, write-not-allowed check, retry behavior (maxAttempts=1 for streams) — expanded from 2 to 13 tests
    • AmazonS3Client tests: Add streaming GET/SET tests (getObjectStreamAsync, uploadObjectStreamAsync), UNSIGNED-PAYLOAD signing, no-retry for stream uploads — expanded from 32 to 37 tests (50 total with credentials and provider tests)
    • HttpBuildCacheProvider tests: Fill gaps in buffer-based tests: successful GET returning buffer, trySetCacheEntryBufferAsync (success, write-not-allowed, retry on server error)
    • Build and run tests to verify

Copilot AI and others added 4 commits April 5, 2026 03:35
…P plugin

Add optional tryGetCacheEntryStreamByIdAsync and trySetCacheEntryStreamAsync
methods to ICloudBuildCacheProvider. Implement streaming in HttpBuildCacheProvider
and update OperationBuildCache to prefer streaming when available. Add
fetchStreamAsync to WebClient and stream write support to
FileSystemBuildCacheProvider.

Agent-Logs-Url: https://github.com/microsoft/rushstack/sessions/050e10a7-3cad-4da4-93e5-9941453283b9

Co-authored-by: iclanton <5010588+iclanton@users.noreply.github.com>
@iclanton
Copy link
Copy Markdown
Member

iclanton commented Apr 5, 2026

@copilot - add APIs to FileSystem in node-core-library for the fs APIs you're making use of in this PR. Ideally, none of our projects (except for FileSystem itself) should use fs. Everything should use FileSystem. The change to node-core-library will also need a changefile. Take a look at the conventions in FileSystem and follow that pattern.

…rary and update consumers to use FileSystem instead of fs directly

Agent-Logs-Url: https://github.com/microsoft/rushstack/sessions/5ad20d4d-c9a4-4855-bb13-8dd9e2c1350b

Co-authored-by: iclanton <5010588+iclanton@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 5, 2026

@copilot - add APIs to FileSystem in node-core-library for the fs APIs you're making use of in this PR. Ideally, none of our projects (except for FileSystem itself) should use fs. Everything...

Added createReadStream() and createWriteStream() static methods to FileSystem in node-core-library, along with FileSystemReadStream and FileSystemWriteStream type aliases (following the same pattern as FileSystemStats and FolderItem). Updated FileSystemBuildCacheProvider and OperationBuildCache to use the new FileSystem APIs instead of importing from node:fs directly. Change file for @rushstack/node-core-library included as a minor bump. Commit: 468caf7

Copilot AI requested a review from iclanton April 5, 2026 04:32
@iclanton
Copy link
Copy Markdown
Member

iclanton commented Apr 5, 2026

@copilot - are there async variants of those fs APIs? Should we expose and use those too?

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 5, 2026

@copilot - are there async variants of those fs APIs? Should we expose and use those too?

No, fs.createReadStream() and fs.createWriteStream() don't have async variants — they're already inherently asynchronous. They return stream objects immediately and all the actual I/O happens asynchronously through the stream events/pipeline. The fs/promises module doesn't include them for this reason.

@iclanton
Copy link
Copy Markdown
Member

iclanton commented Apr 5, 2026

@copilot - take a look at the other FileSystem APIs. Many have an optional options parameter with useful utilities. For example, the write operations have an option to create the parent folder if it doesn't exist. Consider adding useful options to these new APIs

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 5, 2026

@copilot - take a look at the other FileSystem APIs. Many have an optional options parameter with useful utilities. For example, the write operations have an option to create the parent folder if it...

Added IFileSystemCreateWriteStreamOptions with ensureFolderExists to createWriteStream(), following the same pattern as IFileSystemWriteBinaryFileOptions and other write APIs. Updated FileSystemBuildCacheProvider to use the new option instead of manual ensureFolderAsync(). For createReadStream, there aren't useful options from the existing FileSystem conventions (read methods use encoding/convertLineEndings which don't apply to raw streams). Commit: 91161d8

@iclanton
Copy link
Copy Markdown
Member

iclanton commented Apr 5, 2026

@copilot - can we add stream support to the other cache plugins?

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 5, 2026

@copilot - can we add stream support to the other cache plugins?

Added streaming support to both remaining cache plugins:

Amazon S3 plugin (733ef5c):

  • Refactored AmazonS3Client._makeRequestAsync to extract signing into _buildSignedRequest
  • Added _makeStreamRequestAsync using WebClient.fetchStreamAsync() with UNSIGNED-PAYLOAD for AWS Sig V4 signing of streaming uploads
  • Added getObjectStreamAsync() (with retry support) and uploadObjectStreamAsync() (single attempt, since consumed streams can't be replayed)
  • Added tryGetCacheEntryStreamByIdAsync and trySetCacheEntryStreamAsync to AmazonS3BuildCacheProvider

Azure Storage plugin (733ef5c):

  • Added tryGetCacheEntryStreamByIdAsync using blobClient.download() with readableStreamBody
  • Added trySetCacheEntryStreamAsync using blockBlobClient.uploadStream()
  • Added _logBlobError helper for consistent error handling across buffer and stream methods

All three cache plugins (HTTP, S3, Azure) now implement both streaming methods. All existing tests pass (45 S3, 10 Azure).

Copilot AI changed the title Add streaming APIs to build cache provider interface and HTTP plugin Add streaming APIs to build cache provider interface and all cache plugins Apr 5, 2026
@iclanton
Copy link
Copy Markdown
Member

iclanton commented Apr 6, 2026

@copilot - WebClient handles compression in the buffer case. Is that a concern in the stream case? Is there a way we can handle that? Is that a concern for the Azure plugin?

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 6, 2026

@copilot - WebClient handles compression in the buffer case. Is that a concern in the stream case? Is there a way we can handle that? Is that a concern for the Azure plugin?

Yes, this was a gap. The buffer path in getBufferAsync() decompresses the response based on Content-Encoding (deflate, gzip, brotli), but the stream path was returning the raw IncomingMessage without decompression.

Fixed in f92d579makeStreamRequestAsync now pipes the response stream through the appropriate zlib decompression transform (createInflate/createGunzip/createBrotliDecompress) when Content-Encoding is present and noDecode is not set, using the same lazy await import('node:zlib') pattern as the buffer path.

This covers the HTTP plugin and S3 plugin since both use WebClient. The Azure plugin is not affected — it uses the Azure SDK directly (blobClient.download() / blockBlobClient.uploadStream()), which handles Content-Encoding internally.

Copy link
Copy Markdown
Member

@iclanton iclanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the streaming build cache work — the interface design with optional streaming methods and graceful fallback is clean. A few issues I spotted during review, two of which look like functional bugs.

@@ -264,7 +404,7 @@ export class HttpBuildCacheProvider implements ICloudBuildCacheProvider {
// required. Re-attempt the request, requiring credentials this time.
Copy link
Copy Markdown
Member

@iclanton iclanton Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - Bug: streaming upload retry with consumed stream body

The caller (trySetCacheEntryStreamAsync) correctly sets maxAttempts: 1 to prevent the normal retry path (line 419) from re-sending a consumed stream. However, this credential fallback path doesn't check maxAttempts — it always retries. If a streaming upload is initially sent without credentials and gets a 4xx response, this code will re-invoke _makeHttpCoreRequestAsync with CredentialsOptions.Required, but options.body is the already-consumed stream from the first attempt. The second request would send an empty/finished stream body.

Suggestion: skip this credential fallback when the body is a stream (e.g. check !!body && typeof (body as Readable).pipe === 'function'), or gate it on maxAttempts > 1. This would mean streaming uploads need to provide credentials upfront, which seems like a reasonable requirement.

} = response;
if (statusCode === 301 || statusCode === 302) {
// Drain the redirect response before following
response.resume();
Copy link
Copy Markdown
Member

@iclanton iclanton Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - Bug: response.resume() is called before the redirect switch, which drains the response body even when the redirect falls through to handleResponse.

When redirect is undefined (the default — most callers don't set it) or 'manual', and the server responds with 301/302:

  1. response.resume() is called here, starting to drain the response
  2. Neither 'follow' nor 'error' matches in the switch
  3. Execution falls through to handleResponse(response, ...) at line 154
  4. In the buffer path, handleResponse attaches data/end event listeners to read the response body — but resume() already started draining it

This creates a race where response data could be lost between the resume() call and the handler attachment. The resume() call should be moved inside the case 'follow' and case 'error' branches where the response body is intentionally being discarded:

if (statusCode === 301 || statusCode === 302) {
  switch (redirect) {
    case 'follow': {
      response.resume(); // Only drain when we're discarding this response
      if (redirectUrl) {
        requestFnAsync(redirectUrl, options, true).then(resolve).catch(reject);
      } else {
        reject(new Error(`Received status code ${statusCode} with no location header: ${url}`));
      }
      return;
    }
    case 'error':
      response.resume();
      reject(new Error(`Received status code ${statusCode}: ${url}`));
      return;
  }
}

public async trySetCacheEntryBufferAsync(
terminal: ITerminal,
cacheId: string,
entryStream: Buffer
Copy link
Copy Markdown
Member

@iclanton iclanton Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - Nit: this parameter is named `entryStream` but its type is `Buffer`. Looks like a copy-paste artifact from adding the stream variant. The interface (`ICloudBuildCacheProvider`) names it `entryBuffer` — should match for consistency.

Suggested change
entryStream: Buffer
entryBuffer: Buffer

(The references on lines 92 would need updating too.)

);
updateLocalCacheSuccess = true;
} catch (e) {
updateLocalCacheSuccess = false;
Copy link
Copy Markdown
Member

@iclanton iclanton Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - The error is silently discarded here (and in the identical catch block at line 209). The warning at line 254 only says "Unable to update the local build cache" without any detail about why it failed. This makes debugging cache failures unnecessarily difficult.

Consider at least logging the error at verbose level:
```typescript
} catch (e) {
terminal.writeVerboseLine(`Failed to update local cache: ${e}`);
updateLocalCacheSuccess = false;
}
```

this._cloudBuildCacheProvider.trySetCacheEntryStreamAsync
) {
// Use streaming upload to avoid loading the entire cache entry into memory
const entryStream: FileSystemReadStream = FileSystem.createReadStream(localCacheEntryPath);
Copy link
Copy Markdown
Member

@iclanton iclanton Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - Minor: if `trySetCacheEntryStreamAsync` rejects, this `FileSystemReadStream` is never explicitly destroyed, leaking the file handle until GC. Since `setCloudCacheEntryPromise` isn't awaited until line 371, the cleanup is also deferred. Consider wrapping in a try/catch that calls `entryStream.destroy()` on failure, or attaching an error handler.

headers: headers,
body: body,
redirect: 'follow',
timeoutMs: 0 // Use the default timeout
Copy link
Copy Markdown
Member

@iclanton iclanton Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - Nit: the comment says "Use the default timeout" but passing `0` explicitly overrides the `WebClient._buildRequestOptions` default of 15 seconds and sets the Node.js socket timeout to 0 (no timeout). For streaming transfers of large cache entries, no timeout is probably intentional — but the comment should say that rather than claiming it uses the default.

…ype compatibility

Agent-Logs-Url: https://github.com/microsoft/rushstack/sessions/d8f5e137-3cb0-4ed9-8391-50d7a33d804e

Co-authored-by: iclanton <5010588+iclanton@users.noreply.github.com>
…error logging, stream cleanup, and nits

Agent-Logs-Url: https://github.com/microsoft/rushstack/sessions/d0190bf7-a346-4a71-93bf-d5375d98b552

Co-authored-by: iclanton <5010588+iclanton@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Needs triage

Development

Successfully merging this pull request may close these issues.

3 participants