Skip to content

src, scripts: use kv to cache directories#756

Open
flakey5 wants to merge 2 commits intomainfrom
flakey5/159
Open

src, scripts: use kv to cache directories#756
flakey5 wants to merge 2 commits intomainfrom
flakey5/159

Conversation

@flakey5
Copy link
Copy Markdown
Member

@flakey5 flakey5 commented Nov 15, 2025

Makes use of KV for directory listings. This aims to replace the build-r2-symlinks.mjs file with two different ones:

  • build-directory-cache.mjs - This reads the entire dist-prod bucket and caches it into the KV namespace. This is pretty much only for the initial setup of the cache and for anytime we need to rebuild all of it.
  • update-directory-cache.mjs - This reads only the necessary paths from the dist-prod bucket and updates them in the KV namespace. This will be the one that we will call in the workflow

A USE_KV environment variable is added to the worker that acts as a toggle between KV and S3 listings. This acts as a safeguard in case something goes wrong while we're still testing it, so we can easily switch between the two. Once we're more confident in KV, we can remove it + the entire S3Provider from the worker.

Closes #159
Closes #314

TODO:

  • Pass in version argument in Node core when invoking the update redirect links action
  • Create KV namespace and update the wrangler config

@flakey5 flakey5 force-pushed the flakey5/159 branch 2 times, most recently from 0ade6ef to 7fb3816 Compare January 6, 2026 04:00
@flakey5 flakey5 marked this pull request as ready for review January 6, 2026 04:28
@flakey5 flakey5 requested a review from a team as a code owner January 6, 2026 04:28
# Triggered by https://github.com/nodejs/node/blob/main/.github/workflows/update-release-links.yml
workflow_dispatch:
inputs:
version:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking till node core pr passing this input in gets made + merged

@ovflowd
Copy link
Copy Markdown
Member

ovflowd commented Jan 6, 2026

cc @flakey5 some of the CI steps are failing

Copy link
Copy Markdown
Member

@ovflowd ovflowd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't finish the review yet, but leaving initial comments here :)

@flakey5 flakey5 requested a review from a team February 7, 2026 19:42
export async function listR2DirectoryRecursive(
client,
bucket,
directory = undefined,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
directory = undefined,
directory,

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That parameter should be optional?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does explicitly saying it defaults to undefined any different than leaving it blank?

flakey5 added 2 commits April 7, 2026 09:50
Signed-off-by: flakey5 <73616808+flakey5@users.noreply.github.com>
Signed-off-by: flakey5 <73616808+flakey5@users.noreply.github.com>
@cursor
Copy link
Copy Markdown

cursor bot commented Apr 9, 2026

PR Summary

Medium Risk
Directory listing behavior now has an alternate KV-backed source and new scripts/workflow update production KV data; mistakes could cause incorrect listings/redirects or increased errors, though a USE_KV toggle and S3 fallback reduce blast radius.

Overview
Adds KV as a new backing store for directory listings. The worker gains a DIRECTORY_CACHE binding plus USE_KV toggle, a new KvProvider, and R2Provider.readDirectory now prefers KV when enabled while also asynchronously comparing KV results against the existing cached/S3 results and reporting mismatches to Sentry.

Replaces the old release-link generation script with KV cache maintenance tooling. Introduces shared lib/ utilities (listR2Directory, retry/limits) and new scripts to either rebuild the full directory cache (build-directory-cache.mjs) or incrementally update it per release (update-directory-cache.mjs), including logic to materialize symlink-like entries and update latestVersions.json/cached directory metadata.

Plumbs KV through CI and local config. The update-links GitHub Action now requires a version input and runs update-directory-cache.mjs using Cloudflare/KV credentials; wrangler.jsonc and Vitest miniflare config add the KV namespace, and new e2e/unit tests cover KV directory routing and the new listing/symlink utilities.

Reviewed by Cursor Bugbot for commit e15aa67. Bugbot is set up for automated code reviews on this repo. Configure here.

@@ -0,0 +1,68 @@
import { S3Client } from '@aws-sdk/client-s3';
@@ -0,0 +1,245 @@
import { readFile, writeFile } from 'node:fs/promises';
import { basename, dirname } from 'node:path';
import { S3Client } from '@aws-sdk/client-s3';
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

Bugbot Autofix prepared fixes for all 3 issues found in the latest run.

  • ✅ Fixed: Version argument crashes before null check executes
    • I now validate the raw version argument before calling toLowerCase() so missing input throws the intended explicit error.
  • ✅ Fixed: Subdirectories cached without trailing slashes in e2e tests
    • The e2e cache population now appends / to each subdirectory name so test data matches production directory-cache shape.
  • ✅ Fixed: latest key incorrectly added as directory symlink
    • I restricted added directory symlinks to keys starting with latest-, which excludes both latest and node-latest.tar.gz.

Create PR

Or push these changes by commenting:

@cursor push 29199a84ef
Preview (29199a84ef)
diff --git a/e2e-tests/util.ts b/e2e-tests/util.ts
--- a/e2e-tests/util.ts
+++ b/e2e-tests/util.ts
@@ -47,7 +47,9 @@
   });
 
   const cachedDirectory: ReadDirectoryResult = {
-    subdirectories: Object.keys(directory.subdirectories),
+    subdirectories: Object.keys(directory.subdirectories).map(
+      subdirectory => `${subdirectory}/`
+    ),
     files,
     hasIndexHtmlFile,
     lastModified: new Date(),

diff --git a/scripts/update-directory-cache.mjs b/scripts/update-directory-cache.mjs
--- a/scripts/update-directory-cache.mjs
+++ b/scripts/update-directory-cache.mjs
@@ -20,12 +20,13 @@
 import { createCloudflareClient, writeKeysToKv } from './utils/kv.mjs';
 import { createS3Client, listR2DirectoryRecursive } from './utils/r2.mjs';
 
-const VERSION = process.argv[2].toLowerCase();
-
-if (!VERSION) {
+const VERSION_ARG = process.argv[2];
+if (!VERSION_ARG) {
   throw new TypeError('version missing from args');
 }
 
+const VERSION = VERSION_ARG.toLowerCase();
+
 if (!VERSION.startsWith('v')) {
   throw new TypeError(
     'provided version not in correct format, expected vX.X.X'

diff --git a/scripts/utils/addSymlinksToDirectoryCache.mjs b/scripts/utils/addSymlinksToDirectoryCache.mjs
--- a/scripts/utils/addSymlinksToDirectoryCache.mjs
+++ b/scripts/utils/addSymlinksToDirectoryCache.mjs
@@ -55,8 +55,8 @@
 ) {
   releaseDirectory.subdirectories.push(
     ...Object.keys(latestVersionMap)
-      // We only want directories, remove the only file in the map
-      .filter(version => version !== NODE_LATEST_FILE_NAME)
+      // We only want the `latest-*` directory symlinks.
+      .filter(version => version.startsWith('latest-'))
       .map(version => `${version}/`)
   );
   releaseDirectory.subdirectories.sort();

diff --git a/scripts/utils/addSymlinksToDirectoryCache.test.ts b/scripts/utils/addSymlinksToDirectoryCache.test.ts
--- a/scripts/utils/addSymlinksToDirectoryCache.test.ts
+++ b/scripts/utils/addSymlinksToDirectoryCache.test.ts
@@ -97,6 +97,7 @@
   const latestVersionMap: LatestVersionMapping = {
     'latest-v20.x': 'v20.1.2',
     'latest-v22.x': 'v22.2.1',
+    latest: 'v22.2.1',
     'node-latest.tar.gz': 'latest/node-v22.2.1',
   };

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Reviewed by Cursor Bugbot for commit e15aa67. Configure here.


if (!VERSION) {
throw new TypeError('version missing from args');
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version argument crashes before null check executes

High Severity

process.argv[2] can be undefined when no argument is provided, and calling .toLowerCase() on it throws an unhandled TypeError with a cryptic message before the intended validation on the next line can produce the helpful 'version missing from args' error.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit e15aa67. Configure here.

});

const cachedDirectory: ReadDirectoryResult = {
subdirectories: Object.keys(directory.subdirectories),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subdirectories cached without trailing slashes in e2e tests

Medium Severity

populateDirectoryCache sets subdirectories to Object.keys(directory.subdirectories) which yields names like v20.0.0 without trailing slashes. The production cache scripts (listR2Directory) always produce subdirectory names with trailing / (from S3 CommonPrefixes). This mismatch means e2e tests validate against incorrectly shaped data, potentially hiding real bugs.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit e15aa67. Configure here.

// We only want directories, remove the only file in the map
.filter(version => version !== NODE_LATEST_FILE_NAME)
.map(version => `${version}/`)
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest key incorrectly added as directory symlink

Medium Severity

addLatestDirectorySymlinksToCache filters only node-latest.tar.gz from the version map, but the map also contains a latest key (mapping to e.g. v25.9.0). This causes a spurious latest/ entry to be added to the release directory's subdirectories, which isn't a real directory or symlink.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit e15aa67. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Serve /docs from Worker Using KV for listing directories

4 participants