Skip to content

fix(memory-storage): prevent storage names from escaping the storage directory#3715

Merged
B4nan merged 2 commits into
masterfrom
fix/memory-storage-name-path-traversal
Jun 5, 2026
Merged

fix(memory-storage): prevent storage names from escaping the storage directory#3715
B4nan merged 2 commits into
masterfrom
fix/memory-storage-name-path-traversal

Conversation

@B4nan

@B4nan B4nan commented Jun 4, 2026

Copy link
Copy Markdown
Member

Storage names and key-value-store record keys are used directly as on-disk path components in @crawlee/memory-storage. A value containing .. or an absolute path could therefore resolve outside the intended storage directory and create/write files elsewhere:

  • Storage namesKeyValueStore/Dataset/RequestQueue getOrCreate(name) and update({ name }) resolved path.resolve(baseDir, name), so e.g. KeyValueStore.open('../escaped') escaped the key_value_stores / datasets / request_queues directory.
  • Record keys — the storage client's setRecord({ key }) built the file path as resolve(storeDir, key). The core KeyValueStore.setValue already validates keys against a restricted charset, but the lower-level memory-storage client did not, so a key like ../escaped escaped the store directory.

Adds a shared resolveWithinDirectory(baseDirectory, segment) helper that resolves the candidate path and asserts it stays within the base directory, throwing otherwise. All name- and key-based path construction (resource-client constructors, rename in update, the findOrCache* lookup helpers, and the key-value-store filesystem entry) now routes through it.

Legitimate names/keys are unaffected — including nested segments that stay within the directory; only values that actually escape are rejected. Dataset entity IDs (sequential index) and request IDs (hashed) are internally generated and were already safe.

…directory

Storage names passed to the KeyValueStore, Dataset, and RequestQueue open
APIs were used directly as filesystem path components, so a name containing
`..` or an absolute path could resolve outside the intended storages
directory. Add a shared `resolveStorageDirectory` helper that confines the
resolved directory under its base storage directory, and route all name-based
path construction (create and rename) through it.
@github-actions github-actions Bot added this to the 142nd sprint - Tooling team milestone Jun 4, 2026
@github-actions github-actions Bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Jun 4, 2026
@B4nan B4nan added the adhoc Ad-hoc unplanned task added during the sprint. label Jun 4, 2026
…irectory

Key-value store record keys are used as on-disk file names, so a key
containing `..` or an absolute path could escape the store directory when
written via the storage client's `setRecord` (the core `KeyValueStore.setValue`
already validates keys, but the lower-level client did not). Route the record
key/file path construction through the same confinement check used for storage
names, and rename the helper to `resolveWithinDirectory` since it now guards
both storage directories and record files.
@B4nan B4nan requested review from barjin and vladfrangu June 4, 2026 15:19

@barjin barjin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thank you @B4nan 👍

@B4nan B4nan merged commit a04c297 into master Jun 5, 2026
9 checks passed
@B4nan B4nan deleted the fix/memory-storage-name-path-traversal branch June 5, 2026 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants