@uppy/golden-retriever: cache blob-store size to avoid O(n²) getSize scans#6293
@uppy/golden-retriever: cache blob-store size to avoid O(n²) getSize scans#6293qxprakash wants to merge 6 commits into
Conversation
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ed50d93ec8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| * cursor-walks the entire store N times (O(n²) reads overall). | ||
| */ | ||
| async getSize(): Promise<number> { | ||
| if (this.#cachedSize !== null) return this.#cachedSize |
There was a problem hiding this comment.
Invalidate size cache when shared store is mutated elsewhere
Returning #cachedSize unconditionally after the first getSize() call makes the total-size guard stale when another Uppy instance/tab writes to the same IndexedDB namespace (the default storeName is derived from the Uppy ID, which is commonly shared). In that case, this instance never re-reads IndexedDB, so put() can allow growth past maxTotalSize (or reject valid writes after external deletions) because the cached value only tracks local put/delete calls.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
@codex check if we can use this API to detect multiple instances on tab switch and then trigger a re-scan ? https://developer.mozilla.org/en-US/docs/Web/API/Broadcast_Channel_API
There was a problem hiding this comment.
Pull request overview
This PR optimizes @uppy/golden-retriever’s IndexedDB blob storage by avoiding repeated full-store cursor scans when computing total stored blob size, which can cause heavy IndexedDB load during large multi-file batches.
Changes:
- Added an in-memory
#cachedSizeto cache total blob-store size after the firstgetSize()call. - Updated
getSize()to return the cached value when available, populating it once via a cursor walk. - Updated
put()to increment the cached total on successful writes, anddelete()to invalidate the cache.
Comments suppressed due to low confidence (2)
packages/@uppy/golden-retriever/src/IndexedDBStore.ts:206
getSize()unconditionally writesthis.#cachedSize = sizewhen the cursor completes. If multipleput()s (ordelete()) run concurrently, this can race and overwrite a newer cache value:
- two concurrent
getSize()calls both start a scan while#cachedSizeis null; - one
put()completes and increments#cachedSize, but the other scan finishes later and resets#cachedSizeback to the pre-put total; - similarly, a
delete()may set#cachedSize = null, but an in-flight scan can later set it back to a stale value.
To avoid stale totals, consider guarding cache writes with a generation/invalidation token or de-duplicating scans with an “in-flight getSize promise” so only one scan can populate the cache and late completions can’t overwrite newer/invalidation state.
async getSize(): Promise<number> {
if (this.#cachedSize !== null) return this.#cachedSize
const db = await this.#ready
const transaction = db.transaction([STORE_NAME], 'readonly')
const store = transaction.objectStore(STORE_NAME)
const request = store.index('store').openCursor(IDBKeyRange.only(this.name))
return new Promise((resolve, reject) => {
let size = 0
request.onsuccess = (event) => {
const cursor = (event.target as IDBRequest).result
if (cursor) {
size += cursor.value.data.size
cursor.continue()
} else {
this.#cachedSize = size
resolve(size)
packages/@uppy/golden-retriever/src/IndexedDBStore.ts:258
delete()invalidates the cache only after the IndexedDB request resolves. While the delete is in flight, other calls can still read a now-stale#cachedSize(andput()can still increment it) before it gets reset to null.
Consider invalidating the cache at the start of delete() (before awaiting the request) and coordinating that invalidation with any in-flight getSize() scan (e.g. via the same generation token/promise de-dupe) so callers can’t observe a stale total during concurrent operations.
async delete(fileID: UppyFileId): Promise<unknown> {
const db = await this.#ready
const transaction = db.transaction([STORE_NAME], 'readwrite')
const request = transaction.objectStore(STORE_NAME).delete(this.key(fileID))
const result = await waitForRequest(request)
// We don't track the deleted file's size, so reset the cache so the
// next getSize() does a fresh scan (which will be fast after deletions
// since the store has shrunk).
this.#cachedSize = null
return result
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /** | ||
| * In-memory size cache eliminates the O(n) cursor walk inside getSize() | ||
| * on every put(), making sequential puts O(1) instead of O(n²). | ||
| * null means "not yet initialised"; the first getSize() call populates it. | ||
| * Incremented on successful put, reset to null on delete (since we don't | ||
| * track deleted-file sizes — the next getSize() rebuilds from scratch, | ||
| * which is fast because the store shrinks after deletions). | ||
| */ | ||
| #cachedSize: number | null = null | ||
|
|
There was a problem hiding this comment.
check if we can use this API to detect multiple instances on tab switch ? https://developer.mozilla.org/en-US/docs/Web/API/Broadcast_Channel_API
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ed50d93ec8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| * cursor-walks the entire store N times (O(n²) reads overall). | ||
| */ | ||
| async getSize(): Promise<number> { | ||
| if (this.#cachedSize !== null) return this.#cachedSize |
There was a problem hiding this comment.
Invalidate cached size when other tabs mutate IndexedDB
The new #cachedSize fast path is process-local, so it becomes stale as soon as another tab/Uppy instance adds or deletes blobs in the same uppy-blobs store. In that multi-tab scenario, put() can make decisions from an out-of-date total (maxTotalSize check at put) and either allow writes that exceed the quota or reject writes after space was freed, which is a behavior regression from the previous always-scan implementation. This cache needs a cross-context invalidation signal (e.g., BroadcastChannel/storage event) or a periodic/visibility-triggered rescan before trusting the cached value.
Useful? React with 👍 / 👎.
This PR doesn't fix any existing open issue, but I've encountered this bug multiple times where
IndexedDBcrashes when trying to add multiple batches of files to Uppy's state.The bug is not consistently reproducible, but I believe this change is an improvement over
mainand will be required for my fix in #6280.AI Disclaimer: AI used
Summary ( AI generated )
IndexedDBStore.getSize()cursor-walks the entireuppy-blobsstore on every call, andput()calls it once per file. Storing a batch of N files therefore runs N full-store scans — O(n²) cursor reads overall.This PR adds an in-memory
#cachedSizecounter with the following behaviour:getSize()call populates it (one scan).put()increments it — but only while the cache is still live, so a concurrentdelete()invalidation wins instead of being silently overwritten.delete()resets it tonull, so the nextgetSize()performs a fresh scan of the (now smaller) store.After the first call,
getSize()is O(1) and a batch of N puts is O(n). There is no behaviour change —getSize()returns the same value and themaxTotalSizeguard input()is unaffected.