Skip to content

@uppy/golden-retriever: cache blob-store size to avoid O(n²) getSize scans#6293

Draft
qxprakash wants to merge 6 commits into
transloadit:mainfrom
qxprakash:idb-getsize-cache
Draft

@uppy/golden-retriever: cache blob-store size to avoid O(n²) getSize scans#6293
qxprakash wants to merge 6 commits into
transloadit:mainfrom
qxprakash:idb-getsize-cache

Conversation

@qxprakash
Copy link
Copy Markdown
Collaborator

This PR doesn't fix any existing open issue, but I've encountered this bug multiple times where IndexedDB crashes when trying to add multiple batches of files to Uppy's state.

  InvalidStateError: Failed to execute 'transaction' on 'IDBDatabase':  The database connection is closing.
      at IndexedDBStore.getSize
      at async IndexedDBStore.put
      at async Promise.all
      at async #addBlobToStores
     (golden-retriever/index.js)
     

The bug is not consistently reproducible, but I believe this change is an improvement over main and will be required for my fix in #6280.

AI Disclaimer: AI used

Summary ( AI generated )

IndexedDBStore.getSize() cursor-walks the entire uppy-blobs store on every call, and put() calls it once per file. Storing a batch of N files therefore runs N full-store scans — O(n²) cursor reads overall.

This PR adds an in-memory #cachedSize counter with the following behaviour:

  • The first getSize() call populates it (one scan).
  • put() increments it — but only while the cache is still live, so a concurrent delete() invalidation wins instead of being silently overwritten.
  • delete() resets it to null, so the next getSize() performs a fresh scan of the (now smaller) store.

After the first call, getSize() is O(1) and a batch of N puts is O(n). There is no behaviour change — getSize() returns the same value and the maxTotalSize guard in put() is unaffected.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 14, 2026

⚠️ No Changeset found

Latest commit: ed50d93

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@qxprakash qxprakash marked this pull request as draft May 14, 2026 13:00
@qxprakash qxprakash requested a review from Copilot May 14, 2026 13:00
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ed50d93ec8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

* cursor-walks the entire store N times (O(n²) reads overall).
*/
async getSize(): Promise<number> {
if (this.#cachedSize !== null) return this.#cachedSize
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Invalidate size cache when shared store is mutated elsewhere

Returning #cachedSize unconditionally after the first getSize() call makes the total-size guard stale when another Uppy instance/tab writes to the same IndexedDB namespace (the default storeName is derived from the Uppy ID, which is commonly shared). In that case, this instance never re-reads IndexedDB, so put() can allow growth past maxTotalSize (or reject valid writes after external deletions) because the cached value only tracks local put/delete calls.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex check if we can use this API to detect multiple instances on tab switch and then trigger a re-scan ? https://developer.mozilla.org/en-US/docs/Web/API/Broadcast_Channel_API

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes @uppy/golden-retriever’s IndexedDB blob storage by avoiding repeated full-store cursor scans when computing total stored blob size, which can cause heavy IndexedDB load during large multi-file batches.

Changes:

  • Added an in-memory #cachedSize to cache total blob-store size after the first getSize() call.
  • Updated getSize() to return the cached value when available, populating it once via a cursor walk.
  • Updated put() to increment the cached total on successful writes, and delete() to invalidate the cache.
Comments suppressed due to low confidence (2)

packages/@uppy/golden-retriever/src/IndexedDBStore.ts:206

  • getSize() unconditionally writes this.#cachedSize = size when the cursor completes. If multiple put()s (or delete()) run concurrently, this can race and overwrite a newer cache value:
  • two concurrent getSize() calls both start a scan while #cachedSize is null;
  • one put() completes and increments #cachedSize, but the other scan finishes later and resets #cachedSize back to the pre-put total;
  • similarly, a delete() may set #cachedSize = null, but an in-flight scan can later set it back to a stale value.

To avoid stale totals, consider guarding cache writes with a generation/invalidation token or de-duplicating scans with an “in-flight getSize promise” so only one scan can populate the cache and late completions can’t overwrite newer/invalidation state.

  async getSize(): Promise<number> {
    if (this.#cachedSize !== null) return this.#cachedSize
    const db = await this.#ready
    const transaction = db.transaction([STORE_NAME], 'readonly')
    const store = transaction.objectStore(STORE_NAME)
    const request = store.index('store').openCursor(IDBKeyRange.only(this.name))
    return new Promise((resolve, reject) => {
      let size = 0
      request.onsuccess = (event) => {
        const cursor = (event.target as IDBRequest).result
        if (cursor) {
          size += cursor.value.data.size
          cursor.continue()
        } else {
          this.#cachedSize = size
          resolve(size)

packages/@uppy/golden-retriever/src/IndexedDBStore.ts:258

  • delete() invalidates the cache only after the IndexedDB request resolves. While the delete is in flight, other calls can still read a now-stale #cachedSize (and put() can still increment it) before it gets reset to null.

Consider invalidating the cache at the start of delete() (before awaiting the request) and coordinating that invalidation with any in-flight getSize() scan (e.g. via the same generation token/promise de-dupe) so callers can’t observe a stale total during concurrent operations.

  async delete(fileID: UppyFileId): Promise<unknown> {
    const db = await this.#ready
    const transaction = db.transaction([STORE_NAME], 'readwrite')
    const request = transaction.objectStore(STORE_NAME).delete(this.key(fileID))
    const result = await waitForRequest(request)
    // We don't track the deleted file's size, so reset the cache so the
    // next getSize() does a fresh scan (which will be fast after deletions
    // since the store has shrunk).
    this.#cachedSize = null
    return result

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +103 to +112
/**
* In-memory size cache eliminates the O(n) cursor walk inside getSize()
* on every put(), making sequential puts O(1) instead of O(n²).
* null means "not yet initialised"; the first getSize() call populates it.
* Incremented on successful put, reset to null on delete (since we don't
* track deleted-file sizes — the next getSize() rebuilds from scratch,
* which is fast because the store shrinks after deletions).
*/
#cachedSize: number | null = null

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check if we can use this API to detect multiple instances on tab switch ? https://developer.mozilla.org/en-US/docs/Web/API/Broadcast_Channel_API

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ed50d93ec8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

* cursor-walks the entire store N times (O(n²) reads overall).
*/
async getSize(): Promise<number> {
if (this.#cachedSize !== null) return this.#cachedSize
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Invalidate cached size when other tabs mutate IndexedDB

The new #cachedSize fast path is process-local, so it becomes stale as soon as another tab/Uppy instance adds or deletes blobs in the same uppy-blobs store. In that multi-tab scenario, put() can make decisions from an out-of-date total (maxTotalSize check at put) and either allow writes that exceed the quota or reject writes after space was freed, which is a behavior regression from the previous always-scan implementation. This cache needs a cross-context invalidation signal (e.g., BroadcastChannel/storage event) or a periodic/visibility-triggered rescan before trusting the cached value.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants