Skip to content

fix: add IDB backing store corruption healing mechanism#780

Open
leshniak wants to merge 3 commits into
Expensify:mainfrom
callstack-internal:fix/idb-corruption-detect-and-heal
Open

fix: add IDB backing store corruption healing mechanism#780
leshniak wants to merge 3 commits into
Expensify:mainfrom
callstack-internal:fix/idb-corruption-detect-and-heal

Conversation

@leshniak
Copy link
Copy Markdown
Contributor

@leshniak leshniak commented Apr 28, 2026

Details

Adds an IDB healing mechanism for Chromium's UnknownError: Internal error opening backing store for indexedDB.open. — 884K errors/month, 26.3% of all storage errors (investigation, solution design).

This is a Dexie-style heal pattern (PR1398_maxLoop) inside createStore.ts — the IDB connection manager for IDBKeyValProvider.

What it does:

  • isBackingStoreError() — detects the Chromium-specific corruption error (DOMException with name === 'UnknownError' and message containing 'Internal error opening backing store')
  • Shared healAttemptsRemaining counter (initialized to 3, reset on every successful IDB operation)
  • On backing store error + budget > 0: decrement counter, drop cached dbp, retry executeTransaction once (forces a fresh indexedDB.open())
  • Guard against stale rejection handlers clearing a newer dbp (capture reference before attaching, only clear if unchanged)
  • Clear dbp on rejection in getDB() and verifyStoreExists (fixes pre-existing bug where a cached rejected promise caused infinite failures)

What it does NOT do (by design per #90636):

  • No deleteDatabase()proven to also fail when LevelDB files are corrupt
  • No MemoryOnlyProvider degradation — cache already absorbs all writes during the session
  • No user-visible UI — session serves correctly from cache
  • No changes to storage/index.ts or OnyxUtils.ts — those are separate issues (#90632, #90633)

Related Issues

Expensify/App#90636
Expensify/App#87862

Automated Tests

5 new tests in tests/unit/storage/providers/createStoreTest.ts:

  1. Mid-session healdb.transaction() throws backing store error once, heals via dropping cached connection and reopening
  2. Init-time healindexedDB.open() rejects twice, third succeeds across two store() calls
  3. Budget exhaustion — 3 consecutive permanent failures drain budget to 0, 4th call skips heal entirely
  4. Budget reset — drain to 1, succeed (resets to 3), then 3 more heals all work (proving reset occurred)
  5. Error classificationUnknownError with wrong message and QuotaExceededError both bypass heal path

All 440 tests pass.

Manual Tests

  1. Verify npm run typecheck passes
  2. Verify npm run lint passes
  3. Verify npm test passes (440/440)
  4. Integrate with Expensify/App and verify storage operations still work correctly on web
  5. Post-deploy: monitor VictoriaLogs for IDB heal log lines — fraction of users that emit them and then continue without further storage errors indicates heal success rate

Author Checklist

  • I linked the correct issue in the ### Related Issues section above
  • I wrote clear testing steps that cover the changes made in this PR
    • I added steps for local testing in the Tests section
    • I tested this PR with a High Traffic account against the staging or production API to ensure there are no regressions (e.g. long loading states that impact usability).
  • I included screenshots or videos for tests on all platforms
  • I ran the tests on all platforms & verified they passed on:
    • Android / native
    • Android / Chrome
    • iOS / native
    • iOS / Safari
    • MacOS / Chrome / Safari
  • I verified there are no console errors (if there's a console error not related to the PR, report it or open an issue for it to be fixed)
  • I followed proper code patterns (see Reviewing the code)
    • I verified that any callback methods that were added or modified are named for what the method does and never what callback they handle (i.e. toggleReport and not onIconClick)
    • I verified that the left part of a conditional rendering a React component is a boolean and NOT a string, e.g. myBool && <MyComponent />.
    • I verified that comments were added to code that is not self explanatory
    • I verified that any new or modified comments were clear, correct English, and explained "why" the code was doing something instead of only explaining "what" the code was doing.
    • I verified proper file naming conventions were followed for any new files or renamed files. All non-platform specific files are named after what they export and are not named "index.js". All platform-specific files are named for the platform the code supports as outlined in the README.
    • I verified the JSDocs style guidelines (in STYLE.md) were followed
  • If a new code pattern is added I verified it was agreed to be used by multiple Expensify engineers
  • I followed the guidelines as stated in the Review Guidelines
  • I tested other components that can be impacted by my changes (i.e. if the PR modifies a shared library or component like Avatar, I verified the components using Avatar are working as expected)
  • I verified all code is DRY (the PR doesn't include any logic written more than once, with the exception of tests)
  • I verified any variables that can be defined as constants (ie. in CONST.js or at the top of the file that uses the constant) are defined as such
  • I verified that if a function's arguments changed that all usages have also been updated correctly
  • If a new component is created I verified that:
    • A similar component doesn't exist in the codebase
    • All props are defined accurately and each prop has a /** comment above it */
    • The file is named correctly
    • The component has a clear name that is non-ambiguous and the purpose of the component can be inferred from the name alone
    • The only data being stored in the state is data necessary for rendering and nothing else
    • If we are not using the full Onyx data that we loaded, I've added the proper selector in order to ensure the component only re-renders when the data it is using changes
    • For Class Components, any internal methods passed to components event handlers are bound to this properly so there are no scoping issues (i.e. for onClick={this.submit} the method this.submit should be bound to this in the constructor)
    • Any internal methods bound to this are necessary to be bound (i.e. avoid this.submit = this.submit.bind(this); if this.submit is never passed to a component event handler like onClick)
    • All JSX used for rendering exists in the render method
    • The component has the minimum amount of code necessary for its purpose, and it is broken down into smaller components in order to separate concerns and functions
  • If any new file was added I verified that:
    • The file has a description of what it does and/or why is needed at the top of the file if the code is not self explanatory
  • If the PR modifies a generic component, I tested and verified that those changes do not break usages of that component in the rest of the App (i.e. if a shared library or component like Avatar is modified, I verified that Avatar is working as expected in all cases)
  • If the main branch was merged into this PR after a review, I tested again and verified the outcome was still expected according to the Test steps.
  • I have checked off every checkbox in the PR author checklist, including those that don't apply to this PR.

Screenshots/Videos

Android: Native

N/A — library-level change, IDB is web-only. No UI, no native code touched.

Android: mWeb Chrome

N/A — library-level change, IDB is web-only. No UI, no native code touched.

iOS: Native

N/A — library-level change, IDB is web-only. No UI, no native code touched.

iOS: mWeb Safari

N/A — library-level change, IDB is web-only. No UI, no native code touched.

MacOS: Chrome / Safari

N/A — library-level change, no UI. Verified via 440/440 unit tests passing.

Adds a Dexie-style heal pattern to createStore for Chromium's
Internal error opening backing store error (884K errors/month).

- isBackingStoreError() detects the Chromium-specific corruption
- Shared healAttemptsRemaining counter (3, reset on success)
- On backing store error: clear cached connection, retry once
- Clear dbp on rejection so retries get fresh indexedDB.open()
- 5 new tests: mid-session heal, init heal, budget exhaustion,
  budget reset, error classification

No deleteDatabase(), no provider swap, no UI changes.
Scoped to IDBKeyValProvider only -- SQLite provider untouched.

Ref: Expensify/App#90636

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@leshniak leshniak force-pushed the fix/idb-corruption-detect-and-heal branch from bd7a14f to 32a6cc8 Compare May 15, 2026 08:59
leshniak and others added 2 commits May 15, 2026 13:02
- Capture dbp reference before attaching reject handler; only clear if
  dbp hasn't been replaced by a concurrent heal/retry (prevents stale
  rejection handler from clearing a newer promise)
- Add comment documenting concurrent store() budget drain behavior
- Fix test formatting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ose)

The heal path clears the cached dbp and reopens via indexedDB.open(),
but does not call db.close() on the old IDBDatabase. Updated comments
and log messages from 'close + reopen' to 'drop cached connection and
reopen' to match what the code actually does.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@leshniak leshniak changed the title fix: detect IDB backing store corruption and heal or degrade gracefully fix: add IDB backing store corruption healing mechanism May 15, 2026
@leshniak leshniak marked this pull request as ready for review May 15, 2026 13:32
@leshniak leshniak requested a review from a team as a code owner May 15, 2026 13:32
@melvin-bot melvin-bot Bot requested review from Beamanator and removed request for a team May 15, 2026 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant